Today, I called a US airline because one of their automated systems called to tell me that a trip I had planned on their airline had been changed. You can only imagine my delight when I discovered that the airline in question had helpfully installed yet another voice recognition system to handle much of their call center chores.
In the interest of full disclosure, let me start by saying that I hate automated telephone systems. In fact, if I could find a stronger word than hate, I would use it. I suppose I could string together a series of words to more accurately express my feelings, but since at least twelve of those words would be expletives, I’ll leave that as a mental exercise for the reader.
There is nothing worse than calling a company to accomplish some relatively well defined task and finding yourself ear to processor with a voice recognition telephone system (known as a “speech enabled telephone system” in the biz). Speech enabled telephone systems never work correctly. Amtrak’s, for instance, is funny if you don’t need it to do anything for you, but frustrating as hell if you need information from it. The problems with speech enabled telephone systems (SETS, hereafter) are numerous.
SETS require you to speak in a manner that is wholely unlike regular conversation. You can always tell when someone other unfortunate has encountered a SETS: - They start speaking various bits of information as incantations into the handset with no surrounding context.
- Their voice attempts to lose all inflection (since we all know how crappy SETS technology really is and how poorly it handles dialects and accents).
- Frustration immediately creeps into their voice. I’d love to put a voice stress analyzer on someone dialing up a SETS-saddled number. I’d wager money that nearly 99% of people experience increased stress when dealing with a SETS. One selling point of SETS technology is that it supposedly makes it more “natural” for people to get information out of a system. After all, people talk all the time, so why not let them talk to a computer to get information out of the computer?
Unfortunately, speaking to computers is not easy for people to do, primarily because computers simply are not smart enough to process human languages. The interface glue, in lazy technical terms, between the human brain and the computer processor is the English language. The English language, as any foreigner will tell you, is incredibly complex. In English, words can have multiple meanings and pronunciations depending on context. The human brain is incredibly skilled at taking a spoken sentence and deriving meaning from it without the need to first translate that sentence into text. However, computers need to translate spoken words into text that can then be lexically analyzed. From that analysis, some sort of meaning can then be derived. If context is anything but crystal clear, computers have no choice but to make a computerized version of the WAG (Wild Assed Guess). If I say, “He’s wrapping it up right now,” humans can guess my meaning from the context in which the sentence is spoken. Computers have a much harder time than that. Their ability to judge context is so poor at this point, that attempts to judge meaning from context is more often than not going to lead the computer astray. So, the computer is left analyzing the sentence at hand without considering what has gone before it. The computer might have a hard time deciding if I meant that he is “rapping it up” (as in rhythmic vocal speaking), or if he is “wrapping it up” (as in finishing up some discrete task), or if he is “wrapping it up” (as in surrounding an object with some sort of decorative or protective packaging).
English, like all languages, also overflows with idioms. If parsing meaning from a sentence without a firm grasp of context is hard, deciphering idioms can be impossible. Even humans, with their innate grasp of language and decades of practice can be tripped up by idioms.
When I used to tutor a Korean fellow in California, he was constantly being tripped up by even the most common idioms. The guy was really smart and he was working hard to learn the language, but idioms were still far, far beyond his grasp. I noticed that even though I constantly made conscious attempts to simplify my speech around him, idioms still littered my sentences. Even really common idioms that I expected a guy in his young 20’s to encounter (“What’s Up?” for example) were completely outside of his experience.
How on earth can a computer that can hardly understand simple two-word responses or spoken letters hope to understand idioms? And, if the system cannot understand idioms, can it really be that speech enabled?
Of course not. People using speech enabled systems don’t have natural conversations with the system; they have stilted one-way question and answer sessions in which the humans attempt to modify their voice and inflections enough to satisfy the computer’s limited pattern recognition technologies.
Is that helping people? Is that making their lives easier? Do you really want the face of your company to be a brainless SETS that cannot understand simple sentences like “I changed my mind about going to Florida.”? Is that the first impression of your call center that people should get? A system that has trouble understanding clearly spoken letters and numbers?
Today, for instance, I called the airline, and immediately gritted my teeth when I recognized that a damn SETS had been installed. The system started by asking for my frequent flyer number, which I didn’t have handy. I didn’t have that number handy because no human agent has ever started the conversation by asking for my frequent flyer number. So, right off the bat, the SETS was violating the human expectation of how a call should proceed. The process of reforming human behavior to fit machine needs had begun.
When I had to take, literally, less than twelve seconds to find it, the system responded to my silence by explaining that it could not find my number, would I please restate it? By this time, I had the number, so I spoke it into the telephone, slowly and clearly.
The system could not understand the digits I spoke into the telephone, so it prompted me to speak them again. (And how is this easier than pushing the buttons on the telephone, considering that my frequent flyer number consists entirely of digits?) So, I spoke the numbers again, even more slowly and clearly this time.
The computer gods took pity on me, found my record, and handed it to the SETS. Now, the SETS asked for my itinerary confirmation number. I spoke the six letter code slowly and clearly. I said, “L…
Q.”The computer responded, “Did you say ‘L…
G’?”No, you lousy piece of leperous water buffalo sphincter, I did not say that. So, I said my confirmation number once again, slowly and clearly. The system then responded with another equally, horribly incorrect number. I tried speaking the confirmation number three more times and three more times the computer asked if I had spoken a completely different number.
At this point, I reverted to my usual plan of attack when faced with a SETS: get to a human by any means necessary. So, I trotted out my dictionary of possible terms to short-circuit the damn computer, “Agent. Operator. Human.” I tried the terms in sequence over and over, regardless of the computer’s prompts until it finally said, “I think you want to speak to an agent. Is that correct?”“Yes,” I practically screamed into the handset.
Once I reached an agent, the business end of the call took very little time.
Before I hung up with the agent, I asked her if she finds that customers are more frustrated when they reach her because of the new automated system. She laughed and said, “Actually, they are really nice because they are so relieved to finally reach a human.”She explained that the new system was supposed to “help” the humans in the call center and to be a “tool” for them to use. By the tone in her voice (something else a computer could never hope to comprehend), I understood that she was clearly telling me how management had pitched the system to the call center staff it was designed to replace. Her tone also revealed that she didn’t believe a word of it.
The agent’s experience had been that, in general, most people did not get the information or help they needed from the SETS and that humans generally helped and understood the needs of humans much better than the computer. Of course, she is a biased source since her job would be one of the first to go if SETS technology were to actually be worth a bucket of warm chicken entrails. However, given my experience with SETS, and my general technological bent, I am inclined to believe she was telling the truth.