Computers That Deserve a Good Talking To
wei Hung-chin / photos Diago Chiu / tr. by Phil Newell
March 1993
The more different voice characteristics you plug into the computer, the more accurately it will be able to differentiate words in the future. (drawing by Lee Su-ling)
Feeling depressed by the complicated Chinese language system for computers? Wouldn't you just like to smash that bewildering keyboard? In fact, spoken data entry systems have already appeared to replace traditional keyboard systems. Perhaps in the future when using a computer you won't even have to use your hands; instead your every wish will be its command.
Still, things aren't quite so perfect yet. Computers that can currently do voice entry are limited to simple terms; it's going to take some time before one is ready to digest an extended discourse.
If you pay attention, you'll note that many recent information exhibitions have had displays of computers that "follow spoken commands." You say something to them, and the computer turns what you say into printed words on the screen. Voice entry eliminates the need for a keyboard in order to allow the human brain and the electronic brain to communicate, and allows those who have long been strangers to computers out of fear of learning the software to eliminate their "guarded" attitude to computers.

In researching computer voice input, analysis of voice frequency is a crucial step.
Freudian slips:
But don't be ecstatic too soon. If you think that in the near future computers will "do whatever you say," you've gotten a bit too idealistic. At one of the exhibition booths, the speaker is excitedly telling the audience how easy their computer voice entry system is, and often asks someone from the crowd to say a simple word to test the machine's functions. The person who steps up has a slight lisp, it seems. He says "Government Information Office" (hsin-wen-chu) but the computer screen shows "excitement stimulant" (hsing-fen-chi). The display host feels that perhaps the Mandarin of this particular gent is less than perfect, so he tries himself. He repeats the three sounds hsin-wen-chu in his most staid tone of voice, and the computer answers this time with hsing-wan-chu- "sex toys"! The embarrassed salesman can only haltingly try to get himself out of the mess by saying, "the computer is playing a joke."

Voice computers are still rather stupid. Don't speak carelessly or it won't have any idea what you're talking about. (drawing by Lee Su-ling)
Still immature:
Don't think the above anecdote is going too far, because voice input still has a long way to go.
Wang Hsiao-chuan, a professor in the Electrical Engineering Department of National Tsing Hua University, who began to research a Chinese language voice entry system in 1984, says that thus far his research has yielded the following: If it's simply recognition of basic proper nouns, like Taiwan place names, road names, famous buildings, and so on, the computer's accuracy is 96%. But distinguishing syllables is more complex; if you just want to talk about distinguishing tones of voice, by 1987, computers had already reached accuracy of 98% in distinguishing among the four tones in Mandarin Chinese. But if you want the computer to show both the correct pronunciation and the tone of every syllabic sound in Chinese, then the rate of accuracy is only 90%.
And even if it gets the sound and tone correct, things don't look good for actually getting the correct character. This is because Chinese has too many characters that share identical pronunciations. If a sentence has ten characters, the chance of getting the whole sentence correct even just for pronunciation is only .90 raised to a power of ten, which comes out to an accuracy rate of about 35%. This is the reason why "Government Information Office" becomes "sex toys."

One microphone can replace a complex keyboard. Voice entry is really appealing.
Keep it simple:
Although voice entry software is already on the market, computers still don't mix well with people. These types of software are developed for special purposes only, as for inquiries about Taipei streets or a bank account. Their special feature is that the data being sought is fixed and of a limited scope. If all the needed data is already entered into the computer, the chances of misrecognition are small, with an accuracy rate of 98%.
But if you want recognition of a whole sentence given current technology, that would be extremely difficult. You say, "I love you," and the computer is likely to understand, and can accurately transmit your sentiment. But if you say "I'll love you until the end of time, until the seas dry up and the mountains crumble . . . ." a human brain will find this useful, while a computer would just respond "words not recognized."
Looking back over the development process of voice entry systems, in fact specialists in every nation have invested a great deal of effort. Yet the results have not been commensurate. In the 1970s the U.S. Department of Defense began researching voice entry computer systems, but after twenty years the computer still doesn't completely understand. Although IBM, ATT, and other giants have partly developed rather user-friendlyexperimental systems, because they need a huge computer capacity, not just any machine can handle them. So they cannot be immediately commercialized.

Voice entry is very helpful to busy assembly line workers, and some mechanical actions can be performed entirely by machines following simple voice commands.
Chinese is tougher:
If a Chinese language voice entry system can be successfully developed, the market potential is indeed great. Taiwan has long been a center of personal computer manufacturing, but the degree of information technology in use here is still not particularly high. Many people keep a respectful distance from computers. A major reason may very well be that Chinese is not an alphabetic language like European languages or Japanese. People in these countries are used to typing from a young age, and there are few problems with using a computer keyboard. But for Chinese, whether it be the system of entering each character according to its various radicals and components, using phonetic symbols, or using romanization, all of them require that a person make the effort to study hard in order to get familiar with the keyboard and software. Thus, either script (handwritten) or voice data entry would spark a rapid spread of information technology for Chinese; and of the two the highest expectations are reserved for the more convenient spoken system.
Professor Li Lin-shan of the Electrical Engineering Department at National Taiwan University, one of the few people studying voice entry methods in Taiwan, says that if a voice entry system can overcome the technical obstacles, this would be one of the most suitable data entry systems for Chinese speakers. Although voice entry is a key to breaking through obstacles to computer use for Chinese, "a Chinese voice entry system is a topic that is not popular in either academia or the business world," says Li Lin-shan. Because of language difficulties, Chinese publications on voice entry systems cannot find their way into international scholarly journals, which reduces the interest of those involved. In terms of business, the main market for Taiwan computer products right now is the U.S. and Europe. The local market is too small, and incomes are still too low in mainland China. Thus the market is just not ready, and consequently willingness to invest in this area is very low.

A voice entry system needs to fit on such a small PC board, so that in the future it can hopefully replace keyboards.
Keep struggling, comrades:
Not only is there little research on voice entry systems in Taiwan, it got a late start to begin with. Even Wang Hsiao-chuan, who got into the field early on, beginning in 1984, has been doing it less than nine years. He admits that at first he had a great deal of ambition in exploring this virgin territory, but in the last nine years, "sometimes I really feel there's no sense of accomplishment." He says there are still many areas that require breakthroughs.
Professor Wang points out that the biggest problem in voice entry is that computers can't comprehend human thought processes, so cannot use human language to talk to us. As a result, if you wanted to come up with a clever product like those in science fiction movies, it would require the help of artificial intelligence. But the study of artificial intelligence is still struggling.
Academics want the research to come out perfect, but companies have already rushed out products. But these products can only handle relatively simple information. "For something like simple words, you just need to have all the data already inside, then the results are usually satisfactory," says Liu Li-cheng, manager of Voice Processing Development at Taicom Data Systems, which is in the process of promoting a voice entry system in the market. The voice system requires that data on the special features of the user's voice first be fed in, and the more people who enter their voice data in the memory bank, the higher the computer's "listening comprehension." Further, if the information to be distinguished is relatively simple, like street names, then the accuracy rate is very high.

Successful development of a voice entry system would be a boon to those who have vision problems. You just have to talk to the computer and it will do whatever you say.
A handy thing to have around:
Liu Li-cheng points out that overseas many doctors use voice entry systems to prepare prescriptions. In busy factories, where workers often don't have a free hand to spare, simple commands can be used to get the computer to operate. Simple bank telephone voice systems allow customers to inquire about their accounts. Successful development of voice entry systems would make life much more convenient for everyone.
But Liu confesses that given the current level of technology, their sales channels are limited. "So far we've only sold 100-200 sets of voice entry technology, with most of the customers being research units," he says. This shows that most of the people who buy this software do so not to use but to study.
Although the market is still limited, Liu is still very confident about the future of voice entry systems. Once they are successfully completed, the Chinese language market around the world could be calculated "in the hundreds of millions."
Lack of a database:
Although virtually every country in the world has people investing their energy in devising a computer voice entry system, there is little exchange among them. "The grammar, pronunciation, and even thought processes differ from nation to nation, so there is little of reference value to share," says Wang Hsiao-chuan. Voice system researchers in fact are rather isolated internationally.
As for which language is relatively at an advantage in computer voice entry, Wang Hsiao-chuan replies, "it's hard to say." He indicates that the advantage of English is that it has no tones, but it also has the problem of too many words and sounds. You have to put more than 10,000 words into the computer before you can use it, and the number of words is increasing all the time. Although the Chinese syllabary only adds up to 1,000-plus common sounds, much more pared down than English, it has those four tones which are not easy for the computers to recognize. He says, "Anyway, if the computer can't understand human thought processes, there is a bottleneck in voice entry no matter what the language."
Further, domestic research into voice entry does not necessarily face only technical problems. The biggest problem is the lack of a "language data base." Because computers are quite "stupid," they need people to input all the data they will require, so it is vitally important to provide the computer with a complete database.
"Love" you, not "leave" you:
For example, computers do not hear, like people, with ears, but decide what it is someone is saying from an analysis of the tone, frequency, and other variables. When a genteel-voiced young lady whispers "I love you," and an 80-year old granny without a tooth in her head says the same thing, although you and I can tell it's identical, in the computer analysis, if the older woman's voice data is lacking, it might hear, "I leave you." It's only a slight difference in verbiage, but polar opposites in meaning. To get the computer to get "love" and "leave" straight, you have to tell it more about how to say "love." The tones of how the young girl, the elderly man, the young Romeo, and the granny say "love" must all be entered into the computer. And the more "love" the computer receives, the more it understands what all these different pronunciations of "love" have in common. If you can speak Mandarin Chinese, whether it be "Taiwan Mandarin," "Chekiang Mandarin," or even Mandarin with a Sinkiang accent, the computer will become accustomed to the user's inflections. And the computer can still accept "love" from someone speaking perfect standard Mandarin. Here the computer is ahead of people, for it has no prejudices about people's provincial accents and origins, and in the future if you give it "love" the chance that it will understand correctly will be very high.
Perfect lover:
But in terms of domestic language data bases, only the Institute of Electronic Information Systems of the Ministry of Transportation and Communications has such a data base. And even this only has the voice data for Chinese from about 100 people. Similar operations overseas require the voice data from at least 1000 people before they can really be useful.
"If the data bank is comprehensive, with the help of a grammar and a complete electronic dictionary, computer voice entry systems should have a real future," concludes Professor Wang. Already several scholars are cooperating to establish a voice data bank that is practicable and publicly trusted.
Although research into voice entry is advancing step by step, there are still many bottlenecks. But don't feel too deflated. In fact, the sense of anticipation of the birth of a computer that will follow your every wish is somewhat romantic. Just imagine: A computer that tamely obeys in everything, just like a perfect lover . . . .
[Picture Caption]
p.44
The more different voice characteristics you plug into the computer, the more accurately it will be able to differentiate words in the future. (drawing by Lee Su-ling)
p.45
In researching computer voice input, analysis of voice frequency is a crucial step.
p.46
Voice computers are still rather stupid. Don't speak carelessly or it won't have any idea what you're talking about. (drawing by Lee Su-ling)
p.47
One microphone can replace a complex keyboard. Voice entry is really appealing.
p.47
Voice entry is very helpful to busy assembly line workers, and some mechanical actions can be performed entirely by machines following simple voice commands.
p.47
A voice entry system needs to fit on such a small PC board, so that in the future it can hopefully replace keyboards.
p.48
Successful development of a voice entry system would be a boon to those who have vision problems. You just have to talk to the computer and it will do whatever you say.