?? book2
字號:
.EQdelim $$.EN.CH "1 WHY SPEECH OUTPUT?".ds RT "Why speech output?.ds CX "Principles of computer speech.ppSpeech is our everyday, informal, communication medium. But although we useit a lot, we probably don't assimilate as much information through ourears as we do through our eyes, by reading or looking at pictures and diagrams.You go to a technical lecture to get the feel of a subject \(em the overallarrangement of ideas and the motivation behind them \(em and fill in the details,if you still want to know them, from a book. You probably find out more aboutthe news from ten minutes with a newspaper than from a ten-minute news broadcast.So it should be emphasized from the start that speech output from computers isnot a panacea. It doesn't solve the problems of communicating with computers;it simply enriches the possibilities for communication..ppWhat, then, are the advantages of speech output? One good reason for listeningto a radio news broadcast instead of spending the time with a newspaperis that you can listen while shaving, doing the housework, or driving the car.Speech leaves hands and eyes free for other tasks.Moreover, it is omnidirectional, and does not require a free line of sight.Related to this is theuse of speech as a secondary medium for status reports and warning messages.Occasional interruptions by voice do not interfere with other activities,unless they demand unusual concentration, and people can assimilate spoken messagesand queue them for later action quite easily and naturally..ppThe second key feature of speech communication stems from the telephone.It is the universality of the telephone receiver itself that is importanthere, rather than the existence of a world-wide distribution network;for with special equipment (a modem and a VDU) one does not need speech to take advantage ofthe telephone network for information transfer.But speech needs no tools other than the telephone, and this givesit a substantial advantage. You can go into a phone booth anywhere in the world,carrying no special equipment, and have access to your computer within seconds.The problem of data input is still there: perhaps your computersystem has a limited word recognizer, or you use the touchtone telephonekeypad (or a portable calculator-sized tone generator). Easy remote accesswithout special equipment is a great, and unique, asset to speech communication..ppThe third big advantage of speech output is that it is potentially very cheap.Being all-electronic, except for the loudspeaker, speech systems are wellsuited to high-volume, low-cost, LSI manufacture. Other computer outputdevices are at present tied either to mechanical moving parts or to the CRT.This was realized quickly by the computer hobbies market, where speech outputperipherals have been selling like hot cakes since the mid 1970's..ppA further point in favour of speech is that it is natural-seeming andsomehow cuddly when compared with printers or VDU's. It would have been muchmore difficult to make this point before the advent of talking toys likeTexas Instruments' "Speak 'n Spell" in 1978, but now it is an accepted fact that friendlycomputer-based gadgets can speak \(em there are talking pocket-watchesthat really do "tell" the time, talking microwave ovens, talking pinball machines, and,of course, talking calculators.It is, however, difficult to assess whether the appeal stems frommechanical speech's novelty \(em itis still a gimmick \(em and also to what extent it is tied up witheconomic factors.After all, most of the population don't use high-quality VDU's, and their majorexperience of real-time interactive computing is through the very limited displaysand keypads provided on video games and teletext systems..ppArticles on speech communication with computers often list many more advantages of voice output(see Hill 1971, Turn 1974, Lea 1980)..[Hill 1971 Man-machine interaction using speech.].[Lea 1980.].[Turn 1974 Speech as a man-computer communication channel.]For example, speech.LB.NPcan be used in the dark.NPcan be varied from a (confidential) whisper to a (loud) shout.NPrequires very little energy.NPis not appreciably affected by weightlessness or vibration..LEHowever, these either derive from the three advantages we have discussed above,or relatemainly to exotic applications in space modules and divers' helmets..ppUseful as it is at present, speech output would be even more attractive if it couldbe coupled with speech input. In many ways, speech input is its "big brother".Many of the benefits of speech output are even more striking for speech input.Although people can assimilate information faster through the eyes than theears, the majority of us can generate information faster with the mouth thanwith the hands. Rapid typing is a relatively uncommon skill, and even hightyping rates are much slower than speaking rates (although whether we canoriginate ideas quickly enough to keep up with fast speech is another matter!) Totake full advantage of the telephone for interaction with machines, machinerecognition of speech is obviously necessary. A microwave oven, calculator,pinball machine, or alarm clock that responds to spoken commands is certainlymore attractive than one that just generates spoken status messages. A bookthat told you how to recognize speech by machine would undoubtedly be moreuseful than one like this that just discusses how to synthesize it! But thetechnology of speech recognition is nowhere near as advanced as that ofsynthesis \(em it's a much more difficult problem. However, because speech inputis obviously complementary to speech output, and even very limited inputcapabilities will greatly enhance many speech output systems, it is worthsummarizing the present state of the art of speech recognition..ppCommercial speech recognizers do exist. Almost invariably, they acceptwords spoken in isolation, with gaps of silence between them, rather thanconnected utterances.It is not difficult to discriminate with high accuracy up to a hundreddifferent words spoken by the same speaker, especially if the vocabularyis carefully selected to avoid words which sound similar. If severaldifferent speakers are to be comprehended, performance can be greatly improvedif the machine is given an opportunity to calibrate their voices in a trainingsession, and is informed at recognition time which one is to speak.With a large population of unknown speakers, accurate recognition is difficultfor vocabularies of more than a few carefully-chosen words..ppA half-way house between isolated word discrimination and recognition of connectedspeech is the problem of spotting known words in continuous speech. Thisallows much more natural input, if the dialogue is structured as keywordswhich may beinterspersed by unimportant "noise words". To speak in truly isolatedwords requires a great deal of self-discipline and concentration \(em it issurprising how much of ordinary speech is accounted for by vague soundslike um's and aah's, and false starts. Word spotting disregards these and sopermits a more relaxed style of speech. Some progress has been made on it inresearch laboratories, but the vocabularies that can be accomodated are stillvery small..ppThe difficulty of recognizing connected speech depends crucially on what isknown in advance about the dialogue: its pragmatic, semantic, and syntacticconstraints. Highly structured dialogues constrain very heavily the choice ofthe next word. Recognizers which can deal with vocabularies of over 1000 wordshave been built in research laboratories, but the structure of the input hasbeen such that the average "branching factor" \(em the size of the set out ofwhich the next word must be selected \(em is only around 10 (Lea, 1980)..[Lea 1980.]Whether suchhighly constrained languages would be acceptable in many practical applicationsis a moot point. One commercial recognizer, developed in 1978, can cope withup to five words spoken continuously from a basic 120-word vocabulary..ppThere has been much debate about whether it will ever be possible for a speechrecognizer to step outside rigid constraints imposed on the utterances it canunderstand, and act, say, as an automatic dictation machine. Certainly the mostadvanced recognizers to date depend very strongly on a tight context beingavailable. Informed opinion seems to accept that in ten years' time,voice data entry in the office will be an important and economically feasibleprospect, but that it would be rash to predict the appearance of unconstrainedautomatic dictation by then..ppLet's return now to speech output and take a look at some systems which use it,to illustrate the advantages and disadvantages of speech in practicalapplications..sh "1.1 Talking calculator".ppFigure 1.1 shows a calculator that speaks..FC "Figure 1.1"Whenever a key is pressed,the device confirms the action by saying the key's name.The result of any computation is also spoken aloud.For most people, the addition of speech output to a calculator is simply agimmick.(Note incidentally that speech.ulinputis a different matter altogether. The ability to dictate lists of numbers andcommands to a calculator, without lifting one's eyes from the page, would havevery great advantages over keypad input.) Used-carsalesmen find that speech output sometimes helps to clinch a deal: they key inthe basic car price and their bargain-basement deductions, and the customer is sobemused by the resulting price being spoken aloud to him by a machine that hesigns the cheque without thinking! More seriously, there may be some smalladvantage to be gained when keying a list of figures by touch from having theirvalues read back for confirmation. For blind people, however, such devicesare a boon \(em and there are many other applications, like talking elevatorsand talking clocks, which benefit from even very restricted voice output.Much more sophisticated is a typewriter with audio feedback, designed byIBM for the blind. Although blind typists can remember where the keys on atypewriter are without difficulty, they rely on sighted proof-readers to helpchecktheir work. This device could make them more useful as office typists andsecretaries. As well as verbalizing the material (including punctuation)that has been typed, either by attempting to pronounce the words or by spellingthem out as individual letters, it prompts the user through the more complex action sequencesthat are possible on the typewriter..ppThe vocabulary of the talking calculator comprises the 24 words of Table 1.1..RF.nr x1 2.0i+\w'percent'u.nr x1 (\n(.l-\n(x1)/2.in \n(x1u.ta 2.0izero percentone lowtwo overthree rootfour em (m)five timessix pointseven overfloweight minusnine plustimes-minus clearequals swap.ta 0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i.in 0.FG "Table 1.1 Vocabulary of a talking calculator"This represents a total of about 13 seconds of speech. It is storedelectronically in read-only memory (ROM), and Figure 1.2 shows the circuitryof the speech module inside the calculator..FC "Figure 1.2"There are three large integrated circuits.Two of them are ROMs, and the other is a special synthesis chip which decodes thehighly compressed stored data into an audio waveform.Although the mechanisms used for storing speech by commercial devices arenot widely advertised by the manufacturers, the talking calculator almostcertainly uses linear predictive coding \(em a technique that we will examinein Chapter 6.The speech quality is very poor because of the highly compressed storage, andwords are spoken in a grating monotone.However, because of the very small vocabulary, the quality is certainly goodenough for reliable identification..sh "1.2 Computer-generated wiring instructions".ppI mentioned earlier that one big advantage of speech over visual output is thatit leaves the eyes free for other tasks.When wiring telephone equipment during manufacture, the operator needs to usehis hands as well as eyes to keep his place in the task.For some time tape-recorded instructions have been used for this in certainmanufacturing plants. For example, the instruction.LB.NIRed 2.5 11A terminal strip 7A tube socket.LEdirects the operator to cut 2.5" of red wire, attach one end to a specified pointon the terminal strip, and attach the other to a pin of the tube socket. Thetape recorder is fitted with a pedal switch to allow a sequence of such instructionsto be executed by the operator at his own pace..ppThe usual way of recording the instruction tape is to have a human readerdictate them from a printed list.The tape is then checked against the list by another listener to ensure thatthe instructions are correct. Since wiring lists are usually stored andmaintained in machine-readable form, it is natural to consider whether speechsynthesis techniques could be used to generate the acoustic tape directly bya computer (Flanagan.ulet al,1972)..[Flanagan Rabiner Schafer Denman 1972.].ppTable 1.2 shows the vocabulary needed for this application..RF.nr x1 2.0i+2.0i+\w'tube socket'u.nr x1 (\n(.l-\n(x1)/2.in \n(x1u.ta 2.0i +2.0iA green seventeenblack left sixbottom lower sixteenbreak make stripC nine tencapacitor nineteen terminaleight one thirteeneighteen P thirtyeleven point threefifteen R topfifty red tube socketfive repeat coil twelveforty resistor twentyfour right twofourteen seven upper.ta 0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i +0.8i.in 0.FG "Table 1.2 Vocabulary needed for computer-generated wiring instructions"It is rather largerthan that of the talking calculator \(em about 25 seconds of speech \(em but wellwithin the limits of single-chip storage in ROM, compressed by the linearpredictive technique. However, at the time that the scheme was investigated(1970\-71) the method of linear predictive coding had not been fully developed,and the technology for low-cost microcircuit implementation was not available.But this is not important for this particular application, for there isno need to perform the synthesis on a miniature low-cost computer system,nor need itbe accomplished in real time. In fact a technique of concatenatingspectrally-encoded words was used (described in Chapter 7), and it wasimplemented on a minicomputer. Operating much slower than real-time, the systemcalculated the speech waveform and wrote it to disk storage. A subsequent phaseread the pre-computed messages and recorded them on a computer-controlled analoguetape recorder..ppInformal evaluation showed the scheme to be quite successful. Indeed, thesynthetic speech, whose quality was not high, was actually preferred tonatural speech in the noisy environment of the production line, for each
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -