?? speech synthesis & speech recognition using sapi 5_1.htm
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0066)http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm -->
<HTML><HEAD><TITLE>Speech Synthesis & Speech Recognition Using SAPI 5.1</TITLE>
<META content="text/html; charset=windows-1252" http-equiv=Content-Type>
<META content="MSHTML 5.00.2614.3500" name=GENERATOR></HEAD>
<BODY bgColor=lightblue><A name=Top></A><FONT
face="Verdana, Arial, Helvetica, sans-serif" size=2><IMG align=right alt=Athena
height=164
src="Speech Synthesis & Speech Recognition Using SAPI 5_1.files/Athena.gif"
width=174>
<H1>
<P align=center>Speech Synthesis & Speech Recognition Using SAPI
5.1</P></H1>
<P align=center><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#AboutBrian"><I>Brian
Long</I></A> (<A href="http://www.blong.com/"
target=_blank>http://www.blong.com/</A>)</P>
<H2>Table of Contents</H2>
<UL>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Introduction">Introduction</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#TTS">Speech
Synthesis</A>
<UL>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#EnumVoices">Enumerating
Voices</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Speech">Making
Your Computer Talk</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Events">Voice
Events</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Animation">Animating
Speech</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#KeepingTrack">Keeping
Track Of Spoken Text</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#SpeakingDialogs">Speaking
Dialogs</A> </LI></UL>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#SR">Speech
Recognition</A>
<UL>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Grammars">Grammars</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#DSR">Continuous
Dictation Recognition</A>
<UL>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#GramNotify">Grammar
Notifications</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#EngineDialogs">Engine
Dialogs</A> </LI></UL>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#CnC">Command
and Control Recognition</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Troubleshooting">Speech
Recognition Troubleshooting</A> </LI></UL>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Deployment">SAPI
5.1 Deployment</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#Summary">Summary</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#References">References/Further
Reading</A>
<LI><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.htm#AboutBrian">About
Brian Long</A> </LI></UL>
<P><A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">Click
here</A> to download the files associated with this article.</P>
<HR>
<H2><A name=Introduction>Introduction</A></H2>
<P>This article looks at adding support for speech capabilities to Microsoft
Windows applications written in Delphi, using the Microsoft Speech API version
5.1 (SAPI 5.1). For an overview on the subject of speech technology please <A
href="http://www.blong.com/Conferences/DCon2002/Speech/Speech.htm">click
here</A>.</P>
<P>There is also coverage on using SAPI 4 to build speech-enabled applications.
Information on using the SAPI 4 high level interfaces can be found by <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI4HighLevel/SAPI4.htm">clicking
here</A>, whilst discussion of the low level interfaces can be found by <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI4LowLevel/SAPI4.htm">clicking
here</A>.</P>
<P>SAPI 5.1 exposes most of the important interfaces, types and constants
through a registered type library (SAPI 5.0 did not do this, making it difficult
to use in Delphi without someone writing the equivalent of the JEDI import unit
for SAPI 5). This means that you can access SAPI 5.1 functionality through late
bound or early bound Automation. We will focus our attention on early bound
Automation, which requires you to import the type library.</P>
<P>Choose <FONT face="Courier New, Courier, mono">Project | Import Type
Library...</FONT> and locate the type library described as <I>Microsoft Speech
Object Library (Version 5.1)</I> in the list. Now ensure the <FONT
face="Courier New, Courier, mono">Generate Component Wrapper</FONT> checkbox is
checked so the type library import unit will include component wrapper classes
for each exposed Automation object. These components will go on the
<I>ActiveX</I> page of the Component Palette by default, but you may wish to
specify a more appropriate page, such as <I>SAPI 5.1</I>.</P>
<P>Now press <FONT face="Courier New, Courier, mono">Install...</FONT> so the
type library will be imported and the generated components will be installed
onto the Component Palette (pressing <FONT
face="Courier New, Courier, mono">Create Unit</FONT> would also generate the
type library import unit, but would require us to install it manually).</P>
<P>The generated import unit is called SpeechLib_TLB.pas and will be installed
in a package. You can either select the default package offered (the <I>Borland
User Components</I> package by default), choose to open a different package or
even create a new one. When the package is compiled and installed you will get a
whopping set of 19 new components on the <I>SAPI 5.1</I> page of the Component
Palette.</P>
<P>Each component is named after the primary interface it implements. So for
example, the <FONT face="Courier New, Courier, mono">TSpVoice</FONT> component
implements the <FONT face="Courier New, Courier, mono">SpVoice</FONT> interface.
You can find abundant documentation on all these interfaces in the SAPI 5.1 SDK
documentation.</P>
<P>Ready made SAPI 5.1 packages containing Automation components for Delphi 5, 6
and 7 can be found in appropriately named subdirectories under SAPI 5.1 in the
accompanying files.</P>
<P><B><U>Note:</U></B> if you are using Delphi 6 you will encounter a problem
that is still present even with Update Pack 2 installed. The type library
importer has a bug where the parameters to Automation events are incorrectly
dispatched (they are sent in reverse order) meaning that all the Automation
events operate incorrectly (if at all). You can avoid this by importing the type
library in Delphi 5 or 7 and using the generated type library import unit in
Delphi 6. A Delphi 6 compatible package is supplied with <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">this
article's files</A> (it uses a Delphi 5 generated type library import unit).</P>
<P><B><U>Note:</U></B> The Delphi 7 type library importer has been improved to
produce more accurate Pascal representations of items in the type library than
Delphi 5 did (and than Delphi 6 tried to). As a result of this, the event
handlers will often have different parameter lists in the Delphi 7 imported type
library. This means that the sample programs won't compile with Delphi 7 with
the true Delphi 7 SAPI type library import unit.</P>
<P>If you wish, you can write late bound Automation that calls <FONT
face="Courier New, Courier, mono">CreateOleObject</FONT> to instantiate the
Automation objects. In the case of the <FONT
face="Courier New, Courier, mono">SpVoice</FONT> interface, you would
execute:</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>var</B>
SpVoice: Variant;
...
SpVoice := CreateOleObject(<I>'SAPI.SpVoice'</I>)
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<H2><A name=TTS>Speech Synthesis</A></H2>
<P>At its simplest level, all you need to do to get your program to speak is to
use a <FONT face="Courier New, Courier, mono">TSpVoice</FONT> Automation object
and call the <FONT face="Courier New, Courier, mono">Speak</FONT> method. A
trivial application that does this can be found in the TextToSpeechSimple.dpr
project in <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">the
files associated with this article</A>. The code looks like this:</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.Button1Click(Sender: TObject);
<B>begin</B>
SpVoice1.Speak(memText.Text, SVSFDefault)
<B>end</B>;
</FONT></CODE></PRE></TD></TR></TBODY></TABLE>
<P>And there you have it: a speaking application. The call to Speak takes a
number of parameters that we should examine:</P>
<UL>
<LI>The first is the text to speak, passed as a <FONT
face="Courier New, Courier, mono">PChar</FONT>. Because of the second
parameter, this call will be synchronous and so will not return until the text
has been spoken.
<LI>The second parameter represents some flags that indicate how to use the
first parameter (you can combine multiple flags with the <FONT
face="Courier New, Courier, mono">or</FONT> operator). For example:<BR>
<UL>
<LI><FONT face="Courier New, Courier, mono">SVSFDefault</FONT> means the
<FONT face="Courier New, Courier, mono">Speak</FONT> method will be
synchronous
<LI><FONT face="Courier New, Courier, mono">SVSFlagAsync</FONT> makes the
<FONT face="Courier New, Courier, mono">Speak</FONT> method asynchronous and
so it returns immediately (you can use events to find out when speech
terminates, or call the <FONT
face="Courier New, Courier, mono">WaitUntilDone</FONT> method, or call <FONT
face="Courier New, Courier, mono">SpeakCompleteEvent</FONT> to receive a
Win32 event handle, which can be passed to <FONT
face="Courier New, Courier, mono">WaitForSingleObject</FONT>).<BR>Note that
the <FONT face="Courier New, Courier, mono">Speak</FONT> method returns a
stream number. When queuing several asynchronous voice streams, the stream
number allows you to identify them; each voice event passes the stream
number to which it relates as a parameter.
<LI><FONT face="Courier New, Courier, mono">SVSFPurgeBeforeSpeak</FONT>
means any text being spoken and any text queued to speak will be immediately
cancelled.
<LI><FONT face="Courier New, Courier, mono">SVSFNLPSpeakPunc</FONT> means
punctuation marks are read out by their names, rather than being used as
punctuation (so ? is read out as <I>question mark</I>)
<LI><FONT face="Courier New, Courier, mono">SVSFIsFilename</FONT> means the
first parameter is a file name containing text to speak.
<LI>SVSFIsXML means the text includes XML tags to alter attributes of the
spoken text. For example this text controls the pitch, rate, volume,
emphasis and pronunciation of the spoken text:<BR>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<EMPH>Hello</EMPH>
<PRON SYM="d eh l f y">Delphi</PRON> developers!
<VOLUME LEVEL="70">
I can speak <PITCH MIDDLE="+10">high</PITCH> and <PITCH MIDDLE="-10">low</PITCH>.
I can speak <RATE SPEED="+10">very quickly</RATE> and <RATE SPEED="-10">very slowly</RATE>.
I can speak <VOLUME LEVEL="40">quietly</VOLUME> and <VOLUME LEVEL="100">loudly</VOLUME>.
</VOLUME>
</FONT></CODE></PRE></TD></TR></TBODY></TABLE></LI></UL></LI></UL>
<P>When the program executes it lets you type in some text in a memo and a
button renders it into the spoken word.</P>
<P align=center><IMG
src="Speech Synthesis & Speech Recognition Using SAPI 5_1.files/TextToSpeechSimple.png"></P>
<P>That's the simple example out of the way, but what can we achieve if we dig a
little deeper and get our hands a little dirtier? The next project, which holds
the answers to these questions, can be found as TextToSpeech.dpr in <A
href="http://www.blong.com/Conferences/DCon2002/Speech/SAPI51/SAPI51.zip">this
article's files</A>. You can see it running in the screenshot below; notice that
as the text is spoken, the current sentence is italicised and the current word
is displayed selected and also the phonemes spoken are written to a memo.</P>
<P align=center><IMG
src="Speech Synthesis & Speech Recognition Using SAPI 5_1.files/TextToSpeech.png"></P>
<P>The following sections describe the important parts of the code from this
project.</P>
<H3><A name=EnumVoices>Enumerating Voices</A></H3>
<P>The first thing the program does is to add a list of all the available voices
to the combobox and set the rate and volume track bar positions. The latter part
of this is trivial as the voice rate and volume are always within predetermined
ranges (the volume is in the range 0 to 100 and the rate is in the range -10 to
10).</P>
<TABLE bgColor=white border=1>
<TBODY>
<TR>
<TD><PRE><CODE><FONT color=black size=2>
<B>procedure</B> TfrmTextToSpeech.FormCreate(Sender: TObject);
<B>var</B>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -