?? codeproject audio_ostream - a text-to-speech ostream_ free source code and programming help.htm
字號:
<LI>A simple example of how to use <A
href="http://www.boost.org/libs/iostreams/doc/index.html">boost::iostreams</A>
</LI></UL>
<H2>Background</H2>
<P>I recently had to add audio outputs to a program (running on
Windows).</P>
<P><A
href="http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&DisplayLang=en">Microsoft's
SAPI SDK</A> provides a COM interface through which wide character strings
can be spoken via SAPI's TTS engine. The Code Project has many articles
explaining how to use SAPI to varying degrees of complexity. So why
another?</P>
<P>Well, there were some additional features that I wanted that did not
exist in those articles.</P>
<OL>
<LI>As little or no COM hassle. Ideally, it should work within the
simplest Console application.
<LI>Full (transparent) support for types other than wide-char. e.g.
<CODE><SPAN class=code-keyword>char</SPAN>*</CODE>, <CODE>std::<SPAN
class=code-SDKkeyword>string</SPAN></CODE>s and even <CODE><SPAN
class=code-keyword>int</SPAN></CODE>s, <CODE><SPAN
class=code-keyword>float</SPAN></CODE>s, etc.
<LI>Intuitive (or at least familiar) syntax </LI></OL>
<P>To achieve these goals I developed <CODE>audio_ostream</CODE>.</P>
<P><CODE>audio_ostream</CODE> is a full-fledged <CODE>std::ostream</CODE>
which supports any type that has an <CODE><SPAN
class=code-keyword>operator</SPAN><SPAN
class=code-keyword><</SPAN><SPAN
class=code-keyword><</SPAN>()</CODE>.</P>
<P>You can have as many <CODE>audio_ostream</CODE>s as you like all
working in parallel.</P>
<P>To handle COM issues, I used the wonderful COMSTL library which takes
care of all the delicate and brittle COMplications, such as
(un-)initialization, resource (de-)allocation, reference counting etc.</P>
<P><CODE>boost::iostreams</CODE> is used to provide the full
<CODE>std::ostream</CODE> support with very little effort writing
boilerplate code.</P>
<P>Since both <CODE>boost::iostreams</CODE> and COMSTL are header only
libraries I decided to make my class header only too. The minor price of
this decision is that the SAPI headers will be included into any file that
uses <CODE>audio_ostream</CODE>.</P>
<H2>Using the code</H2>
<P>Using the code cannot be easier:</P><PRE lang=c++><SPAN class=code-preprocessor>#include</SPAN><SPAN class=code-preprocessor> <SPAN class=code-string>"</SPAN><SPAN class=code-string>audiostream.hpp"</SPAN>
</SPAN><SPAN class=code-keyword>using</SPAN> <SPAN class=code-keyword>namespace</SPAN> std;
<SPAN class=code-keyword>using</SPAN> <SPAN class=code-keyword>namespace</SPAN> audiostream;
<SPAN class=code-keyword>int</SPAN> main()
{
audio_ostream aout;
aout <SPAN class=code-keyword><</SPAN><SPAN class=code-keyword><</SPAN> <SPAN class=code-string>"</SPAN><SPAN class=code-string>Hello World!"</SPAN> <SPAN class=code-keyword><</SPAN><SPAN class=code-keyword><</SPAN> endl;
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> some more code...
</SPAN> <SPAN class=code-keyword>return</SPAN> <SPAN class=code-digit>0</SPAN>;
}</PRE>
<P>This little program will, obviously, say "Hello World!".</P>
<P>The audio stream is asynchronous so the program will continue running
even while the text is being said (that's why the line <CODE><SPAN
class=code-comment>//</SPAN><SPAN class=code-comment> some more
code...</SPAN></CODE> is there, to allow it to finish speaking). This is
conceptually similar to how <CODE>std::ostream</CODE>s buffer results
until the internal buffer is full and only then the text is displayed.</P>
<P>To use the class:</P>
<OL>
<LI><CODE><SPAN class=code-preprocessor>#include</SPAN><SPAN
class=code-preprocessor></SPAN></CODE> the <CODE>audiostream.hpp</CODE>
header file.
<LI>Create an instance of <CODE>audio_ostream</CODE> (or
<CODE>waudio_ostream</CODE>)
<LI>Use the stream as you would any <CODE>std::ostream</CODE>. </LI></OL>
<P>That's really all you need to do to start using the class.</P>
<H2>Pre-Requisites</H2>
<P>For the code to compile and run you will need 3 libraries:</P>
<OL>
<LI>For the TTS engine, you will need to install the <A
href="http://www.microsoft.com/downloads/details.aspx?FamilyID=5e86ec97-40a7-453f-b0ee-6583171b4530&DisplayLang=en">Microsoft
Speech SDK</A> (I used ver. 5.1).
<LI>For COMSTL you will need the <A
href="http://synesis.com.au/software/stlsoft/">STLSoft libraries</A>
(you'll need STLSoft version 1.9.1 beta 44, or later).
<LI>The <A href="http://boost.org/">Boost</A> Iostreams library. You can
download Boost <A
href="http://sourceforge.net/project/showfiles.php?group_id=7586">here</A>.
</LI></OL>
<P>Set your compiler and linker paths accordingly (Boost and STLSOft are
header only).</P>
<H2>Advanced Usage</H2>
<P>It's possible to change the voice gender, speed, language and many more
parameters of the voice using the SAPI text-to-speech (TTS) XML tags.</P>
<P>Just insert the relevant XML tags into the stream to affect change. The
complete list of possible XML tags can be found <A
href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/Whitepapers/WP_XML_TTS_Tutorial.asp">here</A>.</P>
<P>For example:</P><PRE lang=xml>audio_ostream aout;
// Select a male voice.
aout <SPAN class=code-keyword><</SPAN><SPAN class=code-leadattribute><</SPAN> <SPAN class=code-attribute>"<voice</SPAN> <SPAN class=code-attribute>required</SPAN><SPAN class=code-keyword>='</SPAN><SPAN class=code-keyword>Gender=Male'</SPAN><SPAN class=code-keyword>></SPAN>Hello World!" <SPAN class=code-keyword><</SPAN><SPAN class=code-leadattribute><</SPAN> <SPAN class=code-attribute>endl;</SPAN>
<SPAN class=code-attribute>aout</SPAN> <SPAN class=code-attribute><<</SPAN> <SPAN class=code-attribute>"Five</SPAN> <SPAN class=code-attribute>hundred</SPAN> <SPAN class=code-attribute>milliseconds</SPAN> <SPAN class=code-attribute>of</SPAN> <SPAN class=code-attribute>silence"</SPAN> <SPAN class=code-attribute><<</SPAN> <SPAN class=code-attribute>flush</SPAN> <SPAN class=code-attribute><<</SPAN>
<SPAN class=code-attribute>"<silence</SPAN> <SPAN class=code-attribute>msec</SPAN><SPAN class=code-keyword>='</SPAN><SPAN class=code-keyword>500'</SPAN><SPAN class=code-keyword>/</SPAN><SPAN class=code-keyword>></SPAN> just occurred." <SPAN class=code-keyword><</SPAN><SPAN class=code-leadattribute><</SPAN> <SPAN class=code-attribute>endl;</SPAN>
</PRE>
<P>For some reason, the XML tags must be the first items in the SAPI
spoken string, without any preceding text. <CODE>flush</CODE>ing the
stream before the tag, as in the example, facilitates this.</P>
<P>You can also call <CODE>SetRate()</CODE> with values [-10,10] to
control the speed of the speech.</P>
<H2>The Magic</H2>
<H3>The Core Class</H3>
<P>The heart of the code is the <CODE>audio_sink</CODE> class:</P><PRE lang=c++><SPAN class=code-keyword>template</SPAN> <SPAN class=code-keyword><</SPAN> <SPAN class=code-keyword>class</SPAN> SinkType <SPAN class=code-keyword>></SPAN>
<SPAN class=code-keyword>class</SPAN> audio_sink: <SPAN class=code-keyword>public</SPAN> SinkType
{
<SPAN class=code-keyword>public</SPAN>:
audio_sink()
{
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Initialize the COM libraries
</SPAN> <SPAN class=code-keyword>static</SPAN> comstl::com_initializer coinit;
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Get SAPI Speech COM object
</SPAN> HRESULT hr;
<SPAN class=code-keyword>if</SPAN>(FAILED(hr = comstl::co_create_instance(CLSID_SpVoice, _pVoice)))
<SPAN class=code-keyword>throw</SPAN> comstl::com_exception(
<SPAN class=code-string>"</SPAN><SPAN class=code-string>Failed to create SpVoice COM instance"</SPAN>,hr);
}
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> speak a character string
</SPAN> std::streamsize write(<SPAN class=code-keyword>const</SPAN> <SPAN class=code-keyword>char</SPAN>* s, std::streamsize n)
{
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> make a null terminated string.
</SPAN> std::<SPAN class=code-SDKkeyword>string</SPAN> str(s,n);
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> convert to wide character and call the actual speak method.
</SPAN> <SPAN class=code-keyword>return</SPAN> write(winstl::a2w(str), str.size());
}
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> speak a wide character string
</SPAN> std::streamsize write(<SPAN class=code-keyword>const</SPAN> <SPAN class=code-keyword>wchar_t</SPAN>* s, std::streamsize n)
{
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> make a null terminated wstring.
</SPAN> std::wstring str(s,n);
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> The actual COM call to Speak.
</SPAN> _pVoice-<SPAN class=code-keyword>></SPAN>Speak(str.c_str(), SPF_ASYNC, <SPAN class=code-digit>0</SPAN>);
<SPAN class=code-keyword>return</SPAN> n;
}
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Set the speech speed.
</SPAN> <SPAN class=code-keyword>void</SPAN> setRate(<SPAN class=code-keyword>long</SPAN> n) { _pVoice-<SPAN class=code-keyword>></SPAN>SetRate(n); }
<SPAN class=code-keyword>private</SPAN>:
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> COM object smart pointer.
</SPAN> stlsoft::ref_ptr<SPAN class=code-keyword><</SPAN> ISpVoice <SPAN class=code-keyword>></SPAN> _pVoice;
};</PRE>
<P>There's a lot going on in this little class. Let's tease apart the
pieces one-by-one.</P>
<H3>COMSTL, stlsoft::ref_ptr<> and ISpVoice</H3>
<P>The only member of the class is <CODE>stlsoft::ref_ptr<SPAN
class=code-keyword><</SPAN> ISpVoice <SPAN
class=code-keyword>></SPAN> _pVoice</CODE>.</P>
<P>This is the smart pointer that will handle all the COM stuff for us.
The STLSoft class <A
hfer="http://www.synesis.com.au/software/stlsoft/doc-1.9/ classstlsoft_1_1ref__ptr.html">stlsoft::ref_ptr<></A>
provides RAII-safe handling of reference-counted interfaces (RCIs).
Specifically, it is ideal for handling COM objects.</P>
<P>We are using it with the <CODE>ISpVoice</CODE> interface. From
Microsoft's <A
href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/SAPI51sr/html/ISpVoice.asp">site</A>:</P><EM>The
<CODE>ISpVoice</CODE> interface enables an application to perform text
synthesis operations. Applications can speak text strings and text files,
or play audio files through this interface. All of these can be done
synchronously or asynchronously.</EM>
<P>In the constructor, we first initialize COM usage via the
<CODE>comstl::com_initializer</CODE>. This only happens once (since it is
a static object), and should not trouble us anymore. To initialize
<CODE>_pVoice</CODE> we call <CODE>comstl::co_create_instance()</CODE>
with the <CODE>CLSID_SpVoice</CODE> ID. If all goes well, we are now
holding an <CODE>ISpVoice</CODE> object handle. All reference counting
issues will be handled by <CODE>stlsoft::ref_ptr<SPAN
class=code-keyword><</SPAN><SPAN class=code-keyword>></SPAN></CODE>.
If the call fails an <CODE>comstl::com_exception</CODE> exception is
thrown and the class instance will not be created.</P>
<P>To speak some text we just need to call <CODE>_pVoice-<SPAN
class=code-keyword>></SPAN>Speak()</CODE> with a wide character
string.</P>
<P>To "speak text" we just need to call <CODE>_pVoice-<SPAN
class=code-keyword>></SPAN>Speak()</CODE> with a wide character
string.</P>
<P>However, we would like to support other character types like
<CODE><SPAN class=code-keyword>char</SPAN>*</CODE>, <CODE>std::<SPAN
class=code-SDKkeyword>string</SPAN></CODE> and more. In fact, we want to
support any type that can be converted to a string or wide-string via an
<CODE><SPAN class=code-keyword>operator</SPAN><SPAN
class=code-keyword><</SPAN><SPAN
class=code-keyword><</SPAN>()</CODE>.</P>
<H3>Boost Iostreams </H3>
<P><A
href="http://www.boost.org/libs/iostreams/doc/index.html">boost::iostreams</A>
makes it easy to create standard C++ streams and stream buffers for
accessing new Sources and Sinks. To rephrase from the <A
href="http://www.boost.org/libs/iostreams/doc/index.html">site</A>:</P>
<P><EM>A Sink provides write-access to a sequence of characters of a given
type. A Sink may expose this sequence by defining a member function
<CODE>write</CODE>, invoked indirectly by the Iostreams library through
the function <CODE>boost::iostreams::write</CODE>.</EM></P>
<P>There are 2 pre-defined sinks, <CODE>boost::iostreams::sink</CODE> and
<CODE>boost::iostreams::wsink</CODE> for writing narrow and wide string
respectively.</P>
<P>To make our class a Sink and get all its functionality, all we have to
do is to derive our class from either of these classes (depending if we
want narrow and wide character output). Thus, <CODE>audio_sink</CODE> is a
template class that derives from its template parameter.</P>
<P>To use our sink and create a concrete <CODE>ostream</CODE>, we need to
use the <CODE>boost::iostreams::stream</CODE> class.</P>
<P>The supporting class is <CODE>audio_ostream_t</CODE>: </P><PRE lang=c++><SPAN class=code-keyword>template</SPAN> <SPAN class=code-keyword><</SPAN> <SPAN class=code-keyword>class</SPAN> SinkType <SPAN class=code-keyword>></SPAN>
<SPAN class=code-keyword>class</SPAN> audio_ostream_t: <SPAN class=code-keyword>public</SPAN> boost::iostreams::stream<SPAN class=code-keyword><</SPAN> SinkType <SPAN class=code-keyword>></SPAN>,
<SPAN class=code-keyword>public</SPAN> SinkType
{
<SPAN class=code-keyword>public</SPAN>:
audio_ostream_t()
{
<SPAN class=code-comment>//</SPAN><SPAN class=code-comment> Connect to Sink
</SPAN> open(*<SPAN class=code-keyword>this</SPAN>);
}
};
<SPAN class=code-keyword>typedef</SPAN> audio_ostream_t<SPAN class=code-keyword><</SPAN> audio_sink<SPAN class=code-keyword><</SPAN> boost::iostreams::sink <SPAN class=code-keyword>></SPAN> <SPAN class=cod
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -