?? exampsys.tex
字號(hào):
\begin{verbatim} HSLab noname\end{verbatim}This will cause a window to appear with a waveform display area in the upperhalf and a row of buttons, including a record button in the lower half. Whenthe name of a normal file is given as argument, \htool{HSLab} displays itscontents. Here, the special file name \texttt{noname} indicates that new datais to be recorded. \htool{HSLab} makes no special provision for prompting theuser. However, each time the record button is pressed, it writes thesubsequent recording alternately to a file called \verb|noname_0.| and to afile called \verb|noname_1.|. Thus, it is simple to write a shell scriptwhich for each successive line of a prompt file, outputs the prompt, waits foreither \verb|noname_0.| or \verb|noname_1.| to appear, and then renamesthe file to the name prepending the prompt (see Fig.~\href{f:step3}).\index{extensions!wav@\texttt{wav}}While the prompts for training sentences already were provided for above, theprompts for test sentences need to be generated before recording them. The tool\index{prompt script!generationof}\index{hsgen@\htool{HSGen}}\htool{HSGen} can be used to do this by randomly traversing a word network and outputting each word encountered. For example, typing\begin{verbatim} HSGen -l -n 200 wdnet dict > testprompts\end{verbatim}would generate 200 numbered test utterances, the first few of which would look something like:\begin{verbatim} 1. PHONE YOUNG 2. DIAL OH SIX SEVEN SEVEN OH ZERO 3. DIAL SEVEN NINE OH OH EIGHT SEVEN NINE NINE 4. DIAL SIX NINE SIX TWO NINE FOUR ZERO NINE EIGHT 5. CALL JULIAN ODELL ... etc\end{verbatim}These can be piped to construct the prompt file \texttt{testprompts} forthe required test data.\subsection{Step 4 - Creating the Transcription Files}\sidefig{step3}{50}{Step 3}{-4}{}To train a set of HMMs, every file of training data must have an associatedphone level transcription. Since there is no hand labelled data to bootstrap aset of models, a flat-start scheme will be used instead. To do this, two setsof phone transcriptions will be needed. The set used initially will have noshort-pause (\texttt{sp}) models between words. Then once reasonable phonemodels have been generated, an \texttt{sp} model will be inserted between wordsto take care of any pauses introduced by the speaker.\index{flat start}The starting point for both sets of phone transcription is anorthographic\index{transcription!orthographic} transcription in \HTK\ labelformat. This can be created fairly easily using a text editor or a scriptinglanguage.An example of this is found in the RM Demo at point 0.4. Alternatively, thescript \texttt{prompts2mlf} has been provided in the \texttt{HTKTutorial}directory.The effect should be to convert the prompt utterances exampled above into thefollowing form:\begin{verbatim} #!MLF!# "*/S0001.lab" ONE VALIDATED ACTS OF SCHOOL DISTRICTS . "*/S0002.lab" TWO OTHER CASES ALSO WERE UNDER ADVISEMENT . "*/S0003.lab" BOTH FIGURES (etc.)\end{verbatim}As can be seen, the prompt labels need to be converted into path names, eachword should be written on a single line and each utterance should be terminatedby a single period on its own. The first line of the file just identifies thefile as a \textit{Master Label File} (MLF). This is a single file containing acomplete set of transcriptions. \HTK\ allows each individual transcription tobe stored in its own file but it is more efficient to use an MLF.\index{master label files}\index{MLF}The form of the path name used in the MLF deserves some explanation since it isreally a \textit{pattern} and not a name.\index{master label files!patterns}When \HTK\ processes speech files, it expects to find a transcription (or {\it label file}) with the same name but a different extension. Thus, if the file\texttt{/root/sjy/data/S0001.wav} was being processed, \HTK\ would look for alabel file called \texttt{/root/sjy/data/S0001.lab}. When MLF files are used,\HTK\ scans the file for a pattern which matches the required label file name.However, an asterix will match any character string and hence the pattern usedin the example is in effect path independent. It therefore allows the sametranscriptions to be used with different versions of the speech data to bestored in different locations.Once the word level MLF has been created, phone level MLFs can be generatedusing the label editor \htool{HLEd}\index{hled@\htool{HLEd}}. For example,\texttt{words.mlf}, the command\begin{verbatim}\end{verbatim}will generate a phone level transcription of the following formwhere the \texttt{-l} option is needed to generate the path '\verb+*+' in the output patterns.\begin{verbatim} #!MLF!# "*/S0001.lab" sil w ah n v ae l ih d .. etc\end{verbatim}This process is illustrated in Fig.~\href{f:step4}.The \htool{HLEd} edit script \texttt{mkphones0.led} contains the following commands\begin{verbatim} EX IS sil sil DE sp\end{verbatim}The expand \texttt{EX} command replaces each word in \texttt{words.mlf} by the corresponding pronunciation in the dictionary file \texttt{dict}. The \texttt{IS}command inserts a silence model \texttt{sil} at the start and end ofevery utterance. Finally, the delete \texttt{DE} command deletes allshort-pause \texttt{sp} labels, which are not wanted in the transcriptionlabels at this point. \centrefig{step4}{60}{Step 4}\subsection{Step 5 - Coding the Data}The final stage of data preparation is to parameterise the raw speechwaveforms into sequences of feature vectors. \HTK\ support both FFT-based\index{analysis!FFT-based}and LPC-based\index{analysis!LPC-based} analysis. Here Mel Frequency Cepstral Coefficients (MFCCs)\index{MFCC coefficients},which are derived from FFT-based log spectra, will be used.Coding can be performed using the tool \htool{HCopy}\index{hcopy@\htool{HCopy}} configured to\index{coding}automatically convert its input into MFCC vectors. To do this, a configurationfile (\texttt{config}) is needed which specifies all of the conversion parameters\index{parameterisation}. Reasonable settings for these are as follows\begin{verbatim} # Coding parameters TARGETKIND = MFCC_0 TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F\end{verbatim}Some of these settings are in fact the default setting, but theyare given explicitly here for completeness. In brief, they specifythat the target parameters are to be MFCC using $C_0$ as the energycomponent, the frame period is 10msec (\HTK\ uses units of 100ns),the output should be saved in compressed format, and a crc checksum shouldbe added. The FFT should use a Hamming window and the signal shouldhave first order preemphasis applied using a coefficient of 0.97.The filterbank should have 26 channels and 12 MFCC coefficients shouldbe output. The variable \texttt{ENORMALISE} is by default true and performs energynormalisation on recorded audio files. It cannot be used with live audio andsince the target system is for live audio, this variable should be set tofalse.Note that explicitly creating coded data files is not necessary, as coding canbe done "on-the-fly" from the original waveform files by specifying theappropriate configuration file (as above) with the relevant HTK tools. However,creating these files reduces the amount of preprocessing required duringtraining, which itself can be a time-consuming process.To run \htool{HCopy}, a list ofeach source file and its corresponding output file is needed. For example,the first few lines might look like\index{extensions!mfc@\texttt{mfc}}\begin{verbatim} /root/sjy/waves/S0001.wav /root/sjy/train/S0001.mfc /root/sjy/waves/S0002.wav /root/sjy/train/S0002.mfc /root/sjy/waves/S0003.wav /root/sjy/train/S0003.mfc /root/sjy/waves/S0004.wav /root/sjy/train/S0004.mfc (etc.)\end{verbatim}Files containing lists of files are referred to as script files\footnote{Not to be confused with files containing \textit{edit} scripts}and\index{extensions!scp@\texttt{scp}}by convention are given the extension \texttt{scp} (although \HTK\ does not demand this). Script files are specified using the standard\texttt{-S} option and their contents are read simply as extensionsto the command line. Thus, they avoid the need for command lines withseveral thousand arguments\footnote{Most UNIX shells, especially the C shell, only allow a limited andquite small number of arguments.}.\index{command line!arguments}\index{command line!script files}\centrefig{step5}{100}{Step 5}\noindentAssuming that the above script is stored in the file \texttt{codetr.scp},the training data would be coded by executing\begin{verbatim} HCopy -T 1 -C config -S codetr.scp\end{verbatim}This is illustrated in Fig.~\href{f:step5}. A similar procedure isused to code the test data (using \verb|TARGETKIND = MFCC_0_D_A| inconfig) after which all of the pieces are in place to start trainingthe HMMs. \mysect{Creating Monophone HMMs}{egcreatmono}In this section, the creation of a well-trained set of single-Gaussianmonophone HMMs will be described. The starting point will bea set of identical monophone HMMs in which every mean and variance isidentical. These are then retrained, short-pause models areadded and the silence model is extended slightly. The monophonesare then retrained.Some of the dictionary entries have multiple pronunciations. However,when \htool{HLEd} was used to expand the word level MLF to create thephone level MLFs, it arbitrarily selected the first pronunciation it found.Once reasonable monophone HMMs have been created, the recogniser tool\htool{HVite} can be used to perform a \textit{forced alignment} of\index{forced alignment}the training data. By this means, a new phone level MLF is created in whichthe choice of pronunciations depends on the acoustic evidence. This newMLF can be used to perform a final re-estimation of the monophone HMMs.\index{monophone HMM!construction of}\subsection{Step 6 - Creating Flat Start Monophones}The first step in HMM training is to define a prototype model. Theparameters of this model are not important, its purpose is todefine the model topology. For phone-based systems, a goodtopology to use is 3-state left-right with no skips such as the following\begin{verbatim} ~o <VecSize> 39 <MFCC_0_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <Mean> 39 0.0 0.0 0.0 ... <Variance> 39 1.0 1.0 1.0 ... <State> 3 <Mean> 39 0.0 0.0 0.0 ... <Variance> 39 1.0 1.0 1.0 ... <State> 4 <Mean> 39 0.0 0.0 0.0 ... <Variance> 39
?? 快捷鍵說(shuō)明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -