?? exampsys.tex
字號:
\end{verbatim} The file \texttt{mktri.hed} can be generated using the {\em Perl} script\texttt{maketrihed} included in the \texttt{HTKTutorial} directory.When running the \htool{HHEd}\index{hled@\htool{HHEd}} command youwill get warnings about trying to tie transition matrices for the siland sp models. Since neither model is context-dependent there aren'tactually any matrices to tie.The clone command \texttt{CL}\index{cl@\texttt{CL} command} takes as itsargument the name of the file containing the list of triphones (andbiphones)\index{cloning}\index{parameter tying}\index{item lists} generatedabove. For each model of the form \texttt{a-b+c} in this list, it looks forthe monophone \texttt and makes a copy of it.\index{tying!transitionmatrices} Each \texttt{TI} command takes as its argument the name of a macroand a list of HMM components. The latter uses a notation which attempts tomimic the hierarchical structure of the HMM parameter set in which thetransition matrix \texttt{transP} can be regarded as a sub-component of eachHMM. The list of items within brackets are patterns designed to match the setof triphones, right biphones and left biphones for each phone.\centrefig{egtranstie}{80}{Tying Transition Matrices}Up to now macros and tying have only been mentioned in passing. Although afull explanation must wait until chapter~\ref{c:HMMDefs}, a brief explanationis warranted here. Tying means that one or more HMMs share the same set ofparameters. On the left side of Fig.~\href{f:egtranstie}, two HMM definitionsare shown. Each HMM has its own individual transition matrix. On the rightside, the effect of the first \texttt{TI} command in the edit script\texttt{mktri.hed} is shown. The individual transition matrices have beenreplaced by a reference to a \textit{macro} called \texttt{T\_ah} whichcontains a matrix shared by both models. When reestimating tied parameters,the data which would have been used for each of the original untied parametersis pooled so that a much more reliable estimate can be obtained.Of course, tying could affect performance if performed indiscriminately.Hence, it is important to only tie parameters which have little effect ondiscrimination. This is the case here where the transition parameters do notvary significantly with acoustic context but nevertheless need to be estimatedaccurately. Some triphones will occur only once or twice and so very poorestimates would be obtained if tying was not done. These problems of datainsufficiency will affect the output distributions too, but this will be dealtwith in the next step.Hitherto, all HMMs have been stored in text format and could be inspected likeany text file. Now however, the model files will be getting larger and spaceand load/store times become an issue. For increased efficiency,\HTK\ can store and load MMFs in binary\index{HMM!binary storage}format. Setting the standard \texttt{-B} option causes this to happen.\sidefig{step9}{55}{Step 9}{-4}{Once the context-dependent models have been cloned, the new triphone set can bere-estimated using \htool{HERest}. This is done as previously except that themonophone model list is replaced by a triphone list and the triphonetranscriptions are used in place of the monophone transcriptions. For the final pass of \htool{HERest}, the \texttt{-s} option should be used togenerate a file of state occupation statistics called \texttt{stats}. Incombination with the means and variances, these enable likelihoods to becalculated for clusters of states and are needed during the state-clusteringprocess \index{statistics!state occupation} described below.Fig.~\href{f:step9} illustrates this step of the HMM constructionprocedure. Re-estimation should be again done twice, so that the resultantmodel sets will ultimately be saved in \texttt{hmm12}. }\begin{verbatim} HERest -B -C config -I wintri.mlf -t 250.0 150.0 1000.0 -s stats \ -S train.scp -H hmm11/macros -H hmm11/hmmdefs -M hmm12 triphones1\end{verbatim}\subsection{Step 10 - Making Tied-State Triphones}The outcome of the previous stage is a set of triphone HMMs with all triphonesin a phone set sharing the same transition matrix. When estimating thesemodels, many of the variances in the output distributionswill have been floored since there will be\index{variance!flooring problems}\index{state tying}\index{tying!states}\index{data insufficiency}insufficient data associated with many of the states. The last step inthe model building process is to tie states within triphone setsin order to share data and thus be able to make robust parameter estimates.In the previous step, the \texttt{TI} command was used toexplicitly tie all members of a set of transition matrices together. However,the choice of which states to tie requires a bit more subtlety sincethe performance of the recogniser depends crucially on how accuratethe state output distributions capture the statistics of the speech data.\htool{HHEd} provides two mechanisms which allow states to be clustered and\index{state clustering}then each cluster tied. The first is data-driven and uses a similaritymeasure between states. The second uses decision trees\index{decision trees}and is based on asking questions about the left and right contexts of eachtriphone. The decision tree attempts to find those contexts which make the largestdifference to the acoustics and which should therefore distinguish clusters.Decision tree state tying is performed by running \htool{HHEd} in the normal way, i.e.\begin{verbatim} HHEd -B -H hmm12/macros -H hmm12/hmmdefs -M hmm13 \ tree.hed triphones1 > log\end{verbatim}Notice that the output is saved in a log file. This is important sincesome tuning of thresholds is usually needed.The edit script \texttt{tree.hed}, which contains the instructions regardingwhich contexts to examine for possible clustering, can be rather long andcomplex. A script for automatically generating this file, \texttt{mkclscript},is found in the RM Demo. A version of the \texttt{tree.hed} script, which canbe used with this tutorial, is included in the \texttt{HTKTutorial} directory.Note that this script is only capable of creating the TB commands (decision tree clustering of states). The questions (QS) still need defining bythe user. There is, however, an example list of questions which may be suitable to some tasks (or at least useful as an example) supplied with the RM demo (lib/quests.hed). The entire script appropriate for clustering English phone models is too long to show here in the text, however, its main components are given by the following fragments:\begin{verbatim} RO 100.0 stats TR 0 QS "L_Class-Stop" {p-*,b-*,t-*,d-*,k-*,g-*} QS "R_Class-Stop" {*+p,*+b,*+t,*+d,*+k,*+g} QS "L_Nasal" {m-*,n-*,ng-*} QS "R_Nasal" {*+m,*+n,*+ng} QS "L_Glide" {y-*,w-*} QS "R_Glide" {*+y,*+w} .... QS "L_w" {w-*} QS "R_w" {*+w} QS "L_y" {y-*} QS "R_y" {*+y} QS "L_z" {z-*} QS "R_z" {*+z} TR 2 TB 350.0 "aa_s2" {(aa, *-aa, *-aa+*, aa+*).state[2]} TB 350.0 "ae_s2" {(ae, *-ae, *-ae+*, ae+*).state[2]} TB 350.0 "ah_s2" {(ah, *-ah, *-ah+*, ah+*).state[2]} TB 350.0 "uh_s2" {(uh, *-uh, *-uh+*, uh+*).state[2]} .... TB 350.0 "y_s4" {(y, *-y, *-y+*, y+*).state[4]} TB 350.0 "z_s4" {(z, *-z, *-z+*, z+*).state[4]} TB 350.0 "zh_s4" {(zh, *-zh, *-zh+*, zh+*).state[4]} TR 1 AU "fulllist" CO "tiedlist" ST "trees"\end{verbatim}Firstly, the \texttt{RO}\index{ro@\texttt{RO} command} command is used to setthe outlier threshold\index{outlier threshold} to 100.0 and load the statisticsfile\index{statistics file} generated at the end of the previous step. Theoutlier threshold determines the minimum occupancy\index{minimum occupancy} ofany cluster and prevents a single outlier state forming a singleton clusterjust because it is acoustically very different to all the other states. The\texttt{TR}\index{tr@\texttt{TR} command} command sets the trace level to zeroin preparation for loading in the questions. Each\texttt{QS}\index{qs@\texttt{QS} command} command loads a single question andeach question is defined by a set of contexts. For example, the first\texttt{QS} command defines a question called \texttt{L\_Class-Stop} which istrue if the left context is either of the stops \texttt{p},\texttt, \texttt{t}, \textttjypkrhb, \texttt{k} or \texttt{g}.\sidefig{step10}{50}{Step 10}{-4}{}Notice that for a triphone system, it is necessary to include questionsreferring to both the right and left contexts of a phone. The questions shouldprogress from wide, general classifications (such as consonant, vowel, nasal,diphthong, etc.) to specific instances of each phone.Ideally, the full set of questions loaded using the \texttt{QS} command wouldinclude every possible context which can influence the acoustic realisation ofa phone, and can include any linguistic or phonetic classification which may berelevant. There is no harm in creating extra unnecessary questions, becausethose which are determined to be irrelevant to the data will be ignored.The second \texttt{TR} command enables intermediate level progress reporting sothat each of the following \texttt{TB} commands\index{tb@\texttt{TB} command}can\index{tree building} be monitored. Each of these \texttt{TB} commandsclusters one specific set of states. For example, the first \texttt{TB}command applies to the first emitting state of all context-dependent models forthe phone \texttt{aa}.Each \texttt{TB} command works as follows. Firstly, each set of states definedby the final argument is pooled to form a single cluster. Each question in thequestion set loaded by the \texttt{QS} commands is used to split the pool intotwo sets. The use of two sets rather than one, allows the log likelihood ofthe training data to be increased and the question which maximises thisincrease is selected for the first branch of the tree. The process is thenrepeated until the increase in log likelihood achievable by any question at anynode is less than the threshold specified by the first argument (350.0 in thiscase).Note that the values given in the \texttt{RO} and \texttt{TB} commands affectthe degree of tying and therefore the number of states output in the clusteredsystem. The values should be varied according to the amount of training dataavailable.As a final step to the clustering, any pair of clusters which can be merged\index{cluster merging} such that the decrease in log likelihood is belowthe threshold is merged. On completion, the states in each cluster $i$ aretied to form a single shared state with macro name \texttt{xxx\_i} where\texttt{xxx} is the name given by the second argument of the \texttt{TB}command.The set of triphones used so far only includes those needed to cover thetraining data. The \texttt{AU} command takes as its argument a new list oftriphones expanded to include all those needed for recognition. This list canbe generated, for example, by using \htool{HDMan} on the entire dictionary (notjust the training dictionary), converting it to triphones using the command\texttt{TC} and outputting a list of the distinct triphones to a file using theoption \texttt{-n} \begin{verbatim} HDMan -b sp -n fulllist -g global.ded -l flog beep-tri beep\end{verbatim}\noindentThe -b sp option specifies that the sp phone is used as a word boundary, and so is excluded from triphones. The effect of the \texttt{AU} command is to use the decision trees to synthesise all of the new previously unseen triphones in the new list.\index{au@\texttt{AU} command}Once all state-tying has been completed and new models synthesised, some models may share exactlythe same 3 states and transition matrices and are thus identical.The \texttt{CO} command\index{co@\texttt{CO} command}\index{model compaction} is usedto compact the model set by finding all identical models and tying themtogether\footnote{Note that if the transition matrices had not been tied, the \texttt{CO}command would be ineffective since all models would be different byvirtue of their unique transition matrices.}, producing a new list of modelscalled \texttt{tiedlist}.One of the advantages of using decision tree clustering is that it allowspreviously\index{unseen triphones}unseen triphones to be synthesised. To do this, the trees mustbe saved and this is done by the \texttt{ST} command\index{st@\texttt{ST} command}.Later if new previously unseen triphones are required, for example in thepronunciation of a new vocabulary item, the existing model set can bereloaded into \htool{HHEd}, the trees reloaded using the \texttt{LT} command\index{lt@\texttt{LT} command}and then a new extended list of triphones created using the \texttt{AU} command.\index{au@\texttt{AU} command}After \htool{HHEd} has completed, the effect of tying can be studied andthe thresholds adjusted if necessary. The log file willinclude summary statistics which give the total number of physicalstates remaining and the number of models after compacting.Finally, and for the last time, the models are re-estimated twice using\htool{HERest}. Fig.~\href{f:step10} illustrates this last step in the HMMbuild process. The trained models are then contained in the file\texttt{hmm15/hmmdefs}.\mysect{Recogniser Evaluation}{egrectest}The recogniser is now complete and its performance can be evaluated. The recognition network and dictionary have already been constructed, and test data has been recorded. Thus, all that is necessary is to run the recogniser and then evaluate the results using the \HTK\ analysis tool \htool{HResults}\index{recogniser evaluation}\subsection{Step 11 - Recognising the Test Data}Assuming that \texttt{test.scp} holds a list of the coded test files,then each test file will be recognised and its transcription output toan MLF called \texttt{recout.mlf} by executing the following\begin{verbatim} HVite -H hmm15/macros -H hmm15/hmmdefs -S test.scp \ -l '*' -i recout.mlf -w wdnet \
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -