?? lattice-tool.1
字號:
.\" $Id: lattice-tool.1,v 1.57 2006/09/20 21:05:57 stolcke Exp $.TH lattice-tool 1 "$Date: 2006/09/20 21:05:57 $" "SRILM Tools".SH NAMElattice-tool \- manipulate word lattices.SH SYNOPSIS.B lattice-tool[\c.BR \-help ]option\&....SH DESCRIPTION.B lattice-toolperforms operations on word lattices in .BR pfsg-format (5)or in HTK Standard Lattice format (SLF).Operations include size reduction, pruning, null-node removal,weight assignment fromlanguage models, lattice word error computation, and decoding of the best hypotheses..PPEach input lattice is processed in turn, and a series of optionaloperations is performed in a fixed sequence (regardless of the orderin which corresponding options are specified).The sequence of operations is as follows:.TP1.Read input lattice..TP2.Score pronunciations (if dictionary was supplied)..TP3.Split multiword word nodes..TP4.Posterior- and density-based pruning (before reduction)..TP5.Write word posterior lattice..TP6.Perform word-posterior based decoding..TP7.Write word mesh (confusion network)..TP8.Compute word and transition posteriors (forward-backward algorithm),and N-gram counts if specified..TP9.Compute lattice density..TP10.Check lattice connectivity..TP11.Compute node entropy..TP12.Compute lattice word error..TP13.Output reference word posteriors..TP14.Remove null nodes..TP15.Lattice reduction..TP16.Posterior- and density-based pruning (after reduction)..TP17.Remove pause nodes..TP18.Lattice reduction (post-pause removal)..TP19.Language model replacement or expansion..TP20.Pause recovery or insertion..TP21.Lattice reduction (post-LM expansion)..TP22.Multiword splitting (post-LM expansion)..TP23.Merging of same-word nodes..TP24.Lattice algebra operations (or, concatenation)..TP25.Viterbi-decode best hypothesisand/or generate N-best lists..TP26.Lattice-LM perplexity computation..TP27.Writing output lattice..PPThe following options control which of these steps actually apply..SH OPTIONSEach filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicatestdin/stdout..TP.B \-helpPrint option summary..TP.B \-versionPrint version information..TP.BI \-debug " level"Set the debugging output level (0 means no debugging output).Debugging messages are sent to stderr..TP.BI \-in-lattice " file"Read input lattice from.IR file ..TP.BI \-in-lattice2 " file"Read additional input lattice (for binary lattice operations) from.IR file ..TP.BI \-in-lattice-list " file"Read list of input lattices from.IR file .Lattice operations are applied to each filename listed in .IR file ..TP.BI \-out-lattice " file"Write result lattice to .IR file ..TP.BI \-out-lattice-dir " dir"Write result lattices from processing of .B \-in-lattice-listto directory.IR dir ..TP.B \-read-meshAssume input lattices are in word mesh (confusion network) format, as describedin.BR wlat-format (5)..TP.B \-write-internalWrite output lattices with internal node numbering instead of compact,consecutive numbering..TP.B \-overwriteOverwrite existing output lattice files..TP.BI \-vocab " file"Initialize the vocabulary to words listed in.IR file .This is useful in conjunction with .TP.B \-limit-vocabDiscard LM parameters on reading that do not pertain to the words specified in the vocabulary.The default is that words used in the LM are automatically added to the vocabulary.This option can be used to reduce the memory requirements for large LMs;to this end,.B \-vocab typically specifies the set of words used in the lattices to be processed (which has to be generated beforehand, see .BR pfsg-scripts (1))..TP.BI \-vocab-aliases " file"Reads vocabulary alias definitions from.IR file ,consisting of lines of the form.br \fIalias\fP \fIword\fP.brThis causes all tokens.I aliasto be mapped to.IR word ..TP.B \-unkMap lattice words not contained in the known vocabulary with the unknown word tag.This is useful if the rescoring LM contains a probability for the unknownword (i.e., is an open-vocabulary LM).The known vocabulary is given by what is specified by the.B \-vocab option, as well as all words in the LM used for rescoring..TP.BI \-map-unk " word"Map out-of-vocabulary words to .IR word ,rather than the default.B <unk>tag..TP.B \-tolowerMap all vocabulary to lowercase..TP.BI \-nonevents " file"Read a list of words from.I filethat are used only as context elements, and are not predicted by the LM,similar to ``<s>''.If.B \-keep-pauseis also specified then pauses are not treated as nonevents by default..TP.BI \-max-time " T"Limit processing time per lattice to.I Tseconds..PPOptions controlling lattice operations:.TP.BI \-write-posteriors " file"Compute the posteriors of lattice nodes and transitions (using theforward-backward algorithm) and write out a word posterior latticein.BR wlat-format (5).This and other options based on posterior probabilities make most senseif the input lattice contains combined acoustic-language model weights..TP .BI \-write-posteriors-dir " dir"Similar to the above, but posterior lattices are written toseparate files in directory .IR dir ,named after the utterance IDs..TP.BI \-write-mesh " file"Construct a word confusion network ("sausage") from the lattice and write it to .IR file .If reference words are available for the utterance (specified by.B \-ref-file or.BR \-ref-list )their alignment will be recorded in the sausage..TP.BI \-write-mesh-dir " dir"Similar, but write sausages to files in.I dir named after the utterance IDs..TP.BI \-init-mesh " file"Initialize the word confusion network by reading an existing sausage from.IR file .This effectively aligns the lattice being processed to the existingsausage..TP.BI \-acoustic-meshPreserve word-level acoustic information (times, scores, and pronunciations) in sausages, encoded as described in.BR wlat-format (5)..TP.BI \-posterior-prune " P"Prune lattice nodes with posteriors less than.I Ptimes the highest posterior path..TP.BI \-density-prune " D"Prune lattices such that the lattice density (non-null words per second)does not exceed .IR D ..TP.BI \-nodes-prune " N"Prune lattices such that the total number of non-null, non-pause nodesdoes not exceed.IR N ..TP.B \-fast-pruneChoose a faster pruning algorithm that does not recompute posteriorsafter each iteration..TP.BI \-write-ngrams " file"Compute posterior expected N-gram counts in lattices and output themto.IR file .The maximal N-gram length is given by the.B \-order option (see below).The counts from all lattices processed are accumulated and output at the end..TP.BI \-write-ngram-index " file"Output an index file of all N-gram occurences in the lattices processed,including their start times, durations, and posterior probabilities.The maximal N-gram length is given by the.B \-order option (see below)..TP.BI \-min-count " C"Prune N-grams with count less than .I Cfrom output with .B \-write-ngramsand.BR \-write-ngram-index .In the former case, the threshold applies to the aggregate occurrence counts;in the latter case, the threshold applies to the posterior probability ofan individual occurence..TP.BI \-max-ngram-pause " T"Index only N-grams that contain internal pauses (between words) not exceeding.I Tseconds (assuming time stamps are recorded in the input lattice)..TP.BI \-posterior-scale " S"Scale the transition weights by dividing by.I Sfor the purpose of posterior probability computation.If the input weights represent combined acoustic-language model scoresthen this should be approximately the language model weight of the recognizer in order to avoid overly peaked posteriors (the default value is 8)..TP.BI \-write-vocab " file"Output the list of all words found in the lattice(s) to .IR file ..TP.B \-reduce Reduce lattice size by a single forward node merging pass..TP.BI \-reduce-iterate " I"Reduce lattice size by up to.I Iforward-backward node merging passes..TP.BI \-overlap-ratio " R"Perform approximate lattice reduction by merging nodes that share more than a fraction.I Rof their incoming or outgoing nodes.The default is 0, i.e., only exact lattice reduction is performed..TP.BI \-overlap-base " B"If .I Bis 0 (the default), then the overlap ratio.I R is taken relative to the smaller set of transitions being compared.If the value is 1, the ratio is relative to the larger of the two sets..TP.B \-reduce-before-pruningPerform lattice reduction before posterior-based pruning.The default order is to first prune, then reduce..TP.BI \-pre-reduce-iterate " I"Perform iterative reduction prior to lattice expansion, but after pause elimination..TP.BI \-post-reduce-iterate " I"Perform iterative reduction after lattice expansion and pause node recovery.Note: this is not recommended as it changes the weights assigned fromthe specified language model..TP.B \-no-nullsEliminate NULL nodes from lattices..TP.B \-no-pauseEliminate pause nodes from lattices(and do not recover them after lattice expansion)..TP.B \-compact-pauseUse compact encoding of pause nodes that saves nodes but allows optional pauses where they might not have been included in the originallattice..TP.B \-loop-pauseAdd self-loops on pause nodes..TP.B \-insert-pauseInsert optional pauses after every word in the lattice.The structure of inserted pauses is affected by.B \-compact-pauseand.BR \-loop-pause ..TP.B \-collapse-same-wordsPerform an operation on the final lattices that collapses all nodes with the same words, except null nodes, pause nodes, or nodes with noise words.This can reduce the lattice size dramatically, but also introduces new paths..TP.B \-connectivityCheck the connectedness of lattices..TP.B \-compute-node-entropyCompute the node entropy of lattices..TP.B \-compute-posteriorsCompute node posterior probabilities(which are included in HTK lattice output)..TP.B \-densityCompute and output lattice densities..TP.BI \-ref-list " file"Read reference word strings from .IR file .Each line starts with a sentence ID (the basename of the lattice file name),followed by the words.This and the next option triggers computation of lattice word errors(minimum word error counts of any path through a lattice)..TP.BI \-ref-file " file"Read reference word strings from.IR file .Lines must contain reference words only, and must be matched to inputlattices in the order processed..TP.BI \-write-refs " file"Write the references back to .I file(for validation)..TP.BI \-add-refs " P"Add the reference words as an additional path to the lattice,with probability .IR P .Unless .B \-no-pauseis specified, optional pause nodes between words are also added.Note that this operation is performed before lattice reduction and expansion, so the new path can be merged with existing ones, and theprobabilities for the new path can be reassigned from an LM later..TP.BI \-noise-vocab " file"Read a list of ``noise'' words from.IR file .These words are ignored when computing lattice word errors,when decoding the best word sequence using.B \-viterbi-decodeor.BR \-posterior-decode ,or when collapsing nodes with.BR \-collapse-same-words ..TP.B \-keep-pauseCauses the pause word ``-pau-'' to be treated like a regular word.It prevents pause from being implicitly added to the list of noisewords..TP.BI \-ignore-vocab " file"Read a list of words that are to be ignored inlattice operations, similar to pause tokens.Unlike noise words (see above) they are also skipped during LM evaluation.With this option and.BR \-keep-pause ,pause words are not ignored by default..TP.BI \-split-multiwordsSplit lattice nodes with multiwords into a sequence of non-multiwordnodes.This option is necessary to compute lattice error of multiword latticesagainst non-multiword references, but may be useful in its own right..TP.BI \-split-multiwords-after-lmPerform multiword splitting after lattice expansion using the specified LM.This should be used if the LM uses multiwords, but the final latticesare not supposed to contain multiwords..TP.BI \-multiword-dictionary " file"Read a dictionary from .I filecontaining multiword pronunciations and word boundary markers (a ``|'' phonelabel).Specifying such a dictionary allows the multiword splitting optionsto infer accurate time marks and pronunciation information for themultiword components..TP.BI -multi-char " C"Designate .I Cas the character used for separating multiword components.The default is an underscore ``_''..TP.BI \-operation " O"Perform a lattice algebra operation.I Oon the lattice or lattices processed, with
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -