?? multi-ngram.html
字號:
<! $Id: multi-ngram.1,v 1.3 2004/12/03 17:59:01 stolcke Exp $><HTML><HEADER><TITLE>multi-ngram</TITLE><BODY><H1>multi-ngram</H1><H2> NAME </H2>multi-ngram - build multiword N-gram models<H2> SYNOPSIS </H2><B> multi-ngram </B>[<B>-help</B>]<B></B>option...<H2> DESCRIPTION </H2><B> multi-ngram </B>builds N-gram language models that contain multiwords, i.e., compound wordsthat are a concatenation of words from some prior given model.It will optionally generate multiword N-grams and insert them intoan existing, reference N-gram model, so as to cover multiwords occuring in a specified vocabulary.It will then assign probabilities to the multiword N-grams so that wordstrings containing multiwords have the same probabilities as the stringsof component words in the reference model.<P>Note that the inverse operation (expanding a multiword N-gram to containonly regular words) is subsumed by the <B> ngram -expand-classes </B>function.<H2> OPTIONS </H2>Each filename argument can be an ASCII file, or a compressed file (name ending in .Z or .gz), or ``-'' to indicatestdin/stdout.<DL><DT><B> -help </B><DD>Print option summary.<DT><B> -version </B><DD>Print version information.<DT><B>-order</B><I> n</I><B></B><DD>Set the maximal N-gram order to be used from the reference model.NOTE: The order of the model is not set automatically when a modelfile is read, so the same file can be used at various orders.To use models of order higher than 3 it is always necessary to specify thisoption.<DT><B>-multi-order</B><I> n</I><B></B><DD>The maximal N-gram order in the multiword-based model.<DT><B>-debug</B><I> level</I><B></B><DD>Set the debugging output level (0 means no debugging output).<DT><B>-vocab</B><I> file</I><B></B><DD>Words to be added to the model.In particular, this should include all the multiwords to be added.<DT><B>-multi-char</B><I> C</I><B></B><DD>Character used to delimit component words in multiwords(an underscore character by default).<DT><B>-lm</B><I> file</I><B></B><DD>Reference N-gram model.<DT><B>-multi-lm</B><I> file</I><B></B><DD>Model containing multiwords; the N-grams in this model will be assignednew probabilities based on the reference model.If this option is <I> not </I>given then the multiword model will be generated by adding multiwordN-grams to the reference model.<DT><B> -prune-unseen-ngrams </B><DD>This option prevents the insertion of multiword N-grams whose componentN-grams are not contained in the reference model.For example, for a multiword bigram "a_b c_d" to be inserted, a trigramreference model must contain the trigrams "a b c" and "b c d".If the reference model were a bigram LM, it would have to contain"a b", "b c", and "c d".This option is important to control the size of the multiword LM forlarge vocabularies.<DT><B>-write-lm</B><I> file</I><B></B><DD>Output location of the generated multiword model.</DD></DL><H2> SEE ALSO </H2><A HREF="ngram.html">ngram(1)</A>, <A HREF="ngram-format.html">ngram-format(5)</A>.<H2> BUGS </H2>This program is a hack for cases were the original training data is not available and a multiword model has to be generated from an existingmodel.<BR>The resulting model is no longer properly normalized, since the same word string can potentially be represented with or without multiwords.<BR>The generation of multiword N-grams uses a heuristic algorithm that works well for bigrams and trigrams, but is not exhaustive.<H2> AUTHOR </H2>Andreas Stolcke <stolcke@speech.sri.com>.<BR>Copyright 2000-2004 SRI International</BODY></HTML>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -