?? multi-ngram.1
字號:
multi-ngram(1) multi-ngram(1)NNAAMMEE multi-ngram - build multiword N-gram modelsSSYYNNOOPPSSIISS mmuullttii--nnggrraamm [--hheellpp] option ...DDEESSCCRRIIPPTTIIOONN mmuullttii--nnggrraamm builds N-gram language models that contain multiwords, i.e., compound words that are a concatenation of words from some prior given model. It will optionally generate multiword N-grams and insert them into an exist- ing, reference N-gram model, so as to cover multiwords occuring in a specified vocabulary. It will then assign probabilities to the multiword N-grams so that word strings containing multiwords have the same probabilities as the strings of component words in the reference model. Note that the inverse operation (expanding a multiword N- gram to contain only regular words) is subsumed by the nnggrraamm --eexxppaanndd--ccllaasssseess function.OOPPTTIIOONNSS Each filename argument can be an ASCII file, or a com- pressed file (name ending in .Z or .gz), or ``-'' to indi- cate stdin/stdout. --hheellpp Print option summary. --vveerrssiioonn Print version information. --oorrddeerr _n Set the maximal N-gram order to be used from the reference model. NOTE: The order of the model is not set automatically when a model file is read, so the same file can be used at various orders. To use models of order higher than 3 it is always nec- essary to specify this option. --mmuullttii--oorrddeerr _n The maximal N-gram order in the multiword-based model. --ddeebbuugg _l_e_v_e_l Set the debugging output level (0 means no debug- ging output). --vvooccaabb _f_i_l_e Words to be added to the model. In particular, this should include all the multiwords to be added. --mmuullttii--cchhaarr _C Character used to delimit component words in multi- words (an underscore character by default). --llmm _f_i_l_e Reference N-gram model. --mmuullttii--llmm _f_i_l_e Model containing multiwords; the N-grams in this model will be assigned new probabilities based on the reference model. If this option is _n_o_t given then the multiword model will be generated by adding multiword N-grams to the reference model. --pprruunnee--uunnsseeeenn--nnggrraammss This option prevents the insertion of multiword N- grams whose component N-grams are not contained in the reference model. For example, for a multiword bigram "a_b c_d" to be inserted, a trigram refer- ence model must contain the trigrams "a b c" and "b c d". If the reference model were a bigram LM, it would have to contain "a b", "b c", and "c d". This option is important to control the size of the multiword LM for large vocabularies. --wwrriittee--llmm _f_i_l_e Output location of the generated multiword model.SSEEEE AALLSSOO ngram(1), ngram-format(5).BBUUGGSS This program is a hack for cases were the original train- ing data is not available and a multiword model has to be generated from an existing model. The resulting model is no longer properly normalized, since the same word string can potentially be represented with or without multiwords. The generation of multiword N-grams uses a heuristic algo- rithm that works well for bigrams and trigrams, but is not exhaustive.AAUUTTHHOORR Andreas Stolcke <stolcke@speech.sri.com>. Copyright 2000-2004 SRI InternationalSRILM Tools $Date: 2004/12/03 17:59:01 $ multi-ngram(1)
?? 快捷鍵說明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -