亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? hlmfund.tex

?? 隱馬爾科夫模型工具箱
?? TEX
?? 第 1 頁 / 共 3 頁
字號:
%% !HVER!hlmfund [SJY 05/04/97]%% Updated (and about 90% rewritten) - Gareth Moore 16/01/02 - 27/03/02%\mychap{Fundamentals of language modelling}{hlmfund}The \HTK\ language modelling tools are designed for constructing andtesting statistical $n$-gram language models.  This chapter introduceslanguage modelling and provides an overview of the supplied tools.  Itis strongly recommended that you read this chapter and then workthrough the tutorial in the following chapter -- this will provide youwith everything you need to know to get started building language models.\sidepic{HLMoperation}{80}{An $n$-gram is a sequence of $n$ symbols (e.g. words, syntacticcategories, etc) and an $n$-gram language model (LM) is used topredict each symbol in the sequence given its $n-1$ predecessors.  Itis built on the assumption that the probability of a specific $n$-gramoccurring in some unknown test text can be estimated from thefrequency of its occurrence in some given training text.  Thus, asillustrated by the picture above, $n$-gram construction is a three stageprocess.  Firstly, the training text is scanned and its $n$-grams arecounted and stored in a database of \textit{gram} files.  In thesecond stage some words may be mapped to an out of vocabulary class orother class mapping may be applied, and then in the final stagethe counts in the resulting gram files are used to compute$n$-gram probabalities which are stored in the \textit{language model}file.  Lastly, the \textit{goodness} of a language model can beestimated by using it to compute a measure called \textit{perplexity}on a previously unseen test set.  In general, the better a language modelthen the lower its test-set perplexity.}Although the basic principle of an $n$-gram LM is very simple, inpractice there are usually many more potential $n$-grams than can everbe collected in a training text in sufficient numbers to yield robustfrequency estimates.  Furthermore, for any real application such asspeech recognition, the use of an essentially static and finitetraining text makes it difficult to generate a single LM which iswell-matched to varying test material. For example, an LM trained onnewspaper text would be a good predictor for dictating newsreports but the same LM would be a poor predictor for personal lettersor a spoken interface to a flight reservation system.  A finaldifficulty is that the \textit{vocabulary} of an $n$-gram LM is finiteand fixed at construction time.  Thus, if the LM is word-based, it canonly predict words within its vocabulary and furthermore new wordscannot be added without rebuilding the LM.The following four sections provide a thorough introduction to thetheory behind $n$-gram models.  It is well worth reading through thissection because it will provide you with at least a basicunderstanding of what many of the tools and their parameters actuallydo -- you can safely skip the equations if you choose because the textexplains all the most important parts in plain English.  The finalsection of this chapter then introduces the tools provided toimplement the various aspects of $n$-gram language modelling that havebeen described.\mysect{{\it n}-gram language models}{ngramLMs}Language models estimate the probability of a word sequence, $\hatP(w_1, w_2, \ldots, w_m)$ -- that is, they evaluate $P(w_i)$ as defined in equation\ref{e:3} in chapter \ref{c:fundaments}.\footnote{The theory components of this chapter --these first four sections -- are condensed from portions of{\textbf{``Adaptive Statistical Class-based Language Modelling''},G.L. Moore; \textit{Ph.D thesis, Cambridge University} 2001}}The probability $\hat P(w_1, w_2, \ldots, w_m)$ can be decomposed as aproduct of conditional probabilities:\begin{equation}\hat P(w_1, w_2, \ldots, w_m) = \prod_{i=1}^{m} \hat P(w_i \;|\; w_1,\ldots, w_{i-1})\label{cond_prob_model}\end{equation}\mysubsect{Word {\it n}-gram models}{wordngrams}Equation \ref{cond_prob_model} presents an opportunity forapproximating $\hat{P}(\cal{W})$ by limiting the context:\begin{equation}\hat P(w_1, w_2, \ldots, w_m) \simeq \prod_{i=1}^{m} \hat P(w_i \;|\; w_{i-n+1},\ldots, w_{i-1})\label{ngram_model}\end{equation}for some $n \geqslant 1$. If language is assumed to be ergodic -- thatis, it has the property that the probability of any state can beestimated from a large enough history independent of the startingconditions\footnote{See section 5 of [Shannon 1948]for a more formal definition of ergodicity.} -- then for sufficiently high $n$ equation\ref{ngram_model} is exact.  Due to reasons of data sparsity, however,values of $n$ in the range of 1 to 4 inclusive are typically used, andthere are also practicalities of storage space for these estimates toconsider.  Models using contiguous but limited context in this way areusually referred to as $n$-gram language models, and the conditionalcontext component of the probability (``$w_{i-n+1}, \ldots, w_{i-1}$''in equation \ref{ngram_model}) is referred to as the {\it history}.Estimates of probabilities in $n$-gram models are commonly based on maximumlikelihood estimates -- that is, by counting events in context on some giventraining text:\begin{equation}\hat P(w_i | w_{i-n+1}, \ldots, w_{i-1}) =\frac{C(w_{i-n+1}, \ldots, w_i)}{C(w_{i-n+1}, \ldots, w_{i-1})}\label{ngramcountdiv}\end{equation}where $C(.)$ is the count of a given word sequence in thetraining text. Refinements to this maximum likelihood estimate aredescribed later in this chapter.The choice of $n$ has a significant effect on the number of potentialparameters that the model can have, which is maximally bounded by$|\mathbb{W}|^n$, where $\mathbb{W}$ is the set of words in thelanguage model, also known as the {\it vocabulary}.  A 4-gram modelwith a typically-sized 65,000 word vocabulary can thereforepotentially have $65,000\,^4\simeq 1.8\times10^{19}$ parameters.  In practice, however, only asmall subset of the possible parameter combinations represent likelyword sequences, so the storage requirement is far less than thistheoretical maximum -- of the order of $10^{11}$ times less infact.\footnote{Based on the analysis of 170 million words of newspaperand broadcast news text.}  Even given this significant reduction incoverage and a very large training text\footnote{A couple of hundredmillion words, for example.} there are still many plausible wordsequences which will not be encountered in the training text, or willnot be found a statistically significant number of times. It would notbe sensible to assign all unseen sequences zero probability, somethods of coping with low and zero occurrence word tuples have beendeveloped. This is discussed later in section \ref{robust_estimation}.It is not only the storage space that must be considered, however --it is also necessary to be able to attach a reasonable degree ofconfidence to the derived estimates. Suitably large quantities ofexample training text are also therefore required to ensure statisticalsignificance.  Increasing the amount of training text not only givesgreater confidence in model estimates, however, but also demands morestorage space and longer analysis periods when estimating modelparameters, which may place feasibility limits on how much data can beused in constructing the final model or how thoroughly it can beanalysed. At the other end of the scale for restricted domain modelsthere may be only a limited quantity of suitable in-domain textavailable, so local estimates may need smoothing with global priors.In addition, if language models are to be used for speech recognitionthen it is good to train them on {\it precise} acoustic transcriptionswhere possible -- that is, text which features the hesitations,repetitions, word fragments, mistakes and all the other sources ofdeviation from purely grammatical language that characterise everydayspeech. However, such acoustically accurate transcriptions are inlimited supply since they must be specifically prepared; real-worldtranscripts as available for various other purposes almostubiquitously correct any disfluencies or mistakes made by speakers.\mysubsect{Equivalence classes}{HLMequivalenceclasses}The word $n$-gram model described in equation \ref{ngram_model} usesan equivalence mapping on the word history which assumes that allcontexts which have the same most recent $n-1$ words all have the sameprobability. This concept can be expressed more generally by definingan equivalence class function that acts on word histories, $\mathcalE(.)$, such that if $\mathcal E(x) = \mathcal E(y)$ then $\forall w:$$P(w | x) = P(w | y)$:\begin{equation}P(w_i \;|\; w_1, w_2, \ldots, w_{i-1}) =P(w_i \;|\; \mathcal E(w_1, w_2, \ldots, w_{i-1}))\label{equiv_cond_prob_model}\end{equation}A definition of $\mathcal{E}$ that describes a word $n$-gram is thus:\begin{equation}\mathcal E_{\textrm{word-{\it n}-gram}}(w_1, \ldots, w_{i}) = \mathcal E(w_{i-n+1}, \ldots, w_{i})\end{equation}In a good language model the choice of $\mathcal{E}$ should be such that itprovides a reliable predictor of the next word, resulting in classeswhich occur frequently enough in the training text that they can bewell modelled, and does not result in so many distinct historyequivalence classes that it is infeasible to store or analyse all theresultant separate probabilities.\mysubsect{Class {\it n}-gram models}{HLMclassngram}\label{classngram-description}One method of reducing the number of word history equivalence classesto be modelled in the $n$-gram case is to consider some words asequivalent. This can be implemented by mapping a set of words to aword class $g \in \mathbb{G}$ by using a classification function $G(w)= g$. If any class contains more than one word then this mapping willresult in less distinct word classes than there are words,$|\mathbb{G}| < |\mathbb{W}|$, thus reducing the number of separatecontexts that must be considered. The equivalence classes can then bedescribed as a sequence of these classes:\begin{equation}\mathcal E_{\textrm{class-{\it n}-gram}}(w_1, \ldots, w_{i}) = \mathcalE(G(w_{i-n+1}), \ldots, G(w_{i}))\label{equiv_classes}\end{equation}A deterministic word-to-class mapping like this has some advantagesover a word $n$-gram model since the reduction in the number ofdistinct histories reduces the storage space and training datarequirements whilst improving the robustness of the probabilityestimates for a given quantity of training data. Because multiplewords can be mapped to the same class, the model has the ability tomake more confident assumptions about infrequent words in a classbased on other more frequent words in the same class\footnote{Since itis assumed that words are placed in the same class because they sharecertain properties.} than is possible in the word $n$-gram case -- andfurthermore for the same reason it is able to make generalisingassumptions about words used in contexts which are not explicitlyencountered in the training text. These gains, however, clearlycorrespond with a loss in the ability to distinguish between differenthistories, although this might be offset by the ability tochoose a higher value of $n$.The most commonly used form of class $n$-gram model uses a singleclassification function, $G(.)$, as in equation \ref{equiv_classes},which is applied to each word in the $n$-gram, including the word whichis being predicted. Considering for clarity the bigram\footnote{By convention{\it unigram} refers to a 1-gram,{\it bigram} indicates a 2-gram and {\it trigram} is a 3-gram. There is nostandard term for a 4-gram.}case, then given $G(.)$ the language model has the terms $w_i$,$w_{i-1}$, $G(w_i)$ and $G(w_{i-1})$ available to it. The probabilityestimate can be decomposed as follows:\begin{eqnarray}P_{\textrm{class'}}(w_i \;|\; w_{i-1})& = & P(w_i \;|\; G(w_i), G(w_{i-1}), w_{i-1} )\nonumber\\  & \qquad\qquad\times & P(G(w_i) \;|\; G(w_{i-1}), w_{i-1})\label{chap2equalToOne}\end{eqnarray}It is assumed that $P(w_i\;|\; G(w_i), G(w_{i-1}), w_{i-1})$ is independent of $G(w_{i-1})$ and$w_{i-1}$ and that $P(G(w_i) \;|\; G(w_{i-1}), w_{i-1})$ isindependent of $w_{i-1}$, resulting in the model:\begin{equation}P_{\textrm{class}}(w_i \;|\; w_{i-1}) = P(w_i \;|\; G(w_{i})) \;\times\;P(G(w_i) \;|\; G(w_{i-1}))\label{normclass}\end{equation} Almost all reported class $n$-gram work using statistically-found classesis based on clustering algorithms which optimise $G(.)$ on the basisof bigram training set likelihood, even if the class map is to be usedwith longer-context models.  It is interesting tonote that this approximation appears to works well, however,suggesting that the class maps found are in some respects ``general''and capture some features of natural language which apply irrespectiveof the context length used when finding these features.\mysect{Statistically-derived Class Maps}{HLMclustering}\label{clustering_section}An obvious question that arises is how to compute or otherwise obtaina class map for use in a language model. This section discussesone strategy which has successfully been used.Methods of statistical class map construction seek to maximise thelikelihood of the training text given the class model by makingiterative controlled changes to an initial class map -- in order tomake this problem more computationally feasible they typically use adeterministic map.\mysubsect{Word exchange algorithm}{HLMexchangealg}\label{KN-clustering}[Kneser and Ney 1993]\footnote{R. Kneser and H. Ney,\textbf{``Improved Clustering Techniques for Class-Based Statistical LanguageModelling''}; \textit{Proceedings of the European Conference on SpeechCommunication and Technology} 1993, pp. 973-976} describes analgorithm to build a class map by starting from some initial guess ata solution and then iteratively searching for changes to improve theexisting class map.  This is repeated until some minimum changethreshold has been reached or a chosen number of iterations have beenperformed. The initial guess at a class map is typically chosen by asimple method such as randomly distributing words amongst classes orplacing all words in the first class except for the most frequentwords which are put into singleton classes. Potential moves are thenevaluated and those which increase the likelihood of the training textmost are applied to the class map. The algorithm is described indetail below, and is implemented in the \HTK\ tool \htool{Cluster}.Let $\mathcal{W}$ be the training text list of words $(w_1, w_2, w_3,\ldots)$ and let $\mathbb{W}$ be the set of all words in$\mathcal{W}$.From equation\ref{cond_prob_model} it follows that:\begin{equation}P_\mathrm{class}(\mathcal{W}) \;=\; \prod_{x, y \in \mathbb{W}}P_\mathrm{class}(x \;|\; y)^{C(x,y)}\label{classnorm_totprob}\end{equation}where $(x, y)$ is some word pair `$x$' preceded by `$y$' and $C(x, y)$is the number of times that the word pair `$y$ $x$' occurs in the list

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
欧美aaa在线| 日韩午夜精品视频| 亚洲乱码精品一二三四区日韩在线| 国产成人综合自拍| 欧美国产一区在线| 色噜噜狠狠成人网p站| 一个色综合网站| 欧美一级高清大全免费观看| 精品一区二区影视| 国产精品欧美经典| 欧美日韩中文一区| 精品亚洲aⅴ乱码一区二区三区| 久久久精品国产免大香伊| 成人午夜在线视频| 一区二区三区高清不卡| 日韩欧美的一区| 成人av综合在线| 亚洲午夜成aⅴ人片| 精品国产伦一区二区三区观看体验 | 国产精品久久久久久久久免费丝袜| 不卡的电影网站| 日韩一区欧美二区| 国产亚洲美州欧州综合国| 91久久精品午夜一区二区| 蜜臀av一区二区在线免费观看 | 欧美日本视频在线| 国产在线一区二区综合免费视频| 亚洲国产激情av| 欧美日韩精品二区第二页| 国内精品视频一区二区三区八戒| 成人欧美一区二区三区小说| 欧美一区二区在线免费观看| 成人av电影在线网| 丝袜a∨在线一区二区三区不卡| 欧美国产成人精品| 91精品国产综合久久久蜜臀图片| 成人av网站在线观看免费| 欧美bbbbb| 亚洲自拍偷拍综合| 国产精品久久久久久久久免费丝袜 | 成人动漫一区二区在线| 视频一区二区三区入口| 综合久久久久综合| 久久久久一区二区三区四区| 欧美人xxxx| 91丨九色丨蝌蚪丨老版| 国产精品小仙女| 免费观看久久久4p| 午夜久久久影院| 亚洲天堂2016| 中文字幕精品一区二区精品绿巨人| 日韩欧美精品在线视频| 欧美精三区欧美精三区| 色综合天天做天天爱| 国产一区二区伦理片| 日韩高清欧美激情| 亚洲观看高清完整版在线观看| 国产精品色哟哟网站| 久久影视一区二区| 欧美变态口味重另类| 欧美一区欧美二区| 欧美一区二区三区男人的天堂| 欧美网站一区二区| 91精品办公室少妇高潮对白| 成人av电影免费在线播放| 成人免费看的视频| 风流少妇一区二区| 国产成人在线视频网址| 国产乱理伦片在线观看夜一区| 国内成人自拍视频| 国产一区二区视频在线播放| 激情综合色播激情啊| 久久国产精品免费| 国产在线不卡视频| 国产精品一区二区三区四区| 国产成人精品亚洲日本在线桃色 | 免费观看日韩av| 乱一区二区av| 久久激情综合网| 国产在线精品一区二区三区不卡| 精品一区二区三区蜜桃| 国产麻豆日韩欧美久久| 国产成人免费av在线| 国产成人精品免费一区二区| 成人av综合一区| 91国偷自产一区二区三区成为亚洲经典| 91免费小视频| 欧美日韩在线播放| 欧美电视剧在线观看完整版| 国产亚洲欧美日韩在线一区| 国产精品色哟哟网站| 一区二区三区资源| 亚洲chinese男男1069| 日韩电影一区二区三区| 久久91精品久久久久久秒播| 国产精品18久久久久久久久久久久| 国产白丝网站精品污在线入口| av男人天堂一区| 欧美日韩极品在线观看一区| 精品少妇一区二区三区在线视频| 国产精品色在线| 亚洲国产成人av网| 极品销魂美女一区二区三区| 波多野结衣的一区二区三区| 欧美日韩一级黄| 欧美大片在线观看一区二区| 亚洲欧美日韩中文播放| 亚洲电影激情视频网站| 国产乱妇无码大片在线观看| 日本高清不卡一区| 精品久久人人做人人爱| 亚洲丝袜自拍清纯另类| 日本成人中文字幕在线视频 | 国产精品高清亚洲| 亚洲成av人片一区二区三区 | 91亚洲资源网| 欧美一级精品在线| 亚洲少妇30p| 久久99国产精品久久99果冻传媒| 97成人超碰视| 精品国产不卡一区二区三区| 亚洲另类在线制服丝袜| 久久激情五月激情| 在线亚洲高清视频| 国产亚洲成aⅴ人片在线观看 | 激情综合网av| 欧美综合在线视频| 欧美韩日一区二区三区| 日本色综合中文字幕| 色婷婷av一区二区三区软件| 久久影音资源网| 日本午夜一区二区| 色琪琪一区二区三区亚洲区| 久久精品一区二区三区不卡 | 欧美午夜精品久久久久久超碰| 中文字幕巨乱亚洲| 久久超级碰视频| 欧美日韩一区二区三区在线| 中文久久乱码一区二区| 美女视频黄 久久| 欧美视频中文字幕| 国产精品二三区| 国产成人精品免费一区二区| 欧美一级精品在线| 午夜久久久久久久久久一区二区| 91麻豆国产自产在线观看| 国产精品污www在线观看| 国产资源精品在线观看| 日韩一区二区不卡| 日韩在线播放一区二区| 在线观看精品一区| 亚洲欧美成aⅴ人在线观看| 高清国产午夜精品久久久久久| 欧美不卡一区二区三区四区| 免费成人在线影院| 日韩一区二区免费视频| 日韩精品五月天| 欧美日韩一区二区三区在线| 夜夜精品浪潮av一区二区三区| 91在线小视频| 日韩美女视频一区| 色婷婷综合五月| 一区二区三区.www| 在线看日本不卡| 亚洲一区二区在线播放相泽| 日本高清不卡在线观看| 亚洲一区二区三区精品在线| 在线看国产一区二区| 亚洲一区免费观看| 欧美美女网站色| 青青草国产成人99久久| 精品国产乱码久久| 国产麻豆精品在线| 国产精品久久久久久久久久久免费看 | 日韩欧美一级二级三级久久久| 免费高清不卡av| 久久久久久久久久久久久久久99| 国产精品1区2区3区在线观看| 国产日韩v精品一区二区| 成人综合日日夜夜| 亚洲啪啪综合av一区二区三区| 91精品福利在线| 人人超碰91尤物精品国产| 欧美成人vps| 波多野结衣一区二区三区| 亚洲综合一二区| 日韩精品一区二区三区在线播放 | 日韩和欧美的一区| 2欧美一区二区三区在线观看视频| 寂寞少妇一区二区三区| 国产精品视频免费| 欧美日韩一区二区在线观看| 美女诱惑一区二区| 国产欧美1区2区3区| 欧美专区亚洲专区| 美女mm1313爽爽久久久蜜臀| 久久久久亚洲蜜桃| 日本精品视频一区二区| 久久精品免费观看| 亚洲色图自拍偷拍美腿丝袜制服诱惑麻豆|