亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

? 歡迎來到蟲蟲下載站! | ?? 資源下載 ?? 資源專輯 ?? 關于我們
? 蟲蟲下載站

?? hlmfund.tex

?? 隱馬爾科夫模型工具箱
?? TEX
?? 第 1 頁 / 共 3 頁
字號:
$\mathcal{W}$.In general evaluating equation \ref{classnorm_totprob} will lead toproblematically small values, so logarithms can be used:\begin{equation}\log P_\mathrm{class}(\mathcal{W}) \;=\; \sum_{x, y \in \mathbb{W}}C(x, y) . \log P_\mathrm{class}(x \;|\; y) \label{classnorm_logprob}\end{equation}Given the definition of a class $n$-gram model in equation\ref{normclass}, the maximum likelihood bigramprobability estimate of a word is:\begin{eqnarray}P_\mathrm{class}(w_i \;|\; w_{i-1}) & = & \frac{C(w_i)}{C(G(w_i))}  \times  \frac{C\left(G(w_i), G(w_{i-1})\right)}       {C(G(w_{i-1}))} \label{classnorm_breakdown}\end{eqnarray}where $C(w)$ is the number of times that the word `$w$' occurs in thelist $\mathcal{W}$ and $C(G(w))$ is the number of times that the class$G(w)$ occurs in the list resulting from applying $G(.)$ to each entryof $\mathcal{W}$;\footnote{That is, $C(G(w))=\sum_{x:G(x)=G(w)}C(x)$.}  similarly $C(G(w_x), G(w_y))$ is the count of theclass pair `$G(w_y)$ $G(w_x)$' in that resultant list.Substituting equation \ref{classnorm_breakdown} into equation \ref{classnorm_logprob}and then rearranging gives:\begin{eqnarray}\log P_\mathrm{class}(\mathcal{W}) & \;=\; &\sum_{x,y \in \mathbb{W}} C(x,y) . \log\left(  \frac{C(x)}{C(G(x))} \times \frac{C(G(x),G(y))}{C(G(y))}   \right) \nonumber\\&\;=\;& \sum_{x,y \in \mathbb{W}} C(x,y) . \log  \left(\frac{C(x)}{C(G(x))}\right) \;+\; \sum_{x,y \in \mathbb{W}} C(x,y)  . \log\left(\frac{C(G(x),G(y))}{C(G(y))}\right) \nonumber\\&\;=\;& \sum_{x \in \mathbb{W}} C(x) . \log \left(\frac{C(x)}{C(G(x))}\right) \;+\; \sum_{g,h \in \mathbb{G}} C(g,h) . \log\left(\frac{C(g,h)}{C(h)}\right)  \nonumber\\&\;=\;& \sum_{x \in \mathbb{W}} C(x) . \log C(x) \;-\; \sum_{x \in \mathbb{W}} C(x) . \log C(G(x))\nonumber\\&&\;+\; \sum_{g,h \in \mathbb{G}} C(g,h) . \log C(g,h) \;-\; \sum_{g \in \mathbb{G}} C(g) . \log C(g) \nonumber\\&\;=\;& \sum_{x \in \mathbb{W}} C(x) . \log C(x) \;+\; \sum_{g,h \in \mathbb{G}} C(g,h) . \log C(g,h)\nonumber\\&&\;-\; 2 \sum_{g \in \mathbb{G}} C(g) . \log C(g)\label{classnorm_ml}\end{eqnarray}where $(g,h)$ is some class sequence `$h$ $g$'.Note that the first of these three terms in the final stage of equation\ref{classnorm_ml}, ``$\sum_{x \in \mathbb{W}} C(x)$ $.$ $\log(C(x))$'', isindependent of the class map function $G(.)$, therefore it is notnecessary to consider it when optimising $G(.)$.  The value a classmap must seek to maximise, $F_{\mathrm{M}_\mathrm{C}}$, can now be defined:\begin{eqnarray}F_{\mathrm{M}_\mathrm{C}}&\;=\;& \sum_{g,h \in \mathbb{G}} C(g,h) . \log C(g,h)\;-\; 2 \sum_{g \in \mathbb{G}} C(g) . \log C(g)\label{classnorm_Fml}\end{eqnarray}A fixed number of classes must be decided before running thealgorithm, which can now be formally defined:\begin{center}\framebox[13.5cm]{\parbox{12cm}{\vspace{0.5cm}\begin{enumerate}\item {\bfseries Initialise}:\label{Cstepone} $\forall w \in\mathbb{W}:\; G(w) = 1$\\Set up the class map so that all words are in the first class and allother classes are empty ({\it or} initialise using some other scheme)\item{\bfseries Iterate}: $\forall i \in \{1\ldots n\} \; \wedge \; \neg s$\\For a given number of iterations $1 \ldots n$ or until some stopcriterion $s$ is fulfilled\begin{enumerate}\item {\bfseries Iterate}: $\forall w \in\mathbb{W}$\\For each word $w$ in the vocabulary\begin{enumerate}\label{Csteptwo}\item {\bfseries Iterate}: $\forall c\in\mathbb{G}$\\For each class $c$\begin{enumerate}\item {\bfseries Move} word $w$ to class $c$, remembering its previous class\item {\bfseries Calculate} the change in $F_{\mathrm{M}_\mathrm{C}}$ for this move\item {\bfseries Move} word $w$ back to its previous class\end{enumerate}\item {\bfseries Move} word $w$ to the class which increased $F_{\mathrm{M}_\mathrm{C}}$ by themost, or do not move it if no move increased $F_{\mathrm{M}_\mathrm{C}}$\end{enumerate}\end{enumerate}\end{enumerate}\vspace{0.5cm}}}\end{center}The initialisation scheme given here in step \ref{Cstepone} representsa word unigram language model, making no assumptions about which wordsshould belong in which class.\footnote{Given this initialisation, thefirst $(|\mathbb{G}|-1)$ moves will be to place each word into anempty class, however, since the class map which maximises$F_{\mathrm{M}_\mathrm{C}}$ is the one which places each word into a singletonclass.}  The algorithm is greedy and so can get stuck in a localmaximum and is therefore not guaranteed to find the optimal class mapfor the training text.  The algorithm is rarely run until totalconvergence, however, and it is found in practice that an extraiteration can compensate for even a deliberately poor choice ofinitialisation.The above algorithm requires the number of classes to be fixed beforerunning. It should be noted that as the number of classes utilisedincreases so the overall likelihood of the training text will tendtend towards that of the word model.\footnote{Which will be higher,given maximum likelihood estimates.}  This is why the algorithm doesnot itself modify the number of classes, otherwise it wouldna\"{\i}vely converge on $|\mathbb{W}|$ classes.\mysect{Robust model estimation}{HLMrobustestimates}\label{robust_estimation}Given a suitably large amount of training data, an extremely long $n$-gramcould be trained to give a very good model of language, as per equation\ref{cond_prob_model} -- in practice, however,any actual extant model must be an approximation. Because it is anapproximation, it will be detrimental to include within the modelinformation which in fact was just noise introduced by the limits ofthe bounded sample set used to train the model -- this information maynot accurately represent text not contained within the trainingcorpus. In the same way, word sequences which were not observed in thetraining text cannot be assumed to represent impossible sequences, sosome probability mass must be reserved for these. The issue of how toredistribute the probability mass, as assigned by the maximum likelihoodestimates derived from the raw statistics of a specific corpus, into asensible estimate of the real world is addressed by various standardmethods, all of which aim to create more robust language models.\subsection{Estimating probabilities}\label{discounting_and_other_fun_things}Language models seek to estimate the probability of each possible wordsequence event occurring. In order to calculate maximum likelihoodestimates this set of events must be finite so that the language modelcan ensure that the sum of the probabilities of all events is 1 givensome context. In an $n$-gram model the combination of the finite vocabularyand fixed length history limits the number of unique events to$|\mathbb{W}|^n$.  For any $n>1$ it is highly unlikely that all wordsequence events will be encountered in the training corpora, and manythat do occur may only appear one or two times. A language modelshould not give any unseen event zero probability,\footnote{If it didthen from equation \ref{cond_prob_model} it follows that theprobability of any piece of text containing that event would also bezero, and would have infinite perplexity.} but without an infinitequantity of training text it is almost certain that there will beevents it does not encounter during training, so various mechanismshave been developed to redistribute probability within the model such that theseunseen events are given some non-zero probability. %The same or%similar mechanisms can also be used to%redistribute probability amongst infrequently-occurring events too in%the assumption that these were not seen commonly enough to draw firm%conclusions about their behaviour.%Various methods of discounting probability mass from observed events and%redistributing it to unseen events have been developed.As in equation \ref{ngramcountdiv}, the maximum likelihood estimate ofthe probability of an event $\mathcal{A}$ occurring is defined by thenumber of times that event is observed, $a$, and the total number ofsamples in the training set of all observations, $A$, where$P(\mathcal{A}) = \frac{a}{A}$.  With this definition, events that donot occur in the training data are assigned zero probability since itwill be the case that $a=0$.  [Katz 1987]\footnote{S.M. Katz,\textbf{``Estimation of Probabilities from Sparse Data for theLanguage Model Component of a Speech Recogniser''}; \textit{IEEETransactions on Acoustic, Speech and Signal Processing} 1987, vol. 35no. 3 pp. 400-401} suggests multiplying each observed count by a discountcoefficient factor, $d_a$, which is dependent upon the number of timesthe event is observed, $a$, such that $a' = d_a \,.\, a$.Using this discounted occurrence count, the probability of an eventthat occurs $a$ times now becomes$P_\mathrm{discount}(\mathcal{A}) = \frac{a'}{A\,}$.Different discounting schemes have been proposed that define thediscount coefficient, $d_a$, in specific ways. The same discountcoefficient is used for all events that occur the same number oftimes on the basis of the symmetry requirement that two events thatoccur with equal frequency, $a$, must have the same probability, $p_a$.Defining $c_a$ as the number of events that occur exactly $a$ timessuch that $A = \sum_{a\ge 1} a\,.\,c_a$ %and given the discount%coefficients $d_a$ it thereforeit follows that the total amount ofreserved mass, left over for distribution amongst the unseen events,is %\begin{eqnarray}%p_0 &=& \frac{1}{c_0} \; \sum_{a\ge 1}%       a \,\frac{(1-d_a)\,.\,c_a}{A}\nonumber\\%&=& \frac{1}{c_0\,.\,A} \; \left( \sum_{a\ge 1} c_a\,.\,a% - \sum_{a\ge 1} d_a\,.\,c_a\,.\,a \right) \nonumber\\%&=& $\frac{1}{c_0} \; ( 1\;-$ $\frac{1}{A}\sum_{a\ge 1}$ $d_a\,.\,c_a\,.\,a)$.%\end{eqnarray}\subsubsection{Discounting}In [Good 1953]\footnote{I.J. Good, \textbf{``The Population Frequenciesof Species and the Estimation of Population Parameters''};\textit{Biometrika} 1953, vol. 40 (3,4) pp. 237-264}a method of discounting maximum likelihood estimates was proposedwhereby the count of an event occuring $a$ times is discounted with\begin{equation}d_a = (a+1) \frac{c_{a+1}}{a\,.\,c_a}\end{equation}A problem with this scheme, referred to as {\it Good-Turing} discounting,is that due to the count in the denominator it will fail if there is acase where $c_a = 0$ if there is any count $c_b > 0$ for$b>a$. Inevitably as $a$ increases the count $c_a$ will tend towardszero and for high $a$ there are likely to be many such zero counts. Asolution to this problem was proposed in[Katz 1987], which defines a cut-off value $k$ at which counts $a$for $a > k$ are not discounted\footnote{It is suggested that ``$k=5$or so is a good choice''} -- this is justified byconsidering these more frequently observed counts as reliable andtherefore not needing to be discounted. Katz then describes a reviseddiscount equation which preserves the same amount of mass for theunseen events:\begin{equation}d_a = \left\{ \begin{array}{c@{\quad:\quad}l}  \frac{(a+1) \frac{c_{a+1}}{a\,.\,c_a} \;-\; (k+1)\frac{c_{k+1}}{c_1}}      {1 - (k+1)\frac{c_{k+1}}{c_1}}  & 1 \le a \le k\\1 & a>k\end{array}\right.\end{equation}This method is itself unstable, however -- for example if $k.c_k > c_1$then $d_a$ will be negative for $1 \le a \le k$.\subsubsection{Absolute discounting}An alternative discounting method is {\it absolute}discounting,\footnote{H. Ney, U. Essen and R. Kneser, \textbf{``OnStructuring Probabilistic Dependences in Stochastic LanguageModelling''}; \textit{Computer Speech and Language} 1994, vol.8 no.1pp.1-38} in which a constant value $m$ is substracted from eachcount. The effect of this is that the events with the lowest countsare discounted relatively more than those with higher counts. Thediscount coefficient is defined as\begin{equation}d_a = \frac{a-m}{a}\end{equation}In order to discount the same amount of probability mass as theGood-Turing estimate, $m$ must be set to:\begin{equation}m = \frac{c_1}{\sum_{a=1}^{A}a\,.\,c_a}\end{equation}%\subsubsection{Linear discounting}%In {\it linear} discounting, event counts are discounted in proportion to%their magnitude, thus $d_a$ is constant over all values of $a$. In%order to discount the same quantity of probability mass as the%Good-Turing discounting scheme, $d_a$ must be defined as%\begin{equation}%d_a = 1 \,-\, \frac{c_1}{A}%\end{equation}\mysubsect{Smoothing probabilities}{HLMsmoothingprobs}\label{smoothing_probs}The above discounting schemes present various methods ofredistributing probability mass from observed events to unseenevents. Additionally, if events are infrequently observed then theycan be smoothed with less precise but more frequently observed events.In [Katz 1987] a {\it back off} scheme is proposed and used alongsideGood-Turing discounting. In this method probabilities areredistributed via the recursive utilisation of lower level conditionaldistributions. Given the $n$-gram case, if the $n$-tuple is not observedfrequently enough in the training text then a probability based on theoccurrence count of a shorter-context $(n-1)$-tuple is usedinstead -- using the shorter context estimate is referred to as{\it backing off}.In practice probabilities are typically consideredbadly-estimated if their corresponding word sequences are notexplicitly stored in the language model, either because they did notoccur in the training text or they have been discarded using somepruning mechanism.Katz defines a function $\hat{\beta}(w_{i-n+1},\ldots w_{i-1})$ which represents the totalprobability of all the unseen events in a particular context. % $w_1,

?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频
欧美日韩中字一区| 国产传媒久久文化传媒| 日韩午夜小视频| 99久久精品国产一区| 日韩av一区二| 成人欧美一区二区三区黑人麻豆| 制服.丝袜.亚洲.另类.中文| 粉嫩久久99精品久久久久久夜| 亚洲综合精品久久| 国产精品美女一区二区| 欧美大胆一级视频| 欧美精品丝袜久久久中文字幕| 大白屁股一区二区视频| 黄页视频在线91| 日韩经典一区二区| 亚洲综合免费观看高清完整版在线 | 欧美大片拔萝卜| 欧美视频日韩视频| 一本一本久久a久久精品综合麻豆 一本一道波多野结衣一区二区 | 在线国产亚洲欧美| 成人激情黄色小说| 国产精品66部| 激情综合五月婷婷| 琪琪久久久久日韩精品| 石原莉奈在线亚洲二区| 亚洲国产日韩a在线播放| 亚洲品质自拍视频网站| 欧美国产视频在线| 国产精品乱码人人做人人爱 | 国产三级精品在线| 精品va天堂亚洲国产| 日韩一级视频免费观看在线| 欧美日韩国产一级| 欧美性大战久久| 日本久久一区二区三区| 91色在线porny| 欧美日韩激情一区二区| 在线免费不卡电影| 91成人看片片| 欧美在线制服丝袜| 欧美在线看片a免费观看| 在线视频你懂得一区二区三区| 97久久精品人人做人人爽50路| 福利一区二区在线观看| 成人手机在线视频| 91亚洲精品久久久蜜桃网站 | 亚洲色图视频免费播放| 国产精品久久午夜| 亚洲另类中文字| 亚洲国产欧美在线人成| 亚洲线精品一区二区三区八戒| 亚洲一二三区不卡| 日韩精品91亚洲二区在线观看| 日本不卡一区二区三区高清视频| 另类的小说在线视频另类成人小视频在线 | 国产乱理伦片在线观看夜一区| 国产麻豆日韩欧美久久| 国产suv精品一区二区三区| 成人综合在线观看| 欧美在线免费播放| 欧美一级免费大片| 国产午夜精品理论片a级大结局| 欧美国产日韩精品免费观看| 最新国产精品久久精品| 亚洲高清视频在线| 精品系列免费在线观看| 成人av第一页| 欧美日韩在线播放| 欧美videos中文字幕| 久久精品视频免费| 一区二区三区国产精华| 久久精品国产**网站演员| 成人小视频在线| 欧美性猛交xxxx乱大交退制版| 91精品久久久久久蜜臀| 国产欧美日韩另类一区| 亚洲综合一区二区精品导航| 久久精品国产亚洲高清剧情介绍 | 色婷婷久久久综合中文字幕| 欧美一级一级性生活免费录像| 久久女同精品一区二区| 国产视频一区在线播放| 亚洲精品乱码久久久久久黑人 | 国产欧美综合在线| 一区二区三区久久| 国产精品一区免费视频| 在线观看免费亚洲| 久久天天做天天爱综合色| 亚洲一区二区在线视频| 国产在线日韩欧美| 欧美日韩一区 二区 三区 久久精品| 欧美电影精品一区二区| 一区二区三区在线不卡| 国产乱码一区二区三区| 欧美伦理电影网| 最好看的中文字幕久久| 久久99精品久久久久久久久久久久 | 在线播放视频一区| 国产精品久久久一区麻豆最新章节| 亚洲va国产天堂va久久en| 国产成人夜色高潮福利影视| 欧美精品第1页| 伊人一区二区三区| 国产成人精品免费在线| 日韩免费高清av| 午夜精品久久久久久久99樱桃| 成人黄色片在线观看| 精品国内二区三区| 日日欢夜夜爽一区| 欧美无乱码久久久免费午夜一区 | 日韩欧美国产午夜精品| 一区二区三区在线不卡| www.日韩在线| 国产日产精品1区| 精品在线免费观看| 日韩精品一区二区三区在线观看| 亚洲一区在线视频| av在线不卡电影| 国产欧美一区视频| 国产精品主播直播| 精品久久久久久综合日本欧美| 五月天婷婷综合| 精品婷婷伊人一区三区三| 亚洲免费观看在线视频| 91视频免费播放| 亚洲天堂福利av| 91亚洲精华国产精华精华液| 国产精品私人自拍| 丰满白嫩尤物一区二区| 国产精品青草久久| 不卡的av电影| 综合网在线视频| 91成人网在线| 亚洲成人av免费| 欧美日韩一区二区三区高清| 亚洲午夜国产一区99re久久| 欧美在线视频全部完| 一区二区三区中文免费| 色婷婷国产精品久久包臀| 亚洲精品五月天| 欧美日韩亚州综合| 日韩综合小视频| 日韩亚洲欧美中文三级| 久久99国产乱子伦精品免费| 精品国产一区二区三区忘忧草| 国产中文字幕精品| 日本一区二区三级电影在线观看| 国产盗摄视频一区二区三区| 日本一区二区综合亚洲| 91色乱码一区二区三区| 亚洲一区二区三区四区的| 欧美精品v日韩精品v韩国精品v| 天天亚洲美女在线视频| 日韩免费看网站| 成人精品免费网站| 亚洲免费观看高清完整版在线| 欧美唯美清纯偷拍| 久久福利资源站| 欧美国产精品劲爆| 欧美曰成人黄网| 蜜桃av一区二区| 久久夜色精品一区| 91视频一区二区三区| 日本午夜精品视频在线观看 | 亚洲一区二区精品久久av| 884aa四虎影成人精品一区| 狠狠色狠狠色综合| 自拍偷拍亚洲激情| 欧美一区二区播放| 国产精品69久久久久水密桃| 亚洲精品成a人| 日韩精品一区二区三区中文不卡| 成人午夜在线视频| 日韩影院在线观看| 中文字幕一区二区三区色视频| 欧美日韩一区二区三区四区五区| 开心九九激情九九欧美日韩精美视频电影 | 国产一区二区不卡| 一区二区久久久久| 2019国产精品| 在线免费观看日韩欧美| 国产伦理精品不卡| 亚洲午夜精品久久久久久久久| 精品91自产拍在线观看一区| 色综合久久中文综合久久牛| 另类小说综合欧美亚洲| 亚洲色图一区二区三区| 精品三级在线观看| 欧美视频中文字幕| 国产福利一区二区三区在线视频| 亚洲va欧美va天堂v国产综合| 国产亚洲成av人在线观看导航| 欧美三日本三级三级在线播放| 国产成人精品影视| 另类成人小视频在线| 一区二区高清视频在线观看| 久久精品视频网| 日韩午夜精品电影| 日本电影亚洲天堂一区| 成人精品视频.|