?? naive bayes algorithm for learning to classify text.htm
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<!-- saved from url=(0067)http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes.html -->
<HTML><HEAD><TITLE>Naive Bayes algorithm for learning to classify text</TITLE>
<META http-equiv=Content-Type content="text/html; charset=gb2312"><!-- Changed by: Jason Rennie, 2-Feb-1997 -->
<META content="MSHTML 6.00.2600.0" name=GENERATOR></HEAD>
<BODY aLink=#5e5a80 bgColor=#eff7ff>
<H1>Naive Bayes algorithm for learning to classify text </H1>
<H3>Companion to Chapter 6 of <A
href="http://www.cs.cmu.edu/~tom/mlbook.html"><I>Machine Learning</I></A>
textbook. </H3>Naive Bayes classifiers are among the most successful known
algorithms for learning to classify text documents. This page provides an
implementation of the Naive Bayes learning algorithm similar to that described
in Table 6.2 of the textbook. It also provides a dataset containing 20,000
newsgroup messages drawn from the 20 newsgroups described in Table 6.3. As
mentioned in the textbook, the dataset contains 1000 documents from each of the
20 newsgroups.
<P>
<H3>Note on downloading </H3>This code and data are only supported under the
Unix and Linux operating systems. (if you would like to volunteer support for
Windows, please contact me). To reconstruct the original files from a downloaded
files such as xxx.tar.gz, type the following two commands to Unix:
<P><I>gunzip xxx.tar.gz <BR>tar -xf xxx.tar </I>
<P>
<H3>Code</H3>This code is based on the Rainbow/Libbow software package developed
by Andrew McCallum. It includes efficient C code for indexing text documents
along with code implementing the Naive Bayes learning algorithm. Libbow also
provides implementations of two additional text learning algorithms: TFIDF and
prTFIDF. This code may be used as both a building block for creating other
programs, or as a stand-alone learning/classification system.
<P>Note: this code is a minor variant of the code described in Table 6.2 of <A
href="http://www.cs.cmu.edu/~tom/mlbook.html"><I>Machine Learning</I></A>.
<UL>
<LI><A href="http://www.cs.cmu.edu/~mccallum/bow">Most recent Libbow source
code and documentation</A>
<LI><A
href="http://www.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes/bow-latest.tar.gz">Old
Libbow source code and documentation (tarred and gziped)</A> </LI></UL>.
<P>
<H3>Newsgroup Data</H3><!--One of the datasets which has been used to evaluate textlearning algorithms --><!-- -->
<UL>
<LI>The <A
href="http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes/20_newsgroups.tar.gz">tarred
and gzipped data directory </A>(easiest for downloading).
<LI>A <A
href="http://www-2.cs.cmu.edu/afs/cs/project/theo-11/www/naive-bayes/mini_newsgroups.tar.gz">tarred
and gzipped</A> subset of the Newsgroup data which contains 100 randomly
selected messages from each newsgroup. This is a useful dataset for learning
to use Rainbow. </LI></UL>
<P>
<H3>On-Line Documentation</H3>
<UL>
<LI><A href="http://www.cs.cmu.edu/~mccallum/bow/rainbow">Rainbow
Documentation</A> <!-- <LI><A HREF="/afs/cs/project/theo-11/www/naive-bayes/quick_intro.html">Quick 'n Dirty Intro to Rainbow</A> --></LI></UL>
<P><I>Visitors from outside CMU are invited to use this material free of charge
for any educational purpose, provided attribution is given in any lectures or
publications that make use of this material. </I>
<P><I>This page organized by Jason Rennie. </I><BR><IMG
src="Naive Bayes algorithm for learning to classify text.files/colorsep.gif">
<CENTER><I><A href="mailto:jr6b@cs.cmu.edu">jr6b@cs.cmu.edu</A> | Last updated
4/6/97 </I></CENTER></BODY></HTML>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -