?? dtree.html
字號:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"><!-- =================================================================== File : dtree.html Contents: Description of decision and regression tree programs Author : Christian Borgelt==================================================================== --><html><head><title>Decision and Regression Trees</title></head><!-- =============================================================== --><body bgcolor=white><h1><a name="top">Decision and Regression Trees</h1><h3>(A Brief Documentation of the Programs dti / dtp / dtx / dtr)</a></h3><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3>Contents</h3><ul type=disc><li><a href="#intro">Introduction</a></li><li><a href="#domains">Determining Attribute Domains</a></li><li><a href="#induce">Inducing a Decision Tree</a></li><li><a href="#prune">Pruning a Decision Tree</a></li><li><a href="#exec">Executing a Decision Tree</a></li><li><a href="#xmat">Computing a Confusion Matrix</a></li><li><a href="#rules">Extracting Rules from a Decision Tree</a></li><li><a href="#other">Other Decision Tree Examples</a></li><li><a href="#copying">Copying</a></li><li><a href="#download">Download</a></li><li><a href="#contact">Contact</a></li></ul><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="intro">Introduction</a></h3><p>I am sorry that there is no detailed documentation yet. Below youcan find a brief explanation of how to grow a decision tree with theprogram <tt>dti</tt>, how to prune a decision tree with the program<tt>dtp</tt>, how to execute a decision tree with the program<tt>dtx</tt>, and how to extract rules from a decision tree with theprogram <tt>dtr</tt>. For a list of options, call the programs withoutany arguments.</p><p>Enjoy,<br><a href="http://fuzzy.cs.uni-magdeburg.de/~borgelt/">Christian Borgelt</a></p><p>As a simple example for the explanations below I use the datasetin the file <tt>dtree/ex/drug.tab</tt>, which lists 12 records ofpatient data (sex, age, and blood pressure) together with an effectivedrug (effective w.r.t. some unspecified disease). The contents of thisfile is:</p><pre> Sex Age Blood_pressure Drug male 20 normal A female 73 normal B female 37 high A male 33 low B female 48 high A male 29 normal A female 52 normal B male 42 low B male 61 normal B female 30 normal A female 26 low B male 54 high A</pre><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="domains">Determining Attribute Domains</a></h3><p>To induce a decision tree for the effective drug, one firsthas to determine the domains of the table columns using the program<tt>dom</tt> (to be found in the table package, see below):</p><pre> dom -a drug.tab drug.dom</pre><p>The program <tt>dom</tt> assumes that the first line of the tablefile contains the column names. (This is the case for the example file<tt>drug.tab</tt>.) If you have a table file without column names, youcan let the program read the column names from another file (using the<tt>-h</tt> option) or you can let the program generate default names(using the <tt>-d</tt> option), which are simply the column numbers.The <tt>-a</tt> option tells the program to determine automaticallythe column data types. Thus the values of the <tt>Age</tt> column areautomatically recognized as integer values.</p><p>After dom has finished, the contents of the file <tt>drug.dom</tt>should look like this:</p><pre> dom(Sex) = { male, female }; dom(Age) = ZZ; dom(Blood_pressure) = { normal, high, low }; dom(Drug) = { A, B };</pre><p>The special domain <tt>ZZ</tt> represents the set of integer numbers,the special domain <tt>IR</tt> (not used here) the set of real numbers.(The double <tt>Z</tt> and the <tt>I</tt> in front of the <tt>R</tt>are intended to mimic the bold face or double stroke font used inmathematics to write the set of integer or the set of real numbers.All programs that need to read a domain description also recognizea single <tt>Z</tt> or a single <tt>R</tt>.)</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="induce">Inducing a Decision Tree</a></h3><p>To induce a decision tree using the <tt>dti</tt> program(<tt>dti</tt> is simply an abbreviation of Decision Tree Induction),type</p><pre> dti -a drug.dom drug.tab drug.dt</pre><p>You need not tell the program <tt>dti</tT> that the Drug columncontains the class, since by default it uses the last column as theclass column (the <tt>Drug</tt> column is the last column in the file<tt>drug.tab</tt>). If a different column contains the class, you canspecify its name on the command line using the <tt>-c</tt> option,e.g. <tt>-c Drug</tt>.</p><p>At first glance it seems to be superfluous to provide the<tt>dti</tt> program with a domain description, since it is alsogiven the table file and thus can determine the domains itself.But without a domain description, the <tt>dti</tt> program would beforced to use all columns in the table file and to use them with theautomatically determined data types. But occasions may arise in whichyou want to induce a decision tree from a subset of the columns or inwhich the numbers in a column are actually coded symbolic values. Insuch a case the domain file provides a way to tell the <tt>dti</tt>program about the columns to use and their data types. To ignore acolumn, simply remove the corresponding domain definition from thedomain description file (or comment it out --- C-style(<tt>/* ... */</tt>) and C++-style (<tt>// ...</tt>) comments aresupported). To change the data type of a column, simply change thedomain definition.</p><p>By default the program <tt>dti</tt> uses information gain ratio asthe attribute selection measure. Other measures can be selected viathe <tt>-e</tt> option. Call <tt>dti</tt> with option <tt>-!</tt> fora list of available attribute selection measures.</p><p>With the above command the induced decision tree is written to thefile <tt>drug.dt</tt>. The contents of this file should look likethis:</p><pre> dtree(Drug) = { (Blood_pressure) normal:{ (Age|41) <:{ A: 3 }, >:{ B: 3 }}, high :{ A: 3 }, low :{ B: 3 }};</pre><p>Since the <tt>-a</tt> option was given, the colons after the valuesof an attribute (here, for example, the values of the attribute<tt>Blood_pressure</tt>) are aligned. This makes a decision tree easierto read, but may result in larger than necessary output files.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="prune">Pruning a Decision Tree</a></h3><p>Although it is not necessary for our simple example, the induceddecision tree can be pruned, i.e., simplified by removing somedecisions. This is done by invoking the program <tt>dtp</tt>(<tt>dtp</tt> is simply an abbreviation for Decision Tree Pruning):</p><pre> dtp -a drug.dt drug_p.dt</pre><p>The table the decision tree was induced from can be given as athird argument to the <tt>dtp</tt> program. In this case an additionalway of pruning (replacing an inner node (an attribute test) by itslargest child) is enabled.</p><p>By default dtp uses confidence level pruning with a confidencelevel of 0.5 as the pruning method. The confidence level can bechanged via the <tt>-p</tt> option (pruning parameter), the pruningmethod via the <tt>-m</tt> option. Call <tt>dtp</tt> without argumentsfor a list of available pruning methods.</p><table width="100%" border=0 cellpadding=0 cellspacing=0><tr><td width="95%" align=right><a href="#top">back to the top</a></td> <td width=5></td> <td><a href="#top"><img src="uparrow.gif" border=0></a></td></tr></table><!-- =============================================================== --><p><img src="line.gif" alt="" height=7 width=704></p><h3><a name="exec">Executing a Decision Tree</a></h3><p>An induced decision tree can be used to classify new data using theprogram <tt>dtx</tt> (<tt>dtx</tt> is simply an abbreviation forDecision Tree Execution):</p><pre> dtx -a drug.dt drug.tab drug.cls</pre><p><tt>drug.tab</tt> is the table file (since we do not have specialtest data, we simply use the training data), <tt>drug.cls</tt> isthe output file. After <tt>dtx</tt> has finished, <tt>drug.cls</tt>contains (in addition to the columns appearing in the decision tree,and, for preclassified data, the class column) a new column <tt>dt</tt>,which contains the class that is predicted by the decision tree.You can give this new column a different name with the <tt>-p</tt>
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -