?? ripper.1
字號(hào):
.EN.TH RIPPER 1.SH NAME.PPripper \- learns a rule set from examples.SH SYNOPSIS.PP.B ripper[ options ] filestem.SH DESCRIPTION.PP.I Ripperis a program for inducing classification rules from a set ofpreclassified examples; as such it is broadly similar to learningmethods such as neural nets, nearest neighbor, and decision treeinduction systems such as CART, C4.5 and ID3. The user provides a setof examples, each of which has been labeled with the appropriate.I class. Ripper will then look at the examples and find a set of rules thatwill predict the class of later examples. .PPRipper has several advantages over other learning techniques. First,ripper's hypothesis is expressed as a set of if-then rules. Theserules are relatively easy for people to understand; if the ultimategoal is to gain insight into the data, then ripper is probably abetter choice than a neural network learning method, oreven a decision tree induction system like CART. Second, ripper is asymptotically faster than other competitive rule learningalgorithms; this means that it will be much faster on largedatasets. Third, ripper allows the user tospecify constraints on the format of the learned if-then rules. Ifthere is some prior knowledge about the concept to be learned, thenthese constraints can often lead to more accurate hypotheses.Fourth, ripper allows attributes to be either nominal,continuous, or "set-valued". The value of a set-valued attributeis a set of atoms: for example, a set-valued attributecould be used to encode the set of words that appeared inthe body of a document. Recent versions of Ripper also supportbag-valued attributes. .SH OPTIONS TO RIPPER.PPThe sole argument to ripper is a .I filestem that determines the name of ripper's input files. (The input filesfor ripper are described below.) The options to ripper have thefollowing meanings..PP.TP 10.BI \-c\^Expect noise-free data..TP.BI \-n\^Expect noisy data (the default.).TP.BI \-k num\^Estimate error rates with k-fold cross-validation. The trainingis split into k disjoint partitions, and the learning algorithm istrained on every collection of k-1, and then tested in the remainingpartition..TP.BI \-l\^Estimate error rate with leave-one-out cross-validation (ie,N-fold cross-validation where N is size of training set.).TP.BI \-v lev\^Set the trace level ("verbosity") to.I lev,which must be either 0, 1, 2, or 3. The default is 0..TP.BI \-a ordering\^Before learning, ripper first heuristically orders the classes;by one of the following methods: +freq, order by increasiningfrequency (the default); -freq, order by decreasing frequency;given, order classes as in the names file; mdl, use heuristicsto guess an optimal ordering; unordered (see below)..PPAfter arranging the classes ripper finds rules toseparate class1 from classes class2, ... classn, then rules toseparate class2 from classes class3, ... classn, and so on. The finalclass classn will become the default class. The end result is thatrules for a single class will always be grouped together, but rulesfor classi are possibly simplified, because they can assume that theclass of the example is one of classi, ... classn. If an exampleis covered by rules from two or more classes, then this conflict isresolved in favor of the class that comesfirst in the ordering..PPWith the '-aunordered' option, ripper will separateeach class from the remaining classes, thus ending up with rules for every class. Conflicts are resolved by deciding infavor of the rule with lowest training-set error..TP.BI \-s \^Read the training data from standard input, rather than from filestem.data..TP.BI \-g filename\^Use grammar file filename.gram..TP.BI \-f filename\^Use names file filename.names..TP.BI \-O n\^Control optimization of rules. Rippermakes n optimization passes over the rules it learns. The default is n=2..TP.BI \-M n\^Use statistics collected on a class-stratified subsample of .I nexamples (instead of the entire dataset) to make certainfrequently repeated decisions. For very large datasets,RIPPER using subsamples of a few hundred or a few thousand willtypically produce a slightly inferior ruleset; however, it will run muchmore quickly than RIPPER without subsampling..TP.BI \-I n\^Discretize continuous attributes into .I nequal-frequency segments. (If.I numis zero, discretize into the maximal possible number ofsegments.) Default is to not discretize continuous values.Discretization usually speeds up ripper on large datasetswith many continuous values, but may cost in accuracy..TP.BI \-G\^Print the grammar and exit. This is sometimes useful whenone would like to make a change to the default grammar..TP.BI \-N\^Print a names file and exit. This is sometimes useful whenone would like to generate a names file for use by C4.5.(Ripper can usually infer the types of an attribute froma dataset, so a names file for Ripper is optional.).TP.BI \-R\^Randomize operation. (By default, a fixed random seed is used.).TP.BI \-! stringAllow or disallow negative tests in rules. If the string contains a"s", then negative tests of the form "attribute !~ value" forset- and bag- valued attributes will be allowed in rules. (The symbol "!~"stands for "does not contain".) If the stringcontains an "n", then negative tests of the form "attribute != value"for nominal attributes will be allowed in rules. .BI \-D\^ nChange the maximum "decompression"..TP.BI \-S\^ nSimplify the hypothesis more (n>1) or less (n<1)..TP.BI \-L\^ nChange the "loss" ratio, ie the ratio of the cost of a false negativeto the cost of a false positive. A value of n>1 will usuallyimprove recall of the minority classes, and a value of n<1will usually improve precision..TP.BI \-A\^Add redundant tests to rules. This sometimes improves precisionand readability, principly for set- or bag-valued attributes that containsets of English words..TP.BI \-F\^ nForce each rule to cover at least.I nexamples..SH INPUT FILESThe files read and written by ripper are of the form.I filestem.extwhere.I filestemis the first and only argument to ripper. All of ripper inputfiles are free format (i.e. white space is not important.) Anythingfollowing a percent sign character but on the same line is a comment..PPRipper expects to find four files: a.B data filecalled .I filestem.datacontaining some preclassified examples, a called.I filestem.testthat contains some additional preclassified examples to be used astest cases, a .B names filecalled.I filestem.namesdefining the names of the classes and attributes used in the data file,and a .B grammar file called.I filestem.gramdefining the rules that are allowed to be used in a hypothesis.Except for the grammar file,the format for these files is roughly thesame as used by C4.5. The format will be described in more detail below. Thelast three files are optional. If there is no test file ripper willeither not test its learned rule set, or (if directed by the user todo so through the \fB-k\fR or \fB-y\fR options) ripper will usecross-validation to test its learned rule set. If there is no namesfile, ripper will assign arbitrary names to the attributes andclasses, and will try to figure out the types of the attributes fromthe data. If there is no grammar file, ripper will use the defaultgrammar described below..PP Ripper also creates a file .I filestem.hypcontaining the ruleset or rulesets it found, in a format that is intended to be computer-readable..PPAn example for ripper is described by a fixed set of.I attributes. These attributes can be either continuous, nominal, set-valued,or bag-valued. Continuous attributes have real-number values. The value of a nominal attribute is one of a fixed set of symbolic values, for example "on, off" or "low, medium, high". The value of a set- or bag-valued attribute is a set of atoms (ratherthan a single symbolic value.) These attributes, as well as the classes thatare to be predicted, are defined in the.I names file..PPThe names file contains first, a comma-separated list of atomsrepresenting the classes, terminated by a period. (An.I atomcontains only letters, digits, and the underscore character, and mustbegin with a letter. Alternatively, an atom is any sequence ofcharacters enclosed in single quotes.) The list of classes is followed by a list of.I attribute definitions.Each attribute definition consistsofthe name of the attribute, e.g. "height" or "sex";a colon;and either the atom .I continuous if the attribute is continuous, the atom .I setif the attribute is set-valued,the atom .I bagif the attribute is bag-valued,the atom.I symbolicif the attribute can take on any symbolic value,or a comma-separated list of atomsrepresenting possible symbolic values of the attribute, if the attribute isnominal. Finally, every attribute definition must be terminated by a period..PPRipper also supports .I ignored and.I suppressedattributes.Ignored attributes are completely ignored by the learning system.To define an ignored attribute, use a declaration of the form.I attribute_name: ignore.Suppressed attributes are similar, except that while theyare not used in Ripper's hypotheses, the number of values ofthe attribute does effect MDL-based pruning. Hence,suppressing an attribute that was notused in a hypothesis should not change Ripper's performancein any way.
?? 快捷鍵說(shuō)明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -