?? faq.html
字號:
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f401"><b>Q: The output of training C-SVM is like the following. What do they mean?</b></a>
<br/>
<br>optimization finished, #iter = 219
<br>nu = 0.431030
<br>obj = -100.877286, rho = 0.424632
<br>nSV = 132, nBSV = 107
<br>Total nSV = 132
<p>
obj is the optimal objective value of the dual SVM problem.
rho is the bias term in the decision function
sgn(w^Tx - rho).
nSV and nBSV are number of support vectors and bounded support
vectors (i.e., alpha_i = C). nu-svm is a somewhat equivalent
form of C-SVM where C is replaced by nu. nu simply shows the
corresponding parameter. More details are in
<a href="http://www.csie.ntu.edu.tw/~cjlin/papers/libsvm.pdf">
libsvm document</a>.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f402"><b>Q: Can you explain more about the model file?</b></a>
<br/>
<p>
After the parameters, each line represents a support vector.
Support vectors are listed in the order of "labels" listed earlier.
(i.e., those from the first class in the "labels" list are
grouped first, and so on.)
If k is the total number of classes,
in front of each support vector, there are
k-1 coefficients
y*alpha where alpha are dual solution of the
following two class problems:
<br>
1 vs j, 2 vs j, ..., j-1 vs j, j vs j+1, j vs j+2, ..., j vs k
<br>
and y=1 in first j-1 coefficients, y=-1 in the remaining
k-j coefficients.
For example, if there are 4 classes, the file looks like:
<pre>
+-+-+-+--------------------+
|1|1|1| |
|v|v|v| SVs from class 1 |
|2|3|4| |
+-+-+-+--------------------+
|1|2|2| |
|v|v|v| SVs from class 2 |
|2|3|4| |
+-+-+-+--------------------+
|1|2|3| |
|v|v|v| SVs from class 3 |
|3|3|4| |
+-+-+-+--------------------+
|1|2|3| |
|v|v|v| SVs from class 4 |
|4|4|4| |
+-+-+-+--------------------+
</pre>
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f403"><b>Q: Should I use float or double to store numbers in the cache ?</b></a>
<br/>
<p>
We have float as the default as you can store more numbers
in the cache.
In general this is good enough but for few difficult
cases (e.g. C very very large) where solutions are huge
numbers, it might be possible that the numerical precision is not
enough using only float.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f404"><b>Q: How do I choose the kernel?</b></a>
<br/>
<p>
In general we suggest you to try the RBF kernel first.
A recent result by Keerthi and Lin
(<a href=http://www.csie.ntu.edu.tw/~cjlin/papers/limit.ps.gz>
download paper here</a>)
shows that if RBF is used with model selection,
then there is no need to consider the linear kernel.
The kernel matrix using sigmoid may not be positive definite
and in general it's accuracy is not better than RBF.
(see the paper by Lin and Lin
(<a href=http://www.csie.ntu.edu.tw/~cjlin/papers/tanh.pdf>
download paper here</a>).
Polynomial kernels are ok but if a high degree is used,
numerical difficulties tend to happen
(thinking about dth power of (<1) goes to 0
and (>1) goes to infinity).
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f405"><b>Q: Does libsvm have special treatments for linear SVM?</b></a>
<br/>
<p>
No, libsvm solves linear/nonlinear SVMs by the
same way.
Some tricks may save training/testing time if the
linear kernel is used,
so libsvm is <b>NOT</b> particularly efficient for linear SVM,
especially when
C is large and
the number of data is much larger
than the number of attributes.
You can either
<ul>
<li>
Use small C only. We have shown in the following paper
that after C is larger than a certain threshold,
the decision function is the same.
<p>
<a href="http://guppy.mpe.nus.edu.sg/~mpessk/">S. S. Keerthi</a>
and
<B>C.-J. Lin</B>.
<A HREF="papers/limit.ps.gz">
Asymptotic behaviors of support vector machines with
Gaussian kernel
</A>
.
<I><A HREF="http://mitpress.mit.edu/journal-home.tcl?issn=08997667">Neural Computation</A></I>, 15(2003), 1667-1689.
<li>
Check <a href=http://www.csie.ntu.edu.tw/~cjlin/liblinear>liblinear</a>,
which is designed for large-scale linear classification.
</ul>
<p> Please also see our <a href=../papers/guide/guide.pdf>SVM guide</a>
on the discussion of using RBF and linear
kernels.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f406"><b>Q: The number of free support vectors is large. What should I do?</b></a>
<br/>
<p>
This usually happens when the data are overfitted.
If attributes of your data are in large ranges,
try to scale them. Then the region
of appropriate parameters may be larger.
Note that there is a scale program
in libsvm.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f407"><b>Q: Should I scale training and testing data in a similar way?</b></a>
<br/>
<p>
Yes, you can do the following:
<br> svm-scale -s scaling_parameters train_data > scaled_train_data
<br> svm-scale -r scaling_parameters test_data > scaled_test_data
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f408"><b>Q: Does it make a big difference if I scale each attribute to [0,1] instead of [-1,1]?</b></a>
<br/>
<p>
For the linear scaling method, if the RBF kernel is
used and parameter selection is conducted, there
is no difference. Assume Mi and mi are
respectively the maximal and minimal values of the
ith attribute. Scaling to [0,1] means
<pre>
x'=(x-mi)/(Mi-mi)
</pre>
For [-1,1],
<pre>
x''=2(x-mi)/(Mi-mi)-1.
</pre>
In the RBF kernel,
<pre>
x'-y'=(x-y)/(Mi-mi), x''-y''=2(x-y)/(Mi-mi).
</pre>
Hence, using (C,g) on the [0,1]-scaled data is the
same as (C,g/2) on the [-1,1]-scaled data.
<p> Though the performance is the same, the computational
time may be different. For data with many zero entries,
[0,1]-scaling keeps the sparsity of input data and hence
may save the time.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f409"><b>Q: The prediction rate is low. How could I improve it?</b></a>
<br/>
<p>
Try to use the model selection tool grid.py in the python
directory find
out good parameters. To see the importance of model selection,
please
see my talk:
<A HREF="http://www.csie.ntu.edu.tw/~cjlin/talks/freiburg.pdf">
A practical guide to support vector
classification
</A>
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f410"><b>Q: My data are unbalanced. Could libsvm handle such problems?</b></a>
<br/>
<p>
Yes, there is a -wi options. For example, if you use
<p>
svm-train -s 0 -c 10 -w1 1 -w-1 5 data_file
<p>
the penalty for class "-1" is larger.
Note that this -w option is for C-SVC only.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f411"><b>Q: What is the difference between nu-SVC and C-SVC?</b></a>
<br/>
<p>
Basically they are the same thing but with different
parameters. The range of C is from zero to infinity
but nu is always between [0,1]. A nice property
of nu is that it is related to the ratio of
support vectors and the ratio of the training
error.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f412"><b>Q: The program keeps running (without showing any output). What should I do?</b></a>
<br/>
<p>
You may want to check your data. Each training/testing
data must be in one line. It cannot be separated.
In addition, you have to remove empty lines.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f413"><b>Q: The program keeps running (with output, i.e. many dots). What should I do?</b></a>
<br/>
<p>
In theory libsvm guarantees to converge if the kernel
matrix is positive semidefinite.
After version 2.4 it can also handle non-PSD
kernels such as the sigmoid (tanh).
Therefore, this means you are
handling ill-conditioned situations
(e.g. too large/small parameters) so numerical
difficulties occur.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f414"><b>Q: The training time is too long. What should I do?</b></a>
<br/>
<p>
For large problems, please specify enough cache size (i.e.,
-m).
Slow convergence may happen for some difficult cases (e.g. -c is large).
You can try to use a looser stopping tolerance with -e.
If that still doesn't work, you may want to train only a subset of the data.
You can use the program subset.py in the directory "tools"
to obtain a random subset.
<p> When using large -e, you may want to check if -h 0 (no shrinking) or -h 1 (shrinking) is faster.
See the next question below.
<p>
If you are using polynomial kernels, please check the question on the pow() function.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f4141"><b>Q: Does shrinking always help?</b></a>
<br/>
<p>
If the number of iterations is high, then shrinking
often helps.
However, if the number of iterations is small
(e.g., you specify a large -e), then
probably using -h 0 (no shrinking) is better.
See the
<a href=../papers/libsvm.pdf>implementation document</a> for details.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f415"><b>Q: How do I get the decision value(s)?</b></a>
<br/>
<p>
We print out decision values for regression. For classification,
we solve several binary SVMs for multi-class cases. You
can obtain values by easily calling the subroutine
svm_predict_values. Their corresponding labels
can be obtained from svm_get_labels.
Details are in
README of libsvm package.
<p>
We do not recommend the following. But if you would
like to get values for
TWO-class classification with labels +1 and -1
(note: +1 and -1 but not things like 5 and 10)
in the easiest way, simply add
<pre>
printf("%f\n", dec_values[0]*model->label[0]);
</pre>
after the line
<pre>
svm_predict_values(model, x, dec_values);
</pre>
of the file svm.cpp.
Positive (negative)
decision values correspond to data predicted as +1 (-1).
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f4151"><b>Q: How do I get the distance between a point and the hyperplane?</b></a>
<br/>
<p>
The distance is |decision_value| / |w|.
We have |w|^2 = w^Tw = alpha^T Q alpha = 2*(dual_obj + sum alpha_i).
Thus in svm.cpp please find the place
where we calculate the dual objective value
(i.e., the subroutine Solve())
and add a statement to print w^Tw.
<p align="right">
<a href="#_TOP">[Go Top]</a>
<hr/>
<a name="/Q4:_Training_and_prediction"></a>
<a name="f416"><b>Q: On 32-bit machines, if I use a large cache (i.e. large -m) on a linux machine, why sometimes I get "segmentation fault ?"</b></a>
<br/>
<p>
On 32-bit machines, the maximum addressable
memory is 4GB. The Linux kernel uses 3:1
split which means user space is 3G and
kernel space is 1G. Although there are
3G user space, the maximum dynamic allocation
memory is 2G. So, if you specify -m near 2G,
the memory will be exhausted. And svm-train
will fail when it asks more memory.
For more details, please read
<a href=http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=3BA164F6.BAFA4FB%40daimi.au.dk>
this article</a>.
<p>
The easiest solution is to switch to a
64-bit machine.
Otherwise, there are two ways to solve this. If your
machine supports Intel's PAE (Physical Address
Extension), you can turn on the option HIGHMEM64G
in Linux kernel which uses 4G:4G split for
kernel and user space. If you don't, you can
try a software `tub' which can eliminate the 2G
boundary for dynamic allocated memory. The `tub'
?? 快捷鍵說明
復制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號
Ctrl + =
減小字號
Ctrl + -