Rainbow is a C program that performs document classification usingone of several different methods, including naive Bayes, TFIDF/Rocchio,k-Nearest neighbor, Maximum Entropy, Support Vector Machines, Fuhr sProbabilitistic Indexing, and a simple-minded form a shrinkage withnaive Bayes.
* acousticfeatures.m: Matlab script to generate training and testing files from event timeseries.
* afm_mlpatterngen.m: Matlab script to extract feature information from acoustic event timeseries.
* extractevents.m: Matlab script to extract event timeseries using the complete run timeseries and the ground truth/label information.
* extractfeatures.m: Matlab script to extract feature information from all acoustic and seismic event timeseries for a given run and set of nodes.
* sfm_mlpatterngen.m: Matlab script to extract feature information from esmic event timeseries.
* ml_train1.m: Matlab script implementation of the Maximum Likelihood Training Module.
?ml_test1.m: Matlab script implementation of the Maximum Likelihood Testing Module.
?knn.m: Matlab script implementation of the k-Nearest Neighbor Classifier Module.
How the K-mean Cluster work
Step 1. Begin with a decision the value of k = number of clusters
Step 2. Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly, or systematically as the following:
Take the first k training sample as single-element clusters
Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recomputed the centroid of the gaining cluster.
Step 3 . Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.
Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.
ClustanGraphics聚類分析工具。提供了11種聚類算法。
Single Linkage (or Minimum Method, Nearest Neighbor)
Complete Linkage (or Maximum Method, Furthest Neighbor)
Average Linkage (UPGMA)
Weighted Average Linkage (WPGMA)
Mean Proximity
Centroid (UPGMC)
Median (WPGMC)
Increase in Sum of Squares (Ward s Method)
Sum of Squares
Flexible (ß space distortion parameter)
Density (or k-linkage, density-seeking mode analysis)