This directory contains code implementing the K-means algorithm. Source code
may be found in KMEANS.CPP. Sample data isfound in KM2.DAT. The KMEANS
program accepts input consisting of vectors and calculates the given
number of cluster centers using the K-means algorithm. Output is
directed to the screen.
How the K-mean Cluster work
Step 1. Begin with a decision the value of k = number of clusters
Step 2. Put any initial partition that classifies the data into k clusters. You may assign the training samples randomly, or systematically as the following:
Take the first k training sample as single-element clusters
Assign each of the remaining (N-k) training sample to the cluster with the nearest centroid. After each assignment, recomputed the centroid of the gaining cluster.
Step 3 . Take each sample in sequence and compute its distance from the centroid of each of the clusters. If a sample is not currently in the cluster with the closest centroid, switch this sample to that cluster and update the centroid of the cluster gaining the new sample and the cluster losing the sample.
Step 4 . Repeat step 3 until convergence is achieved, that is until a pass through the training sample causes no new assignments.