?? algorithm
字號(hào):
Here is some information regarding the clustering method used in cluster,since many people have asked for some.THE METHODThe algorithm works by iteratively merging smaller clusters into biggerones.It starts with one data point per cluster. Then look for the smallestdistance between any two clusters (i.e., data points at this stage).Distance is Euclidean by default, but can be any p-norm (see the -noption). Those two cluster are merged, i.e., they form a new clusterwith the two earlier ones as subclusters, which gives rise to a branchin the cluster tree. For purposes of distance computation the new clusteris represented by the average of all its data points.The merging is repeated until only one cluster is left.LITERATUREAs pointed out by Fionn Murtagh (fmurtagh@eso.org), the algorithm implementedby cluster is known as the Centroid or "Unweighted pair group" method.(see Sneath:73 below).An account of the origin of this algorithm can be found inthe following mail from Yoshiro Miyata: Date: Sat, 6 Feb 93 10:50:41 JST From: miyata@sccs.chukyo-u.ac.jp (Yoshiro MIYATA) To: stolcke@ICSI.Berkeley.EDU Subject: cluster algorithm When I wrote that code several years ago, being a psychology grad student I knew next to nothing about this branch of statistics. It was just a fun project on a rainy weekend sort of thing. I don't remember reading any book .. I probably heard about clustering analysis in someone's talk, imagined what it should be like and coded it up. I must have been lucky if that was close enough to something in the literature!!The included MAIL from fmurtagh@eso.org mentions several alternative methodsand points out references.REFERENCES@book{Sneath:73,author = {Peter H. A. Sneath and Robert R. Sokal},title = {Numerical taxonomy; the principles and practice of numerical classification},publisher = {W. H. Freeman},address = {San Francisco},yyear = {1973}}@book{Murtagh:85,author = {Fionn Murtagh},title = {Multidimensional clustering algorithms},publisher = {Physica-Verlag},address = {Vienna},year = {1985}}@article{Gower:69,author = {J. C. Gower and G. J. S. Ross},title = {Minimum Spanning Trees and Single Linkage Cluster Analysis},journal = {Applied Statistics},volume = {18},number = {1},pages = {54-64},year = {1969},annote = {Describes algorithms for MST and SLCA, in particular how to obtain the latter from the former.}}@article{Zahn:71,author = {Charles T. Zahn},title = {Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters},pages = {68-86},journal = {IEEE Transactions on Computers},volume = {C-20},number = {1},month = jan,year = {1971},annote = {Explores the minimum spanning tree as a tool for clustering. Applications include cluster segmentation, density gradient detection, particle tracking, description of Gaussian clusters, hierarchical clustering, and relative compactness computation. Also contains many pointers into the clustering literature.}}THE ALGORITHMIn the latest version of cluster, minimum distance pairs are found bykeeping track of the nearest neighbor of each data point (as well asthe distance). The nnbs are initially computed in O(n^2) time. When anew cluster is formed, one of the child nodes is discarded, while theother is replaced by the new node representing the new cluster. Foreach of the other points, it is checked whether the nearest neighborhas changed due to the new cluster, and updated if so. This can bedone in constant time for each point (except for the rare case wherethe previous nearest neighbor happened to be one of the two points nowreplaced by the cluster). On each iteration, searching for theshortest distance pair takes O(n) time using nearest neighbor list.There are n iterations overall, so the total time is O(n^2).For historical notes on the less efficient algorithm in the old cluster,see the BENCHMARKS file. Also note that an even more efficientimplementation (O(n log n)) should be possible. Unfortunately, I'm just a dedicated user of cluster, and not expert in these matters.Andreas Stolcke1/21/93
?? 快捷鍵說(shuō)明
復(fù)制代碼
Ctrl + C
搜索代碼
Ctrl + F
全屏模式
F11
切換主題
Ctrl + Shift + D
顯示快捷鍵
?
增大字號(hào)
Ctrl + =
減小字號(hào)
Ctrl + -