?? algorithm

?? Andreas Stolcke開發(fā)的聚類源碼, 版本2.9。聚類方法是層次合并聚類

字號(hào):

Here is some information regarding the clustering method used in cluster,since many people have asked for some.THE METHODThe algorithm works by iteratively merging smaller clusters into biggerones.It starts with one data point per cluster.  Then look for the smallestdistance between any two clusters (i.e., data points at this stage).Distance is Euclidean by default, but can be any p-norm (see the -noption). Those two cluster are merged, i.e., they form a new clusterwith the two earlier ones as subclusters, which gives rise to a branchin the cluster tree.  For purposes of distance computation the new clusteris represented by the average of all its data points.The merging is repeated until only one cluster is left.LITERATUREAs pointed out by Fionn Murtagh (fmurtagh@eso.org), the algorithm implementedby cluster is known as the Centroid or "Unweighted pair group" method.(see Sneath:73 below).An account of the origin of this algorithm can be found inthe following mail from Yoshiro Miyata:    Date: Sat, 6 Feb 93 10:50:41 JST    From: miyata@sccs.chukyo-u.ac.jp (Yoshiro MIYATA)    To: stolcke@ICSI.Berkeley.EDU    Subject: cluster algorithm    When I wrote that code several years ago, being a psychology grad    student I knew next to nothing about this branch of statistics.  It    was just a fun project on a rainy weekend sort of thing.  I don't    remember reading any book .. I probably heard about clustering    analysis in someone's talk, imagined what it should be like and coded    it up.  I must have been lucky if that was close enough to something    in the literature!!The included MAIL from fmurtagh@eso.org mentions several alternative methodsand points out references.REFERENCES@book{Sneath:73,author =	{Peter H. A. Sneath and Robert R. Sokal},title =		{Numerical taxonomy; the principles and practice of numerical		classification},publisher =	{W. H. Freeman},address =	{San Francisco},yyear =		{1973}}@book{Murtagh:85,author =	{Fionn Murtagh},title =		{Multidimensional clustering algorithms},publisher =	{Physica-Verlag},address =	{Vienna},year =		{1985}}@article{Gower:69,author =        {J. C. Gower and G. J. S. Ross},title =         {Minimum Spanning Trees and Single Linkage Cluster Analysis},journal =       {Applied Statistics},volume =        {18},number =        {1},pages =         {54-64},year =          {1969},annote =        {Describes algorithms for MST and SLCA, in particular how to		obtain the latter from the former.}}@article{Zahn:71,author =        {Charles T. Zahn},title =         {Graph-Theoretical Methods for Detecting and Describing                Gestalt Clusters},pages =         {68-86},journal =       {IEEE Transactions on Computers},volume =        {C-20},number =        {1},month =         jan,year =          {1971},annote =        {Explores the minimum spanning tree as a tool for                clustering.  Applications include cluster segmentation,                density gradient detection, particle tracking, description                of Gaussian clusters, hierarchical clustering,                and relative compactness computation.                Also contains many pointers into the clustering literature.}}THE ALGORITHMIn the latest version of cluster, minimum distance pairs are found bykeeping track of the nearest neighbor of each data point (as well asthe distance).  The nnbs are initially computed in O(n^2) time.  When anew cluster is formed, one of the child nodes is discarded, while theother is replaced by the new node representing the new cluster.  Foreach of the other points, it is checked whether the nearest neighborhas changed due to the new cluster, and updated if so.  This can bedone in constant time for each point (except for the rare case wherethe previous nearest neighbor happened to be one of the two points nowreplaced by the cluster).  On each iteration, searching for theshortest distance pair takes O(n) time using nearest neighbor list.There are n iterations overall, so the total time is O(n^2).For historical notes on the less efficient algorithm in the old cluster,see the BENCHMARKS file.  Also note that an even more efficientimplementation (O(n log n)) should be possible.  Unfortunately, I'm just a dedicated user of cluster, and not expert in these matters.Andreas Stolcke1/21/93

?? 文件大小 35 K

?? 上傳用戶 lhf123290507

?? 所屬分類人工智能/神經(jīng)網(wǎng)絡(luò)

??? 相關(guān)標(biāo)簽

#Andreas #Stolcke #聚類 #2.9

?? 快捷鍵說(shuō)明

復(fù)制代碼 Ctrl + C

搜索代碼 Ctrl + F

全屏模式 F11

切換主題 Ctrl + Shift + D

顯示快捷鍵 ?

增大字號(hào) Ctrl + =

減小字號(hào) Ctrl + -

亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? algorithm

?? 快捷鍵說(shuō)明