?? readme

?? 聚類算法全集以及內附數據集
字號:
mcli/mclx/mcle/cli/clx/cle - probabilistic and fuzzy clusteringThis file provides some explanations on how to use the programs mcli,mclx, mcle, cli, clx, cle to induce, execute and evaluate a set ofclusters. However, it does not explain all options of these programs.For a list of options, call mcli, mclx, mcle, cli, clx, and cle withoutany arguments.Enjoy,Christian Borgelte-mail: borgelt@iws.cs.uni-magdeburg.deWWW:    http://fuzzy.cs.uni-magdeburg.de/~borgelt------------------------------------------------------------------------In this directory (cluster/ex) you can find the well-known iris data(measurements of the sepal length / width and the petal length / widthof three types of iris flowers) in formats suitable for the clusteringprograms. There are two versions: a matrix version iris.pat, whichcontains only a matrix of numbers, and a table version iris.tab, whichcontains column names and an additional column with the iris typeinformation.The matrix version can be processed with the programs mcli and mclx.To induce a set of three clusters with the fuzzy c-means algorithm, type  mcli -c3 iris.pat iris.clsThe option -c3 instructs the program to find three clusters. iris.patis the input file containing the data, iris.cls the output file towhich  a description of the clusters will be written. The result ofthis program call should look like this (contents of iris.cls):  function = cauchy(2,0);  normmode = sum1;  params = {{ [-1.00478, 0.846484, -1.28465, -1.23865] },            { [-0.0383645, -0.818721, 0.32297, 0.232151] },            { [1.06925, 0.0374249, 0.970174, 1.02979] }};  scales = [5.84333, 1.21168], [3.05733, 2.30197],           [3.758, 0.568374],  [1.19933, 1.31632];The first line states the membership function used (it is the same forall clusters). In this case it is the (generalized) Cauchy function  f(d) = 1/(d^a +b),where d is the distance from the cluster center, with parameters a = 2and b = 0. That is, the (unnormalized) degree of membership is computedas the inverse squared distance from the cluster center. An alternativeis the (generalized) Gaussian function  f(d) = exp(-0.5 *d^a),which can be selected with the option -G.The second line states the normalization mode for the membershipdegrees. Here it is "sum1", which means that the membership degreesare scaled in such a way that they sum up to 1.The line starting with "params" and the two lines following it specifythe cluster parameters, which in this case (fuzzy c-means algorithm)are the coordinates of the cluster centers. Each section enclosed incurly braces specifies the center of one cluster.The last two lines specify the scaling parameters (offset and scalingfactor), which describe how the input data are scaled in order toachieve a distribution with mean 0 and variance 1 in each dimension.The reason for this scaling is to avoid a distortion of the clusteringresult due to considerably different ranges of values in the inputdimensions.If such a normalization is not desired, it can be switched off withthe option -q. For example  mcli -qc3 iris.pat iris.cls(note how several options can be combined) yields  function = cauchy(2,0);  normmode = sum1;  params = {{ [5.00397, 3.41409, 1.48282, 0.253546] },            { [5.88893, 2.76107, 4.36395, 1.39732] },            { [6.77501, 3.05238, 5.64678, 2.05355] }};  scales = [0, 1], [0, 1], [0, 1], [0, 1];Here the scaling parameters all specify the identity function, so thatthe clustering algorithm is executed directly in the input space.The induced set of clusters can than be executed on the data in orderto compute the membership degrees for the different data points. Thisis done with the program mclx. For example,  mclx iris.cls iris.pat iris.outcreates a table iris.out, which contains three additional columns -one for each cluster. These columns hold the degrees of membership,rounded to two decimal places. (If a higher (or lower) accuracy isdesired, the output format of the membership degrees can be changedwith the option -o.)If only the cluster with the highest degree of membership is desired,one may use the option -c, which produces only one additional columncontaining the index of the cluster with the highest degree ofmembership. To this another column, containing the membership degreefor this cluster, may be added with the option -m.The programs cli and clx perform exactly the same tasks as the programsmcli and mclx, only on a different input format, namely the format ofthe file iris.tab. This format is processed in connection with a domaindescription file (here: iris.dom) that specifies which columns are tobe used and the data types of these columns. In this way it is possibleto execute the clustering algorithm on a subset of the attributeswithout changing the data file. It is also possible to handle symbolicattributes, which are coded by a simple 1-in-n code before they arepresented to the clustering algorithm. As a consequence, the output ofthe program cli contains (compared to the output of the program mcli)an additional section stating the domain information for the attributes.Both programs, mcli as well as cli, are highly parameterizable, sothat a large variety of clustering algorithms can be carried out.Here is a list of some options that lead to well-known algorithms:options     algorithm-jhard      hard  c-means algorithmnone        fuzzy c-means algorithm-v          axes-parallel Gustafson-Kessel algorithm-V          general       Gustafson-Kessel algorithm-wvG        axes-parallel Gath-Geva (FMLE) algorithm-wVG        general       Gath-Geva (FMLE) algorithm-wvGNx1     axes-parallel mixture of Gaussians (EM algorithm)-wVGNx1     general       mixture of Gaussians (EM algorithm)Explanation of the individual options:-j#      membership normalization mode-v       adaptable variances-V       adaptable covariances (covariance matrix)-Z       adaptable cluster sizes-w       adaptable weights/prior probabilities-G       Gaussian radial function (default: Cauchy function)-N       normalize to unit integral (probability density)-x       exponent for pattern weightIt is usually advisable to initialize the higher algorithms (likeGustafson-Kessel and Gath-Geva) with a few epochs of the fuzzy c-meansalgorithm. This can be achieved by exploiting that the programs mcliand cli can read in a clustering result. That is, by a call like  mcli -OV iris.pat iris.gk iris.clsthe fuzzy c-means result obtained with the program call stated above(stored in iris.cls) is further processed with the Gustafson-Kesselalgorithm. The result is written to the file iris.gk. The option -Ois necessary to overwrite the cluster type and radial functionparameters read from the input file with the command line values.The shape and size regularization options (-H and -R) are describedbriefly in the file ../doc/regular.tex
?? 文件大小 155 K
?? 上傳用戶 a369100057
?? 所屬分類數學計算
??? 相關標簽

#聚類算法 #數據集
?? 快捷鍵說明

復制代碼 Ctrl + C
搜索代碼 Ctrl + F
全屏模式 F11
切換主題 Ctrl + Shift + D
顯示快捷鍵 ?
增大字號 Ctrl + =
減小字號 Ctrl + -
亚洲欧美第一页_禁久久精品乱码_粉嫩av一区二区三区免费野_久草精品视频

?? readme

?? 快捷鍵說明