In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains how to use the clusplot () function in non-hierarchical clustering. The content of the explanation in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to use the clusplot () function in non-hierarchical clustering.
K-center point division
K-means algorithm selects the distance mean, then outliers will have a great impact on it, it is likely that these isolated points will be grouped together, an improved method is around the center of the actual data (partitioning aroundmedoids,PAM), also known as k-medoids clustering. Similar to the k-means algorithm, it finds k representative objects or centroid points from all the data observation points to reflect the main structure of the data, and then assigns all the observation points to each centroid point to construct k classification clusters. Iterate constantly to find k representative objects to obtain the best centroid point to minimize the sum of differences between objects.
K-medoids algorithm is a variant of k-means algorithm, and its difference lies in the selection of clustering centers. In the k-means algorithm, the clustering center is selected as the average of all the data points in the current cluster, that is, the unreal data points; however, in the k-medoids algorithm, the selection of the clustering center, that is, the center point, is limited to the set of data points contained in the current cluster, and the point with the smallest distance to all other points from the current cluster is selected as the center point. The difference between k-means and k-medoids is similar to the difference between the mean and the median of a data sample.
The pam () function in the cluster package can be analyzed using the original data or distance matrix, so it is very convenient, and the optimal number of packets can be determined by the profile width value, while the pamk () function in the fpc package can automatically calculate the optimal number of classified clusters. The example analysis is as follows:
# read data data=read.table (file= "otu_table.txt", header=TRUE, check.names=FALSE) rownames (data) = data [, 1] data=as.matrix (data [,-1]) # standardize the species data of each sample (i.e., relative abundance) library (vegan) data=decostand (data, MARGIN=2, "total") * 100otu=t (data) # calculate the distance matrix otu_dist=vegdist (otu, method= "bray", diag=TRUE, upper=TRUE) Pamph2) # PAM clustering library (fpc) library (cluster) # determine the optimal number of clusters pambest=pamk (otu_dist) k=pambest$ncotu_pam=pam (otu_dist, k) mycol=c (99phai81503, 562, 666, 996, 519, 549, 548, 548, 548, 548, 654, 654, 349, 120,131, 576, 147, 576, 529 and 429, 122, 87, 456, 55214, 446, 556, 565, 310, 410, 477, 15050, 58821) mycol=colors (mycol) [mycol] (otu_pam, color=TRUE, labels=3, lines=0, cex=1, col.clus=mycol, [1Rank], col.p=otu_pam$clustering)
The clusplot () function can do principal component analysis and graph the clustering results, and the results are as follows:
According to the pamk () function, the optimal number of clusters is 10, but the number of clusters shown in the principal component cluster graph is less than 10, so we can see the differences between different algorithms. We can also draw a contour width map for PAM clustering to help select the best number of clusters, as shown below:
# draw asw=numeric (nrow (otu)) for (I in 2: (length (asw)-1)) {asw [I] = pam (otu_dist, I) $silinfo$avg.width} k.best=which.max (asw) plot (1:length (asw), asw, type= "h", main= "Silhouette of PAM", lwd=2, xlab= "k (number of clusters", ylab= "average silhouette")
The results are as follows:
The highest profile width is obtained at Kraft 22, which is consistent with the fact that the 66 samples of the data we used were taken from 22 sample points, each with 3 parallel samples. But on the whole, the width of the contour is very high, so it is not necessary to select KF22. This is only a reference for the judgment result of the pamk () function.
Different algorithms have different emphasis, so the results will be different. These algorithms can be used as our tool library, and we can choose the algorithm suitable for our own data in practical research. Reply to "Cluster Analysis" in the official account dialog box to get the sample data download link.
Thank you for your reading, the above is the content of "how to use the clusplot () function in non-hierarchical clustering". After the study of this article, I believe you have a deeper understanding of how to use the clusplot () function in non-hierarchical clustering. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.