In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
Today, I will talk to you about what is the super-praise clustering algorithm published by Science. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
In this paper, a simple and graceful clustering algorithm is proposed, which can identify clusters of various shapes, and its super-parameters can be easily determined.
Algorithm thought
The assumption of the algorithm is that the center of the cluster is surrounded by some points with low local density, and the distance between these points and other points with high local density is relatively large. First, two values are defined: the local density ρ I and the distance to the high local density point δ I:
Among them
Dc is a truncated distance and a super parameter. So ρ I equals to the number of points whose distance from point I is less than dc. Because the algorithm is only sensitive to the relative value of ρ I, it is robust to the choice of dc. One recommended practice is to choose dc so that the average number of neighbors of each point is 1% 2% of all points.
For the point of density, set. Note that only those points whose density is local or global will have a much larger distance between adjacent points.
Clustering process
Those points with large local density ρ I and large δ I are considered to be the centers of class clusters. The point with lower local density but larger δ I is the outlier. After determining the cluster center, all other points belong to the class cluster represented by the nearest cluster center. The illustration is as follows:
The left graph is the distribution of all points in two-dimensional space. The right graph takes ρ as Abscissa and δ as ordinate. This kind of graph is called decision graph (decision tree). It can be seen that the ρ I and δ I of points 1 and 10 are relatively large, which serve as the center point of the class cluster. The δ I of 26,27,28 is also larger, but ρ I is smaller, so it is an outlier.
Cluster analysis.
In clustering analysis, it is usually necessary to determine the reliability of each point assigned to a cluster. In this algorithm, we can first define a boundary region (border region) for each class cluster, that is, the points divided into this kind of cluster but the distance from the points of other class clusters is less than dc. Then find the point of the local density of its boundary region for each cluster, so that the local density is ρ h. All the points whose local density is greater than ρ h are considered to be part of the core of the cluster (that is, it is very reliable to assign the point to the cluster), and the remaining points are considered to be the halo of the cluster, that is, it can be regarded as noise. The illustration is as follows
Figure A shows the probability distribution of the generated data, while figure B and C show that 4000 and 1000 points are generated from the distribution, respectively. D and E are the decision diagrams (decision tree) of B and C respectively. We can see that only five points of the two groups of data have relatively large ρ I and very large δ I. These points are regarded as the center of the cluster. After determining the center of the cluster, each point is divided into each cluster (color point) or cluster halo (black point). The F graph shows that the clustering error rate decreases gradually with the increase of the number of sampling points, which shows that the algorithm is robust.
* show the clustering effect of the algorithm on various data distributions, which is very good.
After reading the above, do you have any further understanding of the super clustering algorithm published by Science? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.