In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article will explain in detail what the k-means algorithm is, and the content of the article is of high quality, so the editor will share it for you as a reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
Clustering algorithm belongs to unsupervised machine learning algorithm, that is, there is no category label y, and similar data need to be divided into groups according to data characteristics. K-means is the simplest and most common clustering algorithm, which divides the data with high similarity together by calculating the distance.
Algorithm flow
Randomly select k points as the clustering center, calculate the distance between the other points and the central point, select the nearest center and classify it, calculate the new central point of each category after the classification is completed, recalculate the clustering between each point and the central point and select the nearest classification, repeat this process until the central point no longer changes.
It should be noted that when using the k-means algorithm, you should first confirm the value of k, that is, you want to divide it into several categories, and the k value is generally set to 3-5. The picture below is a picture taken from the Internet, and you can see the process of clustering the points into three clusters (cluster) through four iterations.
R language realization
To achieve k-means clustering in R, you can directly use the kmeans () function. In the following example, we use the iris dataset for demonstration.
The color represents the result of clustering, the shape represents the real division, and "*" is the clustering center. You can view the clustering results of each sample point as follows:
Python implementation
Implementing k-means clustering in python can also be demonstrated using the KMeans () function in sklearn.cluster using the iris dataset.
The color represents the result of clustering.
Advantages and disadvantages of k-means
Advantages:
(1) the principle of the algorithm is simple and the clustering speed is fast.
(2) it is easy to realize.
Disadvantages:
(1) the k value needs to be given in advance, and sometimes it is not known which category is the most appropriate.
(2) the selection of the initial center point will affect the clustering effect. This is the reason why the results are different after each clustering.
(3) because clustering is carried out by the similarity of distance judgment points, the use of k-means algorithm is limited. When the shape of the potential cluster is approximate circle of similar size, and the clustering between each cluster is more obvious, the k-means clustering result is ideal.
About what the k-means algorithm is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.