In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about the case study of how to carry out the K-means algorithm K-Means. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.
Background introduction
This is an unsupervised algorithm that can solve the clustering problem. Its process follows a simple method, which can classify a given data set through a certain number of clusters (assuming k clusters). The data points in the cluster are homogeneous and heterogeneous to the peer group.
Remember to find the shape from the ink imprint? K indicates that this activity is a bit similar. You look at the shape and expand to explain how many different clusters / populations exist!
How the K-means form clustering:
The K-means selects k points for each cluster, which is called centroid.
Each data point forms a cluster with the closest centroid, that is, k clusters.
Find the centroid of each cluster based on the existing cluster members. Here, we have a new center of mass.
When we have a new centroid, please repeat steps 2 and 3. Find the nearest distance between each data point and the new centroid and associate it with the new k-cluster. Repeat this process until the convergence occurs, that is, the center of mass remains unchanged.
How to determine the value of K:
In the K-means, we have clustering, and each cluster has its own centroid. The sum of the square of the difference between the centroid and the data points in the cluster constitutes the sum of the square values of the cluster. Similarly, when the sum of squares of all clusters is added, it becomes the sum of squares of clustering solutions.
We know that this value decreases as the number of clusters increases, but if you draw the result, you may see a sharp decrease in the sum of square distances until a certain k value is reached, and then gradually decrease. Here, we can find the best number of clusters.
Let's take a look at an example of using Python:
'' The following code is for the K-MeansCreated by-ANALYTICS VIDHYA'''
# importing required librariesimport pandas as pdfrom sklearn.cluster import KMeans
# read the train and test datasettrain_data = pd.read_csv ('train-data.csv') test_data = pd.read_csv (' test-data.csv')
# shape of the datasetprint ('Shape of training data:', train_data.shape) print ('Shape of testing data:', test_data.shape)
# Now, we need to divide the training data into differernt clusters# and predict in which cluster a particular data point belongs.
'' Create the object of the K-Means modelYou can also add other parameters and test your code hereSome parameters are: n_clusters and max_iterDocumentation of sklearn KMeans:
Https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html''
Model = KMeans ()
# fit the model with the training datamodel.fit (train_data)
# Number of Clustersprint ('\ nDefault number of Clusters:', model.n_clusters)
# predict the clusters on the train datasetpredict_train = model.predict (train_data) print ('\ nCLusters on train data',predict_train)
# predict the target on the test datasetpredict_test = model.predict (test_data) print ('Clusters on test data',predict_test)
# Now, we will train a model with n_cluster = 3model_n3 = KMeans (n_clusters=3)
# fit the model with the training datamodel_n3.fit (train_data)
# Number of Clustersprint ('\ nNumber of Clusters:', model_n3.n_clusters)
# predict the clusters on the train datasetpredict_train_3 = model_n3.predict (train_data) print ('\ nCLusters on train data',predict_train_3)
# predict the target on the test datasetpredict_test_3 = model_n3.predict (test_data) print ('Clusters on test data',predict_test_3)
Running result:
Shape of training data: (100,5) Shape of testing data: (100,5)
Default number of Clusters: 8
CLusters on train data [6 7 0 7 6 5 5 7 7 3 1 1 3 0 7 1 0 4 5 6 4 33 0 4 0 1 1 0 3 4 33 0 0 1 2 1 4 3 0 2 1 1 0 33 0 7 1 3 0 5 1 0 1 5 4 6 4 3 6 5 0 3 0 4 33 1 5 1 6 5 7 7 6 3 5 3 5 3 1 5 2 5 0 3 2 3 4 7 1 0 1 5 3 6 1 6] Clusters on test data [3 6 2 0 5 6 0 3 5 2 3 4 5 5 53 3 5 5 70 0 5 53 5 0 6 5 0 1 6 3 5 6 0 1 7 3 0 0 6 2 0 53 5 7 3 3 4 6 3 1 6 3 1 3 3 2 3 3 5 1 7 5 1 53 3 5 2 0 1 5 0 3 0 3 6 3 5 4 0 2 6 3 5 6 0 6 4 3 5 0 6 6 6 1 0]
Number of Clusters: 3
CLusters on train data [2 0 1 0 2 1 2 0 0 2 0 0 2 1 0 0 1 22 22 22 1 2 1 0 0 1 22 22 1 1 0 2 0 22 1 2 0 0 1 22 1 0 0 2 1 2 0 1 0 22 22 22 2 1 2 1 2 22 0 1 0 22 0 0 0 2 0 22 2 0 22 2 1 22 22 0 0 1 0 22 2 0 2] Clusters on test data [22 2 1 22 1 22 2 22 2 1 1 22 22 01 1 22 1 22 1 0 22 22 1 0 2 1 22 1 0 2 1 22 1 22 2 0 22 2 0 22 2 0 22 2 0 22 22 2 0 22 2 0 22 1 0 2 2 1 22 2 2 2 22 1 22 1 22 2 01] finish reading the above contents Do you have any further understanding of how to conduct a case study of the K-means algorithm K-Means? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.