In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
How to use python's KMeans and PCA packages to achieve clustering algorithm, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.
Topic: through the given driver behavior data (trip.csv), the driving types of drivers in different periods are clustered into three categories: ordinary driving type, radical type and super-cooling type. Use the Kmeans algorithm in Python's scikit-learn package to practice the application of clustering algorithm. And use the PCA algorithm in the scikit-learn package to reduce the dimension of the clustered data, and then draw a graph to show the clustering effect. By adjusting the parameters of the clustering algorithm, we can observe the change of the clustering effect and practice the parameter adjustment.
Data introduction: select a driver's processed data set trip.csv, and cluster the characteristics of each time period of the driver. (note: driver and trip_no do not participate in clustering)
Field introduction: driver: driver number; trip_no:trip number; v_avg: average speed; v_var: variance of speed; a_avg: average acceleration; a_var: variance of acceleration; r_avg: average rotational speed; r_var: variance of rotational speed; level: time ratio when speed level is a (similarly, VSB, VBCC, VBD) Abelia: the proportion of time when acceleration level is a (similarly, aqb, aqc); rhampa: the proportion of time when rotational speed level is a (rhamb, rqc)
Clustering algorithm requires:
(1) Statistics on the number of categories
(2) find out the clustering center.
(3) merge the category of each piece of data (named jllable) with the original data set to form a new dataframe, named new_df, and output it locally, named new_df.csv.
Dimensionality reduction algorithm requires:
(1) reduce the dimension of the features of the data used for clustering to 2 dimensions, and output the reduced data to form a dataframe name new_pca.
(2) draw a picture to show the clustering effect (you can use the following code):
Import matplotlib.pyplot asplt
D = new_ PCA [new _ df ['jllable'] = = 0]
Plt.plot (d [0], d [1], 'r.')
D = new_ PCA [new _ df ['jllable'] = = 1]
Plt.plot (d [0], d [1], 'go')
D = new_ PCA [new _ df ['jllable'] = = 2]
Plt.plot (d [0], d [1], 'baked')
Plt.gcf (). Savefig ('DVERGRAPHER workspaceUniverse Python Universe DDsxUnix. Png')
Plt.show ()
The python implementation code is as follows:
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one
twenty-two
twenty-three
twenty-four
twenty-five
twenty-six
twenty-seven
twenty-eight
twenty-nine
thirty
thirty-one
thirty-two
thirty-three
thirty-four
thirty-five
thirty-six
From sklearn.cluster import KMeans
From sklearn.decomposition import PCA
Import pandas as pd
Import numpy as np
Import matplotlib.pyplot as plt
Df=pd.read_csv ('trip.csv', header=0, encoding='utf-8')
Df1=df.ix [:, 2:]
Kmeans = KMeans (n_clusters=3, random_state=10) .fit (df1)
Df1 ['jllable'] = kmeans.labels_
Df_count_type=df1.groupby ('jllable') .apply (np.size)
# # number of categories
Df_count_type
# # clustering Center
Kmeans.cluster_centers_
# # New dataframe, named new_df, and exported locally, named new_df.csv.
New_df=df1 [:]
New_df
New_df.to_csv ('new_df.csv')
# # reduce the dimension of the features of the data used for clustering to 2 dimensions, and output the reduced data to form a dataframe name new_pca
Pca = PCA (n_components=2)
New_pca = pd.DataFrame (pca.fit_transform (new_df))
# # Visualization
D = new_ PCA [new _ df ['jllable'] = = 0]
Plt.plot (d [0], d [1], 'r.')
D = new_ PCA [new _ df ['jllable'] = = 1]
Plt.plot (d [0], d [1], 'go')
D = new_ PCA [new _ df ['jllable'] = = 2]
Plt.plot (d [0], d [1], 'baked')
Plt.gcf () .savefig ('kmeans.png')
Plt.show ()
The running results are as follows:
# # number of categories
# # clustering Center
# # New dataframe, named new_df, and exported locally, named new_df.csv.
# # Visualization-kmeans.png
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.