How to use KMeans and PCA packets of python to realize clustering algorithm 07/04 Update SLTechnology News&Howtos

How to use KMeans and PCA packets of python to realize clustering algorithm

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

How to use python's KMeans and PCA packages to achieve clustering algorithm, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.

Topic: through the given driver behavior data (trip.csv), the driving types of drivers in different periods are clustered into three categories: ordinary driving type, radical type and super-cooling type. Use the Kmeans algorithm in Python's scikit-learn package to practice the application of clustering algorithm. And use the PCA algorithm in the scikit-learn package to reduce the dimension of the clustered data, and then draw a graph to show the clustering effect. By adjusting the parameters of the clustering algorithm, we can observe the change of the clustering effect and practice the parameter adjustment.

Data introduction: select a driver's processed data set trip.csv, and cluster the characteristics of each time period of the driver. (note: driver and trip_no do not participate in clustering)

Field introduction: driver: driver number; trip_no:trip number; v_avg: average speed; v_var: variance of speed; a_avg: average acceleration; a_var: variance of acceleration; r_avg: average rotational speed; r_var: variance of rotational speed; level: time ratio when speed level is a (similarly, VSB, VBCC, VBD) Abelia: the proportion of time when acceleration level is a (similarly, aqb, aqc); rhampa: the proportion of time when rotational speed level is a (rhamb, rqc)

Clustering algorithm requires:

(1) Statistics on the number of categories

(2) find out the clustering center.

(3) merge the category of each piece of data (named jllable) with the original data set to form a new dataframe, named new_df, and output it locally, named new_df.csv.

Dimensionality reduction algorithm requires:

(1) reduce the dimension of the features of the data used for clustering to 2 dimensions, and output the reduced data to form a dataframe name new_pca.

(2) draw a picture to show the clustering effect (you can use the following code):

Import matplotlib.pyplot asplt

D = new_ PCA [new _ df ['jllable'] = = 0]

Plt.plot (d [0], d [1], 'r.')

D = new_ PCA [new _ df ['jllable'] = = 1]

Plt.plot (d [0], d [1], 'go')

D = new_ PCA [new _ df ['jllable'] = = 2]

Plt.plot (d [0], d [1], 'baked')

Plt.gcf (). Savefig ('DVERGRAPHER workspaceUniverse Python Universe DDsxUnix. Png')

Plt.show ()

The python implementation code is as follows:

one

two

three

four

five

six

seven

eight

nine

ten

eleven

twelve

thirteen

fourteen

fifteen

sixteen

seventeen

eighteen

nineteen

twenty

twenty-one

twenty-two

twenty-three

twenty-four

twenty-five

twenty-six

twenty-seven

twenty-eight

twenty-nine

thirty

thirty-one

thirty-two

thirty-three

thirty-four

thirty-five

thirty-six

From sklearn.cluster import KMeans

From sklearn.decomposition import PCA

Import pandas as pd

Import numpy as np

Import matplotlib.pyplot as plt

Df=pd.read_csv ('trip.csv', header=0, encoding='utf-8')

Df1=df.ix [:, 2:]

Kmeans = KMeans (n_clusters=3, random_state=10) .fit (df1)

Df1 ['jllable'] = kmeans.labels_

Df_count_type=df1.groupby ('jllable') .apply (np.size)

# # number of categories

Df_count_type

# # clustering Center

Kmeans.cluster_centers_

# # New dataframe, named new_df, and exported locally, named new_df.csv.

New_df=df1 [:]

New_df

New_df.to_csv ('new_df.csv')

# # reduce the dimension of the features of the data used for clustering to 2 dimensions, and output the reduced data to form a dataframe name new_pca

Pca = PCA (n_components=2)

New_pca = pd.DataFrame (pca.fit_transform (new_df))

# # Visualization

D = new_ PCA [new _ df ['jllable'] = = 0]

Plt.plot (d [0], d [1], 'r.')

D = new_ PCA [new _ df ['jllable'] = = 1]

Plt.plot (d [0], d [1], 'go')

D = new_ PCA [new _ df ['jllable'] = = 2]

Plt.plot (d [0], d [1], 'baked')

Plt.gcf () .savefig ('kmeans.png')

Plt.show ()

The running results are as follows:

# # number of categories

# # clustering Center

# # New dataframe, named new_df, and exported locally, named new_df.csv.

# # Visualization-kmeans.png

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.