Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Case Analysis of python clustering

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "python clustering case analysis". In daily operation, I believe many people have doubts about python clustering case analysis. The editor consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "python clustering case analysis". Next, please follow the editor to study!

Cluster analysis.

Store the data in csv format, import it into python, and view the first 10 rows of data.

Import pandas as pd

Reviewsdata = pd.read_csv ('reviewsdata.csv',index_col=0) # the row index of the first column of index_col=0

Reviewsdata.head (10)

From the table, you can see the frequency of using different words of different genders and ages. Carry on the cluster analysis to the data, and draw the clustering tree diagram.

Import scipy

Import scipy.cluster.hierarchy as sch

Import matplotlib.pylab as plt

Import pylab

# generate the distance matrix between points, where the Euclidean distance is used:

DisMat = sch.distance.pdist (reviewsdata.T,'euclidean')

# hierarchical clustering:

Z=sch.linkage (disMat,method='average')

# represent the hierarchical clustering results as a tree and save them as plot_dendrogram.png

Sch.dendrogram (ZMagrine labelswriting wsdata.columnsrewritingfontpacking sizeboxes 7.5)

Plt.rcParams ['font.sans-serif'] = [' SimHei']

Plt.title ("clustering of word of mouth")

Pylab.show ()

In the process of cluster analysis, the frequency of words used by people of different genders and ages is generated, and then the distances of these vectors are compared, and the closer ones are summarized together. Proximity means that the wording is similar, and clustering is the process of constantly merging the two closest vectors. The picture shows that men in their 40s and 50s are very similar in using words, but they are obviously different from women in their 60s. On the whole, we can see that there are differences in opinions between different age groups and different genders.

A few small concepts

Clustering analysis: a method of grouping data into pairs according to data similarity. The characteristics of each category cannot be determined before grouping. Data similarity is judged by distance, and there are many ways to find distance, the simplest of which is Euclidean distance. Hierarchical clustering is used in this paper. DBSCAN clustering method is introduced in clustering (1): DBSCAN algorithm implementation (r language).

At this point, the study of "python clustering case analysis" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report