Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common clustering algorithms in big data's development?

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

I would like to share with you what the common clustering algorithms are in big data's development. I believe most people don't know much about it, so share this article for your reference. I hope you will gain a lot after reading this article. let's learn about it!

Common clustering algorithms

K-Means (K-means) clustering

Algorithm step

(1) first of all, we select some classes / groups and initialize their respective central points randomly. The center point is the same length as the vector of each data point. This requires us to predict in advance the number of classes (that is, the number of central points).

(2) calculate the distance from each data point to the central point, and the data point is divided into which category which is closest to the central point.

(3) calculate the central point in each category as the new central point.

(4) repeat the above steps until each type of center changes little after each iteration. You can also randomly initialize the center point multiple times, and then choose the one that works best.

The following figure illustrates the process of K-Means classification:

Advantages:

High speed and simple calculation

Disadvantages:

We must know in advance how many categories / groups there are in the data.

K-Medians is a variant of K-Means that uses the median of a dataset instead of the mean to calculate the center of the data.

The advantage of K-Medians is that using the median to calculate the center point is not affected by outliers; the disadvantage is that the data in the dataset needs to be sorted when calculating the median, which is slower than K-Means.

two。 Mean shift clustering 3. Density-based clustering method (DBSCAN) 4. Use the maximum expectation (EM) of Gaussian mixture model (GMM) to cluster 5. Agglomeration hierarchical clustering 6. Map group detection (Graph Community Detection)

Other algorithms can be viewed:

Https://blog.csdn.net/Katherine_hsr/article/details/79382249

K-mean clustering

Code implementation

1. Import data

2. Calculate the distance from each data point to the center point

3. Grouping data points

4. Iterative convergence and updating centroid

The result diagram shows:

Similar code and demo data are uploaded to the network disk, you can download, try and apply to your own project.

These are all the contents of this article entitled "what are the common clustering algorithms in big data's development?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report