Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to learn DBSCAN clustering algorithm

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to learn DBSCAN clustering algorithm, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

DBSCAN is a very famous density-based clustering algorithm. Its full name in English is Density-Based Spatial Clustering of Applications with Noise, which means: a spatial clustering algorithm based on density and robust to noise. Intuitively, the DBSCAN algorithm can find all the dense areas of the sample points, and regard these dense areas as a cluster.

DBSCAN algorithm has the following characteristics:

Based on density, robust to noise points far away from the density core

There is no need to know the number of clusters

Clusters of arbitrary shapes can be found.

DBSCAN is usually suitable for clustering analysis of lower-dimensional data.

First, basic concepts

The basic concepts of DBSCAN can be summed up in 1, 2, 3, 4.

1 core idea: based on density.

Intuitively, the DBSCAN algorithm can find all the dense areas of the sample points, and regard these dense areas as a cluster.

Two algorithm parameters: neighborhood radius R and minimum number of points minpoints.

These two algorithm parameters can actually describe what is dense-when the number of points in the neighborhood radius R is greater than the minimum number of points minpoints, it is dense.

Three types of points: core points, boundary points and noise points.

A point whose number of sample points is greater than or equal to minpoints in the neighborhood radius R is called the core point. A point that is not a core point but is in the neighborhood of a core point is called a boundary point. What is neither the core point nor the boundary point is the noise point.

The relationship of four kinds of points: density direct, density reachable, density connected, non-density connected.

If P is the core point and Q is in the R neighborhood of P, then the density from P to Q is direct. Any core point is direct to its own density, the density direct is not symmetrical, if the P to Q density is direct, then Q to P is not necessarily density direct.

If there is a core point, P _ 2 and P _ 3, , Pn, and P1 to P2 density direct, P2 to P3 density direct,... , P (nMel 1) to Pn density direct, Pn to Q density direct, then P1 to Q density can reach. The density is up to and does not have symmetry.

If there is a core point S such that the density of S to P and Q can be reached, then the density of P and Q is connected. The density connection is symmetrical, and if the P and Q densities are connected, then Q and P are also densely linked. Two points whose densities are connected belong to the same cluster.

If the two points are not densely connected, then the two points are not densely connected. The two points that are not connected in density belong to different clusters, or there are noise points in them.

Second, DBSCAN algorithm steps

The algorithm of DBSCAN is divided into two steps.

1. Look for the core points to form temporary clusters.

Scan all the sample points, and if the number of points within the R radius of a sample point > = MinPoints, it will be included in the list of core points, and the points with direct density will form a corresponding temporary cluster.

2, merge temporary clusters to get clusters.

For each temporary cluster, check whether the point is the core point, and if so, merge the corresponding temporary cluster with the current temporary cluster to get a new temporary cluster.

Repeat this until each point in the current temporary cluster is either not in the core point list, or the point whose density reaches directly is already in the temporary cluster, and the temporary cluster is upgraded to a cluster.

Continue the same merge operation for the remaining temporary clusters until all temporary clusters are processed.

Third, the example of using DBSCAN

1, generate sample points

Import numpy as npimport pandas as pdfrom sklearn import datasets%matplotlib inline

Df _ = datasets.make_moons (500 noise = 0.1) df = pd.DataFrame (XMagna columns = ['feature1','feature2'])

Df.plot.scatter ('feature1','feature2', s = 100ma alpha = 0.6, title =' dataset by make_moon')

2. Call the dbscan API to complete clustering.

From sklearn.cluster import dbscan

# eps is the neighborhood radius, min_samples is the minimum number of points core_samples,cluster_ids = dbscan (X, eps = 0.2, min_samples=20) # cluster_ids indicates that the corresponding point is the noise point

Df = pd.DataFrame (np.c_ [Xminute clustermates], columns = ['feature1','feature2','cluster_id']) df [' cluster_id'] = df ['cluster_id'] .astype (' i2')

Df.plot.scatter ('feature1','feature2', s = 100, c = list (df [' cluster_id']), cmap = 'rainbow',colorbar = False, alpha = 0.6 title =' DBSCAN cluster result')

The answer to the question about how to learn the DBSCAN clustering algorithm is shared here. I hope the above content can be of some help to everyone. If you still have a lot of doubts to solve, you can follow the industry information channel to learn more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report