Common methods for calculating the degree of similarity or difference 04/16 Update SLTechnology News&Howtos

Common methods for calculating the degree of similarity or difference

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

How to measure the degree of similarity or difference between data points is the basic problem of clustering algorithm, which will directly affect the effect of clustering analysis. The most intuitive method is to use distance function or similarity function.

A common method for calculating the degree of similarity or difference.

1. Calculation formula 1.Minkowski distance

Many distance calculation methods can be summed up as distance based on vector p-norm, namely Minkowski distance.

Dij= (sumsh=1 | xihxjh | p) 1max pdij = (sumh=1s | xihxjh | p) 1max p

2.Euclidean distance

The parameter p = 2 Minkowski distance is reduced to Euclidean distance. Most clustering algorithms using Euclidean distance can only find hyperspherical data in low-dimensional space, and are sensitive to noise in the data set.

Dij= (sumsh=1 | xihxjh | 2) 1Accord 2dij = (sumh=1s | xihxjh | 2) 1Acer 2

3.City-block distance

The evolution of the parameter p = 1 Minkowski distance to City-block distance,City-block distance can effectively improve the robustness of fuzzy clustering algorithm to noise or outliers.

Dij=sumsh=1 | xihxjh | dij=sumh=1s | xihxjh |

4.Sup distance

The parameter p = infinite, Minkowski distance evolves into Sup distance.

Dij=maxh | xihxjh | dij=maxh | xihxjh |

5.Cosine similarity

Sij=xTixj | | xi | xj | | sij=xiTxj | | xi | xj |

6.Mahalanobis distance

Mahalanobis distance is the Euclidean distance of the data in the original feature space in the linear projection space. The clustering algorithm can successfully find the hyperellipsoidal clusters in the data set by using Mahalanobis distance, but Mahalanobis distance will bring a large amount of computation.

Dij= (xixj) TS1 (xixj) dij= (xixj) TS1 (xixj)

7.Alternative distance

Alternative distance is insensitive to noise in the dataset.

8.Feature weighted distance

Dij= (sumsh=1wah | xihxjh |) 1and2dij = (sumh=1swha | xihxjh |) 1and2

two。 Code

Code

Import numpy as npa = np.array ([1Yue2jinjin3je 4]) b = np.array ([4jin3jin2]) print aprint b#Euclidean distancedistEu = np.sqrt (np.sum (Amurb) * * 2)) print "Euclidean distance =", distEu#City-block distancedistCb = np.sum (np.abs (Amurb)) print "City-block distance =", distCb#Sup distancedistSup = max (np.abs (amurb)) print "Sup distance =", distSup#Cosine similaritycosineSimi = np.dot (a) B) / (np.sqrt (np.sum (aplomb 2)) * np.sqrt (np.sum (baked powder 2)) print "Cosine similarity =", cosineSimi#Alternative distancebeta = 0.5distAlter = 1-np.exp (- beta * np.sqrt (np.sum ((a-b) * * 2)) print "Alternative distance =", distAlter#Feature weighted distanceweigh = np.array ([0.5 cr 0.3 min 0.1]) distFea = np.sqrt (np.dot (weigh) Np.abs (aMub)) print "Feature weighted distance =", distFea

Output

[12 34] [43 21] Euclidean distance = 4.472135955City-block distance = 8Sup distance = 3Cosine similarity = 0.666666666667Alternative distance = 0.89312207434Feature weighted distance = 1.48323969742

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.