In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)06/01 Report--
How to measure the degree of similarity or difference between data points is the basic problem of clustering algorithm, which will directly affect the effect of clustering analysis. The most intuitive method is to use distance function or similarity function.
A common method for calculating the degree of similarity or difference.
1. Calculation formula 1.Minkowski distance
Many distance calculation methods can be summed up as distance based on vector p-norm, namely Minkowski distance.
Dij= (sumsh=1 | xihxjh | p) 1max pdij = (sumh=1s | xihxjh | p) 1max p
2.Euclidean distance
The parameter p = 2 Minkowski distance is reduced to Euclidean distance. Most clustering algorithms using Euclidean distance can only find hyperspherical data in low-dimensional space, and are sensitive to noise in the data set.
Dij= (sumsh=1 | xihxjh | 2) 1Accord 2dij = (sumh=1s | xihxjh | 2) 1Acer 2
3.City-block distance
The evolution of the parameter p = 1 Minkowski distance to City-block distance,City-block distance can effectively improve the robustness of fuzzy clustering algorithm to noise or outliers.
Dij=sumsh=1 | xihxjh | dij=sumh=1s | xihxjh |
4.Sup distance
The parameter p = infinite, Minkowski distance evolves into Sup distance.
Dij=maxh | xihxjh | dij=maxh | xihxjh |
5.Cosine similarity
Sij=xTixj | | xi | xj | | sij=xiTxj | | xi | xj |
6.Mahalanobis distance
Mahalanobis distance is the Euclidean distance of the data in the original feature space in the linear projection space. The clustering algorithm can successfully find the hyperellipsoidal clusters in the data set by using Mahalanobis distance, but Mahalanobis distance will bring a large amount of computation.
Dij= (xixj) TS1 (xixj) dij= (xixj) TS1 (xixj)
7.Alternative distance
Alternative distance is insensitive to noise in the dataset.
Dij=1exp (β | | xixj | | 2) dij=1exp (β | | xixj | | 2)
8.Feature weighted distance
Dij= (sumsh=1wah | xihxjh |) 1and2dij = (sumh=1swha | xihxjh |) 1and2
two。 Code
Code
Import numpy as npa = np.array ([1Yue2jinjin3je 4]) b = np.array ([4jin3jin2]) print aprint b#Euclidean distancedistEu = np.sqrt (np.sum (Amurb) * * 2)) print "Euclidean distance =", distEu#City-block distancedistCb = np.sum (np.abs (Amurb)) print "City-block distance =", distCb#Sup distancedistSup = max (np.abs (amurb)) print "Sup distance =", distSup#Cosine similaritycosineSimi = np.dot (a) B) / (np.sqrt (np.sum (aplomb 2)) * np.sqrt (np.sum (baked powder 2)) print "Cosine similarity =", cosineSimi#Alternative distancebeta = 0.5distAlter = 1-np.exp (- beta * np.sqrt (np.sum ((a-b) * * 2)) print "Alternative distance =", distAlter#Feature weighted distanceweigh = np.array ([0.5 cr 0.3 min 0.1]) distFea = np.sqrt (np.dot (weigh) Np.abs (aMub)) print "Feature weighted distance =", distFea
Output
[12 34] [43 21] Euclidean distance = 4.472135955City-block distance = 8Sup distance = 3Cosine similarity = 0.666666666667Alternative distance = 0.89312207434Feature weighted distance = 1.48323969742
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.