In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to use the kNN algorithm", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "how to use the kNN algorithm" bar!
I. Overview
Advantages: high precision, insensitive to abnormal values, no data input limit
Disadvantages: high computational complexity and high space complexity
Use data range: numerical and nominal.
Second, principle
There is a sample data set (training sample set), and there is a label for each data in the sample set, that is, we know the corresponding relationship between each data in the sample set and its classification. Input new data without labels, and compare each feature value of the new data with the characteristics corresponding to the data in the sample set. Then the algorithm extracts the classification tags with the most similar features in the sample set. An integer whose k is usually less than 20.
III. Example 1: movie classification
Classify romance and action movies. Count the fighting and kissing scenes in many movies.
Movie title fight scene kissing scene Movie Type
California man
3104 romance he's not really into dudes2100 romance beautiful woman181 romance kevin longblade10110 action film
Robo slayer 3000
995 action film
Amped II
982 action movies? 1890? The distance between the movie name and the unknown movie
California man
20.5he's not really into dudes18.7beautiful woman19.2kevin longblade115.3
Robo slayer 3000
117.4
Amped II
118.9
Suppose that Kappa 3 is california man,he's not really into dudes,beautiful woman = > determined as a romance film.
4. Example 2: the matching effect of dating websites
Percentage of time spent playing video games
Annual frequent flyer mileage per week ice cream liter consumption sample classification 0.84000.51121340000.930200001.12
1. Collect data: provide text files
Each sample data occupies one row, with a total of 1000 rows, including 3 features
a. Get frequent flyer miles per year
b. Percentage of time spent playing video games
c. Liters of ice cream consumed per week
two。 Preparing data: data normalization
KNN uses Euclid distance formula
[(0.80-12) ^ 2 + (134000) ^ 2 + (0.5-0.90) ^ 2] ^ 0.5
It is easy to see that attributes with large values have the greatest impact on the calculation results. In other words, the impact of "frequent flyer mileage per year" is far greater than that of other attributes. We need equal weight characterization.
NewValue = (oldValue-min) / (max-min)
$cat datingTestSet2.txt | head-n 440920 8.326976 0.953952 314488 7.153469 1.673904 226052 1.441871 0.805124 175136 13.147394 0.428964 "A text file is converted into a data matrix def file2matrix (filename): fr = open (filename) arrayOLines = fr.readlines () = = > read how many lines there are in the document To build a matrix numberOfLines = len (arrayOLines) returnMat = zeros ((numberOfLines,3)) = > initial 0 matrix classLabelVector= [] index = 0 for line in arrayOLines: line = line.strip () listFromLine = line.split ('\ t') = = >\ t separate each line returnMat [index :] = listFromLine [0:3] = > one row of classLabelVector.append (int (listFromLine [- 1])) of each eigenvalue composition matrix = > label vector index + = 1 return returnMat ClassLabelVector # data normalization # newValue = (oldValue-min) / (max-min) def autoNorm (dataSet): minVals = dataSet.min (0) = > calculate the minimum of each column of the matrix maxVals = dataSet.max (0) = > calculate the maximum of each column of the matrix ranges = maxVals-minVals = = > subtract from the matrix That is, corresponding to the row minus m = dataSet.shape [0] = = > the number of rows of matrix normDataSet = dataSet-tile (minVals, (mpen1)) = > tile (minVals, (mpen1)), construct a minimum matrix with the same number of rows normDataSet = normDataSet / tile (ranges, (mPower1)) = > tile (ranges, (mMagne1) construct a difference matrix return normDataSet,ranges with the same number of rows MinVals def datingClassTest (): hoRatio = 0.1 = > split sample Part as training sample and part as test sample datingDataMat,datingLabels = file2matrix ('') = = > load text file And transformed into data matrix normMat,ranges,minVals = autoNorm (datingDataMat) = > data normalization m = normMat.shape [0] numTestVecs = int (m*hoRatio) errorCount = 0.0 for i in range (numTestVecs): classifierResult = classify0 (normMat [numTestVecs:m,:], datingLabels [numTestVecs:m] 3) = = > print 'the classifier came back with:%d for each row of the input test set, the real answer is:%d'% (classifierResult,datingLabels [I]) if (classifierResult! = datingLabels [I]): = > compare the calculated results with the original results Statistical error rate errorCount+=1.0 print 'the total error rate is:% f'% (errorCount/float (numTestVecs)) print errorCount Thank you for reading, this is the content of "how to use the kNN algorithm". After the study of this article, I believe you have a deeper understanding of how to use the kNN algorithm, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.