How to use kNN algorithm 07/11 Update SLTechnology News&Howtos

How to use kNN algorithm

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use the kNN algorithm", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "how to use the kNN algorithm" bar!

I. Overview

Advantages: high precision, insensitive to abnormal values, no data input limit

Disadvantages: high computational complexity and high space complexity

Use data range: numerical and nominal.

Second, principle

There is a sample data set (training sample set), and there is a label for each data in the sample set, that is, we know the corresponding relationship between each data in the sample set and its classification. Input new data without labels, and compare each feature value of the new data with the characteristics corresponding to the data in the sample set. Then the algorithm extracts the classification tags with the most similar features in the sample set. An integer whose k is usually less than 20.

III. Example 1: movie classification

Classify romance and action movies. Count the fighting and kissing scenes in many movies.

Movie title fight scene kissing scene Movie Type

California man

3104 romance he's not really into dudes2100 romance beautiful woman181 romance kevin longblade10110 action film

Robo slayer 3000

995 action film

Amped II

982 action movies? 1890? The distance between the movie name and the unknown movie

California man

20.5he's not really into dudes18.7beautiful woman19.2kevin longblade115.3

Robo slayer 3000

117.4

Amped II

118.9

Suppose that Kappa 3 is california man,he's not really into dudes,beautiful woman = > determined as a romance film.

4. Example 2: the matching effect of dating websites

Percentage of time spent playing video games

Annual frequent flyer mileage per week ice cream liter consumption sample classification 0.84000.51121340000.930200001.12

1. Collect data: provide text files

Each sample data occupies one row, with a total of 1000 rows, including 3 features

a. Get frequent flyer miles per year

b. Percentage of time spent playing video games

c. Liters of ice cream consumed per week

two。 Preparing data: data normalization

KNN uses Euclid distance formula

[(0.80-12) ^ 2 + (134000) ^ 2 + (0.5-0.90) ^ 2] ^ 0.5

It is easy to see that attributes with large values have the greatest impact on the calculation results. In other words, the impact of "frequent flyer mileage per year" is far greater than that of other attributes. We need equal weight characterization.

NewValue = (oldValue-min) / (max-min)

$cat datingTestSet2.txt | head-n 440920 8.326976 0.953952 314488 7.153469 1.673904 226052 1.441871 0.805124 175136 13.147394 0.428964 "A text file is converted into a data matrix def file2matrix (filename): fr = open (filename) arrayOLines = fr.readlines () = = > read how many lines there are in the document To build a matrix numberOfLines = len (arrayOLines) returnMat = zeros ((numberOfLines,3)) = > initial 0 matrix classLabelVector= [] index = 0 for line in arrayOLines: line = line.strip () listFromLine = line.split ('\ t') = = >\ t separate each line returnMat [index :] = listFromLine [0:3] = > one row of classLabelVector.append (int (listFromLine [- 1])) of each eigenvalue composition matrix = > label vector index + = 1 return returnMat ClassLabelVector # data normalization # newValue = (oldValue-min) / (max-min) def autoNorm (dataSet): minVals = dataSet.min (0) = > calculate the minimum of each column of the matrix maxVals = dataSet.max (0) = > calculate the maximum of each column of the matrix ranges = maxVals-minVals = = > subtract from the matrix That is, corresponding to the row minus m = dataSet.shape [0] = = > the number of rows of matrix normDataSet = dataSet-tile (minVals, (mpen1)) = > tile (minVals, (mpen1)), construct a minimum matrix with the same number of rows normDataSet = normDataSet / tile (ranges, (mPower1)) = > tile (ranges, (mMagne1) construct a difference matrix return normDataSet,ranges with the same number of rows MinVals def datingClassTest (): hoRatio = 0.1 = > split sample Part as training sample and part as test sample datingDataMat,datingLabels = file2matrix ('') = = > load text file And transformed into data matrix normMat,ranges,minVals = autoNorm (datingDataMat) = > data normalization m = normMat.shape [0] numTestVecs = int (m*hoRatio) errorCount = 0.0 for i in range (numTestVecs): classifierResult = classify0 (normMat [numTestVecs:m,:], datingLabels [numTestVecs:m] 3) = = > print 'the classifier came back with:%d for each row of the input test set, the real answer is:%d'% (classifierResult,datingLabels [I]) if (classifierResult! = datingLabels [I]): = > compare the calculated results with the original results Statistical error rate errorCount+=1.0 print 'the total error rate is:% f'% (errorCount/float (numTestVecs)) print errorCount Thank you for reading, this is the content of "how to use the kNN algorithm". After the study of this article, I believe you have a deeper understanding of how to use the kNN algorithm, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.