In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to understand and implement the KNN algorithm, which may not be well understood by many people. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
Knn introduction
Neighbor algorithm, or K-nearest neighbor (kNN,k-NearestNeighbor) classification algorithm, is one of the simplest methods in data mining classification technology. The so-called K nearest neighbor means k nearest neighbors, which means that each sample can be represented by its nearest k neighbors. In ordinary life, we will subconsciously apply it to our judgment, such as rich and poor areas, to judge whether a person is rich or poor, according to his friends' judgment, is the use of kNN's ideas.
KNN is classified by measuring the distance between different eigenvalues. Its idea is that if most of the k most similar samples in the feature space (that is, the nearest neighbor in the feature space) belong to a certain category, then the sample also belongs to this category. K is usually an integer no more than 20. In the KNN algorithm, the selected neighbors are all objects that have been correctly classified. In the decision-making of classification, this method only determines the category of the sample to be divided according to the category of the nearest sample or samples.
In KNN, the distance between objects is calculated as a non-similarity index between objects, which avoids the problem of matching between objects. Here the distance is generally Euclidean distance.
Implementation of KNN algorithm
Mainly refer to Liu Jianping Pinard blog post K nearest neighbor method (KNN) principle summary, Liu Jianping Pinard blog post has a very profound insight into each algorithm, generally, when you do not understand Li Hang's "statistical learning method", it will suddenly enlighten you to read Liu Da's blog. His blog post mentioned that scikit-learn only uses brute force implementation (brute-force), KD tree implementation (KDTree) and ball tree (BallTree) implementation, so he only discusses the implementation principles of these algorithms in this article. The rest of the implementation methods such as BBF tree and MVP tree are not discussed. Children's shoes who need to have a deeper understanding of the algorithm, follow Liu Jianping's Pinard article ~
Actual combat code
This part mainly refers to the actual combat, and then mainly explains some specific implementation.
The following code imports the required libraries for the program to run
From numpy import *
Import operator
The following program mainly implements the function of generating test data.
Def createDataSet ():
Group = array ([[1.0cr 1.1], [1.0je 1.0], [0d0], [0pc0.1]])
Labels = ['Achilles Magna', 'Achilles'','B']
Return group,labels
Group,labels = createDataSet ()
Output:
In [2]: group
Out [2]: array ([[1,1.1])
[1., 1.]
[0. , 0. ]
[0. , 0.1]])
In [3]: labels
Out [3]: ['A','A','B','B']
The following code mainly implements the function of classification using knn
Def classify0 (inX,dataSet,labels,k):
DataSetSize = dataSet.shape [0]
# functions of tile extension Matrix
DiffMat = tile (inX, (dataSetSize,1))-dataSet
SqdiffMat = diffMat**2
SqDistances = sqdiffMat.sum (axis = 1)
Distances = sqDistances**0.5
SortedDistIndicies = distances.argsort ()
Print (sortedDistIndicies)
ClassCount= {}
For i in range (k):
VoteLabels = labels [sorted DistIndices [I]]
# dict.get gets the value of the specified key. None is returned by default. If the key value does not exist, it is different from dict ['key'] which directly returns error. It can also be specified. The value specified below is 0.
ClassCount [voteLabels] = classCount.get (voteLabels,0) + 1
Print (classCount)
# Python3.5: iteritems becomes items (python2 classCount.iteritems ())
# items can output (key,value) in dict
The key parameter in # sorted is passed into the function. Instead of getting the value, the operator.itemgetterr function defines a function that acts on the object to obtain the value.
# operator.itemgetter (1) is to get the second parameter in classCount.items ()
SortedClassCount = sorted (classCount.items (), key = operator.itemgetter (1), reverse = True)
Print (sortedClassCount)
Return sortedClassCount [0] [0]
Given the output, give the classification value
In [7]: classify0 ([0jue 0.2], group,labels,2)
[3 2 1 0]
{'Bamboo: 2}
[('Barrier, 2)]
Out [6]:'B'
In-depth interpretation of actual combat code
Argsort function
The argsort () function arranges the elements in x from small to large, extracts their corresponding index (index), and then outputs them to y.
The output is in the order from small to large.
Example:
Import numpy as np
A = np.array ([2jue 0pr 4 dint 1m 2m 4je 5])
A.argsort ()
The output is a sorted index from smallest to largest:
Out [12]: array ([1,3,0,4,2,5,6], dtype=int64)
The output is the index of list, and the order of list from small to large is extracted.
Sort interpretation
Dict.get vs dict ['key']
A = {'name':' wang'}
Dict ['key'] output
A ['age']
Out [16]: KeyError: 'age'
Dict.get output:
A.get ('age')
A.get ('age', 10)
Out [17]: 10
Dict ['key'] can only get the value that exists. If it does not exist, KeyError will be triggered.
Dict.get (key, default=None) returns a default value if it does not exist. If it is set, it is set, otherwise it is None
Sort and sorted functions in Python
Sorting a list with the sort function affects the list itself, but sorted does not
A = [1, 2, 1, 4, 4, 3, 5]
A.sort ()
AOut [18]: [1, 1, 2, 3, 4, 5]
The sort function changes the order of a
A = [1, 2, 1, 4, 4, 3, 5]
Sorted (a)
AOut [19]: [1, 2, 1, 4, 3, 5]
Sorted did not change the order of a
Sorted function
Sorted (iterable,cmp,key,reverse) (usage of pyhton2)
Python3 sorted has removed support for cmp.
List1 = [('david', 90), (' mary',90), ('sara',80), (' lily',95)]
Sorted (list1,cmp = lambda xQuery y: cmp (x [0], y [0]))
TypeError: 'cmp' is an invalid keyword argument for this function
Sort with the key function
Sorted (list1,key = lambda list1: list1 [0])
Out [23]: [('david', 90), (' lily', 95), ('mary', 90), (' sara', 80)]
List1 [0] means to sort with the first element in list
Sorted (list1,key = lambda list1: list1 [1])
Out [24]: [('sara', 80), (' david', 90), ('mary', 90), (' lily', 95)]
List1 [1] means to sort with the second element in list.
Three sorted interview questions
1) the application of key function
Students = [('john',' Aids, 15), ('jane',' Bones, 12), ('dave','B', 10)]
Sorted (students,key=lambda s: s [2]) # sort by age
2) sorting of multiple characters
Asdf234GDSdsf23' this is a string sort, collation: lowercase
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.