How to understand the distance discrimination in R language classification algorithm 04/29 Update SLTechnology News&Howtos

How to understand the distance discrimination in R language classification algorithm

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How to understand the distance discrimination in R language classification algorithm, I believe that many inexperienced people do not know what to do about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

1. Analysis of the principle of distance discrimination

A judgment is made according to the distance between the sample to be determined and the known class sample. According to the known class sample information, the distance discriminant function is established, and then the attribute data of each sample to be determined is calculated generation by generation to get the distance value, according to which the sample is judged into the sample cluster of the category with the smallest distance value.

K-nearest neighbor algorithm is the most widely used distance discriminant method. His idea is that if most of the K most similar / adjacent samples in the feature space belong to a certain category, then the sample also belongs to this category.

The three solid sample points in the figure are surrounded by three known types of sample points represented by circular, triangular and square hollow points respectively. Now let's take Kraft 5, that is, circle the five sample points closest to the sample points to be classified, and then check their categories. Among the five points, there are more samples in which category the unknown sample belongs to. Easily available unknown samples (from left to right) belong to circle, triangle and square in turn.

When discriminating by K-nearest neighbor method, because it mainly depends on the information of the surrounding limited adjacent samples, rather than the method of discriminating class domain, it is more suitable than other methods for the sample set with more overlapping or overlapping class domains.

two。 Application in R language

In the K nearest neighbor (K-Nearest Neighbor,KNN) algorithm, we mainly use the class packet.

The knn (train,test,cl,k=1,1=0,prob=FALSE,use.all=TRUE) function.

On the other hand, in the weighted k-nearest neighbor Weighted KKNN, we mainly use the

Kknn (formula=formula (train), train,test,na.action=na.omit (), Kendall 7, ordered= distancestry 2) Kernel = "optimal", ykernel=NULL,scale=TRUE,contrasts=c ('unordered'= "contr.dummy", ordered= "contrl.rodinal")) function.

3. Discriminant analysis with iris dataset as an example

1) apply the model and observe the output

Library (kknn) fit_pre_kknn=kknn (Species~.,data_train,data_test [,-5], KJN 5) fit_pre_ KKN [1: length (fit_pre_kknn)]

2) testing the accuracy of the model

Table (data_test$Species, fit_pre_kknn$fitted.values) sum (as.numeric (as.numeric (fit_pre_kknn$fitted.values)! = as.numeric (data_test$Species) / nrow (data_test)

After reading the above, have you mastered how to understand the distance discrimination method in R language classification algorithm? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.