In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "what are the problems of using kNN algorithm in machine learning". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. please follow the editor's train of thought to study and learn "what are the problems of using kNN algorithm in machine learning"?
Skill test question and answer 1) the k-NN algorithm makes more calculations on the test time than the training time.
A) True B) false
Solution: a
The training phase of the algorithm only includes storing the feature vectors and category labels of the training samples.
In the testing phase, the test points are classified by assigning the most frequently used tags among the k training samples closest to the query point-thus requiring a higher amount of computation.
2) suppose that the algorithm you use is the k nearest neighbor algorithm, and in the following image, _ _ will be the best value of k.
A) 3 B) 10 C) 20 D) 50 solution: B
When the value of k is 10:00, the verification error is minimum.
3) which distance metric cannot be used in k-NN?
A) Manhattan B) Minkowski C) Tanimoto D) Jaccard E) Mahalanobis F) can be used
Solution: F
All of these distance metrics can be used as distance metrics for k-NN.
4) which of the following options is true about the k-NN algorithm?
A) can be used for classification B) can be used for regression C) can be used for classification and regression
Solution: C
We can also use k-NN for regression problems. In this case, the prediction can be based on the mean or median of the k most similar instances.
5) which of the following statements is true about the k-NN algorithm?
If the proportion of all data is the same, the effect of k-NN will be better.
K-NN works well with a few input variables (p), but encounters difficulties when the number of inputs is large
K-NN does not make any assumptions about the functional form of the problem being solved.
A) 1 and 2 B) 1 and 3 C) only 1 D) all above
Solution: d
The above statement is the hypothesis of kNN algorithm.
6) which of the following machine learning algorithms can be used to estimate the missing values of classified variables and continuous variables?
A) K-NN B) Linear regression C) Logistic regression
Solution: a
K-NN algorithm can be used to estimate the missing values of classified variables and continuous variables.
7) which of the following is true about Manhattan distance?
A) can be used for continuous variables B) can be used for classification variables C) can be used for classification variables and continuous variables D) none
Solution: a
The Manhattan distance is designed to calculate the distance between actual value features.
8) which distance measure do we use for classification variables in k-NN?
Hamming distance
Euclidean distance
Manhattan distance
A) 1 B) 2 C) 3 D) 1 and 2 E) 2 and 3 F) 1 and 3
Solution: a
Euclidean distance and Manhattan distance are used in the case of continuous variables, while hamming distance is used in the case of classified variables.
9) which of the following is the Euclidean distance between two data points A (1p3) and B (2p3)?
A) 1 B) 2 C) 4 D) 8
Solution: a
Sqrt ((1-2) ^ 2 + (3-3) ^ 2) = sqrt (1 ^ 2 + 0 ^ 2) = 1
10) which of the following is the Manhattan distance between two data points A (1p3) and B (2p3)?
A) 1 B) 2 C) 4 D) 8
Solution: a
Sqrt (mod ((1-2)) + mod ((3-3)) = sqrt (1 + 0) = 1
Content: 11-12
Suppose you give the following data, where x and y are two input variables and Class is a dependent variable.
The following is a scatter chart that shows the above data in 2D space.
11) suppose you want to use the Euclidean distance in 3-NN to predict the categories of new data points x = 1 and y = 1. Which category does the data point belong to?
A) + class B)-class C) can not judge D) none of these are
Solution: a
All three closest points are class +, so this point is classified as class +.
12) in the previous question, you now want to use 7-NN instead of 3-KNN. Which of the following x = 1 or y = 1 belongs to?
A) + Class B)-Class
C) unable to judge
Solution: B
This point is now classified as-class, because there are 4-class points and 3 + class points in the nearest circle.
Content 13-14:
Suppose you provide the following two types of data, where "+" represents a positive class and "-" represents a negative class.
13) which of the following k values in k-NN minimizes the accuracy of cross-validation by leaving one method?
A) 3 B) 5 C) both are the same D) none of them
Solution: B
5-NN will leave at least one cross-validation error.
14) which of the following is the accuracy of no cross-validation when k = 5?
A) 2 + 14 B) 4 + 14 C) 6 + 14 D) 8 + + 14 E) none of the above is
Solution: e
In 5-NN, we will have a cross-validation accuracy of 10 prime 14.
15) according to the deviation, which of the following is true about k in k-NN?
A) when you increase k, the deviation will increase B) when you decrease k, the deviation will increase C) can not judge D) none of these are
Solution: a
Large K means simple model, and simple model is always regarded as high deviation.
16) which of the following is true about k in the variance k-NN?
A) when you increase k, the variance increases B) when you decrease k, the variance increases C) you can't tell D) none of these are
Solution: B
A simple model will be regarded as a model with small variance.
17) the following two distances (Euclidean distance and Manhattan distance) have been given, which we usually use in the K-NN algorithm. These distances are between point A (x1 ~ Y1) and point B (x _ 2 ~ ~ Y2).
Your task is to mark two distances by looking at the following two graphics. Which of the following options is true about the following picture?
A) left is Manhattan distance, right is Euclidean distance B) left is Euclidean distance, right is Manhattan distance C) left or right is not Manhattan distance D) left or right is not Euclidean distance solution: B
The picture on the left shows how Euclid distance works, and the picture on the right shows the Manhattan distance.
18) which of the following options will you consider in k-NN when you find noise in your data?
A) I will increase the value of k B) I will decrease the value of k C) noise cannot depend on k D) none of these are
Solution: a
To ensure your classification, you can try to increase the value of k.
19) in k-NN, due to the existence of dimension, it is likely to be overfitted. Which of the following options will you consider to solve this problem?
Dimension reduction
Feature selection
A) 1 B) 2 C) 1 and 2 D) none of these are
Solution: C
In this case, you can use dimensionality reduction algorithm or feature selection algorithm
20) here are two statements. Which of the following two statements is true?
K-NN is a memory-based method, that is, the classifier adjusts as soon as we collect new training data.
In the worst case, the computational complexity of new sample classification increases linearly with the increase of the number of samples in the training data set.
A) 1 B) 2 C) 1 and 2 D) none of these are
Solution: C
21) suppose you give the following images (left 1, middle 2 and right 3). Now your task is to find out the k value of k-NN in each image, where K1 represents the first graph, K2 represents the second graph, and K3 is the third graph.
A) K1 > K2 > K3 B) K1
< k2 C)k1 = k2 = k3 D)这些都不是 解决方案:D k值在k3中最高,而在k1中则最低 22)在下图中,下列哪一个k值可以给出最低的留一法交叉验证精度? A)1 B)2 C)3 D)5 解决方案:B 如果将k的值保持为2,则交叉验证的准确性最低。你可以自己尝试。 23)一家公司建立了一个kNN分类器,该分类器在训练数据上获得100%的准确性。当他们在客户端上部署此模型时,发现该模型根本不准确。以下哪项可能出错了? 注意:模型已成功部署,除了模型性能外,在客户端没有发现任何技术问题 A)可能是模型过拟合 B)可能是模型未拟合 C)不能判断 D)这些都不是 解决方案:A 在一个过拟合的模块中,它似乎会在训练数据上表现良好,但它还不够普遍,无法在新数据上给出相同的结果。 24)你给出了以下2条语句,发现在k-NN情况下哪个选项是正确的? 如果k的值非常大,我们可以将其他类别的点包括到邻域中。 如果k的值太小,该算法会对噪声非常敏感 A)1 B)2 C)1和2 D)这些都不是 解决方案:C 这两个选项都是正确的,并且都是不言而喻的。 25)对于k-NN分类器,以下哪个陈述是正确的? A) k值越大,分类精度越好 B) k值越小,决策边界越光滑 C) 决策边界是线性的 D) k-NN不需要显式的训练步骤 解决方案:D 选项A:并非总是如此。你必须确保k的值不要太高或太低。 选项B:此陈述不正确。决策边界可能有些参差不齐 选项C:与选项B相同 选项D:此说法正确 26)判断题:可以使用1-NN分类器构造2-NN分类器吗? A)真 B)假 解决方案:A 你可以通过组合1-NN分类器来实现2-NN分类器 27)在k-NN中,增加/减少k值会发生什么? A) K值越大,边界越光滑 B) 随着K值的减小,边界变得更平滑 C) 边界的光滑性与K值无关 D) 这些都不是 解决方案:A 通过增加K的值,决策边界将变得更平滑 28)以下是针对k-NN算法给出的两条陈述,其中哪一条是真的? 我们可以借助交叉验证来选择k的最优值 欧氏距离对每个特征一视同仁 A)1 B)2 C)1和2 D)这些都不是 解决方案:C 两种说法都是正确的 内容29-30:假设你已经训练了一个k-NN模型,现在你想要对测试数据进行预测。在获得预测之前,假设你要计算k-NN用于预测测试数据类别的时间。 注意:计算两个观测值之间的距离将花费时间D。 29)如果测试数据中有N(非常大)的观测值,则1-NN将花费多少时间? A)N * D B)N * D * 2 C)(N * D)/ 2 D)这些都不是 解决方案:A N的值非常大,因此选项A是正确的 30)1-NN,2-NN,3-NN所花费的时间之间是什么关系。 A)1-NN >2-NN > 3-NN B) 1-NN < 2-NN < 3-NN C) 1-NN ~ 2-NN ~ 3-NN D) none of these are
Solution: C
In kNN algorithm, the training time of any k value is the same.
Overall distribution
The following is the score distribution of the participants:
Thank you for reading, the above is the content of "what are the problems of using kNN algorithm in machine learning". After the study of this article, I believe you have a deeper understanding of the problem of using kNN algorithm in machine learning, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.