How to evaluate the effect of classification in machine learning 07/06 Update SLTechnology News&Howtos

How to evaluate the effect of classification in machine learning

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to evaluate the effect of classification in machine learning. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

Let me give you a question. If the boss asks another colleague to check how many of the 10,000 banknotes are real and how many are counterfeit, then the colleague reports the results to the boss: of these 10,000 banknotes, 2,000 are real and 8,000 are counterfeit. Now, your boss asked you to evaluate the results of this colleague's report. What would you do?

You can re-compare 10,000 banknotes with the machine, or compare the colleague's 2,000 real notes with 8,000 counterfeit notes. However, if you are not allowed to do this, for example, if ten thousand banknotes become one million, it is impossible to give you time to do all of them again, then what do you do?

Spot check.

It seems that this is the only move, so how to conduct spot checks in order to reasonably evaluate the results reported by colleagues?

The general practice is like this: first take a spot check on 2,000 real coins: "if you say that these are all real coins, how many of them are true?" This ratio, in data analysis, is called Precision, precision or accuracy. If 100% is true, it is certainly the best. But even if the accuracy is 100%, it does not mean that the result of this report will be ok, because the other 8, 000 "counterfeit coins" may be all real ones.

Therefore, we have to take a spot check among 8,000 counterfeit coins: "you say they are all fake, but how many of them are fake?" This ratio can also explain the problem. If a hundred tickets are drawn and 90 of them are genuine coins (only 10% of them are genuine counterfeit coins), then it is very unreliable to say that these 8,000 pieces are counterfeit coins. This ratio, Xiao Cheng thinks, is also called accuracy, that is, "how much of your vacation is false?" the higher the value, the better.

In addition to accuracy, is it enough?

It seems to be enough, because accuracy reflects the ability to "judge right from wrong". In the above example, for example, the accuracy of judging a real currency is 98%, which means that only 2% of the counterfeit currency is misjudged as the real currency, so the possibility of me judging the counterfeit currency as the real currency is very low (assuming that 2% is very low in the industry: -). For example, if the accuracy of judging counterfeit coins is 95%, that means that only 5% of the real coins are misjudged to be counterfeit. In this case, the proportion of false judgment as true, or true as false, is very low. Isn't that the end? This shows that the ability to "judge right from wrong" is very strong, ah, this prediction system is credible, ah, it will not misjudge true and false, ah, if you give it a pile of banknotes, it will be able to tell the true from the false, and it will not make mistakes.

However, there is a premise that the system will not know whether the judgment is correct or not until the system is judged. If the system doesn't judge all banknotes, or only 1,000 out of 10,000 banknotes, what do you expect it to do? It is very accurate, but it is only accurate when it is judged, and there are many that have not been judged. But then again, as long as it is very accurate (high precision), there will be people who use the market. We'll talk about this later.

So, there is another indicator called Sensitivity sensitivity, also known as Recall recall rate.

The recall rate reflects the ability to "retrieve". For example, if I give the system 1000 real coins, it can find 800 real coins, and that 80% is the recall rate. If its recall rate is 10%, that means only 100 copies have been recovered. What about 900? At this time, there are two possibilities, one is that I can't judge, for example, I can't judge these 900 tickets, so I can't find them, and the other possibility is that I misjudged them as counterfeit coins. But what does the misjudgment mean? Misjudgment means that the accuracy is poor, so if the accuracy is very high, there is only one possibility, that is, it cannot be judged.

Therefore, both the accuracy and the recall rate depend on whether the accuracy is reliable (what is said), and the recall rate reflects whether the data can be found (how many samples are covered).

Whether it is accuracy or recall rate, it is only a number, and in order to get this number, it is usually obtained through the prediction test of many samples, so it is necessary to reflect these samples, which shows that my "rate" has a reason. At this time, the "confusion matrix" came out.

Confusion matrix, also known as error matrix, the names are translated from the ghost language (thus it can be seen that the meaning is the most important, who knows what is translated). A confusion matrix is a table with real values on one side and predicted values on the other. You can put it any way you like. Just look at the following picture:

"confusion" shows the ability to classify. As an example, Apple's recall rate (the ability to recover how many apples can be recovered in a pile of apples) is 90%. The recall rates of bananas, pears and strawberries are 80%, 95% and 97%, respectively. Xiaocheng uses exactly 100 real values for each category, for the convenience of you who are not good at mental arithmetic. Thus it can be seen that the value on the right diagonal is the recall value, that is, the value that is judged accurately.

Looking at the precision, the line here shows the accuracy. For example, the accuracy of apples is 90 / (90 / 10) = 89%, the accuracy of bananas is 80 / (5 / 80) = 89.9%, pears and strawberries are 90.4% and 92.4% respectively.

Then, the recall rate is looked at together with the precision.

Apples, bananas, pears, strawberries

90% 80% 95% 97%

89% 89.9% 90.4% 92.4%

Strawberry has the highest recall rate and precision, let's congratulate it!

In addition to the recall rate and accuracy, there are two other indicators for evaluating the effectiveness of the classifier (the above predicted apple and banana is the classifier). One is called the total accuracy, which is added by the accurate value (the value on the right diagonal) and divided by the total number of samples. Here is (9080, 995, 97) / 400, 90.5%, and the other is called specificity. For example, for apples, it is the proportion of the samples that are really not apples. However, Xiao Cheng feels that the reference significance of these two indicators is not great.

From a sample table, the recall rate and precision of each category are abstracted. If you abstract it again, it is the F1Score of each category, which is the integration of recall rate and precision: F1 Scorekeeper 2 * precision * recall / (precision + recall). In the above example, Apple's F1Score=2*90%*89%/ (90% 89%) = 89.49%.

Basically, the quality of the classifier is described in terms of accuracy and recall rate, and a F1Score is added at most.

Let's go a little deeper and ask you a question, what scenarios can high precision be used in? What scenarios can be used with a high recall rate?

High precision means no mistakes. It can be used for classification. For example, if you give a batch of samples, there are several ABCD classifications in the samples. In the case of high precision, you can point out which samples are A, which samples are B, and so on. If the accuracy is high, but the recall rate is low, it can also be used for classification, but there are a lot of A samples have not been found (who you can't tell), and so can other classifications.

The recall rate is high, but the precision is low, that is, the classification is unreliable. If it is said that the result of A may be B, then do not classify. At this time, because the recall rate is high, it can be used to screen the existing classification. For example, a batch of samples are basically Class A, but there may be some B and C in it, so you can use this classifier to screen, which is equivalent to error correction. Because the recall rate is high, you can basically find A back. Throw away the rest.

With high recall rate and high precision, it can be used not only for classification, but also for the screening of existing categories.

What if the recall rate is low and the precision is low? That's useless, use manual work!

This is the end of the article on "how to evaluate the effect of classification in machine learning". I hope the above content can be of some help to you, so that you can learn more knowledge. If you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.