How to realize the principle Analysis of Adaboost 04/15 Update SLTechnology News&Howtos

How to realize the principle Analysis of Adaboost

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article shows you how to achieve the principle analysis of Adaboost, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.

Basic principles

The basic principle of Adaboost algorithm is to combine multiple weak classifiers (the weak classifier generally chooses a single-layer decision tree) to make it a strong classifier.

Adaboost adopts the idea of iteration, only one weak classifier is trained in each iteration, and the trained weak classifier will participate in the use of the next iteration. In other words, in the Nth iteration, there are a total of N weak classifiers, of which one is previously trained, and its parameters are no longer changed. This training is the Nth classifier. Among them, the relationship of the weak classifier is that the N weak classifier is more likely to divide the data which is not divided into pairs of the previous Nmuri weak classifier, and the final classification output depends on the comprehensive effect of the N classifiers.

Weak classifier (single-layer decision tree)

Adaboost generally uses single-layer decision tree as its weak classifier. The single-layer decision tree is the simplest version of the decision tree, and there is only one decision point, that is to say, if the training data has multi-dimensional features, the single-layer decision tree can only select one-dimensional features to make decisions, and there is also a key point. the threshold of decision also needs to be considered.

With regard to the decision points of a single-level decision tree, let's look at a few examples. For example, when a feature has only one dimension, it can be divided into one category with less than 7, marked as + 1, and those greater than (equal to) 7 into another category, marked as-1. Of course, 13 can also be used as the decision point, those whose decision direction is greater than 13 are divided into + 1 category, and those less than (equal to) 13 are divided into-1 category. In the single-level decision tree, there is only one decision point, so the two decision points in the following figure cannot be selected at the same time.

By the same token, when the feature has two dimensions, the ordinate 7 can be used as the decision point, the decision direction is less than 7 divided into + 1 category, greater than (equal to) 7 category-1 category. Of course, we can also use Abscissa 13 as the decision point, those whose decision direction is greater than 13 are divided into + 1 category, and those less than 13 are divided into-1 category. In the single-level decision tree, there is only one decision point, so the two decision points in the following figure cannot be selected at the same time.

It is the same to extend to three-dimensional, four-dimensional and N-dimension. in the single-level decision tree, there is only one decision point, so we can only choose an appropriate decision threshold as the decision point in one of the dimensions.

On the two weights of Adaboost

There are two kinds of weights in Adaboost algorithm, one is the weight of data, the other is the weight of weak classifier. Among them, the weight of the data is mainly used for the weak classifier to find the decision point with the smallest classification error, and then use this minimum error to calculate the weight of the weak classifier (voice). The greater the classifier weight means that the weak classifier has a greater say in the final decision.

Adaboost data weight and weak Classifier

The principle of single-layer decision tree has just been introduced, and there is a problem here: if the training data remains the same, then the best decision point found by the single-layer decision tree must be the same every time. Why? Because the single-layer decision tree looks for all the possible decision points and chooses the best one, if the training data remains the same, then the best point found every time is of course the same point.

Therefore, here the Adaboost data weight comes in handy, the so-called "the weight of the data is mainly used for the weak classifier to find the point with the smallest classification error". In fact, when calculating the error of the single-layer decision tree, Adaboost requires it to multiply the weight, that is, to calculate the error with weight.

For example, in the past, when there was no weight (in fact, it was a draw weight), when there were 10 points, the weight of each point was 0.1, and the error rate was increased by 0.1 when there was one error; when there were 3 errors, the error rate was 0.3. Now, the weight of each point is different, or 10 points, the weight is in turn [0.01, 0.01, 0.01, 0.01, 0.01], if the first point is wrong, then the error rate is 0.01, if the third point is wrong, then the error rate is 0.01, if the last point is wrong, then the error rate is 0.91. In this way, when selecting decision points, it is natural to try to pair the important points (the last point in this case) in order to reduce the error rate. Thus it can be seen that the weight distribution affects the selection of decision points in the single-layer decision tree, the important points get more attention, and the small ones get less attention.

In Adaboost algorithm, the weight of each weak classifier will be adjusted after training, and the weight of the points misclassified in the last round of training will be increased. In this round of training, due to the influence of weight, the weak classifier of this round will be more likely to pair the misclassified points of the previous round. If there is still no pairing, then the weight of the misclassified points will continue to increase, and the next weak classifier will pay more attention to this point and try to divide it into pairs.

In this way, to achieve "I will divide what you are wrong", the next classifier mainly focuses on the points where the previous classifier is not right, and each classifier has its own emphasis.

Weight of Adaboost classifier

Because the relationship of several classifiers in Adaboost is that the N-th classifier is more likely to divide the data that the N-1 classifier is not paired, it can not guarantee that the data of the previous pair can be paired at the same time. Therefore, in Adaboost, each weak classifier has its own most concerned point, and each weak classifier only pays attention to a part of the data in the whole data set, so they must be combined together to play a role. Therefore, the final voting needs to be weighted according to the weight of the weak classifier, and the weight is calculated according to the classification error rate of the weak classifier. The general rule is that the lower the error rate of the weak classifier, the higher the weight.

Illustrating the structure of Adaboost classifier

The figure shows the overall structure of the Adaboost classifier. From right to left, you can see the final summation and symbol function. Before you see the summation on the left, the dotted line in the graph represents the iterative effect of different rounds. In the first iteration, there is only the structure of the first line, and in the second iteration, it includes the structure of the first line and the second line. Each iteration adds one line structure, and the "cloud" at the bottom of the figure indicates the omission of the iterative structure.

The first round of iterations will do several things:

1. New weak classifier WeakClassifier (I) and weak classifier weight alpha (I)

two。 The weak classifier WeakClassifier (I) is trained by data set data and data weight W (I), and its classification error rate is obtained, based on which the weak classifier weight alpha (I) is calculated.

3. Through the method of weighted voting, the final prediction output of all weak classifiers is obtained by weighted voting, and the final classification error rate is calculated. if the final error rate is lower than the set threshold (for example, 5%), then the iteration ends; if the final error rate is higher than the set threshold, then update the data weight to get W (iTun1)

Illustration of Adaboost weighted voting results

With regard to the final weighted vote, here are a few examples:

For example, in the one-dimensional feature, after three iterations, and know the decision point and voice of the weak classifier after each iteration, to see how to achieve weighted voting.

As shown in the figure, three decision points are obtained after three iterations.

The leftmost decision point is divided into + 1 category for those less than (equal to) 7, and-1 for those greater than 7, and the weight of the classifier is 0.5.

The middle decision point is greater than (equal to) 13 is divided into + 1 category, less than 13 is divided into-1 category, weight 0.3

The rightmost decision points are less than (equal to 19) into + 1 category, greater than 19 into-1 category, with a weight of 0.4.

For the leftmost weak classifier, its vote shows that the area less than (equal to) 7 is 0.5, large and 7 is-0.5. Similarly, for the middle classifier, the vote is 0.3 for greater than (equal to) 13 and-0.3 for less than 13. The rightmost vote is 0.4 for less than (equal to 19) and-0.4 for greater than 19, as shown in the following figure:

Peace is available:

Finally, the final classification result can be obtained by transforming the symbolic function.

How to get the code of this article:

You can get it by following Wechat's official account datayx and replying to boost.

More intuitively, let's look at a more complex example. The same is true for 2D. There happens to be an example to analyze. The original data is distributed as shown in the following figure:

The Adaboost classifier tries to separate the two types of data and run the program to show the decision points, as shown in the following figure:

From this point of view, it seems to be separated, but what are the specific parameters? Looking at the output of the program, you can get weights such as its decision points and weak classifiers, which are marked as follows in the figure:

The figure is divided into six sub-regions, and the corresponding categories for each area are:

No. 1: sign (- 0.998277 / 0.874600-0.608198) =-1

No. 2: sign (+ 0.998277 / 0.874600-0.608198) = + 1

No. 3: sign (+ 0.998277 / 0.874600 / 0.608198) = + 1

No. 4: sign (- 0.998277-0.874600-0.608198) =-1

No. 5: sign (+ 0.998277-0.874600-0.608198) =-1

No. 6: sign (+ 0.998277-0.874600 / 0.608198) = + 1

Where sign (x) is a symbolic function, a positive number returns 1 and a negative number returns-1.

Finally, the results are as follows:

Through these two examples, I believe you have understood what happened when the Adaboost algorithm weighed voting.

Having said so much and giving so many examples, it is to let you understand the basic principles of Adaboost from the details. Understanding the relationship between the two weights of Adaboost is the key to understanding the Adaboost algorithm.

The above content is how to realize the principle analysis of Adaboost. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.