How to parse Stacking and Blending mode 07/13 Update SLTechnology News&Howtos

How to parse Stacking and Blending mode

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is about how to analyze Stacking and Blending, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

I. introduction to the idea of stacking

The idea of 1.Stacking is a hierarchical fusion model. For example, when we merge three GBDT models trained with different characteristics, we will take the three GBDT as the basic model, on which we train a sub-learner (usually a linear model LR), which is used for the organization to use the answer of the base learner, that is, to take the answer of the grass-roots model as input. Let the secondary learning organization assign weights to the answers of the grass-roots model.

two。 The following figure is a simple example. An and B are learners, C, D and E are sub-learners for re-organizing answers, and sub-learners use the answers provided by the underlying model.

II. Interpretation of stacking process

The main idea of Stacking is to train the model to learn the prediction results using the underlying learner. The following figure shows a 50% discount process in which the base model generates prediction results on all data sets in stacking. The secondary learner will retrain based on the prediction results of the model. The process of generating prediction results by a single base model is as follows:

* first generate a test set and a training set from all the data sets (if the training set is 10000 and the test set is 2500 rows), then the upper layer will do a 50% discount cross-check, using 8000 items in the training set as the feeding set, and the remaining 2000 rows as the verification set (orange)

* each validation is equivalent to training a model using 8000 pieces of blue data, using the model to validate the verification set to get 2000 pieces of data, and predicting the test set to get 2500 pieces of data, so that after five cross-checks, we can get the results of the intermediate orange 52000 validation sets (equivalent to the prediction results of each data) and the prediction results of 52500 test sets.

* the prediction results of the verification set will then be spliced into a matrix of 10000 length, marked A1, while the prediction results of the test set with 5 rows of 2500 rows will be weighted averaged to get a matrix of 2500 columns, marked B1.

* above, we get the prediction results A1 and B1 of a base model on the data set, so that when we integrate the three base models, we get six matrices: A1, A2, A3, B1, B2, B3.

* then we will combine A1, A2, A3 into a matrix with 10000 rows and 3 columns as training data,B1, B2, and B3 into a matrix with 2500 rows and 3 columns as testing data, and let the lower-level learner retrain based on this data.

* retraining is based on the prediction results of each basic model as features (three features). The secondary learner will learn to give weight w to the prediction results of such a basic learning to make the final prediction the most accurate.

The above is the idea of Stacking, Stacking integration also requires the base learner to remain independent as far as possible, the effect is similar.

III. Stacking characteristics

Using stacking, you can combine more than 1000 models, sometimes for dozens of hours. However, these monstrous integration methods also have their uses:

(1) it can help you beat the best algorithm in academic circles.

(2) it is possible to transfer the integrated knowledge to a simple classifier.

(3) the automatic large-scale integration strategy can effectively resist over-fitting by adding regular items, and it does not need too much parameter adjustment and feature selection. So in principle, stacking is very suitable for lazy people.

(4) this is the best way to improve the effect of machine learning, or the most efficient method, human ensemble learning.

4. Comparison between stacking and Blending

1.Blending is very similar to Stacking, but it is simpler than Stacking. The difference between the two is:

* blending directly prepares a portion of the 10% reserved set to continue to predict only on the reserved set, trains different Base Model with disjoint data, and averages their output (weighted). The implementation is simple, but the use of training data is less.

The advantage of 2.blending is that it is simpler than stacking and does not cause data traversal (the so-called data creation, such as using global statistical characteristics when training some data, resulting in excessive good results of the model). Generalizers and stackers use different data and can add other models to blender at any time.

3. The disadvantage is that blending uses only part of the dataset as the reserved set for validation, while stacking uses multi-fold cross-validation, which is more robust than using a single reserved set.

4. Both methods are good. Depending on your preference, you can do Blending in part and Stacking in part.

The above is how to analyze Stacking and Blending. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.