Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use the Blending algorithm of Python

2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces Python Blending algorithm how to use the relevant knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe everyone will read this Python Blending algorithm how to use the article will have some gains, let's take a look at it together.

I. Foreword

General Machine Learning: Learning a hypothesis from training data.

Ensemble approach: An attempt to construct a set of hypotheses and combine them, ensemble learning is a machine learning paradigm in which multiple learners are trained to solve the same problem.

Integration methods are classified as:

Bagging(Parallel Training): Random Forest

Boosting: Adaboost; GBDT; XgBoost

Stacking:

Blending:

or classified into serial integration method and parallel integration method

1. Serial model: improve the performance of the model by giving a large weight to the misclassified samples through the dependence between the base models.

2. Principle of parallel model: Take advantage of the independence of the basic model, and then reduce the error greatly by averaging.

Introduction to Blending

The training data is divided into training and validation sets + new training data set and new test set

The training data is divided into two parts, one part is used to train base model, the other part is used to train meta-model after model prediction.

The test data is also predicted by the base model to form new test data. Finally, the metamodel predicts new test data. The Blending framework looks like this:

Note: This is based on stacking and dividing data.

III. Blending process

Step 1: Divide the raw training data into training set and validation set.

Step 2: Train T different models using the training set pairs.

Step 3: Use T basis models to predict the validation set and use the results as new training data.

Step 4: Train a metamodel using the new training data.

Step 5: Use T basis models to predict the test data, and the results are used as new test data.

Step 6: Use the metamodel to predict the new test data and get the final result.

IV. Cases

Related Toolkit Loading

import numpy as npimport pandas as pd import matplotlib.pyplot as pltplt.style.use("ggplot")%matplotlib inlineimport seaborn as sns

create data

from sklearn import datasets from sklearn.datasets import make_blobsfrom sklearn.model_selection import train_test_splitdata, target = make_blobs (n_samples=10000, centers=2, random_state=1, cluster_std=1.0 )##Create training and test sets X_train1,X_test,y_train1,y_test = train_test_split (data, target, test_size=0.2, random_state=1)##Create training and validation sets X_train,X_val,y_train,y_val = train_test_split (X_train1, y_train1, test_size=0.3, random_state=1)print ("The shape of training X:",X_train.shape)print ("The shape of training y:",y_train.shape)print ("The shape of test X:",X_test.shape)print ("The shape of test y:",y_test.shape)print("The shape of validation X:",X_val.shape)print("The shape of validation y:",y_val.shape)

Set up the first level classifier

from sklearn.svm import SVCfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.neighbors import KNeighborsClassifierclfs = [SVC(probability=True),RandomForestClassifier(n_estimators=5,n_jobs=-1,criterion='gini'),KNeighborsClassifier()]

Set up the second level classifier

from sklearn.linear_model import LinearRegressionlr = LinearRegression()

the first layer

val_features = np.zeros((X_val.shape[0],len(clfs)))test_features = np.zeros((X_test.shape[0],len(clfs)))for i,clf in enumerate(clfs): clf.fit(X_train,y_train) val_feature = clf.predict_proba(X_val)[:,1] test_feature = clf.predict_proba(X_test)[:,1] val_features[:,i] = val_feature test_features[:,i] = test_feature

the second layer

lr.fit(val_features,y_val)

Output predicted results

lr.fit_features,y_val)from sklearn.model_selection import cross_val_scorecross_val_score(lr,test_features,y_test,cv=5) About "Python Blending algorithm how to use" The content of this article is introduced here, thank you for reading! I believe everyone has a certain understanding of "how to use Python's Blending algorithm." If you still want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 210

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report