In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces Python Blending algorithm how to use the relevant knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe everyone will read this Python Blending algorithm how to use the article will have some gains, let's take a look at it together.
I. Foreword
General Machine Learning: Learning a hypothesis from training data.
Ensemble approach: An attempt to construct a set of hypotheses and combine them, ensemble learning is a machine learning paradigm in which multiple learners are trained to solve the same problem.
Integration methods are classified as:
Bagging(Parallel Training): Random Forest
Boosting: Adaboost; GBDT; XgBoost
Stacking:
Blending:
or classified into serial integration method and parallel integration method
1. Serial model: improve the performance of the model by giving a large weight to the misclassified samples through the dependence between the base models.
2. Principle of parallel model: Take advantage of the independence of the basic model, and then reduce the error greatly by averaging.
Introduction to Blending
The training data is divided into training and validation sets + new training data set and new test set
The training data is divided into two parts, one part is used to train base model, the other part is used to train meta-model after model prediction.
The test data is also predicted by the base model to form new test data. Finally, the metamodel predicts new test data. The Blending framework looks like this:
Note: This is based on stacking and dividing data.
III. Blending process
Step 1: Divide the raw training data into training set and validation set.
Step 2: Train T different models using the training set pairs.
Step 3: Use T basis models to predict the validation set and use the results as new training data.
Step 4: Train a metamodel using the new training data.
Step 5: Use T basis models to predict the test data, and the results are used as new test data.
Step 6: Use the metamodel to predict the new test data and get the final result.
IV. Cases
Related Toolkit Loading
import numpy as npimport pandas as pd import matplotlib.pyplot as pltplt.style.use("ggplot")%matplotlib inlineimport seaborn as sns
create data
from sklearn import datasets from sklearn.datasets import make_blobsfrom sklearn.model_selection import train_test_splitdata, target = make_blobs (n_samples=10000, centers=2, random_state=1, cluster_std=1.0 )##Create training and test sets X_train1,X_test,y_train1,y_test = train_test_split (data, target, test_size=0.2, random_state=1)##Create training and validation sets X_train,X_val,y_train,y_val = train_test_split (X_train1, y_train1, test_size=0.3, random_state=1)print ("The shape of training X:",X_train.shape)print ("The shape of training y:",y_train.shape)print ("The shape of test X:",X_test.shape)print ("The shape of test y:",y_test.shape)print("The shape of validation X:",X_val.shape)print("The shape of validation y:",y_val.shape)
Set up the first level classifier
from sklearn.svm import SVCfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.neighbors import KNeighborsClassifierclfs = [SVC(probability=True),RandomForestClassifier(n_estimators=5,n_jobs=-1,criterion='gini'),KNeighborsClassifier()]
Set up the second level classifier
from sklearn.linear_model import LinearRegressionlr = LinearRegression()
the first layer
val_features = np.zeros((X_val.shape[0],len(clfs)))test_features = np.zeros((X_test.shape[0],len(clfs)))for i,clf in enumerate(clfs): clf.fit(X_train,y_train) val_feature = clf.predict_proba(X_val)[:,1] test_feature = clf.predict_proba(X_test)[:,1] val_features[:,i] = val_feature test_features[:,i] = test_feature
the second layer
lr.fit(val_features,y_val)
Output predicted results
lr.fit_features,y_val)from sklearn.model_selection import cross_val_scorecross_val_score(lr,test_features,y_test,cv=5) About "Python Blending algorithm how to use" The content of this article is introduced here, thank you for reading! I believe everyone has a certain understanding of "how to use Python's Blending algorithm." If you still want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 210
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.