In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
The main content of this article is to explain "what is the pipeline processing mechanism". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the pipeline processing mechanism?"
Why Pipeline?
Have you ever encountered this situation: in a machine learning project, various data preprocessing operations for the training set, such as feature extraction, standardization, principal component analysis, etc., have to be reused on the test set.
To avoid repetitive operations, the pipeline mechanism in machine learning is used here.
According to the explanation on sklearn's website, pipeline has the following wonderful uses:
1. Convenience and encapsulation: directly call fit and predict methods to train and predict all algorithm models in pipeline.
2. Joint parameter selection: you can select the parameters of all evaluators in the grid search pipeline at once.
3. Security: the training converter and predictor use the same sample, and the pipeline helps to prevent the statistical data from the test data from leaking into the cross-validated training model.
The principle of Pipeline
Pipeline can connect many algorithm models together to form a typical machine learning problem workflow.
The Pipeline processing mechanism is like stuffing all the models into a tube, and then processing the data in turn to get the final classification result.
Note:
Estimator: estimators, all machine learning algorithm models, are called estimators.
The output of the converter can be put into another converter or estimator as input.
An example of a complete Pipeline step:
1. First of all, the data are preprocessed, such as the processing of missing values.
two。 Standardization of data
3. Dimension reduction
4. Feature selection algorithm
5. Classification or prediction or clustering algorithm (estimator, estimator)
In fact, when the fit method of pipeline is called, the features are processed with a former NMUE converter, which is then passed to the final estimator training. Pipeline inherits all the methods of the last estimator.
The usage of Pipeline
Call method:
Sklearn.pipeline.Pipeline (steps, memory=None, verbose=False)
Detailed description of parameters:
Steps: step: use the (key, value) list to build, where key is the name you give this step, and value is an evaluator object.
Memory: memory parameter, default None
Function of Pipeline
The method of Pipline is to execute the corresponding method in each learner, and if the learner does not have this method, it will report an error. Suppose there are n learners in the Pipline:
Transform: a transform method for executing each learner in turn
Fit: fit and transform methods are executed on the first n Mel 1 learner in turn, and the nth learner (the last learner) executes the fit method
Predict: a predict method for executing the nth learner
Score: a score method for executing the nth learner
Set_params: set the parameters of the nth learner
Get_param: get the parameters of the nth learner
The wonderful use of Pipeline: modular Feature Transform
Taking the classification task of Iris dataset as an example
From sklearn.pipeline import Pipeline
From sklearn.preprocessing import StandardScaler
From sklearn.svm import SVC
From sklearn.decomposition import PCA
From sklearn.datasets import load_iris
Iris=load_iris ()
Pipe=Pipeline ([('sc', StandardScaler ()), (' pca',PCA ()), ('svc',SVC ()]))
# ('sc', StandardScaler ()) sc is a custom converter name, and StandardScaler () is a converter that performs standardization tasks
Pipe.fit (iris.data,iris.target)
First use StandardScaler to standardize each column of the dataset (transformer)
Then PCA principal component analysis is used for feature dimensionality reduction (transformer).
Finally, SVC model (Estimator) is used.
Output result:
Pipeline (memory=None
Steps= [('sc']
StandardScaler (copy=True, with_mean=True, with_std=True))
('pca'
PCA (copy=True, iterated_power='auto', n_components=None
Random_state=None, svd_solver='auto', tol=0.0
Whiten=False))
('svc'
SVC (cache_size=200 1.0, cache_size=200, class_weight=None, coef0=0.0)
Decision_function_shape='ovr', degree=3
Gamma='auto_deprecated', kernel='rbf', max_iter=-1
Probability=False, random_state=None, shrinking=True
Tol=0.001, verbose=False)]
Verbose=False)
The training result is a model, which can be used to predict directly. When forecasting, the data will be transformed from step1 to avoid having to write additional code to implement the data that the model uses to predict. The accuracy of the model on the X training set can also be obtained by pipe.score (XQuery Y).
Array ([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2
2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) Pipeline wonderful use: automated Grid Search
Pipeline can be combined with GridSearch to select parameters:
From sklearn.datasets import fetch_20newsgroups
Import numpy as np
News = fetch_20newsgroups (subset='all')
From sklearn.cross_validation import train_test_split
News.data [: 3000], news.target [: 3000], test_size=0.25,random_state=33)
From sklearn.feature_extraction.text import TfidfVectorizer
Vec = TfidfVectorizer ()
X_count_train = vec.fit_transform (X_train)
X_count_test = vec.transform (X_test)
From sklearn.svm import SVC
From sklearn.pipeline import Pipeline
# use pipeline to simplify the system construction process and connect text extraction with classifier model
Clf = Pipeline ([
('vect',TfidfVectorizer (stop_words='english')), (' svc',SVC ())
])
# Note: after feature processing by pipeline and SVC model training, the trained classifier clf is directly obtained.
Parameters = {
'svc__gamma':np.logspace (- 2, 1, 4)
'svc__C':np.logspace (- 1, 1, 3)
'vect__analyzer': ['word']
}
# n_jobs=-1 represents all the CPU that uses the computer
From sklearn.grid_search import GridSearchCV
Gs = GridSearchCV (clf,parameters,verbose=2,refit=True,cv=3,n_jobs=-1)
Gs.fit (Xerox powder powder and yellowtrain)
Print (gs.best_params_,gs.best_score_)
Print (gs.score (Xerotec yearly test))
Output
{'svc__C': 10.0,' svc__gamma': 0.1, 'vect__analyzer':' word'} 0.7906666666666666
0.8226666666666667 other uses of Pipeline
There are some other uses of Pipeline. Here is a brief introduction to the two most commonly used make_pipeline.
Pipeline.make_pipeline (\ * steps,\ * kwargs)
The make_pipeline function is a simple implementation of the Pipeline class. You only need to pass in the class instance of each step without naming it yourself, and automatically set the lowercase of the class to the name of the step.
Make_pipeline (StandardScaler (), GaussianNB ())
Output
Pipeline (steps= [('standardscaler', StandardScaler (copy=True, with_mean=)
True, with_std=True)), ('gaussiannb', GaussianNB (priors=None))])
P=make_pipeline (StandardScaler (), GaussianNB ())
P.steps
Output
[('standardscaler', StandardScaler (copy=True, with_mean=True, with_std=True))
FeatureUnion
FeatureUnion is also set through (key,value) pairs and parameters are set through set_params. The difference is that each step is calculated separately, and FeatureUnion finally merges the results of their calculations together, returning an array without the method of the last estimator. Some data need to be standardized, or logarithmic, or onehot coding to form multiple features, and then select important features, when FeatureUnion is very useful.
From sklearn.pipeline import FeatureUnion
From sklearn.preprocessing import StandardScaler
From sklearn.preprocessing import FunctionTransformer
From numpy import log1p
Step1= ('Standar',StandardScaler ())
Step2= ('ToLog',FunctionTransformer (log1p))
Steps=FeatureUnion (transformer_list= [step1,step2])
Steps.fit_transform (iris.data)
Data=steps.fit_transform (iris.data) so far, I believe you have a deeper understanding of "what is the pipeline processing mechanism". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 293
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.