What is the pipeline processing mechanism? 04/15 Update SLTechnology News&Howtos

What is the pipeline processing mechanism?

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

The main content of this article is to explain "what is the pipeline processing mechanism". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the pipeline processing mechanism?"

Why Pipeline?

Have you ever encountered this situation: in a machine learning project, various data preprocessing operations for the training set, such as feature extraction, standardization, principal component analysis, etc., have to be reused on the test set.

To avoid repetitive operations, the pipeline mechanism in machine learning is used here.

According to the explanation on sklearn's website, pipeline has the following wonderful uses:

1. Convenience and encapsulation: directly call fit and predict methods to train and predict all algorithm models in pipeline.

2. Joint parameter selection: you can select the parameters of all evaluators in the grid search pipeline at once.

3. Security: the training converter and predictor use the same sample, and the pipeline helps to prevent the statistical data from the test data from leaking into the cross-validated training model.

The principle of Pipeline

Pipeline can connect many algorithm models together to form a typical machine learning problem workflow.

The Pipeline processing mechanism is like stuffing all the models into a tube, and then processing the data in turn to get the final classification result.

Note:

Estimator: estimators, all machine learning algorithm models, are called estimators.

The output of the converter can be put into another converter or estimator as input.

An example of a complete Pipeline step:

1. First of all, the data are preprocessed, such as the processing of missing values.

two。 Standardization of data

3. Dimension reduction

4. Feature selection algorithm

5. Classification or prediction or clustering algorithm (estimator, estimator)

In fact, when the fit method of pipeline is called, the features are processed with a former NMUE converter, which is then passed to the final estimator training. Pipeline inherits all the methods of the last estimator.

The usage of Pipeline

Call method:

Sklearn.pipeline.Pipeline (steps, memory=None, verbose=False)

Detailed description of parameters:

Steps: step: use the (key, value) list to build, where key is the name you give this step, and value is an evaluator object.

Memory: memory parameter, default None

Function of Pipeline

The method of Pipline is to execute the corresponding method in each learner, and if the learner does not have this method, it will report an error. Suppose there are n learners in the Pipline:

Transform: a transform method for executing each learner in turn

Fit: fit and transform methods are executed on the first n Mel 1 learner in turn, and the nth learner (the last learner) executes the fit method

Predict: a predict method for executing the nth learner

Score: a score method for executing the nth learner

Set_params: set the parameters of the nth learner

Get_param: get the parameters of the nth learner

The wonderful use of Pipeline: modular Feature Transform

Taking the classification task of Iris dataset as an example

From sklearn.pipeline import Pipeline

From sklearn.preprocessing import StandardScaler

From sklearn.svm import SVC

From sklearn.decomposition import PCA

From sklearn.datasets import load_iris

Iris=load_iris ()

Pipe=Pipeline ([('sc', StandardScaler ()), (' pca',PCA ()), ('svc',SVC ()]))

# ('sc', StandardScaler ()) sc is a custom converter name, and StandardScaler () is a converter that performs standardization tasks

Pipe.fit (iris.data,iris.target)

First use StandardScaler to standardize each column of the dataset (transformer)

Then PCA principal component analysis is used for feature dimensionality reduction (transformer).

Finally, SVC model (Estimator) is used.

Output result:

Pipeline (memory=None

Steps= [('sc']

StandardScaler (copy=True, with_mean=True, with_std=True))

('pca'

PCA (copy=True, iterated_power='auto', n_components=None

Random_state=None, svd_solver='auto', tol=0.0

Whiten=False))

('svc'

SVC (cache_size=200 1.0, cache_size=200, class_weight=None, coef0=0.0)

Decision_function_shape='ovr', degree=3

Gamma='auto_deprecated', kernel='rbf', max_iter=-1

Probability=False, random_state=None, shrinking=True

Tol=0.001, verbose=False)]

Verbose=False)

The training result is a model, which can be used to predict directly. When forecasting, the data will be transformed from step1 to avoid having to write additional code to implement the data that the model uses to predict. The accuracy of the model on the X training set can also be obtained by pipe.score (XQuery Y).

Array ([0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0

0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2

2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2

2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]) Pipeline wonderful use: automated Grid Search

Pipeline can be combined with GridSearch to select parameters:

From sklearn.datasets import fetch_20newsgroups

Import numpy as np

News = fetch_20newsgroups (subset='all')

From sklearn.cross_validation import train_test_split

News.data [: 3000], news.target [: 3000], test_size=0.25,random_state=33)

From sklearn.feature_extraction.text import TfidfVectorizer

Vec = TfidfVectorizer ()

X_count_train = vec.fit_transform (X_train)

X_count_test = vec.transform (X_test)

From sklearn.svm import SVC

From sklearn.pipeline import Pipeline

# use pipeline to simplify the system construction process and connect text extraction with classifier model

Clf = Pipeline ([

('vect',TfidfVectorizer (stop_words='english')), (' svc',SVC ())

])

# Note: after feature processing by pipeline and SVC model training, the trained classifier clf is directly obtained.

Parameters = {

'svc__gamma':np.logspace (- 2, 1, 4)

'svc__C':np.logspace (- 1, 1, 3)

'vect__analyzer': ['word']

}

# n_jobs=-1 represents all the CPU that uses the computer

From sklearn.grid_search import GridSearchCV

Gs = GridSearchCV (clf,parameters,verbose=2,refit=True,cv=3,n_jobs=-1)

Gs.fit (Xerox powder powder and yellowtrain)

Print (gs.best_params_,gs.best_score_)

Print (gs.score (Xerotec yearly test))

Output

{'svc__C': 10.0,' svc__gamma': 0.1, 'vect__analyzer':' word'} 0.7906666666666666

0.8226666666666667 other uses of Pipeline

There are some other uses of Pipeline. Here is a brief introduction to the two most commonly used make_pipeline.

Pipeline.make_pipeline (\ * steps,\ * kwargs)

The make_pipeline function is a simple implementation of the Pipeline class. You only need to pass in the class instance of each step without naming it yourself, and automatically set the lowercase of the class to the name of the step.

Make_pipeline (StandardScaler (), GaussianNB ())

Output

Pipeline (steps= [('standardscaler', StandardScaler (copy=True, with_mean=)

True, with_std=True)), ('gaussiannb', GaussianNB (priors=None))])

P=make_pipeline (StandardScaler (), GaussianNB ())

P.steps

Output

[('standardscaler', StandardScaler (copy=True, with_mean=True, with_std=True))

FeatureUnion

FeatureUnion is also set through (key,value) pairs and parameters are set through set_params. The difference is that each step is calculated separately, and FeatureUnion finally merges the results of their calculations together, returning an array without the method of the last estimator. Some data need to be standardized, or logarithmic, or onehot coding to form multiple features, and then select important features, when FeatureUnion is very useful.

From sklearn.pipeline import FeatureUnion

From sklearn.preprocessing import StandardScaler

From sklearn.preprocessing import FunctionTransformer

From numpy import log1p

Step1= ('Standar',StandardScaler ())

Step2= ('ToLog',FunctionTransformer (log1p))

Steps=FeatureUnion (transformer_list= [step1,step2])

Steps.fit_transform (iris.data)

Data=steps.fit_transform (iris.data) so far, I believe you have a deeper understanding of "what is the pipeline processing mechanism". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.