How to implement a sampling regression algorithm in python 07/06 Update SLTechnology News&Howtos

How to implement a sampling regression algorithm in python

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shows you how to achieve a sampling regression algorithm in python, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

12 spot check regression algorithm

12.1 Overview of algorithms

Discuss the following algorithms linear algorithms

Linear regression.

RIdge ridge regression

LASSO linear regression

Elastic Net Regression can't find a translation. It's a derived algorithm. Nonlinear algorithm

K nearest neighbor

CART decision tree

SVM support vector machine these three are not classification methods, how also discussed here, we will see through specific examples.

The problem solved is the housing price problem in Boston. All parameters are digital. The above algorithm is used for sampling, and then the mean squared error mean square error is used to estimate the results.

12.1 Linear Learning algorithm 12.1.1 Linear regression

Linear regression assumes that the input variables follow the Gaussian distribution. There is a correlation between each feature and the result, but there is no strong correlation between them.

Linear regression is to fit the points of the sample into a line, which can reflect the law of the data to the maximum extent.

Names=names) array = dataframe.valuesX = array [:, 0:13] Y = array [:, 13] kfold = KFold (n_splits=10, random_state=7) model = LinearRegression () scoring= 'neg_mean_squared_error'results = cross_val_score (model, X, Y, cv=kfold, scoring=scoring) print (results.mean ()) #-34.705255944512.2.2 Ridge regression

What is the return of the Ridge? It is an extension of linear regression. Here we can briefly say a few basic concepts.

The basic routine of linear regression: this kind of problem of linear regression. The basic process is as follows: first, there must be a prediction model, then the loss function should be calculated to get the problem expression, and the third is that there needs to be an algorithm to calculate the coefficient of the loss function to make it optimal (maximum or minimum error possible).

Predictive model, linear.

Calculate the loss function. Two methods: 1) maximum likelihood estimation. This is to assume that the samples belong to a certain distribution, and then get the question: how to copy the parameters to maximize the probability of these samples. It's called maximum likelihood. 2) least squares. It is to calculate the sum of squares between the values obtained by the model and the results of the samples. The problem is how to minimize this expression.

The above two problems lead to the same goal by different paths. -how to prove it? Let's calculate this problem to see which algorithm is used.

The main results are as follows: 1) the basic idea of gradient descent method is to start with a theta vector and then calculate a value. Update theta, the original theta minus the product of the step size and the fastest descent direction. The direction is to derive each theta. After that, the calculation results converge directly.

It may be local convergence. So the initial point is very important.

Normal equation. Understanding is that the matrix is directly inverted and then derived.

Gauss-Newton method. Apply Taylor expansion

The extension of the problem. 3) this method is easy to fit, after all, all the points need to be fitted, because it is obtained by solution. So regularization is introduced.

First of all, this time is called lasso, also called L1 regularization of linear regression.

At this time, two new extreme solution methods are introduced, the coordinate axis descent method (coordinate descent) and the minimum angle regression method (Least Angle Regression, LARS).

If the regular term is squared, it is Ridge regression. At this time, the problem is easy to have too many dimensions. -Why?

The regular term is the absolute value of Ramda multiplied by one. To solve this problem, the problem is that the absolute value is discontinuous, non-differentiable and jumping, so that gradient descent and normal equations can not be used.

This section is Ridge regression, and the next is LASSO regression.

# Ridge Regressionfrom pandas import read_csvfrom sklearn.model_selection import KFoldfrom sklearn.model_selection import cross_val_scorefrom sklearn.linear_model import Ridgefilename = 'housing.csv'names = [' CRIM', 'ZN',' INDUS', 'CHAS',' NOX', 'RM',' AGE', 'DIS',' RAD', 'TAX',' PTRATIO','B', 'LSTAT',' MEDV'] dataframe = read_csv (filename, delim_whitespace=True Names=names) array = dataframe.valuesX = array [:, 0:13] Y = array [:, 13] num_folds = 10kfold = KFold (n_splits=10, random_state=7) model = Ridge () scoring = 'neg_mean_squared_error'results = cross_val_score (model, X, Y, cv=kfold) Scoring=scoring) print (results.mean ()) #-34.078246209312.2.3 LASSO regression # Lasso Regressionfrom pandas import read_csvfrom sklearn.model_selection import KFoldfrom sklearn.model_selection import cross_val_scorefrom sklearn.linear_model import Lassofilename = 'housing.csv'names = [' CRIM', 'ZN',' INDUS', 'CHAS',' NOX', 'RM',' AGE', 'DIS',' RAD', 'TAX',' PTRATIO','B', 'LSTAT' 'MEDV'] dataframe = read_csv (filename, delim_whitespace=False, names=names) array = dataframe.valuesX = array [:, 0:13] Y = array [:, 13] kfold = KFold (n_splits=10, random_state=7) model = Lasso () scoring=' neg_mean_squared_error'results = cross_val_score (model, X, Y, cv=kfold, scoring=scoring) print (results.mean ()) #-34.464084588312.2.4 ElasticNet regression

ElasticNet regression combines ridge regression and LASSO regression, that is to say, L1 regular and L2 regular are added. Let's see how it works.

# ElasticNet Regressionfrom pandas import read_csvfrom sklearn.model_selection import KFoldfrom sklearn.model_selection import cross_val_scorefrom sklearn.linear_model import ElasticNetfilename = 'housing.csv'names = [' CRIM', 'ZN',' INDUS', 'CHAS',' NOX', 'RM',' AGE', 'DIS',' RAD', 'TAX',' PTRATIO','B', 'LSTAT',' MEDV'] dataframe = read_csv (filename, delim_whitespace=True Names=names) array = dataframe.valuesX = array [:, 0:13] Y = array [:, 13] kfold = KFold (n_splits=10, random_state=7) model = ElasticNet () scoring= 'neg_mean_squared_error'results = cross_val_score (model, X, Y, cv=kfold, scoring=scoring) print (results.mean ()) #-31.164573714212.3 nonlinear machine learning model 12.3.1 K nearest neighbor

K-nearest neighbor is an algorithm based on distance. Find the k ones closest to the new record in the training set. An average is used as a prediction.

Because all distances have to be compared every time, the prediction process of this algorithm is relatively slow.

Distance, the default is Minkowski distance. And the Euclidean distance and the Manhattan distance.

# KNN Regressionfrom pandas import read_csvfrom sklearn.model_selection import KFoldfrom sklearn.model_selection import cross_val_scorefrom sklearn.neighbors import KNeighborsRegressorfilename = 'housing.csv'names = [' CRIM', 'ZN',' INDUS', 'CHAS',' NOX', 'RM',' AGE', 'DIS',' RAD', 'TAX',' PTRATIO','B', 'LSTAT',' MEDV'] dataframe = read_csv (filename, delim_whitespace=False, names=names) array = dataframe.valuesX = array [: 0:13] Y = array [:, 13] kfold = KFold (n_splits=10, random_state=7) model = KNeighborsRegressor () scoring= 'neg_mean_squared_error'results = cross_val_score (model, X, Y, cv=kfold, scoring=scoring) print (results.mean ()) #-107.2868389812.3.2 CART# DecisionTree Regressionfrom pandas import read_csvfrom sklearn.model_selection import KFoldfrom sklearn.model_selection import cross_val_scorefrom sklearn.tree import DecisionTreeRegressorfilename =' housing.csv'names = ['CRIM',' ZN', 'INDUS' 'CHAS',' NOX', 'RM',' AGE', 'DIS',' RAD', 'TAX',' PTRATIO','B', 'LSTAT',' MEDV'] dataframe = read_csv (filename, delim_whitespace=False, names=names) array = dataframe.valuesX = array [:, 0:13] Y = array [:, 13] kfold = KFold (n_splits=10, random_state=7) model = DecisionTreeRegressor () scoring = 'neg_mean_squared_error'results = cross_val_score (model, X, Y) Cv=kfold, scoring=scoring) print (results.mean ()) #-34.7474612.3.4 SVM

Note that the SVM here is based on the LIBSVM package.

# SVM Regressionfrom pandas import read_csvfrom sklearn.model_selection import KFoldfrom sklearn.model_selection import cross_val_scorefrom sklearn.svm import SVRfilename = 'housing.csv'names = [' CRIM', 'ZN',' INDUS', 'CHAS',' NOX', 'RM',' AGE', 'DIS',' RAD', 'TAX',' PTRATIO','B', 'LSTAT',' MEDV'] dataframe = read_csv (filename, delim_whitespace=False, names=names) array = dataframe.valuesX = array [: Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.