Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to select key features by Python Lasso regression

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article is about how to Python Lasso regression to select the key features, the editor feels very practical, so share with you to learn, I hope you can get something after reading this article, say no more, follow the editor to have a look.

Today, I would like to share that Lasso regression is used to select key features. In exploratory analysis of data, too many features are introduced. In order to model and analyze with these features directly, we need to further screen the original features and retain only important features. Lasso algorithm seeks to minimize the sum of squares of residuals under the condition that the sum of the absolute values of the model coefficients is less than a constant. The effect of variable selection is better than stepwise regression, principal component regression, ridge regression, partial least square and so on, which can better overcome the shortcomings of traditional methods in model selection.

We select some of the GDP metrics as follows:

Lasso regression concept

Lasso regression method is a kind of regularization method, which is compression estimation. It obtains a more refined model by constructing a penalty function. It is used to compress some coefficients and set some coefficients to zero, which retains the advantage of subset contraction, and is a biased estimation for dealing with complex collinear data.

Basic principles of Lasso

Lasso is a shrinkage estimation method based on the idea of reducing the feature set (order reduction). Lasso method can compress the coefficients of features and make some regression coefficients 0, so as to achieve the purpose of feature selection, and can be widely used in model improvement and selection. By selecting the penalty function, the idea and method of Lasso are used to achieve the purpose of feature selection. Model selection is essentially a process of seeking sparse expression of the model, which can be accomplished by optimizing a function problem of "loss" + "punishment".

Lasso parameter estimation definition

Among them

It is a non-negative regular parameter, which controls the complexity of the model.

The greater the penalty for the linear model with more features, the greater the penalty for the linear model with more features, and finally get a model with fewer features.

It's called a penalty. Parameters.

The cross-verification method can be used to determine the value with the minimum cross-verification error. Finally, according to what you get

Value, you can re-fit the model with all the data.

Lasso applicable scenario

When there is multicollinearity in the original features, Laso regression is a good method to deal with collinearity, and it can effectively screen the features with multicollinearity. In machine learning, in the face of massive data, the first thing that comes to mind is dimensionality reduction, and strive to solve the problem with as little data as possible. in this sense, feature selection using Lasso model is also an effective method of dimensionality reduction. In theory, Lasso does not have many restrictions on data types, can receive any type of data, and generally does not need to standardize features.

Advantages and disadvantages of Lasso regression method

The advantage of Lasso regression method is that it can make up for the shortcomings of least square estimation and stepwise regression local optimal estimation, can select features well, and effectively solve the problem of multicollinearity among features. The disadvantage is that when there is a set of highly related features, the Lasso regression method tends to choose one of them and ignore all the other features, which will lead to the instability of the results. Although the Lasso regression method has some disadvantages, it can still play a good role in the appropriate scenarios.

After Lasso regression, the values of each feature are shown in the following table.

Using Lasso regression to eliminate the coefficient with a value of 0.000, it is concluded that the key factors affecting GDP are X1, X2, X3, X4, X5, X6, X7, X8, X9, X11, X12, and use these characteristics for further research.

# Lasso Model Program

Import numpy as np

Import pandas as pd

From sklearn.linear_model import Lasso

Inputfile ='C:\\ Users\\ 27342\\ Desktop\\ data.csv' # input data file

Data = pd.read_csv (inputfile) # read data

Lasso = Lasso (1000) # call the Lasso () function and set the value of λ to 1000

Lasso.fit (data.iloc [:, 0:12], data ['y'])

Print ('correlation coefficient is:', np.round (lasso.coef_,5)) # outputs the result, keeping five decimal places

# # calculate the number of non-zero correlation coefficients

Print ('number of non-zero correlation coefficients is:', np.sum (lasso.coef_! = 0))

Mask = lasso.coef_! = 0 # returns a Boolean array of whether the correlation coefficient is zero

Print ('correlation coefficient is zero:', mask)

Outputfile ='C:\\ Users\\ 27342\\ Desktop\\ new_reg_data.csv' # output data file

New_reg_data = data.iloc [:, mask] # returns data with a non-zero correlation coefficient

New_reg_data.to_csv (outputfile) # storing data

Print ('the dimension of the output data is:', new_reg_data.shape) # View the dimension of the output data

The above is how to select the key features of Python Lasso regression. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report