L1 and L2 regularization terms and how to use them in machine learning 07/08 Update SLTechnology News&Howtos

L1 and L2 regularization terms and how to use them in machine learning

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "L1, L2 regularization terms and how to use them in machine learning". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "L1, L2 regularization terms and how to use them in machine learning".

Empirical risk and structural risk

In machine learning tasks, the loss function (loss function) is commonly used to measure the difference between the model output value and the real value Y, such as the loss function defined below:

If the data

If it obeys the joint distribution, the expected value of the loss function is, also known as the real risk of the model, recorded as. Our goal is to find the optimal model or concept to minimize the real risk, that is, because the distribution of the data is unknown, we can only replace the real risk by the average loss on the training set of the model trained by historical data. The average loss on the training set is called empirical risk (empirical risk). That is, our goal is to minimize the empirical risk through the data on the training set to obtain the optimal model or concept:

Three kinds of fitting effects (picture from Andrew Ng Machine Learning Open course video)

In general, the smaller the value of the loss function, the better the fitting effect of the model. But in practical application, our goal is not only to make the loss function as small as possible, but in the most extreme cases, the model we train fits the values of all the samples on the training set, such as the third model in the above figure (picture from Andrew Ng Machine Learning Open course video). This phenomenon is over-fitting, that is, the generalization ability of the model becomes weaker. It is unable to produce good results on unseen data samples. Over-drafting the contract also means that the structural complexity of the model is particularly high at this time, which is the malpractice brought by structural risk (structural risk). Therefore, in addition to reducing the empirical risk of the model, it is also necessary to reduce its structural risk. The role of the regularization term described below is to reduce the complexity of the model, that is, to reduce its structural risk. Regularization term

The regularization term (regularization), also known as the penalty term, is often added to the loss function to make up our objective function (object function). The purpose of the regularization term is to restrict the parameters of the model training. The commonly used regularization terms include L1 regularization and L2 regularization, which are often expressed as and. Among them, the parameter or coefficient that represents the training of the model is the calculation operation of finding the norm. Usually, the more complex the model is, the larger the penalty term is, the simpler the model is, and the smaller the penalty term is.

The calculations of L1 and L2 correspond to the following respectively, which indicates the adjusted step size, and the larger the value, the more the optimal solution will be obtained when the model parameter is 0:

L1 regularization represents the sum of the absolute values of each element in the weight vector.

L2 regularization represents the square root of the sum of squares of each element in the weight vector.

In addition, in addition to the L1 and L2 regularization terms, there is also a L0 regularization term, which means to find the number of non-zero parameters.

The role of L1 and L2 regularization

First of all, the functions of L1 and L2 regularization are as follows:

L1 regularization can produce sparse solution, that is, it will make the optimal value of many parameters become 0, and the parameter obtained is a sparse matrix or vector. Can be used for feature selection.

L2 regularization can produce parameters with very small values, that is, the optimal values of many parameters are very small. It can prevent the model from overfitting.

L1 regularization can obtain sparse solution, so it can be used for model feature selection. Taking the linear regression model as an example, the parameter of many features is 0, which means that their contribution to the prediction results is zero, so the features that are not zero can be retained for feature selection.

L2 regularization can prevent the model from overfitting, because when adding the objective function of L2 regularization, the parameters tend to be as small as possible, and finally get a model with small parameters. Compared with the model with large parameters, a small change in the characteristics of the sample will lead to great changes in the output of the model, such as the third model in the previous diagram, which contains terms, but its corresponding parameters are very large. the result is bound to change a lot. If the parameters are very small, the change of parameters will have little impact on the output of the model, so as to enhance the generalization ability of the model.

The above is all the contents of the article "L1, L2 regularization terms and how to use them in machine learning". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.