In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article is to share with you what regularization means in big data. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Preface
By designing network models with different layers and sizes, the initial function assumption space (or the network capacity shown) can be provided for the optimization algorithm. However, with the optimization and updating of network parameters, the actual capacity of the model can be changed accordingly.
Take the polynomial function model as an example:
Y = r 0 + r 1 x + r 2 x 2 + r 3 x 3 … + r n x n + e r r o r yawng0rlxxr2x ^ 2 + r3x ^ 3 … + r _ NX ^ n + error y=r0+r1x+r2x2+r3x3... + rnxn+error
The capacity of the above model can be simply measured by n. In the process of training, if the model parameter ri=0 r_i=0 ri=0, the representation of the functional model will be reduced, then the actual capacity of the network will be reduced accordingly. Therefore, the actual capacity of the network can be constrained by limiting the sparsity of the network.
Regularization
Regularization is by adding additional parameter sparsity penalty term (regularization term) to the loss function to limit the sparsity of the network, so as to restrict the actual capacity of the network, so as to prevent the model from over-fitting.
Therefore, after adding a sparse penalty term to the parameters of the model, our target loss function becomes:
The first term in the formula is the original loss function, and the second term is the sparse constraint function for the network parameters, that is, the regular term.
Let's focus on the regular term. In general, the sparse constraint of the parameter is realized by constraining the L-norm of the parameter θ\ theta θ, that is:
In addition to minimizing the original loss function, the new optimization objective also needs to constrain the sparsity of network parameters. While reducing the loss function, the optimization algorithm will force the network parameter θ\ theta θ to become sparse as much as possible, and the weight relationship between them is through the super parameter. To balance, the larger one. It means that the sparsity of the network is more important; the smaller one? It means that the training error of the network is more important. By choosing the right one. Hyperparameters can obtain better training performance, while ensuring the sparsity of the network, so as to obtain good generalization ability.
The commonly used regularization method is L0 ~ L _ 1 ~ L _ 2 regularization. That is, 0 norm, 1 norm, 2 norm.
L0 regularization
L0 regularization refers to the regularization way of using L0 norm as sparse penalty term Ω (θ)\ Omega (\ theta) Ω (θ), that is,
Where the L0 norm is defined as the number of non-zero elements in θ I\ theta_i θ I. By constraining the size of Ω (θ)\ Omega (\ theta) Ω (θ), most of the connection weights in the network can be forced to be zero. However, because the L0 norm is not derivable and can not be optimized by gradient descent method, it is not widely used in neural networks.
L1 regularization
The regularization method of using L1 norm as sparse penalty term Ω (θ)\ Omega (\ theta) Ω (θ) is called L1 regularization, that is,
Where the L1 norm is defined as the sum of the absolute values of all elements in θ I\ theta_i θ I. L1 regularization is also called Lasso
Regularization, which is continuously derivable, is widely used in neural networks.
L2 regularization
The regularization method of using L2 norm as sparse penalty term Ω (θ)\ Omega (\ theta) Ω (θ) is called L2 regularization, that is,
The L2 norm is defined as the sum of squares of all elements in θ I\ theta_i θ I. L2 regularization is also called Ridge Regularization. Like L1 regularization, L2 regularization is continuously derivable and widely used in neural networks.
Regularization effect
In the following experiment, under the condition of keeping other hyperparameters such as network structure unchanged, L2 regularization term is added to the loss function, and different degrees of regularization effects are obtained by changing the hyperparameter λ\ lambda λ.
The experimental results are as follows:
It can be seen that with the regularization coefficient? With the increase of the number of parameters, the penalty of the network for parameter sparsity becomes larger, which forces the optimization algorithm to search and get a model with smaller network capacity. At 0.00001, the effect of regularization is relatively weak, and the network has the phenomenon of overfitting, but when the network is 0.1, the network has been optimized to the appropriate capacity, and there is no obvious over-fitting and underfitting phenomenon.
It should be noted that during the actual training, we generally try a smaller regularization coefficient to see if there has been a fitting phenomenon in the observation network. And then try to grow up gradually. Parameters to increase the sparsity of network parameters and improve the generalization ability. But it's too big.
Parameters may lead to non-convergence of the network, which needs to be adjusted according to the actual task.
Thank you for reading! This is the end of the article on "what is the meaning of regularization in big data". I hope the above content can be of some help to you, so that you can learn more knowledge. If you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.