In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly introduces what is the BackPropagation chain rule in python. It is very detailed and has a certain reference value. Friends who are interested must read it!
1. Chain rule
According to previous knowledge, if we need to find the value of the target parameter, we need to give an initial value first, and then update it continuously through gradient descent until the final loss value is minimum. The most important link is the gradient needed when the gradient drops, that is, the derivative of the final loss function to the parameters.
As shown in the figure below, suppose there is a neuron, which is the input layer, with two data, the parameters are W1 and W2, and the offset term is b, then we need to combine these parameters into a function z, and then input it into the sigmoid function to get the output of the neuron. In the process, z is very easy to derive from w, that is, x1 and x2. According to the chain rule, as shown in the lower left corner of the following figure, our overall calculation process is to find the partial derivative of z to w through forward propagation, and then to find the partial derivative of loss function C to z through back propagation.
two。 Forward propagation
Calculate the partial derivative of z to w: the forward propagation is quite simple, and the result of the partial derivative of the parameter is the input data corresponding to the parameter, as shown in the following figure. For the input layer, the input data is the original data 1 and-1, and for other layers, the input data is the output after being converted by sigmoid.
3. Backward propagation
Calculate the partial derivative of C to z:
Suppose that in each neuron, the final output of the sigmoid function is a, then the partial derivative of C to z can be written according to the chain rule, multiplied by the partial derivative of C to A.
The partial derivative of a to z is just a sigmoid function, which can be calculated.
C to a partial derivative, because an input into the next layer of multiple neurons, assuming that there are two, therefore, C to a partial derivative, is equal to the partial derivative of these two neurons and summation. For example, if the first neuron z'= input a * weight W3 derivative, then C calculates the partial derivative of this neuron, that is, C takes the partial derivative of z', multiplies it by z', and the latter term is very simple, which is W3. For zonal', the partial derivative of an is W4.
So the question becomes, what is the result of the partial derivative of C for z, z?
Fake! Such as! The partial derivatives of the loss function C to z 'and z' are known:
In the above process of calculating the partial derivative of z from C, the following formula can be written. Outside the brackets is the partial derivative of a to z, and inside the parentheses is the partial derivative of C to a:
This formula can be thought of as a backpropagating neuron, as shown in the following figure:
In this neuron, the result of the derivative of loss function C to z 'and z' before sigmoid transformation is the input, and the weight W3 ~ W4 is the corresponding weight of the input. The two inputs are multiplied by the parameters, and then multiplied by the derivative of the sigmoid function to z, and finally the partial derivative of C to z is obtained. And the derivative of sigmoid to z, this is constant, and it has been determined, because we can already determine it through forward propagation calculation.
With the first output of back propagation, then we can have the results needed by other neurons in the hidden layer, and so on, for all neurons, we can calculate the partial derivative of the loss function to their z. With this, then we can calculate the gradient of each parameter w by combining the partial derivative of z to w. In order to make a gradient decline.
4. Calculation method arrangement
Suppose we are calculating the output layer, then we have an output after forward propagation, so we already have the loss function C, and the forward propagation also gives us z 'and z', then all the data we need are ready, and we can directly calculate the partial derivatives of C to z 'and z'.
If we are calculating the middle layer, and when calculating the partial derivative of C to z', we also need the results of C to two other z that the next layer propagates back to, then we continue to calculate and continue to look for the next layer of calculation. we need the information of the next layer until after the output layer, we get one, and then we recursively calculate all the items that have yet to be determined before.
5. Summary
Since we need the contents of the output layer as the input of back propagation, after the forward propagation, let's not think about what derivative is needed, just start at the end and get the partial derivative of the loss function C for each z.
So far, we have obtained the partial derivative of z to w of each neuron forward propagation (in fact, the output an after sigmoid transformation), and the partial derivative of C to z of each neuron after back propagation. By multiplying the two, we get the result we need, that is, the gradient of each parameter.
The above is all the content of the article "what is the chain Rule of BackPropagation in python". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.