Example Analysis of Pytorch gradient Descent Optimization 07/06 Update SLTechnology News&Howtos

Example Analysis of Pytorch gradient Descent Optimization

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces the example analysis of Pytorch gradient descent optimization, which has a certain reference value, and interested friends can refer to it. I hope you will gain a lot after reading this article.

Activation function 1.Sigmoid function

The function image and expression are as follows:

Through this function, the input from negative infinity to positive infinity can be compressed to between 0 and 1. At xzero, output 0.5

The implementation through PyTorch is as follows:

2.Tanh function

It is more commonly used in RNN, which is changed from the sigmoid function. The expression and the image are shown in the following figure:

The value of this function is-1 to 1, and the derivative is: 1-Tanh**2.

The implementation via PyTorch is as follows:

3.ReLU function

This function can truncate the value of input less than 0 to 0, and the value greater than 0 remains the same. Therefore, the derivative is 0 where it is less than 0 and 1 where it is greater than 0, so it is very convenient to calculate the derivative.

The implementation via PyTorch is as follows:

Loss function and derivation

Usually, we use mean squared error, that is, mean square error, as the loss function.

1.autograd.grad

Torch.autograd.grad (loss, [w1rew2pr...])

The first one you enter is the loss function, and the second is a list of parameters, and even if there is only one, you need to add brackets.

We can create the loss function directly through the method of mse_loss.

Enter the loss function mse in torch.autograd.grad and the object [w] that you want to derive, and you can derive it directly.

Note: when we create w, we need to add requires_grad=True before we can derive it.

You can also add derivable attributes to it through the method of W. derivation grad _ ().

2.loss.backward ()

This method is called directly on the loss function.

This method does not return gradient information, but saves the gradient information to parameters, which can be viewed directly with w.grad.

3.softmax and its derivation

This function converts the input with a large gap into a probability between 0 and 1, and the sum of all probabilities is 1.

Derivation of the softmax function:

Let the input be an and the output through softmax be p

Note: when iSj, the partial derivative is positive, and when I! = j, the partial derivative is negative.

The implementation through PyTorch is as follows:

Third, chain rule 1. Monolayer perceptron gradient

In fact, the single-layer perceptron has only one node, data * weight, input this node, after the sigmoid function conversion, get the output value. The gradient can be obtained according to the chain rule.

Function conversion and derivation can be easily realized through PyTorch.

two。 Multi-output sensor gradient

There are more output values, so there are more nodes. But the way of derivation is actually the same.

The method of derivation through PyTorch is as follows:

3. Derivation with hidden layer in the middle

A hidden layer is added in the middle, only adjusting the input content of the output node. Originally, the data was output directly to the output node, but now the output of the middle layer is used as input to the output node. The implementation using PyTorch is as follows:

4. Back propagation of multilayer perceptrons

Still through the chain rule, the output sigmoid (x) of each node is the input of the next node, so we get the sigmoid function of each node through forward propagation, and the final output result. After calculating the loss function, we can calculate the gradient of each parameter of each node by backward propagation.

The following DELTA (k) only unifies part of the content into one letter to express it, and the specific derivation will not be detailed.

IV. Examples of optimization

Optimize through the following functions.

Optimization process: initialize the parameter → forward propagation to calculate the predicted value → to get the loss function → back propagation to get gradient → to parameter update → forward propagation → again.

In this case, the optimization process is somewhat different:

Select the optimizer before optimization, and enter the parameters and gradients directly.

① pred = f (x) gives the predicted value according to the function, which is used to calculate the gradient later.

The ② optimizer.zero_grad () gradient returns to zero. Because after backpropagation, the gradient is automatically brought to the parameters (shown above, which can be called to view).

③ pred.backward () calculates the gradient with the predicted value.

④ pred.step () updates the parameters.

The above steps can be cycled.

Thank you for reading this article carefully. I hope the article "sample Analysis of Pytorch gradient decline Optimization" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.