What about the error of loss back propagation in pytorch 04/17 Update SLTechnology News&Howtos

What about the error of loss back propagation in pytorch

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to do loss back propagation errors in pytorch, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

File "train.py", line 143, in train

Loss.backward ()

File "/ usr/local/lib/python3.6/dist-packages/torch/tensor.py", line 198, in backward

Torch.autograd.backward (self, gradient, retain_graph, create_graph)

File "/ usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 94, in backward

Grad_tensors = _ make_grads (tensors, grad_tensors)

File "/ usr/local/lib/python3.6/dist-packages/torch/autograd/__init__.py", line 35, in _ make_grads

Raise RuntimeError ("grad can be implicitly created only for scalar outputs")

RuntimeError: grad can be implicitly created only for scalar outputs

Problem analysis:

Because we execute loss.backward () without an argument, which is the same as loss.backward (torch.Tensor (1. 0)), the parameter defaults to a scalar.

But because your loss is not a scalar, but a two-dimensional tensor, it will report an error.

Solution: 1. Specify the parameter dimension passed to the backward to loss.backward ():

Loss = criterion (pred, targets) loss.backward () # changed to: loss = criterion (pred, targets) loss.backward (loss.clone (). Detach ()) 2. Modify the output dimension of the loss function

Change the output of the tensor to a scalar, such as loss summation or mean of multiple dimensions. This method may not be suitable for some tasks, you can try to modify it yourself.

Criterion = nn.L1Loss (reduction='none') # remove the parameter and change it to: criterion = nn.L1Loss () here by the way introduce the reduction parameter in the pytorch loss function

In the new version of pytorch, the reduction parameter is used instead of the old version of size_average and reduce parameters.

There are three choices for the reduction parameter:

'elementwise_mean': is the default, indicating that the loss of N samples is averaged and returned (equivalent to reduce=True,size_average=True)

Sum': refers to the summation of the loss of n samples (equivalent to reduce=True,size_average=False)

'none': represents the loss that directly returns n sub-samples (equivalent to reduce=False)

Add: under Pytorch, the reason why loss does not decline due to the wrong setting of back propagation and its solution

Under Pytorch, the reason why loss does not decline due to the wrong setting of back propagation and its solution

Just contact with in-depth learning for a period of time, has been studying computer vision, now also trying to achieve their own idea, from which also encountered some problems, this time specifically write about their own parameters in the back propagation (backward) process did not set up, resulting in loss does not decline.

Alternating description for multiple networks

Briefly describe my network structure, my network is up and down, first train the first network, use groud truth to supervise the results of the loss_steam1, get the trained feature. Then the obtained feature is cascaded to the second path, and the final result is obtained through the network, and then groud truth is used to supervise the loss.

The whole network is based on VGG19 network, built under pytorch, and has GPU environment:

This question really turns itself over for a period of time, only to find that there is a problem. Here is the analysis and answer to this question:

PyTorch gradient transfer

In PyTorch, the data type of incoming network calculation must be Variable type. Variable wraps a Tensor and stores the gradient and the reference to create the Variablefunction. In other words, it records the gradient and network graph of each layer of the network, which can realize the reverse transmission of the gradient.

Then according to the final loss, the gradient of each layer can be calculated step by step recursively, and the weight can be updated.

There are three main steps to achieve gradient reverse transfer:

1. Initialization gradient value: net.zero_grad () clears the network status

2. Inverse solution gradient: loss.backward () back propagation to find the gradient.

3. Update parameters: optimizer.step () update parameters

Solution

When I was writing the code, I still didn't understand my own code. When solving the gradient in the reverse direction, there is no back propagation to the first path, which certainly cannot update the path, so I added another step:

Loss_steam1.backward (retain_graph = True) / / because every time you run backward, if you don't add retain_graph = True, the calculation graph will be free after running.

Loss.backward ()

Is that enough? I also thought so at that time, and found that loss_steam1 still did not fall, and worried for a long time, and found that the gradient, do not update the parameters, how can it be useful!

Optimizer_steam1.step () / / this item must be added with optimizer.step ()

Ha ha! This is done, and the effect is indeed much better than before.

The above is all the contents of the article "what to do about loss back propagation errors in pytorch". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.