What are the pits of autograd in Pytorch 04/15 Update SLTechnology News&Howtos

What are the pits of autograd in Pytorch

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what are the pits of autograd in Pytorch. It is very detailed and has certain reference value. Friends who are interested must finish reading it.

About Variable and Tensor

In the old version of Pytorch, Variable is an encapsulation of Tensor; when Pytorch is greater than v0.4, Varible and Tensor merge, which means that Tensor can run like the old version of Variable, of course, Variable encapsulation can still be used in the new version, but a Tensor will be returned for the Varieble operation.

Import torch as tfrom torch.autograd import Variable a = t.ones print (type (a)) # output: a=Variable (a) print (type (a)) # output is still: print (a.volatile) # output: _ _ main__:1: UserWarning: volatile was removed (Variable.volatile is always False) a.volatile=Trueprint (a.volatile) # output: _ _ main__:1: UserWarning: volatile was removed (Variable.volatile is always False) # the attribute volatile has been removed from the current version pytorch That is, volatile is always false leaf node leaf

For output that is not any function (Function), the node created by the user is called the leaf node, and the grad_fn of the leaf node is None.

Import torch as ta = t.ones, b = t.rand, c = a*bc.is_leaf# output: (tensor ([1,1., 1.], requires_grad=True), True) b # output: (tensor ([0.4254, 0.8763, 0.5901], requires_grad=True), True) c = a*bc.is_leaf# output: False. C is not a leaf node a.grad_fn# output: None. The grad_fn of the leaf node is None.c.grad_fn# output: autograd operation

First of all, Tensor does not need to be derived by default, that is, requires_ grad defaults to False.

Import torch as ta = t.ones (3) a.requires_grad# output: False.Tensor does not need derivative by default

If a node requires_grad is set to True, then all nodes that depend on it requires_grad are True.

Import torch as t a = t.ones (3) b = t.ones b.requires_grad# output: Truec = a + bc.requires_grad# output: True. Although c does not specify that it needs to be derived, but c depends on b, and b needs to be derived, so c.requires_grad=True

Only scalar can perform reverse backward () operations, and backward is cumulative to the grad of leaf nodes. When only the calculation operation is carried out without backward, the grad of the leaf node does not change.

To correct this, it is not only scaler that can perform backward operations, but also matrices and vectors, but parameters of the corresponding dimensions are added to backward ().

Import torch as t a = t.ones (3 recounting gradations True) b = t.rand (3 recordings graded True) a tensor ([1.1,1.], requires_grad=True), # tensor ([0.9373, 0.0556, 0.6426], requires_grad=True)) c = a*bc# output: tensor ([0.9373, 0.0556, 0.6426]) Grad_fn=) c.backward (retain_graph=True) # output: RuntimeError: grad can be implicitly created only for scalar outputs# only numeric scalar can perform backward operation d = c.sum () d.backward (retain_graph=True) # retain_graph=True is to save the intermediate cache Otherwise, an error will be reported when backward again: tensor ([0.9373, 0.0556, 0.6426]) b.grad# output: tensor ([1., 1., 1.]) # the grad of an and b after backward produces the value e = c.sum () e.backward (retain_graph=True) b.grad# output: tensor ([2.2,2. 2.]). The grad of .b accumulates after twice backward. F = c.sum () b.grad# output: tensor ([2.2,2.2.2.]) # only calculate but not backward Gradient does not update Tensor.data and Tensor.detach ()

If the value of tensor needs to participate in the calculation but does not want to participate in the update of the calculation chart, you can use tensor.data in the calculation, so that you can use the value of tensor without updating the gradient.

Import torch as ta = t.ones) b = t.rand a.data.requires_grad# output: False. A.data is independent of the calculation diagram c = a.data * b.datad = c.sum () d.backward () # output: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn# cannot backward () because it is independent of the calculation diagram, requires_grad = False

When the tensor.data is modified, the tensor is also modified synchronously, and the gradient value is no longer accurate when using the tensor to calculate and backward, because the tensor has been modified!

Import torch as ta = t.ones (3pence4parting graded trees True) b = t.rand (3memorials 4recoveringgraded True) c = a*bd = c.sum () a.data.sigmoid _ () # output: tensor ([[0.7311, 0.7311, 0.7311, 0.7311], # [0.7311, 0.7311, 0.7311, 0.7311], # [0.7311, 0.7311, 0.7311,0.7311]) # although sigmoid is performed on a.data But the value of a has been modified. D.backward () b.grad# output: tensor ([[0.7311, 0.7311, 0.7311, 0.7311], # [0.7311, 0.7311, 0.7311, 0.7311], # [0.7311, 0.7311, 0.7311,0.7311]]) # b's grad is not allowed, it should all be 1!

In order to avoid the change of grad caused by the modification of tensor.data, tensor.detach can also be used to ensure that tensor does not participate in the calculation chart, but when the value of tensor is changed, backward will report an error and there will be no previous inaccuracies caused by the change of tensor value.

Import torch as ta = t.ones, b = t.rand, c = a * bd = c.sum () asigmoid.sigmoid.a# output: tensor ([0.7311, 0.7311, 0.7311, 0.7311], # [0.7311, 0.7311, 0.7311, 0.7311], # [0.7311, 0.7311, 0.7311,0.7311] Requires_grad=True) the value of # a has changed d.backward () # error: RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation# because the value of a has been modified So there can be no more backward.

It is recommended to use the tensor.detach way instead of the tensor.data way, because it is safer!

Autograd.grad and hook

Sometimes we may use the grad of the non-leaf node in the calculation, but the grad of the non-leaf node will be emptied automatically after the backward:

Import torch as t a = t.ones, b = t.rand, c = a*bd = c.sum () d.backward () a.grad# output: tensor ([0.3114, 0.3017, 0.8461, 0.6899], # [0.3878, 0.8712, 0.2406, 0.7396], # [0.6369, 0.0907, 0.4984 0.5058]]) c.grad# output: None#c is a non-leaf node Be emptied after calculation

You can use autograd.grad and hook to handle this situation:

# use autograd.grad to obtain intermediate node gradient t.autograd.grad (dMagnec) # output: (tensor ([1.1,1.1,1.1,1.], # [1.1.1.1.1.1.], # [1.1.1.1.1.1.]]),) # obtain intermediate node gradient import torch as t a = t.ones (3prime4 graded True) b = t.rand (3prime4) using hook Requires_grad=True) c = a*bd = c.sum () def print_grad (grad): print (grad) # register to c hookc_hook = c.register_hook (print_grad) d.backward () # output: tensor ([[1.1,1.1.1.1.], # [1.1.1.1.1.1.], # [1.1.1.1.1.1.]) # remove hook c_hook.remove ()

Add: some notes on autograd and backward in Pytorch

1 Tensor

All the calculations in Pytorch can actually go back to Tensor, so it's necessary to take a fresh look at Tensor.

If we need to calculate the derivative of a Tensor, we need to set its .derivative _ grad property to True. For convenience, in this paper, we call this kind of variable we define as leaf nodes, and the intermediate or final variable based on leaf node can be called result node.

The attributes shown in the following figure are usually recorded in another Tensor:

Data: the stored data information

Requires_grad: set to True means that the Tensor needs to be derived

Grad: the gradient value of the Tensor, each time the backward is calculated, the gradient of the previous moment needs to be returned to zero, otherwise the gradient value will be accumulated all the time, which will be discussed later.

Grad_fn: the leaf node is usually None, and only the grad_fn of the result node is valid, indicating which type of gradient function it is.

Is_leaf: used to indicate whether the Tensor is a leaf node.

For example:

X = torch.rand (3, requires_grad=True) y = x * * 2z = x + xprint ('x requires grad: {}, is leaf: {}, grad: {}, grad_fn: {}. '.format (x.requires_grad, x.is_leaf, x.grad, x.grad_fn)) print (' y requires grad: {}, is leaf: {}, grad: {} Grad_fn: {}. '.format (y.requires_grad, y.is_leaf, y.grad, y.grad_fn) print (' z requires grad: {}, is leaf: {}, grad: {}, grad_fn: {}. '.format (z.requires_grad, z.is_leaf, z.grad, z.grad_fn))

Running result:

X requires grad: True, is leaf: True, grad: None, grad_fn: None.

Y requires grad: True, is leaf: False, grad: None, grad_fn:.

Z requires grad: True, is leaf: False, grad: None, grad_fn:.

2 torch.autograd.backward

The code is as follows:

X = torch.tensor (1, requires_grad=True) y = torch.tensor (2. 0, requires_grad=True) z = x**2+yz.backward () print (z, x.grad, y.grad) > tensor (3, grad_fn=) tensor (2.) Tensor (1.

When z is a scalar, the gradient of the leaf node is automatically calculated according to the chain rule when its backward method is called.

But if z is a vector or a matrix, how do you calculate the gradient? In this case, we need to define grad_tensor to calculate the gradient of the matrix.

Before we explain why we use it, let's take a look at how the interface for backward is defined in the source code:

Torch.autograd.backward (tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None)

Tensor: the tensor used to calculate the gradient. In other words, the two methods are equivalent: torch.autograd.backward (z) = = z.backward ()

Grad_tensors: used when calculating non-scalar gradients. In fact, it is also a tensor, its shape generally needs to be consistent with the previous tensor.

Retain_graph: usually after a call to backward, pytorch automatically destroys the calculation graph, so if you want to repeatedly call backward on a variable, you need to set this parameter to True

Create_graph: can be used to calculate higher-order gradients when set to True

Grad_variables: the official term is grad_variables' is deprecated. Use 'grad_tensors' instead. In other words, this parameter should be discarded in later versions, and just use grad_tensors directly.

Pytorch designed a parameter like grad_tensors. Its function is equivalent to "weight".

Let's first look at an example:

X = torch.ones (2) z = x + 2z.backward () >... RuntimeError: grad can be implicitly created only for scalar outputs

The error message above means that it calculates the gradient only for scalar output, and there is nothing it can do about the derivative of one matrix.

X = torch.ones (2) x + 2z.sum () .backward () print (x.grad) > tensor ([1,1.])

And the parameter grad_tensors plays a role in helping to sum up.

In other words, Z and a weight tensor grad_tensors are summed after hadamard product. This is why the grad_tensors needs to be the same size as the incoming tensor.

X = torch.ones (2reversely graded grad_tensors) z = x + 2z.backward (torch.ones_like (z)) # grad_tensors needs to be the same size as the input tensor print (x.grad) > > tensor ([1,1.]) 3 torch.autograd.gradtorch.autograd.grad (outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)

After reading the previous content, it is easy to understand this function. The functions of the parameters are as follows:

Outputs: the result node, that is, the derivative being computed

Inputs: leaf nod

Grad_outputs: similar to grad_tensors in the backward method

Retain_graph: ditto

Create_graph: ditto

Only_inputs: the default is True. If True, only the gradient value of the specified input is returned. If False, the gradients of all leaf nodes are calculated and the calculated gradients are accumulated to their respective .grad attributes.

Allow_unused: the default is False, that is, input must be specified, and an error will be reported if it is not specified.

Notice that this function returns a type of tuple.

The above is all the contents of the article "what are the autograd pits in Pytorch?" Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.