How PyTorch automatically calculates gradients 07/12 Update SLTechnology News&Howtos

How PyTorch automatically calculates gradients

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces PyTorch how to automatically calculate the gradient, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

1. Concept

Tensor is the core class of the automatic derivation part of this pytorch. If you put its attribute .derivative _ grad=True, it will begin to track all operations on that tensor, thus realizing gradient propagation using the chain rule. When the calculation is complete, you can call .gradient () to complete all the gradient calculations. The gradient of this Tensor accumulates into the .grad property.

If you do not want to continue to trace the tensor, you can call .detach () to separate it from the trace record, and the subsequent gradient will not be passed. In addition, you can use with torch.no_grad () to wrap blocks of operation code that you don't want to be tracked, which is common when evaluating a model because there is no need to continue to calculate the gradient.

Function is another important class. The combination of Tensor and Function can build a directed acyclic graph (DAG) that records the whole computing process. Each Tensor has a .grad _ fn attribute, which is the Function that creates the Tensor, that is, whether the Tensor is obtained by some operation, if so, grad_fn returns an object associated with those operations, otherwise it is None.

two。 The concrete realization is 2.1. Create a tensor that can be derived automatically

First, let's create a tensor and set up the requires_grad=True:

X = torch.ones (2,2, requires_grad=True) print (x) print (x.grad_fn)''

Output:

Tensor ([[1, 1.]

[1, 1.], requires_grad=True)

None

A tensor created directly like x is called a leaf node, and the grad_fn corresponding to a leaf node is None. If you perform an operation:

Y = x + 1print (y) print (y.grad_fn)''tensor ([[2, 2.], [2, 2.]], grad_fn=)''

Y is created by an addition operation, so it has a grad_fn for the operation.

Try to do something more complex:

Z = y * * 2out = z.mean () print (z, out)''tensor ([[4, 4.], [4, 4.]], grad_fn=) tensor (4, grad_fn=)''

The out above is a scalar 4, which usually uses out.backward () to derive scalars directly. There is no need to specify derivative variables, which will be explained in detail later.

You can also change the requires_grad property through .changes _ grad_ ():

A = torch.randn (3,2) # default requires_grad = Falsea = (a * * 2) print (a.requires_grad) # Falsea.requires_grad_ (True) # using the in-place operation, change the attribute print (a.requires_grad) # Trueb = (a * a). Sum () print (b.grad_fn)''FalseTrue'''2.2. Gradient calculation

Torch.autograd implements the chain rule of gradient derivation, which is used to calculate the product of some Jacobian matrices, that is, the product of the first derivative of the function.

Note: grad is accumulated in the process of back propagation. Every time you run back propagation, the gradient accumulates the previous gradient, so it is generally necessary to clear the gradient x.grad.data.zero _ () before backpropagation.

X = torch.ones (2,2, requires_grad=True) y = x + 1z = y * * 2out = z.mean () print (z, out) out.backward () print (x.grad) # Note that grad is cumulative out2 = x.sum () out2.backward () print (out2) print (x.grad) out3 = x.sum () x.grad.data.zero() out3.backward () print (out3) print (x.grad)''tensor ([[4.4,4.], [4. 4.], grad_fn=) tensor (4, grad_fn=) tensor ([[1, 1.], [1, 1.]) tensor (4, grad_fn=) tensor ([[2, 2.], [2, 2.]) tensor (4, grad_fn=) tensor ([[1, 1.], [1, 1.]))''

The automatic derivation of Tensor is very convenient for scalars such as out.backward () above, but when the object of backpropagation is not scalar, it is necessary to add a Tensor of the same shape as out in y.backward (), which does not allow the derivation of tensor to tensor, but only allows the derivative of scalar to tensor, and the result is a tensor of the same shape as the independent variable.

This is to avoid the vector (or even higher dimensional tensor) derivation from the tensor and convert it into scalar to tensor derivative.

X = torch.tensor ([1.0,2.0,3.0,4.0], requires_grad=True) y = 2 * xz = y.view (2,2) print (z)''tensor ([[2.2,4.], [6.,8.], grad_fn=)''

Obviously, tensor z above is not a scalar, so when you call z.backward (), you need to pass in a weight vector of the same shape as z to get a scalar.

C = torch.tensor ([[1.0,0.1], [0.01,0.10], dtype=torch.float) z.backward (c) print (x.grad)''tensor ([[2.2,4.], [6.,8]], grad_fn=) tensor ([2.0000, 0.2000, 0.0200, 0.0020])' '2.3 stop gradient tracking

We can use the detach () or torch.no_grad () statement to stop gradient tracking:

X = torch.tensor (1.0, requires_grad=True) y1 = x * * 2 with torch.no_grad (): Y2 = x * * 3y3 = y1 + y2print (x.requires_grad) print (y1, y1.requires_grad) # Trueprint (y2, y2.requires_grad) # Falseprint (y3, y3.requires_grad) # True'''Truetensor (1, grad_fn=) Truetensor (1.) Falsetensor (2, grad_fn=) True'''

We try to calculate the gradient:

Y3.backward () print (x.grad) # y2.backward () # this sentence will report an error, because at this time y2.reverse propagation cannot be called''tensor (2.)''

The result here is 2 because we don't get the gradient of Y2, we just do a back propagation of Y1 as the final gradient output.

2.4. Modify the value of tensor

If we want to change the value of tensor but do not want to keep it in the record of autograd, require s_grad = False, that is, it does not affect the ongoing backpropagation, then we can do so with tensor.data. However, this operation should be noted that some problems may arise, such as scalar 0.

X = torch.ones (1) print (x.data) # is still a tensorprint (x.data.requires_grad) # but it is independent of the calculation chart y = 2 * xx.data * = 100 # it only changes the value and will not be recorded in the calculation chart, so it will not affect the gradient propagation y.backward () print (x) # changing the value of data will also affect the value of tensor (x.grad)

Pytorch0.4 retains .detach () later, but official documentation recommends .detach (), because when using x.detach, any in-place change will cause backward to report an error, so .detach () is a safer way to exclude subgraphs from gradient calculations.

Such as the following example:

Torch.tensor ([1Jing 2Jing 3.], requires_grad = True) out = a.sigmoid () c = out.detach () c.zero() # in-place is 0, tensor ([0.,0.,0.]) print (out) # modified by c.zeroo!! Tensor ([0.,0.,0.]) out.sum (). Backward () # Requires the original value of out, but that was overwritten by c. Zero _ ()''RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation'''a = torch.tensor ([1pc2, requires_grad = True) out = a.sigmoid () c = out.datac.zero_ () # tensor ([0.,0. 0.]) print (out) # out was modified by c. Zero _ () tensor ([0.,0.,0.]) out.sum () .backward () a.grad # this will not report an error But a has been changed, and the final calculated gradient is actually the wrong''tensor ([0, 0, 0.])'.

Add: how pytorch calculates derivatives _ Pytorch automatically calculates gradient (autograd)

Deep learning is actually an optimization problem, finding the minimum loss value, because there are too many independent variables, it is very difficult to find the minimum value. So there are a lot of optimization methods, and gradient descent is a very typical example. In this paper, the automatic gradient calculation in pytorch library of python is explained in detail.

Tensor

The tensor in pytorch can be used to store vectors or scalars.

Torch.tensor (1) # scalar torch.tensor ([1]) # 1 vector

Tensor can also specify the data type and the location of the data storage (can be stored in video memory, hardware acceleration)

Torch.tensor ([1mai 2], dtype=torch.float64) gradient

In mathematics, gradient is defined as follows:

It can be seen that the independent variable is multiplied by the corresponding unit vector relative to each partial derivative of the dependent variable, and the final gradient vector is added.

In pytorch, we cannot directly define the function, nor can we directly obtain the expression of the gradient vector. More often, we just get the partial derivative of the function relative to the independent variable at a certain point.

Let's assume a unary function: y = x ^ 2 + 3x + 1. In pytorch, we assume x = 2, then

> x = torch.tensor (2, dtype=torch.float64, requires_grad=True) > y = x * x + 3 * x + 1 > y.backward () > x.gradtensor (7, dtype=torch.float64)

As you can see, the final derivative of y relative to x is 7 at x2. To verify it in mathematics, then it is

Y'= 2 * 2 + 3, when x = 2, y'= 2 * 2 + 3 = 7, which completely accords with the gradient value obtained automatically by torch.

Next, what happens when calculating a binary function:

> x1 = torch.tensor (1.0) > > x2 = torch.tensor (2.0, requires_grad=True) > y = 3*x1*x1 + 9 * x2 > y.backward () tensor (6.) > x2.gradtensor (9.)

As you can see, we can find the partial derivative of y relative to x2.

What is discussed above is the case of scalars, and then we will discuss the case where the independent variables are vectors.

Mat1 = torch.tensor ([[1jin2jue 3]], dtype=torch.float64, requires_grad=True) > mat2tensor ([[1.], [2.], [3.]], dtype=torch.float64, requires_grad=True)

Mat1 is a 1x3 matrix, mat2 is a 3x1 matrix, and their cross multiplication is a 1x1 matrix. In pytorch, you can directly backward it to get the gradient value relative to mat1 or mat2.

> y = torch.mm (mat1, mat2) > y.backward () > mat1.gradtensor ([1,2.,3.]], dtype=torch.float64) > mat2.gradtensor ([[1.], [2.], [3.]], dtype=torch.float64))

In fact, every element in mat1 can be regarded as an independent variable, then the gradient vector relative to mat1 is to derive the partial derivatives of three x respectively.

Equivalent to y = mat1 [0] * mat2 [0] + mat1 [1] * mat2 [1] + mat1 [2] * mat2 [2]

Then calculate the partial derivative of y for each element of mat1,mat2 respectively.

In addition, if we finally output a vector of a N x M, and we want to calculate the partial derivative of this vector relative to the independent variable vector, then we need to pass the parameter in the parameter of the backward function.

In fact, the autograd core of pytorch is to calculate a vector-jacobian product. Jacobian is a matrix composed of the partial derivative of a dependent variable vector relative to an independent variable vector. Vector is equivalent to the partial derivative of a function from a dependent variable vector to a scalar. Finally, there is the gradient vector of the scalar relative to a vector.

Thank you for reading this article carefully. I hope the article "how to automatically calculate gradients in PyTorch" shared by the editor will be helpful to you. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.