In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
The purpose of this article is to share with you the content of an example analysis of how detach is cut off from the network in pytorch. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
The most common way to cut off network reverse transmission in pytorch is to use detach.
Detach
This is how this method is introduced in the official documentation.
Detach = _ add_docstr (_ C._TensorBase.detach, r "Returns a new Tensor, detached from the current graph. The result will never require gradient.. note:: Returned Tensor uses the same data tensor as the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks.")
Returns a new Variable detached from the current diagram.
The returned Variable will never need a gradient
If the Variable volatile=True is detach, then the volatile from detach is also True.
Another note is that the returned Variable and the Variable of the detach point to the same tensor
Import torchfrom torch.nn import initt1 = torch.tensor ([1, 2.], requires_grad=True) T2 = torch.tensor ([2, 3.], requires_grad=True) v3 = T1 + t2v3_detached = v3.detach () v3 changes the value of tensor in v3_detached Variable print (v3, v3_detached) # the value of tensor in v3 also changes the value of print (v3. Grad_fn=) tensor ([4.7,7.]) True False'''
This is done in pytorch by copying the tensor before the position needs to be cut off. There are two copied functions in tensor, one is clone (), the other is copy_ (), clone () is equivalent to completely replicating the previous tensor, and its gradient will also be replicated, and in back propagation, the cloned sample and the result are equivalent, which can be simply understood that clone only gives the same tensor a different code name, which is equivalent to'='. So if you want to generate a new separate tensor, use copy_ ().
However, for such operations, there is a special function in pytorch-detach ().
The node created by the user is leaf_node (such as the three nodes of abc in the figure), which does not depend on other variables and cannot operate in_place for leaf_node. The root node is the ultimate goal of the computing graph (figure y). The gradient values of all nodes relative to the root node can be calculated by the chain rule. This can be done by calling root.backward ().
So what detach does is redeclare a variable to point to the location of the original variable, but requires_grad is false. A deeper understanding is that the calculation graph breaks from the variable over detach, and it becomes a leaf_node. Even if its requires_node is later re-set to true, it will not have a gradient.
Pytorch gradient
(after 0.4), tensor and variable merge, and tensor has grad, grad_fn and other attributes.
The default tensor created is False. If the current tensor_grad is None, it will not propagate forward. If there are other branches with grad, only the grad of other branches will be propagated.
# Tensorx for creating requires_grad=False by default = torch.ones (1) # create a tensor with requires_grad=False (default) print (x.requires_grad) # out: False # create another Tensor, also requires_grad= Falsey = torch.ones (1) # another tensor with requires_grad=False # both inputs have requires_grad=False. So does the outputz = x + y # because there are two Tensor x reparations, gradations false. Automatic differentiation can not be realized, so z after operation z=x+y cannot be automatically differentiated. Requires_grad=Falseprint (z.requires_grad) # out: False # then autograd won't track this computation. Let's verify! # therefore cannot autograd, program error # z.backward () # out: program error: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn # now create a tensor with requires_grad=Truew = torch.ones (1, requires_grad=True) print (w.requires_grad) # out: True # add to the previous result that has require_grad=False # because the requires_grad=True of Tensor w is entered in the operation of total Therefore, the operation can carry out back propagation and automatic derivation. Total = w + z # the total sum now requires gradgratotal.resume grad # out: True# autograd can compute the gradients as welltotal.backward () print (w.grad) # out: tensor ([1.]) # and no computation is wasted to compute gradients for x, y and z, which don't require grad# due to ZMagne x Y, so it does not calculate the gradient z.grad = = x.grad = = y.grad = = None# Truenn.Paramterimport torch.nn.functional as F# With square kernels and equal stridefilters = torch.randn weiths = torch.nn.Parameter (torch.randn) inputs = torch.randn out = F.conv2d (inputs, weiths, stride=2,padding=1) print (out.shape) con2d = torch.nn.Conv2d Padding=1) out_2 = con2d (inputs) print (out_2.shape)
Supplement: usage of Pytorch-detach ()
Objective:
The training of neural network may sometimes want to keep some of the network parameters unchanged and only adjust some of the parameters.
Or train part of the branch network so that its gradient does not affect the gradient of the main network. At this point, we need to use the detach () function to cut off the back propagation of some branches.
1 tensor.detach ()
Returns a new tensor that is detached from the current calculation graph. But it still points to the location where the original variable is stored, except that requirse_grad is false. The resulting tensir never needs a calculator gradient and does not have grad.
Even if its requires_grad is later re-set to true, it will not have a gradient grad. In this way, we will continue to use this new tensor for calculation, and later when we do reverse propagation, the tensor to the call detach () will stop and can no longer propagate further.
Note:
Using the tensor returned by detach and the original tensor share one memory, that is, one modification and the other will change accordingly.
For example, a normal example is:
Import torch a = torch.tensor ([1,2,3.], requires_grad=True) print (a) print (a.grad) out = a.sigmoid () out.sum () .backward () print (a.grad)
Output
Tensor ([1, 2, 3.], requires_grad=True)
None
Tensor ([0.1966, 0.1050, 0.0452])
1.1 when you use detach () to detach a tensor but do not change the tensor, it does not affect backward ():
Import torch a = torch.tensor ([1,2,3.], requires_grad=True) print (a.grad) out = a.sigmoid () print (out) # add detach (), the requires_grad of c is Falsec = out.detach () print (c) # there is no change to c at this time So it will not affect backward () out.sum (). Backward () print (a.grad)''return: Nonetensor ([0.7311, 0.8808, 0.9526], grad_fn=) tensor ([0.7311, 0.8808, 0.9526]) tensor ([0.1966, 0.1050, 0.0452])' 'Thank you for reading! This is the end of the article on "example Analysis of detach cut-off Network reverse Transmission in pytorch". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.