In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "PyTorch gradient cutting how to avoid training loss nan", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "PyTorch gradient cutting how to avoid training loss nan" this article.
An example of training code usage is as follows: from torch.nn.utils import clip_grad_norm_outputs = model (data) loss= loss_fn (outputs, target) optimizer.zero_grad () loss.backward () # clip the gradclip_grad_norm_ (model.parameters (), max_norm=20, norm_type=2) optimizer.step ()
Among them, max_norm is the maximum norm of the gradient, and it is also the main parameter set when the gradient is cut.
Note: some students on the Internet have warned that the training time will be greatly increased after the use of gradient tailoring. At present, I have not encountered this problem in my testing network training. I will update it later when I encounter it.
Add: the train of thought that nan appears in the process of pytorch training
1. The most common thing is to divide by 0 or log0.
See if the code adds a small number in this operation, but this order of magnitude is much different from the order of magnitude of the operation. It's usually 1e-8.
2. Cut the gradient optim.zero_grad () loss.backward () nn.utils.clip_grad_norm (model.parameters, max_norm, norm_type=2) optim.step () before optim.step ()
Max_norm is usually 1pm, 3pm, 5pm.
3. If the first two items can't solve the problem of nan,
Follow the procedure below to judge.
... loss = model (input) # 1. First, let's see if loss is nan. If loss is nan, it means that the first division of 0 or log0 operation assert torch.isnan (loss). Sum () = 0, print (loss) optim.zero_grad () loss.backward () # 2 may have occurred in the process of forward. If loss is not nan, then the forward process is fine and may be a gradient explosion, so try nn.utils.clip_grad_norm (model.parameters, max_norm, norm_type=2) # 3.1 before step, determine whether the parameter is nan, if not whether it is nanassert torch.isnan (model.mu) after step. Sum () = = 0, print (model.mu) optim.step () # 3.2judge after step Whether the parameter and its gradient is nan, if 3.1is not nan, but 3.2is nan,#, especially if Nan appears in the gradient, consider whether the learning rate is too large, reduce the learning rate or try another optimizer. Assert torch.isnan (model.mu). Sum () = = 0, print (model.mu) assert torch.isnan (model.mu.grad). Sum () = = 0, print (model.mu.grad) above is all the content of the article "how to avoid training loss nan by PyTorch gradient clipping". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.