Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the difference between eval and no_grad in PyTorch

2025-02-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what is the difference between eval and no_grad in PyTorch, it has a certain reference value, interested friends can refer to it, I hope you can learn a lot after reading this article, let the editor take you to understand it.

First of all, there is an essential difference between the two.

Model.eval () is used to tell each layer within the model to work in eval mode. The main purpose of this operation is to deal with special layer, such as dropout and batchnorm, which need to take different operations in training mode. It can be turned on during training and testing.

Torch.no_grad () tells the automatic derivation engine not to perform derivation operations. The significance of this operation is to speed up the calculation and save memory. But because there is no gradient, there is no way to do backward. So it can only be turned on during the test.

So in evaluate, you need to use both at the same time.

Model =... dataset =... loss_fun =... # traininglr=0.001model.train () for XMagi y in dataset: model.zero_grad () p = model (x) l = loss_fun (p, y) l.backward () for p in model.parameters (): p.data-= lr*p.grad # evaluatingsum_loss = 0.0model.eval () with torch.no_grad (): for XMague y in dataset: P = model (x) l = loss_fun (p Y) sum_loss + = lprint ('total loss:', sum_loss)

In addition, no_grad can also be used as a function modifier to simplify the code.

Def train (model, dataset, loss_fun, lr=0.001): model.train () for XMagi y in dataset: model.zero_grad () p = model (x) l = loss_fun (p, y) l.backward () for p in model.parameters (): p.data-= lr*p.grad @ torch.no_grad () def test (model, dataset, loss_fun): sum_loss = 0.0 model.eval () for x Y in dataset: P = model (x) l = loss_fun (p, y) sum_loss + = l return sum_loss# main block:model =... dataset =... loss_fun =. # trainingtrain () # testsum_loss = test () print ('total loss:', sum_loss)

Add: the usage of model.train, model.eval and torch.no_grad in pytorch

1. Model.train ()

Enable BatchNormalization and Dropout

Model.train () turns model into training mode, and the operations of dropout and batch normalization are trained to prevent network overfitting.

2. Model.eval ()

Do not enable BatchNormalization and Dropout

Model.eval (), pytorch will automatically fix BN and DropOut, and use the trained values. Otherwise, once the batch_size of test is too small, it is easy to cause great color distortion of the generated image by the BN layer.

After training the train samples, the generated model model is used to test the samples. Before model (test), you need to add model.eval (), otherwise, if there is input data, it will change the weight even without training. This is the property brought about by the batch normalization layer in model.

The reasons for doing this during training and testing can be understood from the following two paragraphs:

During training, the accounting calculates the mean and var in a batch, but because it is trained by a small batch and a small batch, the mean and var of each batch will be added up in the form of weighting or momentum, that is to say, when calculating the current batch, the current weight is only 0.1, while all the previously trained ones account for 0.9%. The advantage of this is that it will not lead to unstable training because a batch is too weird.

OK, now suppose that the training is completed, then there is a final "mean and var" on the whole training set, and the parameters in the BN layer have also been learned (if you specify to learn), but now you need to test. During the test, you will often test it one by one. At this time, there is no batch, so it is meaningless to do mean and var for a single data, so what to do? In fact, the mean and var used in BN during the test are mean_final and val_final after training. It can also be said that when testing, BN is a transformation. So you should pay attention to this when using pytorch. Before training, there should be model.train () to tell the network that the training mode is now on, and to use "model.eval ()" in eval to tell the network that it is time to enter the test mode. Because the role of BN in these two modes is different.

3. Torch.no_grad ()

The purpose of this statement is not to calculate the gradient during testing, which can effectively reduce the footprint of video memory during testing and avoid video memory overflow (OOM).

This statement is usually added to the code predicted by the network.

4. The difference between model.eval () and "with torch.no_grad () in pytorch

When you validation in PyTorch, you use model.eval () to switch to test mode, where you can

Mainly used to notify the dropout layer and batchnormalization layer to switch between train and val modes

In train mode, the dropout network layer sets the probability of retaining the activation unit according to the set parameter p (retention probability = p); the batchnorm layer continues to calculate and update the data parameters such as mean and var.

In val mode, the dropout layer allows all activation units to pass, while the batchnorm layer stops calculating and updating mean and var and directly uses the mean and varvalues that have been learned during the training phase.

This mode does not affect the gradient computing behavior of each layer, that is, gradient computing and storage are the same as training mode, except that there is no backprobagation.

While with torch.zero_grad () is mainly used to stop the work of the autograd module in order to accelerate and save video memory, the specific behavior is to stop gradient computing, thus saving GPU computing power and video memory, but will not affect the behavior of dropout and batchnorm layers.

Working with scen

If you don't care about video memory size and computing time, just using model.eval () is enough to get the correct validation results, while with torch.zero_grad () is to further accelerate and save gpu space (because you don't have to calculate and store gradient), so you can calculate faster and run a larger batch to test.

Thank you for reading this article carefully. I hope the article "what's the difference between eval and no_grad in PyTorch" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report