How to use Schedule and warmup_steps in pytorch 07/09 Update SLTechnology News&Howtos

How to use Schedule and warmup_steps in pytorch

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to use Schedule and warmup_steps in pytorch, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand.

1. Lr_scheduler related lr_scheduler = WarmupLinearSchedule (optimizer, warmup_steps=args.warmup_steps, t_total=num_train_optimization_steps)

Among them, args.warmup_steps can be regarded as the coefficient of patience.

Num_train_optimization_steps is the total number of updates for model parameters

Generally speaking:

Num_train_optimization_steps = int (total_train_examples / args.train_batch_size / args.gradient_accumulation_steps)

Schedule is used to adjust the learning rate. In the case of linear transformation adjustment, step is the current number of iterations in the following code.

Def lr_lambda (self, step): # Linear transformation, which returns a numeric value x, and then returns to the class LambdaLR Finally return old_lr*x if step < self.warmup_steps: # increase the learning rate return float (step) / float (max (1, self.warmup_steps)) # decrease the learning rate return max (0.0, float (self.t_total-step) / float (max (1.0, self.t_total-self.warmup_steps)

In practice, lr_scheduler.step () initializes lr to 0. 0. In the first parameter update, the step=1,lr changes from 0 to the initial value initial_lr;. In the second update, a real number alpha is generated in the step=2 code above. When the new lr=initial_lr * alpha; is updated in the third time, the new lr is generated on the basis of initial_lr, that is, the new lr=initial_lr * alpha.

Among them, warmup_steps can be regarded as the patience coefficient of lr adjustment.

Due to the existence of warmup_steps, lr increases slowly at first, and then lr decreases slowly when it exceeds warmup_steps.

In practice, because the grad calculated by the training data may be opposite to the expected direction at the beginning of the training, a smaller lr is adopted at this time, and the LR increases linearly with the increase of the number of iterations, and the growth rate is 1 total-warmup_steps; when the number of iterations is equal to warmup_steps, the learning rate is the initial set learning rate; when the number of iterations exceeds warmup_steps, the learning rate gradually attenuates, the decay rate is 1 / (total-warmup_steps), and then fine-tuned.

2. Gradient_accumulation_steps correlation

Gradient_accumulation_steps solves the problem of insufficient local video memory by accumulating gradients.

Suppose the original batch_size=6, the total number of samples is 24 gradients, samples, steps2.

Then the number of parameter updates = 24 beat 6 times 4

Now, reduce the batch_size=6/2=3, and the number of parameter updates remains the same = 24 pounds, 3 pounds, 2 seconds, 4.

In the case of gradient reverse transmission, the gradient is updated every gradient_accumulation_steps, and the gradient is calculated by using loss.backward () as usual.

Add: pytorch study notes-optimizer.step () and scheduler.step ()

The difference between optimizer.step () and scheduler.step ()

Optimizer.step () is usually used in each mini-batch, while scheduler.step () is usually used in epoch, but it is not absolute and can be done according to specific requirements. The model is updated only if optimizer.step () is used, while scheduler.step () adjusts the lr.

Usually we do.

Optimizer = optim.SGD (model.parameters (), lr = 0.01, momentum = 0.9) scheduler = lr_scheduler.StepLR (optimizer, step_size = 100, gamma = 0.1) model = net.train (model, loss_function, optimizer, scheduler, num_epochs = 100)

The step_size in scheduler indicates that every time scheduler.step () calls step_size, the corresponding learning rate will be adjusted according to the policy.

So if scheduler.step () is put in mini-batch, then step_size means that the learning rate has changed after so many iterations.

Thank you for reading this article carefully. I hope the article "how to use Schedule and warmup_steps in pytorch" shared by the editor will be helpful to everyone. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.