How to deploy the semi-precision model in pytorch 04/23 Update SLTechnology News&Howtos

How to deploy the semi-precision model in pytorch

2025-04-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to deploy the semi-precision model of pytorch, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Background

As a computing framework for deep learning, pytorch is getting more and more applications.

In addition to the application in the model training phase, we have recently applied pytorch to deployment.

In order to reduce the amount of computation, the 16-bit floating-point model can be considered in deployment, while the gradient calculation is involved in training, which requires the use of 32-bit floating-point. This inconsistency of accuracy has been tested, and the performance degradation of the model is limited and acceptable.

However, when inference, the amount of computation can be reduced by half, and the degree of concurrency can be nearly doubled under the same computing resources.

Concrete method

In pytorch, the general model definition inherits the half () method of the torch.nn.Moudle,torch.nn.Module base class to convert all parameters to 16-bit floating point, so after the model is loaded, call this method to achieve the purpose of model switching. Then you just need to switch the tensor of input to 16-bit floating point when inference.

In addition, there is a small trick, in the reasoning process, the model output tensor will naturally become a 16-bit floating point, if you need to create a new tensor, it is best to call the existing tensor new_zeros,new_full and other methods rather than torch.zeros and torch.full, the former can automatically inherit the existing tensor types, so there is no need to add code everywhere to determine whether to use 16-bit or 32-bit, just switch for input tensor.

Add: pytorch uses amp.autocast to accelerate training with semi-precision

Preparatory work

Pytorch 1.6 +

How to use autocast?

According to the official method

The answer is autocast + GradScaler.

How to use automatic blending precision in PyTorch?

Answer: autocast + GradScaler.

1.autocast

As mentioned earlier, you need to use the autocast class in the torch.cuda.amp module. It is also very easy to use.

From torch.cuda.amp import autocast as autocast# creates model The default is torch.FloatTensormodel = Net (). Cuda () optimizer = optim.SGD (model.parameters (),...) for input, target in data: optimizer.zero_grad () # forward procedure (model + loss) enable autocast with autocast (): output = model (input) loss = loss_fn (output, target) # backpropagate outside the autocast context loss.backward () optimizer.step () 2.GradScaler

GradScaler is the gradient scaler module, and you need to instantiate a GradScaler object before the start of the training.

Therefore, the classic use of AMP in PyTorch is as follows:

From torch.cuda.amp import autocast as autocast# creates model The default is torch.FloatTensormodel = Net (). Cuda () optimizer = optim.SGD (model.parameters (),...) # instantiate a GradScaler object scaler = GradScaler () for epoch in epochs: for input, target in data: optimizer.zero_grad () # forward procedure (model + loss) turn on autocast with autocast (): output = model (input) loss = loss_fn (output Target) scaler.scale (loss) .backward () scaler.step (optimizer) scaler.update () 3.nn.DataParallel

Single card training, then the above code is enough, personal test on the 2080ti can reduce at least 1 hand 3 of the video memory, as for speed.

If you want to run more than that, you will find that every result in forward is still float32.

Class Model (nn.Module): def _ init__ (self): super (Model, self). _ _ init__ () def forward (self, input_data_c1): with autocast (): # code return

Just run the code in forward in autocast code block mode.

Automatically perform autocast operation

Tensor is automatically converted to a semi-precision floating-point torch.HalfTensor in the following operations:

1 、 matmul

2 、 addbmm

3 、 addmm

4 、 addmv

5 、 addr

6 、 baddbmm

7 、 bmm

8 、 chain_matmul

9 、 conv1d

10 、 conv2d

11 、 conv3d

12 、 conv_transpose1d

13 、 conv_transpose2d

14 、 conv_transpose3d

15 、 linear

16 、 matmul

17 、 mm

18 、 mv

19 、 prelu

So only these operations can be semi-accurate? No. Other operations such as rnn can also be run with semi-precision, but it needs to be done manually, and automatic conversion is not provided for the time being.

These are all the contents of the article "how pytorch deploys the semi-precision model". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.