How to analyze the theoretical Analysis of Pytorch to ONNX 07/02 Update SLTechnology News&Howtos

How to analyze the theoretical Analysis of Pytorch to ONNX

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What this article shares with you is the theoretical analysis of how to analyze Pytorch to ONNX. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

(1) the significance of transforming Pytorch into ONNX.

Generally speaking, switching to ONNX is just a means, and you need to transform it later after you get the ONNX model, such as converting to TensorRT to complete deployment, or some people add one more step, first from ONNX to caffe, and then from caffe to tensorRT. The reason is that Caffe is more friendly to tensorRT, which will be discussed later on the definition of friendship here.

Therefore, before the work of transferring to ONNX is carried out, the target backend must be clearly defined. ONNX is just a format, just like json. It's legal as long as you meet certain rules, so it's easy to simply transfer from Pytorch to an ONNX file. But different backend devices accept different onnx, so this is the source of the hole.

The conversion of torch.onnx.export from Pytorch to ONNX,ONNXRuntime requires different ONNX,TensorRT and different ONNX.

Here is an example of the simplest Maxpool:

Maxunpool can be regarded as the inverse operation of Maxpool. Let's first look at an example of Maxpool. Suppose there is a tensor of C*H*W (shape [2,3,3]), where the two-dimensional matrix of each channel is the same, as shown below.

In this case, if we call MaxPool (kernel_size=2, stride=1,pad=0) on it in Pytorch

Then you get two outputs, the first of which is the value after Maxpool:

The other is the Idx of Maxpool, that is, which input corresponds to each output. In this way, the gradient of the output can be directly transmitted to the corresponding input when doing back propagation:

Careful students will find that there is actually another way to write Maxpool's Idx:

That is, the idx of each channel is put together, and not each channel starts at 0 separately. There is nothing wrong with these two ways of writing, after all, as long as they are consistent in back propagation.

But when I support OpenMMEditing, it involves the inverse operation of Maxunpool, that is, Maxpool: input MaxpoolId and Maxpool output, get the input of Maxpool.

The MaxUnpool implementation of Pytorch receives the Idx format in which each channel starts at 0, while Onnxruntime does the opposite. So if you want to run the same result with Onnxruntime, you have to do extra processing on the input Idx (that is, the same input as Pytorch). In other words, the neural network diagram transferred by Pytorch is different from that needed by ONNXRuntime.

(2) ONNX and Caffe

There are two paths to mainstream model deployment. Take TensorRT as an example, one is Pytorch- > ONNX- > TensorRT, and the other is Pytorch- > Caffe- > TensorRT. Personally, I think the latter is more mature at present, which is mainly determined by the nature of ONNX,Caffe and TensorRT.

The above table lists several differences between ONNX and Caffe, the most important of which is the granularity of op. For example, if you convert the Attention layer of Bert, ONNX will turn it into a combination of MatMul,Scale,SoftMax, while Caffe may directly generate a layer called Multi-Head Attention and tell CUDA engineers, "you write me a big kernel." )

So if one day a researcher mentions a new State-of-the-art op, it is likely that it can be converted directly into ONNX (if the op implementation in Pytorch is all concatenated using Aten's library), but for Caffe engineers, you need to write a new kernel.

The advantage of fine-grained op is that it is very flexible, but the downside is that it is slower. In recent years, a lot of work has been done on op fushion (such as combining convolution with the relu behind it). Both XLA and TVM have put a lot of work into op fushion, that is, putting a small op into a big op.

TensorRT is the deployment framework introduced by NVIDIA, and natural performance is the primary consideration, so their layer granularity is very coarse. There are natural advantages to converting Caffe to the past in this case.

In addition, coarse granularity can also solve the problem of branching. The neural network in TensorRT's eyes is a simple DAG: given the input of a fixed shape, perform the same operation, and get the output of a fixed shape.

* * one of the current development directions of TensorRT is to support dynamic shape, but it is still very immature. Tensor I = funcA ()

If (iTunes 0)

J = funcB (I)

Else

J = funcC (I)

FuncD (j); for the above network, assuming that funcA,funcB,funcC and funcD are both fine-grained operators supported by onnx, then ONNX will face a difficulty. The converted DAG will either look like this: funcA- > funcB- > funcD, or funcA- > funcC- > funcD. But either way, there must be a problem.

While Caffe can bypass this problem with coarse granularity tensor I = funcA ()

Coarse_func (tensor I) {

If (iTunes 0) return funcB (I)

Else return funcC (I)

}

FuncD (coarse_func (I)) so the DAG it gets is: funcA- > coarse_func- > funcD

Of course, the price of Caffe is that the miserable HPC engineer has to write a coarse_func kernel by hand. (hope that Deep Learning Compiler can liberate HPC engineers as soon as possible)

(3) the limitation of Pytorch itself

Students who are familiar with the deep learning framework know that the main reason why Pytorch can be born in the case that tensorflow has occupied the mainstream, the main reason for its success is that it is very flexible. To take an inappropriate example, tensorflow is like a cymbal, and Pytorch is a Python.

Tensorflow compiles the entire neural network before running, generates a DAG (directed acyclic graph), and then runs this graph. Pytorch, on the other hand, takes one step at a time, and doesn't know what the next node should be until it runs to this node to calculate the result.

ONNX actually transforms the network model in the upper deep learning framework into a picture, because tensorflow itself has a picture, so you just need to get this picture directly and tinker with it.

But for Pytorch, there is no concept of a graph, so if you want to complete the conversion from Pytorch to ONNX, you need to ask ONNX to take a small notebook next to it, then run Pytorch again, write down what you run to, and abstract the recorded results into a picture. Therefore, there are two natural limitations from Pytorch to ONNX.

1. The result of the conversion is only for specific inputs. If a different input causes a change in the network structure, the ONNX is imperceptible (the most common situation is that if there is an if statement in the network, and this time the input leaves if, ONNX will only generate a graph corresponding to if, throwing away all the information in the else).

two。 Need a lot of calculation, because you need a real knife and gun to run the neural network.

PS: aiming at the above two limitations, my undergraduate thesis proposed a solution, that is, through the lexical analysis in the compiler, syntax analysis directly scan the source code of Pytorch or tensorflow to get the graph structure, so that we can lightweight complete the model to ONNX conversion, but also get branch judgment and other information. * at present, Pytorch officials hope to solve the problem of branch statements by using TorchScript, but as far as I know, it is not very mature. The above is how to analyze the theoretical analysis of Pytorch to ONNX. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.