What is the use of tracing mechanism in converting Pytorch to ONNX 07/01 Update SLTechnology News&Howtos

What is the use of tracing mechanism in converting Pytorch to ONNX

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "what is the use of the tracing mechanism in the conversion from Pytorch to ONNX", the content is simple and clear, and I hope it can help you solve your doubts. Let the editor lead you to study and learn the article "what is the use of the tracing mechanism in Pytorch to ONNX".

(1) Mechanism of tracing

As mentioned above, the way Pytorch converts to ONNX is based on tracing (tracking). Generally speaking, the relevant code of ONNX is watching Pytorch run over and record what is running. But not all Python operations are recorded here. For example, in the following code

C = torch.matmul (a, b)

Print ("Blabla")

E = torch.matmul (c, d)

Of these, only lines 1 and 3 are recorded because only they are related to Pytorch, and the second line is just a normal python statement.

Specifically, only ATen operations are recorded. ATen can be understood as a basic operation library of Pytorch, and all Pytorch functions are constructed based on these components (for example, ATen is addition, subtraction, multiplication and division, and all other operations of Pytorch, such as squared, calculating sigmoid, can be constructed based on addition, subtraction, multiplication and division).

* the problem that ONNX cannot record if statements is also because if is not an operation in Aten.

Although ONNX can record all Pytorch actions (that is, all ATen operations), it will do a pruning on output to cut off useless operations.

For example, in the following procedure, it is obvious that the first sentence is useless.

T1 = torch.matmul (a, b)

T2 = torch.matmul (c, d)

Return t2

After getting all the operations and the input-output relationship between them (represented by DAG), ONNX pushes forward and traverses according to the output of DAG, so that all nodes that can be traversed are retained and other nodes are thrown away directly.

In MMDetection (https://github.com/open-mmlab/mmdetection)), there is the following code in NMS (non-Maximumnon maximum suppression):

If bboxes.numel () = = 0:

Bboxes = multibboxes.newzeros (0,5)

Labels = multibboxes.newzeros ((0,), dtype=torch.long)

If torch.onnx.isinonnxexport ():

Raise RuntimeError ('[ONNX Error] Can not record NMS'

'as it has not been executed this time')

Return bboxes, labels

Dets, keep = batchednms (bboxes, scores, labels, nmscfg)

The logic of the code is simple. If the previous network didn't output any legal bbox (branch judgment on the first line), then obviously the result of nms is a bunch of zeros, so there's no need to run nms and return zeros directly.

If we want to convert this code to ONNX, we mentioned earlier that ONNX can't handle branch logic, so we have to choose one path to go and record the model from that path. Obviously, under normal circumstances we naturally expect more bbox and call nms with these bbox as parameters.

So if we find that the path that the model executes triggers the if branch, we have to make a decision to see if we are turning to ONNX, and if so, we need to report an error directly, because obviously the transferred ONNX is not what we want.

Suppose we do nothing, what kind of model do we turn out in this case? After thinking about it, it is not difficult to find that assuming that the return value of the function is the final output of the network, then we will only get a 2-node DAG, that is, the two operations on lines 2 and 3. It was said earlier that ONNX will prune after getting all the DAG. Here, ONNX gets the return value (bboxes, labels) for backtracking, and finds that the top of the two operations on lines 2 and 3 are stopped directly. All other operations, such as backbone,rpn,fpn, are thrown away.

Therefore, when transforming the MMDet model, the transformation must be done with real data and trained parameters, otherwise there will be basically no effective bbox, so the error in line 6 will be triggered.

(2) using tracing mechanism to optimize.

There is an example of using the tracing mechanism to optimize cleverly in MMSeg.

In slide inference, we need to calculate a count mat matrix, which will be a constant if h, w and the corresponding stride are fixed.

However, in training, these are often the parameters we need to adjust. All MMSeg do not choose to save these constants, but calculate them every time.

Countmat = img.newzeros ((batchsize, 1, himg, wimg))

For hidx in range (hgrids):

For widx in range (wgrids):

Y1 = hidx * hstride

X1 = widx * wstride

Y2 = min (y1 + hcrop, himg)

X2 = min (x1 + wcrop, wimg)

Y1 = max (Y2-hcrop, 0)

X1 = max (x2-wcrop, 0)

Cropimg = img [:,:, y1:y2, x1:x2]

Cropseglogit = self.encodedecode (cropimg, imgmeta)

Preds + = F.pad (cropseglogit

(int (x1), int (preds.shape [3]-x2), int (y1)

Int (preds.shape [2]-Y2)

Countmat [:,:, y1:y2, x1:x2] + = 1

Assert (countmat = = 0). Sum () = 0

If torch.onnx.isinonnxexport ():

# cast countmat to constant while exporting to ONNX

Countmat = torch.fromnumpy (

Countmat.cpu () .detach () .numpy () .to (device=img.device)

However, when deploying, these parameters are often fixed, so there is no need to calculate it. So in the if branch of the penultimate line, we did something that seemed useless.

Countmat = torch.fromnumpy (countmat.cpu (). Detach (). Numpy ()) .to (device=img.device)

That is, we convert the calculated countmat from tensor to numpy, and then back to tensor.

Actually, our goal is to cut off tracing.

As mentioned earlier, ONNX can only record ATen-related operations, but it is clear that the interchange between tensor and numpy is definitely not an ATen operation. So in backtracking, when you visit countmat, ONNX can't find out who calculated it, so countmat will be saved as a constant, and the parts that previously calculated countmat will be thrown away.

The above is all the content of this article "what is the use of tracing mechanism in Pytorch to ONNX". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.