What causes the results of Resize operations to differ from framework to framework? -- Standardization attempt from ONNX 04/18 Update SLTechnology News&Howtos

What causes the results of Resize operations to differ from framework to framework? -- Standardization attempt from ONNX

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Author: JD.com AI Research Institute Zhang Jianhao

When alchemists convert models, they often find that the same pictures are entered into the models before and after the transformation, and the model results are slightly different. The reasons include the error of numerical algorithm, the different results produced by different jpeg decoding libraries, and the differences in the implementation of some operators within different frameworks.

When contributing the spec of the Resize operator to ONNX, I found that Resize is an operator that highlights the differences in the implementation of the framework-multiple Resize types, inconsistent hyperparameters, bug left over from history and other problems that are easy to be ignored, resulting in different results of Resize operations in almost every framework, and ONNX is the intermediate format of a neural network model. It should try to preserve the semantics of the operators of the original framework. After reviewing the relevant papers and the source code of various frameworks, I analyzed and summarized many ways to implement Resize operations. Finally, the spec of a relatively perfect and standardized Resize operator is contributed to ONNX, which contains several (basic) orthogonal parameters. The resize/interpolation methods of TensorFlow 1.x, TensorFlow 2.x, PyTorch and OpenCV can all be expressed with this operator 100% lossless. This article briefly introduces the common flow of various resize operations and analyzes what factors cause differences in resize operations in different frameworks.

The resize operation of multi-dimensional tensor (such as two-dimensional image) is composed of several resize operations performed on one-dimensional tensor, so we only discuss the resize operation on one-dimensional tensor. After analyzing the source code of each framework, I find that its flow can be summarized as follows:

one

Let's first discuss that w and fjorw (I) are the coordinates of the I pixel. At first glance, w (I) can be equal to I itself, but it's not that simple. For example, for a tensor with a length of 3, if the coordinate of the I pixel is equal to I itself, then the position of the three pixels in the tensor is like the leftmost position in the following illustration, the length of the horizontal line represents the length of the one-dimensional tensor, and the circle represents the pixel:

The three pixels are not symmetrically distributed on the tensor, but deviate to the left. Intuitively, we don't think this is a particularly good thing. In various frameworks, there are two common ways to solve this problem:

One is to choose w (I) = iTunes 0.5. take an one-dimensional tensor with a length of 3 as an example, with the 0th pixel at 0.5 position, the first pixel at 1.5 position, and the second pixel at 2.5 position, which is called half_pixel, which is the middle method in the image above. In this method

This is very intuitive. The other is to still let w (I) = I, but change the function f to make

Still take the one-dimensional tensor with length 3 as an example, this method is equivalent to cutting off the rightmost part of length 1 in resize, so that the distribution of pixels is symmetrical. This is called align_corner, which is the rightmost method in the image above, and it is the common align_corner=True/False in the parameters of the resize method of various frames. Its name comes from the fact that it allows the first and last pixel (that is, corner) in the tensor to remain unchanged after zooming.

So what happens if we don't use these two methods and must use the "intuitive" asymmetric method? TensorFlow 1.x provides us with such a negative example, its implementation in align_corner=False is wrong because it uses the wrong asymmetric method in the above figure, which will lead to strange zooming results in this blog. Https://hackernoon.com/how-tensorflows-tf-image-resize-stole-60-days-of-my-life-aba5eb093f35

The super-resolution neural network trained by TensorFlow 1.x always has strange problems, and finally he finds that the root cause of the problem is the wrong resize implementation of TensorFlow. He also gives a visual example: if you zoom out the image on the left side of the 16x16 image to 4x4, you should get the image shown on the right side of the following image, but TensorFlow 1.x gives a strange result in the middle of the following image, and the symmetry of the image is completely destroyed, for the reason described above. One of the main reasons why the resize result of TensorFlow 1.x is different from that of other frameworks is its wrong resize implementation, but TensorFlow 2.x has fixed this problem.

two

Next, we will discuss the other two functions g and h, linear, cubic, which are three common ways of resize, which are different in g and h. As mentioned above, the function gets away from

For the nearest pixel, nearest only needs to find the nearest pixel, linear needs to find the nearest two (one on the left and right), and cubic needs to find the nearest four (two on the left and two on the left), and the function h (aQuery r) is to calculate the weighted average of the one / two / four pixels, where the weight is determined by r (as mentioned above, r is

The distance from the left pixel). For each type of nearest/linear/cubic, how to get the weights of each pixel from r has its own standard implementation, nearest resize needless to say, for linear resize, the weights of the two pixels are. For cubic, the weights of the four pixels are

[1] where An is a fixed parameter, but its value varies from frame to frame. The two common choices are-0.5 (implementation of partial version of TensorFlow) and-0.75 (PyTorch). Because there is no uniform standard value for A, it is common that the cubic resize results of different frameworks are different.

Add a digression: the weight calculation of cubic resize is much more complex than that of linear resize, so it certainly takes longer, but produces a better image quality (this paper? Https://arxiv.org/abs/1812.0118 7 found that the use of cubic resize in image preprocessing can improve the accuracy of classification networks.

Another detail that can cause differences in cubic resize results is that cubic resize needs to find

The two most adjacent pixels on the left and right of, but

The left and right sides may not be guaranteed to have two pixels each (assuming that under certain circumstances, there is only one pixel on the left). At this time, there are also two different methods, one is to edge padding the image, that is, two pixels are still found from the left, and the values of both pixels are the values of the first pixel. The other is to think that three pixels are found instead of four, and the weights of the three pixels are normalized.

three

Summary

To sum up, there are a variety of reasons why the results of Resize operations in different frameworks are different, for example, TensorFlow uses its own invented error implementation, parameter An in cubic resize has no fixed value, non-integer

Whether automatic rounding and so on.

The spec of the ONNX Resize operator is written based on the above analysis, and the specific description is in? Https://github.com/onnx/onnx/bl ob/master/docs/Operators.md#Resize

The reference implementation of the Python version is in? Https://github.com/onnx/onnx/bl ob/master/onnx/backend/test/case/node/resize.py

The core attribute coordinate_transformation_mode is a single function obtained by combining w, f and, that is,

The reason for not using independent functions w and f here is not only to look simpler, but also to solve practical problems-- some resize implementations of some frameworks are not used.

The form, but directly let

While this is obviously unreasonable (coordinate_transformation_mode=tf_half_pixel_for_nn describes such an unreasonable implementation), they can only be acknowledged. In comparison, the makers of the previous version of ONNX Resize operator spec did not realize the complexity of Resize operator and completely imitated the implementation of TensorFlow, which is not only inconsistent with the results of other frameworks, but also imitated by TensorFlow's bug.

Now both TensorFlow and PyTorch support exporting this version of the Resize operator, and deployment frameworks such as TensorRT also support importing and running this Resize operator. I feel a great sense of achievement that what I have created can be followed by many well-known frameworks.

Reference: https://ieeexplore.ieee.org/doc ument/1163711

Welcome to click "JD.com Zhaoyun" to learn more wonderful content!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.