Introduction and usage of TVM Compiler 04/10 Update SLTechnology News&Howtos

Introduction and usage of TVM Compiler

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "the introduction and usage of TVM compiler". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the introduction and usage of TVM compiler".

1. Whole structure

TVM is an end-to-end instruction generator. It receives the model input from the deep learning framework, then carries on the transformation and basic optimization of the graph, and finally generates instructions to complete the deployment of the hardware. The whole architecture is based on the graph description structure, whether it is instruction optimization or instruction generation, a graph structure clearly describes the direction of data flow, the dependency between operations and so on. The optimizer based on machine learning is the key point in the optimization process, and there is a lot of instruction space, so it is a very reasonable idea to find the optimal value through the optimization function. Its main features are as follows:

1) based on GPU, TPU and other hardware structures, the tensor operation is regarded as a basic operator, and the data calculation flow is abstracted by describing a deep learning network as a graph structure. On the basis of such a graph structure, it is more convenient to optimize memory. At the same time, it can have better up-down compatibility and support a variety of deep learning frameworks and hardware architectures.

2) huge optimized search space. In terms of optimized graph structure, it is no longer limited to a certain way, but through machine learning methods to search possible space to maximize deployment efficiency. Although this method will lead to a large amount of computation of the compiler, it is more general.

TVM provides a very simple end-to-end user interface, which can be easily deployed by calling TVM's API. For example:

The above is to input the model of Keras into TVM, specify the deployed hardware GPU, and then optimize and generate the code.

TVM also provides Java, C++ and Python interfaces for users to call.

two。 Basic optimization of graph structure

Graph structure is a common description method in most deep learning frameworks. This kind of graph is a high-level description, describing a tensor operation as an operator, rather than splitting it in more detail. This is more conducive to optimization, but also in line with the hardware architecture of GPU, TPU, in these chips computing power is very large, usually can complete a larger calculation, such as convolution, matrix operation and so on. The following is an example of convolution calculation. The whole graph includes 2D convolution, ReLu,dense, softmas, and so on. Such a graph structure is just in line with the structure of the FPGA accelerator, and a computing core is also used to calculate a large calculation in FPGA. The nodes in the TVM diagram describe a tensor data or operator, while the edges represent the dependencies of different calculations.

Based on the graph structure, TVM adopts many graph optimization strategies. It includes operator fusion, which combines a plurality of continuous operations that can be completed with one operator on hardware; constant folding, which completes the pre-calculated data in the compiler to reduce hardware calculation; storage planning, pre-allocates storage space for intermediate data to store intermediate values, so as to prevent intermediate data from being unable to be stored on the chip and increases off-chip storage overhead; data planning, rearranging data is beneficial to hardware computing.

1) operator fusion

In TVM, there are four kinds of operations: 1-to-1 operations, such as addition and point multiplication, descending operations, such as accumulation, complex operations, such as 2D convolution, which combines multiplication and accumulation, and opaque operations, such as classification, data arrangement, etc., which cannot be fused. Operator fusion can reduce storage overhead and implement pipeline, especially in FPGA. For example, our current project is to develop a general RNN architecture IP, which involves matrix multiplication and addition. Addition can be integrated into matrix multiplication, which reduces the computational overhead of individual addition modules and the overhead of reading and writing cache.

2) data planning

In the case of our XRNN, there is a matrix operation array on the chip. Because the size of the array is fixed, the size of the matrix for one calculation is also fixed. For example, calculating the matrix vector multiplication of a 32x32 corresponding to 32x1 requires that the weights and vectors must be aligned according to 32 times, which requires planning the weight data and so on.

3. Tensor calculation

The tensor description language used in TVM is transparent and can be modified according to hardware needs. This is more flexible and conducive to optimization. But this may increase the complexity of compiler optimization. An example of TVM description is as follows:

The description operator contains the size of the result and how it is calculated. But this does not involve the loop structure and more details of data manipulation. TVM adopts the idea of Halide and uses schedule to transform the tensor calculation to calculate the schedule structure with the highest execution efficiency. The whole schedule process is shown below:

It can be seen that TVM not only adopts the schedule mode of Halide, but also adds three schedule methods for GPU and TPU: specile memory scope,tensorization,latency hiding. These schedule methods can transform a tensor operation equivalently, produce a variety of code structures, and select the code structure that is most beneficial to hardware execution.

3.1 parallel optimization

Parallel computing is an important step to improve the efficiency of hardware execution, because convolution, matrix computing and so on are a large number of parallel computing, how to optimize the parallel structure is very important to improve the performance of hardware. There are two issues to consider here: one is parallelism, and the other is data sharing. If the data is not shared, it will increase the consumption of data reading and writing. To make the best use of shareable data, it is necessary to design a computing structure.

TVM puts forward the concept of memory scope, which classifies data computing into parallel and non-parallel computing. For parallel computing, multi-thread can be used for parallel computing, while if not parallel, it is necessary to wait for the dependent data computing to be completed. For example, an example of matrix multiplication:

A similar problem can be encountered in XRNN, where different calculations, such as matrix multiplication and activation functions, can be performed in parallel if there is no dependency.

3.2 Storage read and write optimization

Reading and writing cache or external ddr in FPGA is also an expense. How to reduce the storage read and write overhead is also beneficial to improve the hardware execution efficiency. For example, in our XRNN, it takes a lot of time for save data to ddr, and this data will be used again next time, while increasing the load time. If the data is cached on the chip, the load and save overhead is reduced. For example, on-chip cache will also read and write data back and forth. If you can send one calculated data directly to the next computing core to realize pipelining, then the read and write cache overhead will also be saved.

Another way to hide storage read and write overhead is to load off-chip data before the next calculation starts and while this calculation is in progress, which coincides with the calculation and reduces the extra loading time.

4. Automatic optimizer

On the basis of rich schedule methods, TVM proposes a machine learning model to find the optimal schedule structure. It consists of two parts: one is to generate all possible computing structures based on schedule, and the other is machine learning driving model to predict performance.

Schedule space is huge, it may produce many kinds of computational flow structures, and explore them to find the most suitable structure. This will result in a lot of calculations. However, in our XRNN structure, the space can be reduced because of the limitation of the hardware kernel. Use only the schedule mode that is most likely to affect performance. For example, different calculations such as matrix multiplication, addition, point multiplication and activation can be arranged concurrently and non-concurrently, which is relatively small and helps to speed up the compiler generation work.

The machine learning cost model mainly considers the delay of different operations to predict performance: storage access mode, data reuse, pipeline and so on. In TVM, the cost function is not solved by random statistics, but by using real-time configuration data to train and periodically change the code structure. Artificially explore a more reasonable optimization structure, and then provide to the model, let it continue to replace. This approach avoids the time consuming of exploring a large amount of schedule space and is closer to the actual situation.

Summary

The above briefly introduces the overall architecture and basic methods of TVM, which is actually quite consistent with the hardware structure of FPGA acceleration. Many methods can be borrowed. And TVM is a more compatible compiler architecture, and there will be a lot of different designs for our own FPGA features.

Thank you for your reading, the above is the content of "introduction and usage of TVM compiler". After the study of this article, I believe you have a deeper understanding of the introduction and usage of TVM compiler, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.