Evolution of domestic AI framework! Baidu Paddle Lite release: take the lead in supporting Huawei NPU online compilation 04/19 Update SLTechnology News&Howtos

Evolution of domestic AI framework! Baidu Paddle Lite release: take the lead in supporting Huawei NPU online compilation

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Qianming Border Policy originated from au Fei Temple

Qubit report | official account QbitAI

Domestic AI framework flying oars have just brought new evolution: the official release of Paddle Lite!

With high scalability, high performance and lightweight, it is the first deep learning end-to-side reasoning framework that supports Huawei NPU online compilation.

And this is the case in the general environment, and there are more sustenance for the progress of the basic framework of independent research and development.

Sincerity and strength are also commendable. Support for broader and more heterogeneous AI hardware is one of the core highlights of Paddle Lite.

With the release of this upgrade, the architecture of Paddle Lite has been greatly upgraded, which is more complete in the support of multi-hardware, multi-platform and hardware hybrid scheduling.

It not only covers mobile chips such as ARM CPU, Mali GPU, Adreno GPU, Huawei NPU, but also supports common hardware of edge devices such as FPGA, and has the ability to support mainstream chips in the cloud.

Among them, Paddle Lite has also become the first deep learning reasoning framework compiled online by Huawei NPU. Earlier, Baidu and Huawei announced a strong partnership at the AI developer conference.

It is worth mentioning that the standard Google launched the TensorFlow Lite, the upgrade release of Paddle Lite is directly around the deficiency of the former to strengthen.

Officials say it not only supports a wider range of AI hardware terminals, enhancing the universality of deployment, but also has obvious performance advantages.

The competition of AI framework is becoming more and more fierce, and it has also entered a new stage.

What is Paddle Lite?

Paddle Lite, an evolutionary version of Paddle Mobile, is a reasoning engine for end-to-side high-performance lightweight deployment.

The core purpose is to quickly deploy the trained model in different hardware platform scenarios, according to the input data, perform predictive reasoning to get the calculation results, and support the actual business applications.

In the landing of AI technology, the reasoning stage is related to the practical application, which is directly related to the user experience, which is a very challenging part.

What is more challenging is that the current hardware that carries reasoning is becoming more and more heterogeneous. Cloud, mobile and edge correspond to a variety of hardware, and the underlying chip architecture is very different.

How can we fully support so many hardware architectures and optimize the performance of artificial intelligence applications on these hardware to achieve faster speed?

The solution given by Paddle Lite is:

Through the new architecture, the underlying computing model is modeled with high scalability and flexibility, and the ability of mixed scheduling and execution of a variety of hardware, quantitative methods and Data Layout is strengthened, thus ensuring the supporting ability of macro hardware, and through the ultimate underlying optimization, the leading model application performance is achieved.

Five characteristics of Paddle Lite

Officially, Paddle Lite has five features: high scalability, seamless convergence of training and reasoning, versatility, high performance and lightweight.

1. High scalability.

The new architecture has a stronger ability to describe the hardware abstractly, can easily integrate the new hardware under a set of framework, and has very good expansibility. For example, extension support for FPGA becomes very simple.

In addition, referring to LLVM's Type System and MIR (Machine IR), we can modularize and optimize the hardware and model, expand the optimization strategy more conveniently and efficiently, and provide unlimited possibilities.

At present, Paddle Lite has supported 21 Pass optimization strategies, covering different kinds of optimization, such as hardware computing mode mixed scheduling, INT8 quantization, operator fusion, redundant computing tailoring and so on.

2. Seamless connection of training reasoning.

Different from other independent reasoning engines, Paddle Lite relies on the flying propeller training framework and its corresponding rich and complete operator library, the calculation logic of the underlying operator is strictly consistent with the training, the model is completely compatible and risk-free, and can quickly support more models.

It can be connected with the PaddleSlim model compression tool of Flying Propeller, which directly supports the model trained by INT8 quantization, and can obtain better accuracy than offline quantization.

3. Generality.

The official release of 18 models of benchmark, covering image classification, detection, segmentation and image text recognition and other fields, corresponding to 80 operators Op+85 Kernel, related operators can generally support other models.

Moreover, it is also compatible with the models that support other framework training, and the models of Caffe and TensorFlow training can be inferred and predicted after transformation by the matching X2Paddle tools.

Now, Paddle Lite has been connected with the PaddleSlim model compression tool of Flying Propeller, which directly supports the model trained by INT8 quantization, and can obtain better accuracy than offline quantization.

Support for multi-hardware, including ARM CPU, Mali GPU, Adreno GPU, Huawei NPU, FPGA and so on. AI chips such as Cambrian and bit Continent are being optimized, and other hardware will be supported in the future.

In addition, it also provides Web front-end development interface, supports javascript to call GPU, and can quickly run the deep learning model on the web side.

4. High performance.

On ARM CPU, the performance is excellent. The deep optimization of kernel is carried out for different micro-architectures, which shows the speed advantage on the mainstream mobile model.

In addition, Paddle Lite also supports INT8 quantitative calculation, which can provide high-precision and high-performance prediction ability through the optimization design of the framework layer and efficient quantitative calculation at the bottom, combined with the INT8 quantitative training function in the PaddleSlim model compression tool.

There is also a good performance on Huawei NPU and FPGA.

5. Lightweight.

According to the characteristics of end-side equipment for in-depth customization and optimization, without any third-party dependence.

The whole reasoning process is divided into model loading analysis, optimization analysis of calculation diagram and efficient operation on the equipment. The mobile end can directly deploy the optimized and analyzed graph and perform the prediction.

On the Android platform, the ARMV7 dynamic library only needs 800kdomain ARMV8 dynamic library and only 1.3m, and it can also be tailored more deeply as needed.

At present, the related technologies of Paddle Lite and its predecessors have been widely used in Baidu App, Baidu Maps, Baidu Internet disk, autopilot and other products.

For example, Baidu App recently launched a real-time dynamic multi-target recognition function, which optimizes the original cloud 200-tier visual algorithm model to more than 10 layers with the support of Paddle Lite to identify objects within 100ms and make object location tracking updates within 8ms.

In contrast, human recognition of objects with the naked eye generally requires 170ms to 400ms, tracking object refresh requires about 40ms, which means that its recognition speed has exceeded that of the human eye.

The realization of all this is due to the strong end-to-side reasoning ability of Paddle Lite, which can perfectly undertake the efficient deployment of flying propellers on multi-hardware platforms, and achieve the ultimate performance optimization of the model application.

Detailed explanation of the new architecture

Backed by Baidu, the structure of Paddle Lite has a series of independent research and development technologies.

According to the introduction, Paddle Lite refers to the implementation of several prediction database architectures within Baidu, as well as the integration of advantages and capabilities, and focuses on the complete design of hybrid scheduling for a variety of computing modes (hardware, quantization method, Data Layout). The new architecture is designed as follows:

The top layer is the model layer, which is directly trained by Paddle and transformed into NaiveBuffer special format through model optimization tools, in order to better adapt to the deployment scenario of the mobile side.

The second layer is the program layer, which is the execution program composed of operator sequence.

The third layer is a complete analysis module, including MIR (Machine IR) related modules, which can optimize the calculation diagram of the original model according to the specific hardware list, such as operator fusion, calculation cutting and so on.

Unlike IR (Internal Representation) in the process of propeller training, hardware and execution information are also added to the analysis at this layer.

The bottom layer is the execution layer, that is, a Runtime Program composed of Kernel sequences. The framework scheduling framework of the execution layer is extremely low, involves only the execution of Kernel, and can be deployed separately to support extreme lightweight deployment.

On the whole, it not only focuses on the support of multi-hardware and platforms, but also strengthens the ability of mixed execution of multiple hardware in one model, performance optimization at multiple levels, and lightweight design for end-to-side applications.

The rise of domestic deep learning framework

The evolution of flying oars (PaddlePaddle) is not just a simple product upgrade.

In the general trend and environment, the meaning is becoming different.

On the one hand, there is a general trend.

This year is an important year for AI landing. Domestic AI hardware, AI hardware research and development, including Baidu, Ali, Huawei and other giant companies are actively laying out the design and manufacture of AI chips.

The rapid development of hardware can not make up for the lack of software, foreign technology giants have accelerated their pace to occupy this market gap.

At this year's TensorFlow developer conference, Google has focused on deploying TensorFlow Lite for AI applications on the edge, and it is clear that this framework does not fit well with all kinds of hardware developed by domestic companies.

Foreign technology companies will not spend a lot of energy on domestic chips from many different manufacturers and different architectures. So the flying oar saw the opportunity and achieved initial results.

According to Baidu's just released Q2 results, the number of developer downloads of flying oars increased by 45% in the second quarter of 2019 compared with the previous quarter.

As the most popular domestic machine learning framework at present, Flying Propeller launched Paddle Lite has spent a lot of energy to solve the domestic AI hardware application scope is small and the development is difficult.

On the other hand, the topic of the big situation can not be bypassed.

Compared with the past, the independent research and development and the worry of continuous supply in the development of AI have also been repeatedly talked about.

In addition to patents and hardware, the underlying algorithm framework was also put on the table after Android cut off the supply of Huawei.

At present, although the two deep learning frameworks TensorFlow and PyTorch are open source projects, they are under the control of American companies and may have to "abide by American laws."

Therefore, it is not without the risk of "neck jam".

Before, how to develop such a low-level core technology topic, various experts talked and talked, earnestly appeal, but really into action, it is still even more difficult.

It requires not only the investment of time, talent, resources, etc., but also the right time-at least not until long-term accumulation is not easy to solve.

So the upgrade of Paddle Lite seems to be just the right time. On the one hand, it has accumulated; on the other hand, it is not too late to overtake.

However, it is most straightforward to try it in the end.

Needless to say, let's look at and inspect the goods:

Portal

With regard to the Paddle Lite released this time, the key feature upgrades are summarized as follows:

1. Major upgrades to the architecture, such as Machine IR, Type system, lightweight Operator and Kernel, added general multi-platform and multi-hardware support, multi-precision and data layout hybrid scheduling execution, dynamic optimization, lightweight deployment and other important features. 2. Improved Java API, corresponding to C++ API one by one. 3. A new NaiveBuffer model storage format is added, and the mobile deployment is decoupled from protobuf, making the size of the prediction database smaller. 4. Support the prediction of Caffe and TensorFlow models through X2Paddle. At present, it is officially verified that 6 models are supported. 5. Add in-depth support for Huawei NPU NPU, which has become the first framework to support Huawei NPU online compilation. 6. It supports FPGA and verifies the ResNet50 model. 7. For Mali GPU and Adreno GPU, mixed scheduling of OpenCL and ARM CPU Kernel is supported, and the effect on MobileNetV1, MobileNetV2, ResNet-50 and other models has been verified. 8. For CPU,Paddle Lite of ARM architecture, support and verification of common models such as vgg-16, EfficientNet-b0, ResNet-18 and so on have been added. 9. Add 70 kinds of hardware Kernel.

Https://www.toutiao.com/i6727480978421318158/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.