In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
In this issue, the editor will bring you about what the network design based on FPGA hardware is like. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.
Introduction
I haven't read the article about the implementation of neural network based on FPGA for a long time, because if the neural network accelerates the design for a long time, you will find that the architecture is similar. Everyone is mainly focused on improving the following performance: FPGA computing power, network accuracy, network model size. FPGA architecture is similar to these modules: on-chip cache, convolution acceleration module, pool module, load,save, instruction control module. The hardware architecture is not too difficult, but the difficulty is the software compilation. Because it has to adapt to different network models, but also to be compatible with changes in FPGA hardware, while providing customers with an easy-to-operate interface. These are still quite difficult in the current situation. First of all, there are too many changes in FPGA hardware, changes in the parameters of each module (such as the number of parallel convolution modules), and the other is a variety of network models and many open source network model platforms (tensorflow,pytorch, etc.). There are also many algorithms for network compression, which basically lead to the reduction of the accuracy of the network model. Generally speaking, FPGA-based network acceleration design will emphasize how much the model is compressed and how fast it can run on FPGA, but rarely focus on improving accuracy.
This paper conceptually puts forward the collaborative design of hardware and network, which is a good idea. Because the previous neural network acceleration hardware design and network compression are separate, but in the network compression as far as possible to take into account the characteristics of the hardware, so that the network model is more suitable for the hardware architecture. This paper is actually doing this kind of work, and I don't think it really achieves the coordination of hardware and network design (though it boasts it). But it does provide us with a new research idea: how to design a network that can be applied to hardware from the very beginning. OK, don't talk too much nonsense, let's take a look at the paper.
1. Criticism from the author
To publish papers, it is always necessary to summarize the advantages and disadvantages of previous papers, and then point out their shortcomings to highlight their own advantages. This article also spends a lot of space to criticize the deficiency of past research. To sum up, there are the following points:
1) past studies have used some old networks, such as VGG,resnet,alexnet, which are out of date and are not much used in the market.
2) the datasets used in the past are also small, such as CIFAR10, which contain too few kinds and number of pictures, which are not suitable for commercial applications.
3) the technical means of compressing the old network is no longer suitable for the latest network, such as squeezeNet network, it is 50 times smaller than alexnet network, but can achieve the same precision as alexnet network.
4) previous resnet-like networks with skip connections are not suitable for deployment on FPGA because of the increase in data migration
5) in the past, the convolution core of the network is relatively large, such as 3x3, 5x5, etc., which is not suitable for hardware acceleration.
6) in the past, network compression focused on the old networks, which themselves have a lot of redundancy, so it is easy to compress, while the latest networks, such as ShuffleNet, are not so easy to compress, but such reports are rare.
In short, it means that the previous articles are picking up soft persimmons, and they are relatively backward. So let's see what happens in such an arrogant tone.
2. ShuffleNetV2 to DiracDeltNet
ShuffleNetV2 is a newly developed neural network with smaller parameters in its network model (60 times smaller than VGG16), but its accuracy is only 2% lower than VGG16. ShuffleNet no longer sums the data connected by skip like resnet, but concat the data connected by skip, which reduces the addition operation. Skip connection can expand the depth of the network and improve the accuracy of the deep network. However, addition skip is not conducive to FPGA implementation, one is that addition consumes resources and time, and the other is that skip data increases the migration time. Concat connection also has the same function as addition skip, increasing network depth and precision.
The author makes fine-tuning of the shuffleNetV2 network structure which is more beneficial to the deployment of FPGA. There are three aspects:
1) replace all 3x3 convolution (including 3x3depth-wise convolution) with shift and 1x1 convolution. This replacement can reduce the migration of feature map data, for example, the convolution of 3x3 requires 3 times of each image data, while 1x1 only needs to be moved once, which reduces the logic complexity and improves the operation speed. The Shift operation is to move a range of pixel to the middle as a result, which reduces the number of multiplication operations. This replacement results in reduced precision, but can reduce the number of FPGA operations.
2) reduce the maxpooling operation of 3x3 to 2x2.
3) adjust the order of channel to adapt to FPGA.
3. Quantification
In order to further reduce the number of network parameters, the author adopts the quantization method of DoReFa-Net network to quantify the full precision weight. At the same time, the author quantifies activation. The quantitative results are as follows:
The loss of precision is very small.
Many fine-tuning techniques for network modification have been used in the literature, and there are many details. It can be seen that for such a network with few parameters, it does take a lot of work to compress it further. This may not be very universal. These fine-tuning should take a lot of time and effort.
4. Hardware architecture
The main operations implemented by the hardware are very few, only the following:
1) 1x1 convolution
2) ma-pooling of 2x2
3) shift
4) shuffle and concat
So the hardware architecture has also become very concise, the author said in the article that the two people have only been using HLS for a month.
Very few resources are used.
Take a look at the following and compare the results with others:
This is what the network design based on FPGA hardware shared by Xiaobian is like. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.