In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Https://www.toutiao.com/a6672168138810851853/
With the development of big data and the improvement of computing power, the AI algorithm has become more mature. Whoever seizes the highland of AI chips will have the market dominance. Throughout the AI chip market, it can be said that the flames of war are competing with each other.
I. the overall size of AI chips
According to the statistical data of the "Market Prospect and Investment Strategic Planning Analysis report of China's artificial Intelligence Industry" released by the prospective Industrial Research Institute, the size of China's artificial intelligence market has exceeded 10 billion yuan in 2015. By 2016, the artificial intelligence market has reached 14.2 billion yuan. By 2017, the artificial intelligence market has reached 21.7 billion yuan, and it is expected that the artificial intelligence market will reach 33.9 billion yuan in 2018. It is predicted that the artificial intelligence market will reach 50 billion yuan and 71 billion yuan in 2019 and 2020. The compound annual growth rate from 2015 to 2020 is 44.5%.
McKinsey's forecasts show astonishing data: artificial intelligence semiconductors will be the market leader between 2017 and 2025, with an annual compound growth rate five times higher than all other semiconductors combined. A survey by Tractica will further interpret this growth: comparing the central processing unit (CPU) with graphics processing units, field programmable gate arrays, and specific-purpose integrated circuits. CPU-based turnover will start at about $3 billion by 2019 and grow to about $12 billion by 2025. Revenue from graphics processor-based systems (GPU) will grow from nearly $6 billion in 2019 to about $20 billion by 2025. The contribution of field programmable gate array (FPGA) is very small, probably only about $1 billion by 2025. But the market share of specific purpose integrated circuits (ASIC) will grow from about $2 billion in 2019 to about $30 billion in 2025. By around 2022, artificial intelligence based on specific-purpose integrated circuits will exceed artificial intelligence based on graphics processors in share.
II. Market development environment
A new round of favorable policies of artificial intelligence in China is landing intensively. A reporter from the Economic Information Daily learned that at the beginning of 2019, many provinces and cities, including Chengdu and Zhejiang, issued artificial intelligence industry development plans one after another, focusing on tackling key technical problems and financial support at the basic layers such as smart chips and smart sensors, and speeding up the cultivation of artificial intelligence industry agglomeration areas and leading enterprises. It is expected that the AI chip with the basic layer as the core has a broad space for investment.
III. Market demand and Enterprise Competition of AI chips
At present, the market demand for AI chips is mainly in three categories:
1. Training requirements for major artificial intelligence enterprises and laboratories in the R & D phase (mainly in the cloud, device-side Training requirements are not clear)
2. Mainstream artificial intelligence applications such as InferenceOnCloud,Face++, going out and asking, Siri all provide services through the cloud.
3. InferenceOnDevice, for the device-side reasoning market of smart phones, smart cameras, robots / drones, autopilot, VR and other devices, requires highly customized, low-power AI chip products. For example, Huawei Kirin 970 is equipped with "Neural Network processing Unit (NPU, actually Cambrian IP)" and Apple A11 is equipped with "Neural Network engine (NeuralEngine)".
(1) Training training
Before 2007, artificial intelligence research was limited by algorithms, data and other factors, and there was no particularly strong demand for chips. General-purpose CPU chips could provide sufficient computing power. The GoogleBrain project built by AndrewNg and JeffDean uses a parallel computing platform with 16000 CPU cores to train deep neural networks of more than 1 billion neurons. However, the serial structure of CPU is not suitable for the massive data operation requirements needed for deep learning. The efficiency of using CPU for deep learning training is very low. In the early model of speech recognition using deep learning algorithm, there are 429 neurons in the input layer, the whole network has 156m parameters, and the training time is more than 75 days.
Compared with a small number of logical computing units in CPU, GPU is a huge computing matrix. GPU has thousands of computing cores and can achieve 10-100 times application throughput. It also supports parallel computing capabilities that are essential for deep learning, which can be faster than traditional processors and greatly speeds up the training process.
From the comparison of the above figure, in the internal structure, 70% of the transistors in CPU are used to build Cache (cache memory) and part of the control unit, there are not many parts responsible for logical operation (ALU module), and instruction execution is a serial process one after another. GPU is composed of parallel computing unit, control unit and memory unit. It has a large number of cores (up to thousands of cores) and a large amount of high-speed memory. It is good at parallel computing similar to image processing, and implements computing in the distributed form of matrix. Different from CPU, GPU has a significant increase in computing units, which is especially suitable for large-scale parallel computing.
NVIDIA is now the dominant market for general computing GPU in artificial intelligence. NVIDIA began to lay out artificial intelligence products in 2010 and released a new generation of PASCALGPU chip architecture in 2014, which is NVIDIA's fifth-generation GPU architecture and the first GPU designed for deep learning, which supports all mainstream deep learning computing frameworks. In the first half of 2016, NVIDIA launched the TESLAP100 chip based on PASCAL architecture and the corresponding supercomputer DGX-1 for the neural network training process. DGX-1 includes TESLAP100GPU accelerator, uses NVLINK interconnection technology, and the software stack contains the main deep learning framework, deep learning SDK, DIGITSGPU training system, driver and CUDA. It can quickly design deep neural network (DNN), has half-precision floating-point computing power up to 170TFLOPS, equivalent to 250traditional servers, and can speed up deep learning training by 75 times and improve CPU performance by 56 times.
At present, Google can compete with NVIDIA in Training market. In May this year, Google released TPU2.0,TPU (TensorProcessingUnit), an ASIC chip developed by Google for deep learning acceleration. The first generation of TPU can only be used for reasoning, while the currently released TPU2.0 can be used for both neural network training and reasoning. According to reports, the TPU2.0 includes four chips that can handle 180 trillion floating-point operations per second. Google also found a way to use a new computer network to combine 64 TPU together to upgrade to a so-called TPUPods that provides about 11500 trillion floating-point computing power. Google said that if the company's new deep learning translation model is trained on 32 of the best GPU, it will take a whole day, while 1/8 TPUPod can accomplish the same task in six hours. At present, Google does not sell TPU chips directly, but combines its open source deep learning framework TensorFlow to provide AI developers with TPU cloud-accelerated services, so as to develop TPU2 applications and ecology, such as TPU2's simultaneous release of TensorFlowResearchCloud (TFRC).
In addition to the above two, traditional CPU/GPU manufacturers Intel and AMD are also trying to enter this Training market, such as Intel's XeonPhi+Nervana program, AMD's next-generation VEGA architecture GPU chip, etc., but from the current market progress, it is difficult to pose a threat to NVIDIA. Among startups, Graphcore's IPU processor (IntelligenceProcessingUnit) is also introduced to support both Training and Inference. The IPU uses a homogeneous multi-core architecture with more than 1000 independent processors; supports All-to-All inter-core communication and adopts BulkSynchronousParallel's synchronous computing model; uses a large number of on-chip Memory without direct connection to DRAM.
In short, for cloud-based Training (including Inference) systems, the industry has a more consistent view that the core of competition is not at the level of a single chip, but the construction of the whole software and hardware ecology. The CUDA+GPU of NVIDIA, the TensorFlow+TPU2.0 of Google, the competition among giants has just begun.
(2) InferenceOnCloud cloud reasoning
Compared with the dominant NVIDIA in the Training market, the competition in the Inference market is more scattered. If, as the industry says, deep learning accounts for 5% of the market (Training accounts for 5% and 95% of the market), the Inference market is bound to be more competitive.
In the cloud reasoning part, although GPU still has applications, it is not the best choice, and more heterogeneous computing solutions (CPU/GPU+FPGA/ASIC) are used to complete cloud reasoning tasks. In the field of FPGA, Xilinx and Altera (acquired by Intel) among the big four vendors (Xilinx/Altera/Lattice/Microsemi) have obvious advantages in the field of cloud acceleration. Altera was acquired by Intel in December 2015, and then launched Xeon+FPGA 's cloud solution with cooperation with Azure, Tencent Cloud and Aliyun. Xilinx has more in-depth cooperation with IBM, Baidu Cloud, AWS and Tencent Cloud. In addition, Xilinx has also strategically invested in domestic AI chip start-up company Deep Learning Technology. At present, there is still a big gap between other FPGA vendors in the field of cloud acceleration and Xilinx and Altera.
In the field of ASIC, the commercial AI chips used in cloud reasoning are mainly Google TPU1.0/2.0 at present. Among them, TPU1.0 is only used for DatacenterInference applications. Its core is a matrix multiplication unit composed of 65536 8-bitMAC, and the peak value can reach 92TeraOps/second (TOPS). There is a large on-chip memory, a total of 28MiB. It can support common neural networks such as MLP,CNN and LSTM, as well as the TensorFLow framework. Its average performance (TOPS) can reach 15 to 30 times that of CPU and GPU, and its energy consumption efficiency (TOPS/W) can reach 30 to 80 times. If you use GPU's DDR5memory, these two values can be about 70 times that of GPU and 200 times that of CPU. TPU2.0 is used for both training and reasoning, as described in the previous section.
Cambrian Technology, a domestic AI chip company, is also reported to be independently developing cloud high-performance AI chips, currently cooperating with iFLYTEK and dawning, but there is no detailed product introduction yet.
(3) InferenceOnDevice device-side reasoning
The application scenarios of device-side reasoning are more diversified, and devices such as smartphones, ADAS, smart cameras, voice interaction, VR/AR and other devices require more customized, low-power and low-cost embedded solutions, which gives startups more opportunities and the market competition ecology will be more diversified.
1) Smart phone
The Kirin 970AI chip released by Huawei in early September is equipped with a neural network processor NPU (Cambrian IP). The Kirin 970 uses the TSMC10nm process and has 5.5 billion transistors, which consumes 20 per cent less power than the previous generation of chips. In terms of CPU architecture, the 4-core A73-core 4-core A53 is composed of 8 cores, and the energy consumption is 20% higher than that of the previous generation chip. GPU uses 12-core MaliG72MP12GPU, which increases 20% and 50% in graphics processing and energy efficiency, respectively. NPU uses the HiAI mobile computing architecture, which provides computing performance as high as 1.92TFLOPs under FP16. Compared with four Cortex-A73 cores, it has about 50 times energy efficiency and 25 times performance advantages for the same AI task.
Apple's newly released A11 bionic chip is also equipped with neural network units. According to reports, the A11 bionic chip has 4.3 billion transistors, which are manufactured by TSMC10 nano-FinFET process. CPU adopts a six-core design, which consists of two high-performance cores and four energy-efficient cores. Compared with the A10Fusion, the speed of two performance cores has increased by 25%, and the speed of four energy efficiency cores has increased by 70%. The GPU uses Apple's self-designed three-core GPU graphics processing unit, and the graphics processing speed can be up to 30% faster than that of the previous generation. The neural network engine NPU adopts a dual-core design and can perform up to 600 billion operations per second. It is mainly used for machine learning tasks, can identify people, places and objects, can share the tasks of CPU and GPU, and greatly improve the computing efficiency of the chip.
In addition, Qualcomm has also made public the research and development of NPU since 2014, and has been reflected in the latest two generations of Snapdragon 8xx chips. for example, Snapdragon 835 integrates the "Snapdragon neural processing engine software framework" to provide support for customized neural network layer, based on which OEM manufacturers and software developers can build their own neural network units. ARM has also incorporated its own AI neural network DynamIQ technology into the Cortex-A75 and Cortex-A55 released this year. According to reports, DynamIQ technology can achieve 50 times higher AI performance than current equipment in the next 3-5 years, and can increase the response speed of specific hardware accelerators by 10 times. Generally speaking, the ecology of AI chips in the future of smartphones can basically be concluded that it will still be in the hands of traditional SoC merchants.
2) Autopilot
NVIDIA released the autopilot development platform DRIVEPX2 last year, which is based on 16nmFinFET technology, consumes up to 250W and adopts water-cooled heat dissipation design; supports 12-channel camera input, laser positioning, radar and ultrasonic sensors; CPU uses two new generation NVIDIATegra processors, including 8 A57 cores and 4 Denver cores GPU adopts a new generation of Pascal architecture, and its single-precision computing capacity reaches 8TFlops, surpassing TITANX, and has more than 10 times the deep learning computing ability of the latter. Automotive electronics giants such as Mobileye acquired by Intel, NXP acquired by Qualcomm, Infineon and Renesas also provide ADAS chips and algorithms. Among startups, Horizon's Deep Learning processor (BPU,BrainProcessorUnit) IP and its self-developed Hugo platform are also focused on self-driving.
3) computer vision field
Movidius, acquired by Intel, is a major chip supplier, and Movidius's Myriad series chips are used in the smart surveillance cameras of DJI, Haikang Visa and Dahua. At present, among the domestic companies that do computer vision technology, Shangtang Technology, Face++, Yuncong, Yitu and so on, in the future, with the accumulation of their own computer vision technology, some companies may extend upstream to do CV chip research and development. In addition, there are start-ups such as Renren and Smart Core to provide camera-end AI accelerated IP and chip solutions.
4) other
The representative of the VR device chip is the HPU chip developed by Microsoft for its own VR device Hololens. This chip manufactured by TSMC can simultaneously process data from five cameras, a depth sensor and a motion sensor, and has the matrix operation of computer vision and the acceleration function of CNN operation. In terms of voice interaction equipment chips, there are two domestic companies, Qiying Tailun and Yunzhisheng, which provide chip solutions with built-in deep neural network acceleration schemes optimized for speech recognition to realize off-line speech recognition. In the pan-IOT field, NovuMind designed an AI chip that only uses 3 × 3 convolution filters. The first chip prototype is expected to be launched by the end of this year. It is expected to achieve energy consumption of no more than 5 watts to perform 15 trillion floating-point operations, and can be widely used in all kinds of small Internet "edge" devices.
(4) New architecture-brain-like computing chip
"brain-like chip" refers to the chip designed with reference to the structure of human brain neurons and the cognitive style of human brain perception, and its goal is to develop a chip that breaks the von Neumann architecture. At present, this field is still in the exploratory stage, such as SpiNNaker and BrainScaleS supported by EU, Neurogrid of Stanford University, TrueNorth of IBM and Zeroth of Qualcomm. Domestic Westwell, Tsinghua University, Zhejiang University, University of Electronic Science and Technology and so on.
IBM's TrueNorth, released in 2014. There are 4096 cores, 1 million neurons and 256 million programmable synapses integrated on a chip, using Samsung's 28nm process, with a total of 5.4 million transistors; it can perform 46 billion synaptic operations per second, with a total power consumption of 70mW and 20mW per square centimeter. IBM's ultimate goal is to build a computer with 10 billion neurons and 100 trillion synapses that is 10 times more powerful than the human brain, consumes only one kilowatt and weighs less than two liters.
Westwell, a domestic AI start-up company, uses FPGA to simulate neurons to achieve the way SNN works. There are two products:
1. The bionic brain neuron chip DeepSouth (Shennan), the third generation pulse neural network chip SNN, the complete synaptic neural network based on the algorithm of STDP (spike-time-dependentplasticity), and the bionic chip whose circuit simulates real biological neurons to generate pulses. Through the method of dynamic allocation, the "neuron" of up to 50 million level can be simulated, and the power consumption of the traditional chip is one tenth to one percent of that of the traditional chip in the same task.
2. Deep learning brain neuron chip DeepWell (Deep well), a general intelligent chip for dealing with pattern recognition problems, learns and adjusts the connection weight between neurons in the chip based on on-line pseudo-inverse matrix solving algorithm (OPIUMlite), and has 128 million neurons to adjust the allocation of neuron resources in the chip through exclusive instruction set. The speed of learning and recognition is much higher than that of traditional methods running on general hardware such as CPU,GPU, such as CNN, and the power consumption is lower.
Generally speaking, the field of brain-like computing chips is still in the exploratory stage, and there is still a long way to go from large-scale commercial use.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.