Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Six schools besieged the bright top of cloud AI chip.

2025-04-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

The battlefield of AI chips is obviously more lively.

Just last Friday, the international authoritative artificial intelligence (AI) performance benchmark MLPerf released the latest data center and edge scenario AI reasoning results, both the participating companies and the actual AI chip performance are more interesting than in the past.

Naturally, it is the international AI computing giant Nvidia that takes the lead. This is the first time that NVIDIA has asked its latest flagship AI accelerator H100 Tensor Core GPU to submit results just released this year, and the reasoning performance of AI is 4.5 times higher than that of the previous generation of GPU.

Qualcomm, on the other hand, has proved that it is still capable of high energy efficiency through the latest test results of the cloud AI chip Cloud AI 100.

Domestic AI chip enterprises do not show weakness, this time wall technology, ink core artificial intelligence are for the first time "participate in the war", and the record is not bad, in some models even surpassed the flagship AI chip A100 and H100.

Bishi Technology has submitted data of two models with 99.90% accuracy of ResNet and BERT in the data center scenario, including Offline mode and Server mode. Its offline mode 8-card machine performance is 1.58 times higher than that of the Nvidia 8-card A100 model under the BERT model.

The S30 calculation card of ink core takes the first place in the single card computing power of ResNet-50 95784 FPS, which is 1.2 times that of Nvidia H100 and 2 times that of A100.

And Korea SK Telecom launched South Korea's first AI chip Sapeon X220 in November 2020, which also surpassed the performance of Avidia entry-level AI accelerator card A2 by participating in the test.

However, the Google TPU v4 chip, which showed high performance and high energy efficiency in the June training benchmark list, did not appear on the reasoning list.

In addition, Intel and Ali also demonstrated the performance of systems based only on their server CPU in accelerating AI reasoning.

Generally speaking, the Nvidia A100 is still the all-round player sweeping the major test scores, and the unlisted H100 is just showing its edge this time, and it is expected that the improvement in training performance will be more "exaggerated".

Although domestic AI chips have only participated in the evaluation of some AI models such as ResNet and BERT, their single-point performance has been comparable to that of Nvidia's flagship computing products, showing the ability to replace international advanced products when running specific models.

MLPerf data Center reasoning list:

Https://mlcommons.org/en/inference-datacenter-21/

MLPerf marginal reasoning list:

Https://mlcommons.org/en/inference-edge-21/

01. With the debut of H100 King, Nvidia still dominates the MLPerf benchmark test, which is divided into four types of scenarios: data center, edge, mobile and Internet of things according to the deployment mode, covering six most representative mainstream AI models-image classification (ResNet50), natural language processing (BERT), speech recognition (RNN-T), target object detection (RetinaNet), medical image segmentation (3D-UNet), and intelligent recommendation (DLRM).

Among them, natural language understanding, medical image segmentation and intelligent recommendation set 99% and 99.9% accuracy requirements to investigate the impact of improving the accuracy of AI reasoning on computing performance.

So far, Nvidia is the only company that has participated in all the major algorithm tests in every round of the MLPerf benchmark.

The Nvidia A100 still kills all directions in the latest MLPerf AI reasoning test list, and ranks among the best in the performance of many types of models. The successor of the A100, the H100, made its debut at MLPerf, breaking several world records and performing 4.50 times better than the A100.

▲ Nvidia H100 performance is 4.5 times higher than A100 (Picture Source: Nvidia)

Nvidia has submitted two systems based on the H100 GPU single chip, one with AMD EPYC CPU as the host processor and the other with Intel Xeon CPU.

As you can see, although the H100 GPU using Nvidia's latest Hopper architecture only shows the test results of a single chip this time, its performance has exceeded that of a system with 2, 4, and 8 A100 chips in many cases.

▲ Nvidia H100 refreshes performance records for all workloads in the data center scenario (image source: Nvidia)

Especially in the natural language processing BERT-Large model that requires larger scale and higher performance, the performance of H100 is much better than that of A100 and GPU, which is mainly due to its Transformer Engine.

The H100 GPU is expected to be released at the end of this year, followed by a training benchmark for MLPerf.

In addition, in terms of edge computing, Nvidia Orin, which integrates Nvidia's Ampere architecture and Arm CPU core on a single chip, runs all MLPerf benchmarks and is the most tested of all low-power system-on-chip chips.

It is worth mentioning that the edge AI reasoning efficiency of Nvidia Orin chips has been further improved by 50 per cent compared with its debut on MLPerf in April this year.

▲ in terms of energy efficiency, Orin edge AI reasoning performance improved by up to 50% (image source: Nvidia)

From the test results submitted by Nvidia in MLPerf in the past, we can see that the performance improvement brought about by AI software is becoming more and more significant. Since its debut on MLPerf in July 2020, the performance of the A100 has improved sixfold thanks to continuous improvements in NVIDIA AI software.

Currently, NVIDIA AI is the only platform that can run all MLPerf reasoning workloads and scenarios in data center and edge computing. Through software-hardware collaborative optimization, Nvidia GPU achieves more outstanding results in accelerating AI reasoning in data center and edge computing.

02. The performance of ResNet and BERT models surpassed A100 in general GPU of Bishi Science and Technology.

BR104, a general-purpose GPU chip just released in August this year, also made its public debut at MLPerf.

The MLPerf reasoning list is divided into two categories: Closed (fixed task) and Open (open optimization). The fixed task mainly examines the hardware system and software optimization ability of the tested manufacturers, while the open optimization focuses on the AI technology innovation of the tested manufacturers.

This time, Baili Technology participated in the fixed task evaluation of the data center scene, and the participating model was a tide NF5468M6 server equipped with 8 Bili 104-300W boards with built-in BR104 chips. Bishi Technology submitted the evaluation of the ResNet and BERT 99.9% accuracy models, including both Offline mode and Server mode.

Offline mode corresponds to situations where data is available locally. For example, in ResNet-50 and BERT models, Offline mode is more important; Server mode data comes from real-time data, and data is delivered online in bursts and intermittences. For example, in DLRM, Server mode is more important.

It is reported that Bishi Technology only chose these two types of models for evaluation this time, mainly considering that they are the most widely used and most important models for its target customers, especially the BERT model.

▲ BR104 takes the lead in the performance of both offline and online modes in the selection of BERT models (Picture Source: Wall Technology)

According to the test results, the performance of the BERT model is 1.58 times higher than that of the 8-piece A100 model and the 8-piece BR104 model submitted by Nvidia.

The single-card performance of ▲ BR104 exceeds A100 in the selection of ResNet-50 and BERT models.

Overall, the performance of the Bistro 8-card PCle solution is estimated to be between the Nvidia 8-card A100 and the 8-card H100.

In addition to the 8-card model submitted by Baili Technology itself, Chaochao Information, a well-known server provider, has also submitted a server with 4 Bili 104 boards, which is also the first time that Tide Information has submitted server test results based on domestic manufacturers' chips.

Of all the 4-card models, the Tide submitted server also won the first place in the world under the ResNet50 (Offline) and BERT (Offline & Server, 99.9% precision) models.

For a fledgling startup that launched a chip for the first time, this achievement is already amazing.

03. Ink core S30 wins the first place in image classification single card computing power 95784 FPS far exceeds H100 another Chinese cloud AI chip enterprise ink core artificial intelligence also participated in MLPerf for the first time, and achieved more single card computing power performance than Nvidia H100 in the reasoning task of image classification model.

When designing the AI chip Yingteng processor (ANTOUM), the ink core uses the self-developed double sparse technology to achieve the underlying chip architecture innovation, thus taking into account the data center needs of high performance and high energy efficiency. At this year's GTIC 2022 Global AI Chip Summit, Ink artificial Intelligence released its first batch of high-sparse computing cards S4, S10 and S30 for data center AI reasoning applications to the industry for the first time, namely single-chip card, dual-chip card and three-chip card, respectively.

▲ ink core artificial intelligence S30 calculation card

This time the ink core participated in the test of open optimization. According to the latest MLPerf reasoning list, ink core S30 computing card with 95784FPS single card computing power, won the first ResNet-50 model computing power, which is 1.2 times of H100 and 2 times of A100.

In running the BERT-Large high-precision model (99.9%), although the ink core S30 did not beat the H100, it achieved twice the performance of the A100, and the S30 single card has a computing power of 3837SPS.

Comparison of ink core S30 with A100 and H100 when ▲ runs ResNet-50 and BERT-Large models (image source: ink core artificial intelligence)

It is worth mentioning that the ink core S30 uses the 12nm process, while the Nvidia H100 uses the more advanced 4nm process, which can level the performance of the mainstream AI model of the two big data centers in the presence of intergenerational differences in the process process, mainly due to the sparse algorithm and architecture independently developed by the ink core.

The testing requirements of MLPerf are very stringent, not only testing the computing power of various products, but also setting accuracy requirements above 99%, in order to examine the impact of the high requirements of AI reasoning accuracy on computing performance, that is to say, participating manufacturers cannot sacrifice accuracy in exchange for higher computing power. This also proves that the ink core can achieve sparse calculation while taking into account the accuracy of lossless.

04. High energy efficiency, Qualcomm cloud AI chip ace Qualcomm released the first cloud AI chip Cloud AI 100as early as 2019, continues to firmly participate in the MLPerf, competing with a number of new AI accelerators.

From the test results, in terms of high energy efficiency in image processing, the Qualcomm Cloud AI 100chip using 7nm process can still be The Smiling、Proud Wanderer.

▲ Qualcomm Cloud AI 100

Among the latest results disclosed by MLPerf, Foxconn, Thundercomm, Inventec, Dell, HPE and Lenovo all submitted test results using Qualcomm Cloud AI 100s. It can be seen that Qualcomm's AI chips are already being accepted by the Asian cloud server market.

Qualcomm Cloud AI 100s are available in two versions, Professional (400TOPS) or Standard (300TOPS), both of which have the advantage of high energy efficiency. In terms of image processing, the performance per watt of the chip is twice as high as the NVIDIA Jetson Orin of standard components, and the energy efficiency of natural language processing BERT-99 model is also slightly better.

▲ Qualcomm Cloud AI 100 leads the energy efficiency ratio in ResNet-50 and BERT-99 model testing (figure source: Qualcomm)

While maintaining high energy efficiency, Qualcomm's AI chip does not sacrifice high performance. A 5-card server consumes 75W and can achieve nearly 50% higher performance than a 2-card A100 server. The power consumption of a single 2-card A100 server is as high as 300W.

Performance per watt of ▲ Qualcomm Cloud AI 100s (figure source: Qualcomm)

For edge computing, Qualcomm Cloud AI 100s are already very competitive in terms of energy efficiency in graphics processing, but large data centers will have higher requirements for chip versatility. If Qualcomm wants to further enter the cloud market, Qualcomm may have to expand its support for more mainstream AI models such as recommendation engines in the design of the next generation of Yunbian AI chips.

▲ achieves high energy efficiency of edge servers without sacrificing high performance (image source: Qualcomm)

05. South Korea's first AI chip is unveiled against the entry-level AI acceleration card of Invid in this MLPerf list, we also see the figure of South Korean companies that relatively lack a sense of presence in the field of AI chips.

Sapeon X220 is an AI chip independently developed by SK Telecom, a well-known technology company in South Korea. It is also the first non-memory commercial chip used in data centers in South Korea, which can perform large-scale computing required by AI services at high speed and low power consumption.

▲ Sapeon X220 partial parameters

The test results are also interesting. The Sapeon X220 runs on Supermicro servers and outperforms the entry-level AI accelerator A2 GPU released by Nvidia at the end of last year in the data center reasoning benchmark.

Among them, the performance of X220-Compact is 2.3 times higher than that of A2, and the performance of X220-Enterprise is 4.6 times higher than that of A2.

Energy efficiency is also good. In terms of performance per watt based on maximum power consumption, X220-Compact is 2.2 times more efficient than A2, and X220-Enterprise is 2.0 times more efficient than A2.

Performance and Energy efficiency comparison between ▲ Sapeon X220 Series and Nvidia A2 (Picture Source: SAPEON)

It is worth mentioning that Nvidia A2 uses the advanced 8nm process, while the Sapeon X220 uses the 28nm mature process.

It is reported that Sapeon chip has been used in smart speakers, intelligent video security solutions, AI-based media quality optimization solutions and other applications. This year, SK Telecom also separated its AI chip business and set up a company called SAPEON.

Soojung Ryu, CEO of SAPEON, revealed that the company plans to expand various application fields of the X220 in the future, and is confident that it will use the next generation chip X330 to open the gap with competitive products in the second half of next year to further improve its performance.

06. Intel preview next-generation server CPU Aliyantian 710 CPU first participate in the review although the cloud AI reasoning chip is a hundred schools of thought contend, but so far, the server CPU is still the leader of the AI reasoning market.

In this MLPerf list, we can see that only Intel Xeon and Ali self-developed CPU rely on Tian 710. These systems do not have any AI accelerators and can truly reflect the AI reasoning acceleration ability of these servers CPU.

In the fixed task list, Intel submitted a preview version of Sapphire Rapids 2-socket with PyTorch software system, although the reasoning performance was "killed" by the H100, but enough to beat A2. After all, this is a server CPU,AI reasoning acceleration ability is only its plus item, so it seems that Intel extremely strong CPU acceleration ability is enough to meet the needs of conventional AI reasoning tasks.

In the list of open optimization categories, a start-up named NeuralMagic demonstrated that it can achieve the same performance as other software with less computing power by submitting a system with only Intel Xeon CPU, which is based on pruning technology to implement more elaborate software.

Alibaba also demonstrated for the first time the results of the entire cluster running as a single machine, surpassing other results in terms of total throughput. Its self-developed 710 CPU chip appeared on the MLPerf list for the first time.

In addition, from the evaluation of the MLPerf system configuration by various manufacturers, we can see that the AMD EPYC server CPU has a higher and higher sense of existence in the reasoning application of the data center, and has the momentum to keep pace with Intel Xeon.

07. Conclusion: Nvidia is in a stable position and the new forces of domestic AI chip launch a charge. Generally speaking, Nvidia continues to play steadily and dominates the list of MLPerf reasoning benchmark tests. It is an undisputed big winner. Although some single-point performance results have been surpassed by other competitors, the Nvidia A100 and H100 can still "rub against the ground" of other AI chips in terms of versatility.

Nvidia has not yet submitted reasoning energy efficiency test data for the H100, as well as its performance in training, and when these results come out, the H100 is expected to be even more popular.

Domestic AI chip enterprises also show their edge. After Ali Pingtou self-developed cloud AI chip containing light 800 single-card computing power reached the top of the MLPerf ResNet-50 model reasoning test in 2019, wall technology and ink core also demonstrated the measured performance of their AI chip through the third-party authoritative AI benchmark testing platform.

From the performance results shown in this open optimization list, we can see that sparse computing has become a hot trend in AI reasoning in data centers, and we expect this innovative technology to enter the list of fixed tasks and further verify its landing value by comparing the strength of the system more finely and fairly.

With the increase and diversification of participating institutions, system size and system configuration, the MLPerf benchmark is becoming more and more complex. The results of these previous lists can also reflect the changes in the technology and industrial pattern of global AI chips.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report