In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
Thanks to CTOnews.com netizens Wu Yanzu in South China for the delivery of clues! [guide to Xin Zhiyuan] the legendary Nvidia GH200 made a stunning debut in MLPerf 3.1, with a 17% lead over the H100.
After joining the LLM training test in April, MLPerf has once again ushered in a major update!
Just now, MLCommons released an update to MLPerf v3.1 and added two new benchmarks: LLM reasoning test MLPerf Inference v3.1 and storage performance test MLPerf Storage v0.5.
And this is also the debut of Nvidia's GH200 test results!
Compared with the combination of a single H100 and Intel CPU,GH200 's Grace CPU+H100 GPU, there is a 15 per cent increase in each project.
The debut of Nvidia's GH200 super chip has no doubt that Nvidia's GPU is the brightest in the MLPerf Inference 3.1 benchmark.
Among them, the newly released GH200 Grace Hopper super chip is also the first appearance on MLPerf Inference 3.1.
The Grace Hopper super chip integrates Nvidia's Grace CPU with the H100 GPU through ultra-high bandwidth connections, thus providing stronger performance than a single H100 with other CPU.
"Grace Hopper has demonstrated very strong performance for the first time, with a 17% performance improvement over our H100 GPU submission, and we are fully ahead," Dave Salvator, director of artificial intelligence at Nvidia, said at a news conference.
Performance has grown dramatically, specifically, it integrates an H100 GPU with Grace CPU, connected through 900GB/s 's NVLink-C2C.
CPU and GPU are equipped with 480GB's LPDDR5X memory and 96GB's HBM3 or 144GB's HBM3e memory respectively, integrating high-speed access memory up to 576GB.
The Nvidia GH200 Grace Hopper super chip is designed for computing-intensive workloads and can meet a variety of stringent requirements and functions.
Such as training and running large Transformer models with trillions of parameters, or recommendation systems and vector databases with embedded tables with several TB sizes.
The GH200 Grace Hopper super chip also performed very well in the MLPerf Inference test, refreshing the best performance set by Nvidia's single H100 SXM in each project.
Comparing the performance of the ▲ NVIDIA Grace Hopper MLPerf Inference data center with that of the DGX H100 SXM, each value is the performance leader of GH200. The GH200 Grace Hopper super chip integrates 96 GB of HBM3 and provides up to 4 TB / s of HBM3 memory bandwidth, compared with 80 GB and 3.35 TB / s for the H100 SXM, respectively.
Compared to the H100 SXM, larger memory capacity and larger memory bandwidth allow larger batch sizes to be used on NVIDIA GH200 Grace Hopper superchips to handle workloads.
For example, in the server scenario, the batch size of both RetinaNet and DLRMv2 doubled, and in the offline scenario, the batch size increased by 50%.
The high bandwidth NVLink-C2C connection of GH200 Grace Hopper super chip between Hopper GPU and Grace CPU can realize fast communication between CPU and GPU, which helps to improve performance.
For example, in MLPerf DLRMv2, it takes about 22% of the batch reasoning time to transfer a batch of tensors (Tensor) over PCIe on H100 SXM.
The GH200 Grace Hopper super chip using NVLink-C2C completes the same transmission in only 3% of the reasoning time.
Because of its higher memory bandwidth and larger memory capacity, the single chip performance advantage of Grace Hopper super chip is as high as 17% compared with H100 GPU of MLPerf Inference v3.1.
Overall lead in reasoning and training in MLPerf's debut, GH200 Grace Hopper Superchip demonstrated excellent performance in all workloads and scenarios in the closed category (Closed Division).
In mainstream server applications, L4 GPU can provide a low-power, compact computing solution, and its performance has been greatly improved compared with CPU solutions.
"compared with the best x86 CPU in the test, the performance of the L4 is also very strong, with a six-fold improvement," Salvator said.
For other AI and robotic applications, Jetson AGX Orin and Jetson Orin NX modules achieve excellent performance. Future software optimizations will help further unleash the potential of the powerful Nvidia Orin SoC in these modules.
On the popular target detection AI network, RetinaNet, Nvidia's products have improved performance by up to 84%.
The results of the Nvidia Open part (Open Division) show the potential of model optimization to greatly improve reasoning performance while maintaining extremely high accuracy.
The new MLPerf 3.1 benchmark, of course, is not the first time MLCommons has tried to benchmark the performance of large language models. As early as June this year, MLPerf v3.0 joined the benchmark test of LLM training for the first time. However, there is a big difference between LLM's training and reasoning tasks.
Reasoning workload requires high computing requirements and a wide variety of computing, which requires the platform to quickly deal with various types of data prediction and reasoning on a variety of AI models.
For enterprises that want to deploy AI systems, they need a way to objectively assess the performance of the infrastructure in a variety of workloads, environments, and deployment scenarios. So benchmarking is important for both training and reasoning.
MLPerf Inference v3.1 includes two important updates to better reflect the actual use of AI today:
First of all, the test of large language model (LLM) reasoning based on GPT-J is added. GPT-J is an open source 6B parameter LLM that summarizes the text of the CNN / Daily Mail dataset.
In addition to GPT-J, the DLRM test has been updated this time. Aiming at DLRM introduced in MLPerf Training v3.0, a new model architecture and a larger data set are adopted, which better reflect the scale and complexity of the recommendation system.
David Kanter, founder and executive director of MLCommons, said the training benchmark focused on the larger base model, while the actual tasks performed by the reasoning benchmark represented a wider range of use cases that most organizations could deploy.
In this regard, MLPerf defines four different scenarios in order to be able to test a variety of reasoning platforms and use cases.
Each benchmark is defined by datasets and quality objectives.
Each benchmark requires the following scenarios:
In the MLPerf v3.1 benchmark, there were more than 13500 results, and many submitters improved their performance by 20% or more than the 3.0 benchmark.
Other submitters include Asustek, Azure,cTuning,Connect Tech, Dell, Fujitsu, Giga Computing, Google, H3C HPEMagi IEI, Intel, Intel Habana Labs,Krai, Lenovo, Ink Core, Neural Magic,Nutanix, Oracle, Qualcomm, Quanta Cloud Technology,SiMA,Supermicro,TTA and xFusion.
Detailed data: https://mlcommons.org/en/inference-datacenter-31/
Reference:
Https://developer.nvidia.com/blog/leading-mlperf-inference-v3-1-results-gh200-grace-hopper-superchip-debut/?ncid=so-twit-408646&=&linkId=100000217826658
Https://mlcommons.org/en/inference-datacenter-31/
Https://venturebeat.com/ai/mlperf-3-1-adds-large-language-model-benchmarks-for-inference/
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.