Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Nvidia H100 gave a "little shock" to the AI circle: after 11 minutes of GPT-3 training, he dominated 8 tests, and the cluster performance approached linear growth.

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

GPT-3,8 trains once in 11 minutes and finishes BERT training in seconds.

This is the "little shock" that Nvidia gives to the AI circle.

In the latest MLPerf training benchmark, the Nvidia H100 cluster swept eight tests, all setting new records, especially in the large language model task!

In the large language model task, the acceleration performance of H100 cluster approaches linear growth.

That is, with the increase in the number of cluster processors, the acceleration effect almost increases compared with the same period last year.

This means that communication between GPU within the cluster is very efficient.

In addition, H100 also completed the recommended algorithm, CV, medical image recognition and speech recognition and other tasks, is the only cluster to participate in eight tests.

In the era when computing power is productivity, this wave of achievements means everything.

It is reported that this test system is jointly developed by Nvidia and Inflection AI and hosted on cloud manufacturer CoreWeave.

Significant performance growth on a single node two new tasks have been added to MLPerf Training v3.0 this time:

Large language model (based on GPT-3)

Recommendation algorithm

This means that the test task includes larger data sets and more advanced models.

What refreshes each record as above is a very large cluster composed of 3584 pieces of H100.

Its specific achievements are as follows:

This is the largest cluster Nvidia has produced in this round of tests.

In fact, they also submitted a 768-piece H100 cluster for testing and deployment on the cloud and on-premises.

The results show that the performance of the two is almost the same.

It is further demonstrated that with the increase of the number of graphics cards in the cluster, the performance improvement can be close to linear growth.

(NVIDIA Pre-Eos is local deployment, NVIDIA+CoreWeave is cloud deployment)

In addition, this round of tests also set a new record for single-node acceleration.

Compared with MLPef Training v2.1 six months ago, a single DGX H100 system (made up of eight H100s) averaged 17% faster in each task.

Compared with the A100 Tensor Core GPU, it can be up to 3.1 times faster (BERT task).

The realization of these acceleration effects mainly benefits from two aspects.

On the one hand, the H100 itself is strong enough.

Based on the latest Hopper architecture, the H100 uses TSMC 4nm process to integrate 80 billion transistors, an increase of 26 billion over the A100.

The number of cores reached an unprecedented 16896, 2.5 times that of the A100.

For AI computing, H100 is specially equipped with Transformer Engine, so that the large model training speed can be directly × 6.

On the other hand, it relies on the accelerated network within the cluster.

The Nvidia Quantum-2 InfiniBand network is used here, which is the seventh generation of the network architecture.

According to the official website, accelerated networks can provide software-defined networks, intra-network computing, performance isolation, superior acceleration engines, RDMA and security acceleration up to 400Gb / s.

It is reported that a total of 90 systems participate in the latest round of testing, of which 82 use Nvidia's GPU, while Intel has 7 systems involved.

Intel's acceleration system uses 64-96 Intel Xeon Platinum 8380 processors and 256-389 Intel Habana Gaudi2 accelerators.

The training time of its high-equipped system to complete LLM is 311 minutes.

Based on the test results of this report, some analysts said that the biggest shock he felt was not the performance of the H100 itself, but the excellent results achieved by training AI on the cloud.

So who is CoreWeave, the cloud vendor that Nvidia is working with this time? Who is the joint development system Inflection AI?

The computing cluster will expand further. Let's first take a look at CoreWeave.

Founded in 2017, it is a large cloud vendor that claims to provide the fastest and most flexible large-scale GPU computing resources in the industry, providing on-cloud solutions such as rendering and machine learning, which is 35 times faster and 80% cheaper than large public clouds.

The cloud maker is so popular with tech giants that Nvidia has done a lot of cue before.

In May, CoreWeave raised $200m, mainly from a total of $421 million raised in the Magnetar Capital,B round of hedge funds.

In June, it was reported that Microsoft and CoreWeave had signed an AI computing agreement for computing infrastructure, which could cost billions of dollars over the next few years.

Nvidia has also invested $100m in CoreWeave, which was valued at $2 billion in April.

Inflection AI, another AI startup, was founded by Mustafa Suleyman, a founding member of DeepMind, among others.

The company, founded in March 22, has raised $225 million in financing and is valued at more than $1.2 billion.

The company has developed a large language model Pi, which is trained on the H100 cluster.

It is understood that the positioning of Pi is to help humans better interact with computers. It can gradually get to know users through chat content, and then provide more personalized answers, similar to the feeling of personal intelligent butler.

Inflection AI's latest Blog says that based on current cooperation, they plan to further expand the underlying computing infrastructure in the coming months.

Reference link:

[1] https://blogs.nvidia.com/blog/2023/06/27/generative-ai-debut-mlperf/?continueFlag=685ee2dc8db6455efed731baa85e2741

[2] https://developer.nvidia.com/blog/breaking-mlperf-training-records-with-nvidia-h100-gpus/

[3] https://www.forbes.com/sites/stevemcdowell/2023/06/27/nvidia-h100-dominates-new-mlperf-v30-benchmark-results/?sh=62b226c35e99

This article comes from the official account of Wechat: quantum bit (ID:QbitAI), by Mingmin.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report