Global looting for H100! Yingwei reached the hegemony of GPU, the chief scientist revealed the four elements of success. 07/13 Update SLTechnology News&Howtos

Global looting for H100! Yingwei reached the hegemony of GPU, the chief scientist revealed the four elements of success.

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

[new Zhiyuan Guide] Nvidia's chief scientist revealed four main reasons why Nvidia GPU is so successful, and four key data bring sustained industry competitiveness.

Today, Nvidia is firmly on the throne of GPU.

After the birth of ChatGPT, it brought about a generative AI outbreak, which thoroughly set off a global battle for computing power.

Some time ago, an article revealed that the total global demand for H100 is more than 430000, and this trend will continue until at least the end of 2024.

In the past 10 years, Nvidia has successfully improved the performance of its chips on AI tasks a thousandfold.

How does it succeed for a company that has just entered a trillion dollars?

Recently, Bill Dally, chief scientist of Nvidia, delivered a keynote speech on high-performance microprocessors at the IEEE 2023 Hot Chip Symposium held in Silicon Valley.

On the PPT page of his speech, he summed up four elements of Nvidia's success so far.

Moore's Law is only a small part of Nvidia's "magic", while the new "digital representation" accounts for a large part.

How Nvidia improved the performance of its GPU on AI tasks a thousandfold in 10 years

Add all of the above together and you get the Huang's Law Law.

Master Huang once said, "with the advent of GPUs, Moore's Law is no longer tenable and is replaced by a new super law." "

The 16-fold improvement Dally says that overall, our biggest gain comes from better "digital representation".

These numbers represent the "key parameters" of neural networks.

One of the parameters is weight, the strength of the connection between neurons in the model.

The other is activation, where the sum of the weighted inputs of a neuron is multiplied by how much to determine whether it is activated or not, thus spreading the information to the next layer.

Prior to P100, Nvidia GPU used single-precision floating-point numbers to represent these weights.

According to the IEEE 754 standard definition, the length of these numbers is 32 bits, of which 23 bits represent fractions, 8 bits are basically exponents of fractions, and 1 bit represents a symbol of numbers.

But machine learning researchers soon discovered that inaccurate numbers can be used in many calculations, while neural networks still give the same accurate answer.

The obvious advantage of this is that if the key calculations of machine learning-multiplication and accumulation-need to process fewer bits, you can make logic faster, smaller, and more efficient.

Therefore, in P100, Nvidia uses half-precision FP16.

Google even came up with its own version, called bfloat16.

The difference between the two lies in the relative number of fractional bits and exponential bits: fractional bits provide precision and exponential bits provide range. Bfloat16 has the same number of bits in the range as FP32, so it is easier to switch back and forth between the two formats.

Back now, Nvidia's leading graphics processor H100 can use 8 digits to accomplish some of the tasks of large-scale Transformer neural networks, such as ChatGPT and other large language models.

However, Nvidia found that this is not an one-size-fits-all solution.

For example, Nvidia's Hopper GPU architecture actually uses two different FP8 formats for calculation, one with slightly higher accuracy and the other with a slightly wider range. Nvidia has the special advantage of knowing when to use which format.

Dally and his team have all sorts of interesting ideas to extract more AI performance from fewer bits. Obviously, floating-point systems are obviously not ideal.

A major problem is that floating-point accuracy is very consistent no matter how large or small the number is.

However, the parameters of the neural network do not use large numbers, but are mainly concentrated around 0. Therefore, Nvidia's Redd focuses on finding effective ways to represent numbers so that they are more accurate near zero.

Complex instructions: 12.5x "the cost of extracting and decoding instructions far exceeds the cost of performing simple arithmetic operations," Dally said.

He takes a multiplication instruction as an example, which costs 20 times as much as the 1.5 joules needed to perform the mathematical operation itself. By designing GPU to perform large-scale calculations in a single instruction, rather than a series of multiple instructions, Nvidia has effectively reduced the overhead of single computing and achieved huge benefits.

Dally said that although there is still some overhead, but in the case of complex instructions, these expenses will be apportioned to more mathematical operations. For example, the cost of complex instruction integer matrix multiplication and accumulation addition (IMMA) accounts for only 16% of the energy cost of mathematical computation.

Moore's Law: 2.5 times keeping Moore's Law effective requires billions of dollars of investment, very complex engineering design, and even instability in international relations. But none of these investments is the main reason for the success of Nvidia GPU.

Nvidia has been using the world's most advanced manufacturing technology to produce GPU--H100 using TSMC's N5 (5 nm) process. The chip factory did not begin to build its next-generation N3 process until the end of 2022. Before it was built, N5 was the top manufacturing process in the industry.

Sparsity: making these networks "sparse" twice to reduce computing load is a tricky task.

But in the predecessor of the A100BI H100, Nvidia introduced their new technology: "structured sparsity". This hardware design can force two out of every four possible pruning events, resulting in a new and smaller matrix calculation.

"our work on sparsity is not over yet," Dally said. We need to process the activation function again, and there can be more sparsity in the weight. "

Reference:

Https://spectrum.ieee.org/nvidia-gpu

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.