Gao Wen, Academician of the Chinese Academy of Engineering: it is impossible to build a big model without great calculation. 04/27 Update SLTechnology News&Howtos

Gao Wen, Academician of the Chinese Academy of Engineering: it is impossible to build a big model without great calculation.

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

On the news of July 7, Gao Wen, academician of the Chinese Academy of Engineering and director of Pengcheng Laboratory, said in a speech at the World artificial Intelligence Conference Gao Teng artificial Intelligence Industry Summit Forum that it is impossible to build a big model without great computing power, which is the same as electric power. It is the same as electricity to make what products can be made. People who really know what to do must have math power as the basis.

"the United States now ranks first in terms of computing power, which is about 30 percent more than ours, which means that GDP is also 30 percent more than ours. When our computing power exceeds that of the United States, our GDP can surpass that of the United States."

He said: from now on, computing power is also an index of the development of the digital economy. If you have enough computing power, your digital economy will develop well, but if it is not enough, it will not develop well.

The following is the full text of Gao Wen's speech: good afternoon, experts and leaders.

The previous leaders all spoke very well, and numeracy is a very important thing. Secretary Chen also said in his speech this morning that we should pay attention to three major issues, one of which in terms of artificial intelligence is the construction of numeracy, which is a very important aspect of numeracy construction, just like electricity.

From now on, computing power is also an index of the development of the digital economy. If you have enough computing power, your digital economy will develop well, but if it is not enough, it will not develop well. Therefore, especially intelligent computing is very critical.

So I would like to share with you about the intelligent computing platform of Pengcheng Yunnao, and then share the big mental model of Pengcheng made on this platform.

First of all, we say that numeracy is very important. How important is it?

A 2022 consulting report from Tsinghua University said that after statistical calculation, there is a positive correlation between the power index and GDP. The stronger your computing power, the stronger your GDP. As you can see in the chart on the right, the United States now ranks first in the index of computing power, which is about 30% more than ours, which means that GDP is also 30% more than ours. When our computing power exceeds that of the United States, our GDP can surpass that of the United States.

It is precisely because computing power is so important that it is impossible to develop artificial intelligence and economy without computing power. It is necessary to have computing power, and chips are made into machines, and this machine is used to train this model. In order to do this, we built a machine in Shenzhen in 2020. This machine is called Pengcheng Yunnao II. The computing power of this machine is the strongest machine for artificial intelligence training in the world in 2020. It is stronger than the machines of Microsoft and Google at that time, equivalent to a machine with 4000 cards, and this machine is a machine with 4000 A100 cards, which is connected by an all-optical network, and the latency between nodes is very low.

Because, with this machine, we can do a lot of things, including scientific research, industrial applications and key technology research and development.

Just now I said that this machine has 4000 cards, one is Kunpeng CPU, the other is Teng NPU, all of which are owned by Huawei.

With this machine, is the performance of this machine all right? To participate in the super-calculation ranking of the world TOP500, the whole superpower ranking, at the same time, it also has other tracks. The one we participate in is the IO500 track, the track of your output and input capacity. Since the machine was made in October 2020, we have participated in the ranking in November, six times in a row, once every six months, and the whole node has been the first for two times in a row.

Therefore, this machine is doing artificial intelligence training, and no one else can compare with it. The AI calculation power of this machine, which is once every three years, has also participated in the ranking, ranking first for three times in a row. This machine not only has strong hardware, strong interface capability and network, but also has relatively complete software, including how to do distributed computing, how to do command tuning, self-developed scheduling planning and so on. This is the first time that such a large-scale machine is equivalent to its four machines, but it is actually composed of four of our machines, and there are a lot of software challenges on it.

Some experts may have heard that some people say that only a few thousand people in the world can train a model on 1000 cards at the same time, no more than 4000 people can train on 4000 cards, and even fewer people can train models on 10000 cards. Working together on large-scale cards is a great challenge to software planning and resource scheduling.

We not only make good use of Pengcheng Yunnao II, we also undertake the task of the National Development and Reform Commission, using Pengcheng Yunnao II similar to Huawei ecological, used in the Pengteng AI cluster, whether it is 100p and 900p computing nodes, we use the network to connect resources, we can provide you with resources to tell you that there are resources that can be used through the network.

At the same time, we also put Huawei ecology, in accordance with the requirements of the NDRC, hope that this is a heterogeneous computing network platform, we also chose the computing power of some other manufacturers, in June 2022 when the project acceptance, has aggregated intelligent computing power to 2300p, Yunnao II is only 1000p, this system has gathered 2300p intelligent computing power.

With Yunnao II, we can train big models, which is still a lot of calculation, but even if you can train big models, the current demand is simply not enough, we are planning to do Yunnao III.

Now let me talk about the big model. Pengcheng is making a big model. This is the parameter of 200 billion, that is, 200B. Why do you do this?

You don't have to take any more time, because there are ChatGPT and ChatGPT-like models emerging all the time, and because many companies now use it for industrial applications and services. So, now this model has become very important, but now rich companies can spend hundreds of millions of dollars to build such machines, even tens of billions and tens of billions, but most companies can't do it.

What should we do if the demand in this field is so great?

Our Pengcheng laboratory can make a base, and the model will be opened after training, and we will do vertical application on this base. According to this way of thinking, we first have the computing power of Pengcheng Yunnao II. In the previous stage, we accumulated a lot of data. A few months ago, we got the data through various channels, including purchase, and cleaned the data first. The data is very heavy. Maybe I got 100 pieces of data, and there are only a few left after cleaning, because many of them are repetitive and non-standard, and these things become very small when removed. Although there is a lot of data, only 1% of the data used for training is 5%. With this data, we can build a larger model base.

This large model base, we use the generative pre-training model, which is completely similar to the bottom thing of GPT, to train a good model with this thing, hoping to open it up.

Now this model we think of 200 billion parameters, 200B parameters, after this training, we hope to give it to our partners, and provide corresponding instruction fine-tuning and corresponding manual enhanced learning tools, and even do one or two vertical fields. As a template as a vertical application, how to tell everyone, because Huawei has a lot of experience in this field, we suggest who wants to do vertical application. You can go to Huawei to get some advice and use this model for their application.

With this thing, we hope that we can quickly finish this model and push it to the society, allowing the society to develop China's own artificial intelligence large model application system based on this model. Now we are making great efforts to push forward. We should finish all the training for the first time at the end of August, and the model will be available in September.

How big is the training data now? We are talking about feeding data, feeding 10 Bs a day, but the most important thing is to feed a T data, a T is washed by hundreds of T, and now the data are all in Chinese and code data. now a machine with more than 4000 cards can eat 10 Bprizes a day and 1000 Bs in 100 days, which is a T. I train a model with 200 billion parameters and need 4000 cards to train for 100 days.

This is the power of calculation, if you are not big enough, it is still very hard to make a model, and now I have only trained one T data, two or three T training, and I have to add two to three for 100 days. It is impossible to build a big model without big calculation power. Of course, many people say that I can use speed limit (sound) and so on, but in fact, this is the same as electricity. You can save some electricity to make something. People who really know what to do must be based on math if they don't believe it.

We have a system and an open source community called Kai Chi Open Source Community, in which there are a lot of engineers doing flywheel data engineering, which can make you clean data very quickly. There are a lot of automatic and semi-automatic data cleaning that can help you.

Model training, 4000 cards eat 10 B data, the whole loss is falling a little bit every day, we look very happy, every day data report, down 0.2 a day, we are now about 2 up and down, we hope to eventually drop to 1.8 or so.

This is a process of training. at the same time, we have to consider that there are privacy data and some data users in the application. We hope to apply your model, but we do not want our data to be lost or seen. We provide a module for privacy data protection, which we call damage prevention packages. With such a system, we can support applications.

We hope that Pengcheng can quickly empower society, allowing people to do all kinds of possible applications, such as digital government, "Belt and Road Initiative", intelligent manufacturing, smart finance, smart medical care, and so on. There are yellow and white colors. Yellow color is now put into people to do, you need to make instructions for fine-tuning and forward learning. The latter is for partners to do, and we also have some complete education programs and talent plans, and the talent plan hopes that this model training will be released to train a large number of talents through colleges and universities and partners. it can enable the application of large models in China to start quickly.

To sum up, Pengcheng Lab is working with Huawei on the Pengcheng Yunnao II hardware platform and the Pengcheng mind big model. I hope to contribute a little to the big model of artificial intelligence in China. As a cornerstone, I hope everyone will pay more attention and participate more.

Thank you.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.