Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Wave information Owen ZHU: a hundred flowers blossom in a large model, and the speed is determined by the efficiency of calculation.

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Compared with artificial intelligence in a narrow sense, general artificial intelligence can meet a wider range of scene requirements and achieve a higher degree of logic understanding and tool use ability through large models of cross-domain, cross-disciplinary, cross-task and cross-modal. In 2023, with the continuous breakthrough of LLM large-scale language model technology, the large model brings a new dawn for the exploration of higher-level general artificial intelligence. General artificial intelligence has entered a period of rapid development. In China, large models have shown a trend of letting a hundred flowers blossom, and a variety of large models emerge in endlessly.

In order to take the lead in the era of "hundred models competing for the show", the AI development team needs to focus on solving the huge challenges of computing power, algorithm and data, and the development efficiency and training speed are not only the core factors to ensure the market competitiveness of the large model, but also the core starting point in the future. Recently, Owen ZHU, an AI architect from the Department of artificial Intelligence and High performance Application Software of Chaochao Information, participated in the first NPCon conference jointly sponsored by CSDN and the "new programmer" to give an important technical speech to share the solution of the computing power system of the large model facing the new round of AIGC industrial revolution, and emphasized that the comprehensive optimization of computing power, algorithm, data and system architecture played a vital role in the training of the large model.

The following is a transcript of Owen ZHU's speech at the NPCon conference:

The bottleneck of arithmetic in the era of ● 's "hundred models competing for Show"

The core technology of large model research and development is composed of pre-training and Alignment. The first part is pre-training, which needs a lot of data to make the model converge faster and perform better. The second part is that Alignment,Alignment is not completely equal to reinforcement learning, which optimizes the model output by using a variety of ways / strategies, so that AI can learn how to communicate and express in the communication and feedback with people. these two parts are the core elements to improve the quality of large models.

At present, the basic ability of the model depends on the data, the number of model parameters and computing power. The larger the number of model parameters and the larger the input of training data, the stronger the generalization ability of the model. Due to resource constraints, how to choose when you can't have both? OpenAI concluded that, compared with increasing the amount of data, it would be better to increase the number of model parameters first. Training 200 billion Token with 100 billion model and 100 billion Token training with 200 billion model would have better model performance.

Thus it can be seen that the number of parameters is an important index to measure the ability of the model. when the number of parameters of the model exceeds a certain threshold, the ability of the model shows a leaping improvement. it shows a significant improvement in the ability of language comprehension, generation and logical reasoning, which is what we call the emergence ability of the model.

What is the size of the model that can produce emergence capacity? Now, tens of billions of parameters are the threshold for the model to have the emergence ability, and the model with hundreds of billions of parameters has a better emergence ability. However, this does not mean that the scale of the model will rise to a trillion-scale competition, because the existing large models have not been fully trained, for example, each parameter of GPT-3 is basically trained only 1-2 Token,DeepMind. The research shows that if a large model is fully trained, it is necessary to train 20 Token for each parameter. Therefore, the current hundreds of billions of large-scale models still need to be trained with 10 times more data in order to achieve a better level of model performance.

Whether it is increasing the number of model parameters or increasing the data scale, computing power is still the core driving force for the improvement of large model ability: it is necessary to use "large enough" computing power to support the generalization ability of "sufficiently accurate" model. At present, the computational power equivalent of large model training is still further increasing, from GPT-3 to GPT-4 has increased 68 times. The greater the computational power equivalent, the smaller the cross entropy and the stronger the model ability. With the increase of the number of token trained, the parameters of the model and the amount of calculation, the loss of the language model decreases smoothly, which means that the accuracy of the large language model can be further improved with the expansion of the amount of calculation, the scale of parameters and the number of token.

● wants to build a big model, sharpen its tools first.

The capability of large model comes from a great deal of engineering practical experience, and the engineering challenge of pre-training is huge, which is shown in the following aspects: first, the evolution of AI large model also puts forward high requirements for parallel computing efficiency, on-chip storage, bandwidth and low-latency memory access of clusters. The planning and construction, performance tuning and computing scheduling of Vanka AI platform are all difficult to solve. Secondly, there are common problems in large-scale training, such as hardware failure, gradient explosion and other small-scale training will not be encountered; thirdly, the lack of engineering practice makes it difficult for enterprises to achieve rapid improvement in the quality of the model.

As one of the earliest enterprises to lay out large models, Tide Information took the lead in launching the Chinese AI massive model "Source 1.0" in the industry, with a parameter scale of up to 245.7 billion. With the practice of large model innovation on the scale of hundreds of billions of parameters, Chaochao Information has accumulated practical technical experience in the field of large models and has a professional R & D team to provide the industry with the reference design of AI computing system. In the aspect of computational efficiency, in view of the complex computing mode and low computing power cluster performance in large model training, Source 1.0 adopts the three-dimensional parallel strategies of tensor parallelism, pipeline parallelism and data parallelism in large-scale distributed training. the training takes about 15 days, a total of 180 billion token, and the final loss value of the model converges to 1.73, which is significantly lower than that of other language models such as GPT-3. For the first time, a collaborative design method of large model structure for efficiency and accuracy optimization is proposed, which is deeply optimized around the deep learning framework, training cluster IO and communication. In the case of only 2x200G interconnection, the computing efficiency of source 1.0 reaches 45%, leading the world in computing efficiency. At the level of high-speed interconnection of clusters, the full line-speed networking of the whole cluster based on native RDMA and the optimization of network topology can effectively eliminate the bottleneck of hybrid computing and ensure that the cluster is always in the best state when training large models.

● seeks the optimal solution for the good ecological development of the large model.

At present, the arithmetic power gap between the advanced large models in China and the industry is still large. From the point of view of computing power equivalent, the arithmetic power equivalent of GPT-4 has reached 248842PD, while most of the mainstream large models in China have only a few thousand PD, and the gap is nearly a hundred times.

At the same time, there is also a huge gap between China and the industry in terms of algorithms and data. In terms of algorithms, although open source has brought a good opportunity for the development of domestic large models to overtake in corners, compared with top-level self-developed models such as GPT4, open source models such as LLaMA have a "ceiling" in their ability.

In terms of data, compared with English data sets, there is a significant gap in scale and quality between Chinese data sets and English data sets. Compared with English data with hundreds of billions of words, the data level of Chinese large model is only about 10 billion, and the degree of open source is lower and the degree of closure is higher.

The development of large models and general artificial intelligence is a very complex system engineering. We urgently need to find the optimal solution for the good ecological development of large models in the future from the system level. Come from the actual combat, through the construction of efficient and stable intelligent computing system to accelerate the improvement of model development efficiency.

A few days ago, OGAI (Open GenAI Infra), the big model intelligence calculation software stack of wave information, has been officially released. Through the full stack capacity of "instrumentalization, systematization and full chain", Wave Information is saving time and effort on the refining model, making the large model faster, more stable and more intelligent, and helping 100 models to truly realize "racing AIGC".

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 230

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report