Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Round table dialogue of "AIGC wisdom": wave message + Baichuan + Kuaishou + Jinshan Office

2025-01-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Driven by a huge amount of computing power, the ability of "intelligence emergence" is emerging, which not only brings the dawn of general intelligence, but also accelerates the integration of artificial intelligence and thousands of industries. At present, artificial intelligence computing power is not only the core engine to drive the evolution of large models, but also a challenge that the development of large model industry has to face.

Recently, Yang Jing, founder of New Zhiyuan & CEO, hosted the round-table forum of "talking about the AIGC era, the way of Intelligent Computing", and collided with Liu Jun, Senior Vice President of Wave Information, Chen Weipeng, Co-founder of Baichuan Intelligent Technology, Liu Lingzhi, head of Kuaishou heterogeneous Computing, and Xiong Longfei, Office Technology Director of Jinshan, focusing on the smart computing opportunities, pain points and ways to break the situation in the big model era.

Participants pointed out that China's large models are still in the early stages of business ecological development, and extensive ecological cooperation is needed in the aspects of intelligent computing system construction, platform support, algorithm research and development, and application landing, so as to create more outstanding large models. And promote the scene landing of the model.

The following is the Q & A record of the round table forum:

Yang Jing: now that the "hundred Model Wars" has passed through the first half, how do you lay out your business around the big model in the first half?

Liu Jun: the most important industrial layout of tide information in AIGC is still around smart computing. We hope to accelerate the innovation and landing of generative AI and large models through smart computing power, and carry out product layout and technological innovation around the four elements of computing power, algorithm, data and interconnected intelligent computing. We hope to accelerate the pace of user generative AI innovation through the combination of our products and technologies and better services.

Chen Weipeng: since Baichuan Intelligence entered the field of large models, the product release has changed very quickly. In the field of ToB, Baichuan Intelligent's open source work has gained a good influence, and there are also many business opportunities. In the future, we hope to open ToB cooperation with open source. In terms of ToC, Baichuan Intelligence has a strong Internet product gene, hoping to create a super assistant product to achieve ToB, ToC two legs forward.

Liu Lingzhi: Kuaishou big model is widely used, and some useful attempts have been made in search recommendation advertising and audio and video creation. In the future, I hope to have a more direct communication with the majority of users through ToB's StreamLake, and work together to make the model products bigger and bigger.

Xiong Longfei: Jinshan Office, as the application side of the large model, does not produce the large model for the time being. At present, Jinshan Office adopts multi-model application solutions with different models according to different scenarios, and determines three main directions according to our business positioning: one is AIGC content generation, which helps users write and do things; the second is Copilot, which helps users to be personal assistants; and the third is knowledge insight, which mainly helps users analyze and discover important information in documents.

Yang Jing: computing power is the core driving force and super engine in the era of large models, and the demand for computing power has become the bottleneck of the current AIGC era. That is to say, the AIGC business represented by large models is stuck in computing power, and many enterprises will face the bottleneck and pressure of computing power. What kind of computing infrastructure do we expect to support the innovation and application of large models?

Liu Jun: from the perspective of tide information, the first thing is to do a good job in ensuring the supply of computing power. Under the background of tight supply of computing power, how to build a computing power system with higher productivity has become the focus of the industry. In view of this, the tide information develops the big model "source", studies the demand characteristic of the big model to the computing power system, thus delivers the higher performance and better use computing power system for the customer. Therefore, the release of OGAI Intelligent Computing Software Stack is also intended to help partners and customers maximize computing performance.

Chen Weipeng: everyone has experienced a "shortage of numeracy". This problem is very difficult to overcome. We have also observed that computing power will still be very tight in the next 9-12 months. For Baichuan Intelligence, we solved the problem of computing power by cooperating with cloud manufacturers and seeking cooperation with tide information in the early stage. At the same time, we will spend a lot of time studying how to improve the training efficiency, improve the success rate of research and development, and explore a more efficient computing system.

Liu Lingzhi: Kuaishou mainly relies on large smart infrastructure providers to help us solve the problem of shortage of numeracy to a certain extent, but this shortage of numeracy will always exist. Kuaishou has three main viewpoints on the problem of computing power: the first is to develop a heterogeneous and multiple computing power system to find more options for computing power; secondly, the big model computing problem is a brand-new track, and there will be many opportunities in the next 2-3 years; third, Kuaishou began to lay out its own chips when it encountered the bottleneck of video computing three years ago, and now it has got good results. We hope that the entire industry will unite, from the client side to the infrastructure side, and work together to solve the computing problem through long-term efforts towards a common goal.

Yang Jing: the shortage of numeracy is indeed a difficult problem in the next two or three years. At present, some voices in the industry say that training large models is the same system engineering as rocket launch, not just with a card. Because in large-scale distributed training, computational efficiency, training instability and other problems will affect the training efficiency and accuracy of the model. What kind of technical problems have you encountered in the large model training?

Liu Jun: now the big model training math cluster is like a Ferrari sports car with great performance, but the requirement of how to train this sports car and how to drive this sports car is very high. For the current large model computing infrastructure to play a full role, there are three aspects that need to be paid attention to:

The first is the high efficiency of computing power, which involves the underlying driver of the system, system layer optimization, pipelined parallelism and data parallelism adapted to the large model. The second is linear scalability. After achieving high computing efficiency on a single computer, can it be extended to hundreds of nodes and thousands of cards, maintaining a relatively linear performance expansion ratio? this is an important factor to be considered in the design of the whole computing power cluster system and parallel strategy design. Third, sustainable computing ability, large model training will encounter a variety of software and hardware, algorithms, framework problems caused by failures, training instability system needs more mechanism design, to make this process more automatic and intelligent to complete.

Chen Weipeng: we will have two judgments about training. The first judgment is that I think people may have overestimated the difficulty of training a large model, but underestimated the difficulty of training a model. There is a contradiction in this. In the future, I think it will become easier and easier to train the model itself, but it will be very difficult to train a model well. Because training a good model is not only an empirical project, but also a systematic project, when the scale of the model is getting larger and larger, the training cost is very high, and the fault tolerance space will become very small. This catastrophic cost may become more and more unbearable, so it will be difficult to train a model well. In addition, the cost of the large model experiment is very high, so how to put forward effective assumptions to reduce the cost of this experiment? these things require a lot of experiments and Know-how. I think this may be the biggest winning or losing point of the future competition.

Another cognitive difference is that I think we are now very concerned about the cost of training, and the focus in the second half (from the second half of this year to next year) may shift to the cost of reasoning. Model training can be done greatly, but the cost of reasoning is the key. How to control the reasoning cost, and even the reasoning cost can be lower than others, may be another key point in the future.

Liu Lingzhi: it is not easy to train a good model. In terms of engineering ability, with the progress of the business, the arithmetic problem of training can always be solved, and the challenge enters the reasoning stage. How to reduce the reasoning cost still needs a lot of research. In the future, the problem of reasoning is more serious, because the utilization rate of model training is relatively high, and the utilization rate of inference card is very low.

Yang Jing: this high cost and the slowdown of Moore's Law will hinder the development of large model technology to a certain extent. How can the development of large model break through the bottleneck or difficult problem of computing power in the future?

Liu Lingzhi: as of July 2023, there were more than 70 large models with more than 1 billion parameters in China, but at present, only large models with a scale of more than 50 billion will appear "smarter". If many vertical industry models are not generative dialogues, they may only need 7B and 13B computing power, and the bottleneck may not be obvious, but whether they can be used or not will stand the test of time.

Chen Weipeng: GPT from 3.5 to 4, the parameter size has increased by 10 times, and the corresponding amount of data has also increased by more than 10 times, so it is an increase of two orders of magnitude, and the growth has occurred in less than a year. At present, we can see that the growth of computing power will certainly lead to the increase of ability, and this model will inevitably lead to the contradiction between cost and ability. In the solution, there are two possibilities: one is that the current algorithm paradigm is mainly based on Transformer, the pattern is relatively fixed, and there is the possibility of changing from a general chip to a special chip; second, the information read in a lifetime is less than 10b, and the current model may produce new algorithm ideas and jump out of the way of infinite scale expansion in the future by expanding the scale.

Liu Jun: I think there is a very close relationship between the generalization ability and computing power of large models. On the one hand, of course, it is the number of parameters of the model, and on the other hand, the concept of arithmetic power equivalent that we have been talking about, that is, the computational cost needed to train a model with high performance. The metric is PetaFlop / s-day (the total computing power consumed by a computer that runs a trillion times per second in a complete day). The current challenge is that the required model performance is difficult to match to enough computing power. Tide Information hopes to provide sufficient computing power for partners and customers, so that they do not have to worry about whether the model is big enough and whether the power equivalent is big enough.

For the computing challenge, first of all, the change in the chip will be an opportunity, and there may be a large model-oriented computing chip, which is worthy of special attention in the next step. The second thing to look forward to is that the failure of Moore's Law determines that there is no way to solve the arithmetic problem only at the chip level, so it is necessary to think about this problem from a systematic point of view-- how to build an optimized arithmetic system. So that it can also train very well in the case of low bandwidth.

Yang Jing: the landing of the big model faces more and more problems and obstacles, and it is even more necessary for the industry to work together to solve these problems. What other clients do you need to promote the landing of the big model?

Liu Jun: the application of the big model needs to establish the optimization path from the big model to the landing of the enterprise. the key is that users come in. Many customers' ideas stay at the level of buying large models for direct use, while generative AI must be closely integrated with users' data, scenarios, applications and even the customer's value chain.

Xiong Longfei: during the years when Jinshan Office has been doing AI, our path is generally to do ToC first, because we can use public network or cloud computing power to meet the needs of all C-end users through large clusters. On the other hand, the demand for privatization deployment of B-end and G-end customers is even stronger. The privatization of ToB or ToG requires different programs according to the situation, size or specific data requirements of enterprises or institutions, which will bring a lot of challenges as well as opportunities.

Yang Jing: the competition in the development of the large model industry is becoming more and more fierce. How should enterprises set up competition barriers and find new growth points?

Xiong Longfei: Jinshan office technology is very continuous, has been focused on doing document technology. When we have done something for more than 30 years, we will make it very deep and thorough, and slowly set up relatively high technical barriers. At the same time, we also need to maintain the ability to embrace new technologies and learn and adapt quickly in order to ensure the real-time innovation of technology. Jinshan office has experienced many technological changes, so our sensitivity and response speed to new technology are very fast, and our requirements for ourselves are also very high. basically, we quickly embrace what new technology comes over, to understand it and apply it. in this way, we can avoid being subverted by the new technology.

Liu Lingzhi: the combination of short video and AI model is always a direction of Kuaishou in the future. Our optimization goal is also relatively clear, that is, to reduce the cost of users, so we very much hope that the industry can effectively reduce the reasoning Cost per Token solution, we are also very willing to work with friends in the industry to achieve this goal.

Chen Weipeng: the unreliability of large models comes from two points-timeliness and hallucination. For Baichuan Intelligence, trying to combine search with large models to provide better solutions in a more economical way is what we want to explore technologically.

Liu Jun: generative AI is only the starting point, there is still a long way to go in the future, we need to build competitiveness continuously, we need better data, better algorithms, better computing power, better engineering and better ecology. Training a large model has a very high requirement for engineering ability. On the other hand, it is certainly impossible to do the ecological landing alone, and we must join hands to break through a flywheel ecology that can continue to run.

Yang Jing: how can we solve the various problems encountered in the development of the big model through industrial ecological cooperation?

Liu Jun: wave message has an ecological slogan-"gather metamodel ecology, help a hundred models, wisdom thousand lines", to help a hundred model customers and partners can create wisdom for thousands of industries, this is our ecological concept. We have noticed that model companies are innovative leading technology teams, but when it comes to enterprise customers and B-end customers, there is actually a gap that needs to be bridged. However, to establish a huge channel system and sales system, it is bound to cost a lot of money and the effect is not necessarily good. The meta-brain ecology of tide information calls model partners and AI technology companies left-handed partners, integrators and software vendors right-handed partners, and the three parties join hands to form joint solutions to serve customers and jointly help the wisdom of thousands of industries.

Chen Weipeng: our commercialization path is still being explored, and in the process of building open source influence, we come into contact with a lot of enterprises using our model, by tracking the process they use, slowly capture the needs of the enterprise, and continue to polish the product.

Liu Lingzhi: Kuaishou, as a terminal platform, attaches great importance to win-win cooperation with upstream and downstream ecological partners. On the one hand, it gives the entire industry a relatively clear ecological needs of end users, on the other hand, it continues to polish its own technology through the export of technology.

Xiong Longfei: in the past, we were in the era of heroism, and a single programmer could write software that influenced the times. But this era is changing, all the projects are getting bigger, especially the big model industry chain is very long, the underlying hardware drivers, hardware algorithms, plus solution systems and application-side things, a company can no longer do everything. Therefore, this era must be a win-win era of cooperation, we clearly position the application side in the ecology, cooperate with different roles in the industrial chain, give play to their greatest advantages in each link, and then cooperate to create. I believe that the development of domestic models through upstream and downstream cooperation will be very good and very fast.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report