Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The Cretaceous of the AI model

2025-02-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

The wonderful holiday is about to pass, and we should be busy again. In AI, one of the busiest basic technologies in the past two years should be large models.

With the recent AI painting, AI generation video and other capabilities continue to refresh the public's understanding of AI technology boundaries, the status of the big models standing behind AI creators has also risen. The vigorous "refining big model" movement seems to have come to harvest season.

However, while large models are becoming more and more popular, it is not difficult to see a problem: although pre-trained large models have shown good application effects in many fields, the commercial value generated by these effects is difficult to equate with the training cost and infrastructure investment cost of large models.

In fact, the big model with bright appearance is going through a somewhat difficult transformation stage: the "magic" effect that the big model constantly shows has aroused great attention from capital, industry and academia. As one large model after another is trained and pushed to the market, it will be found that although the application scenarios and commercial values of large models are available, they are not abundant. How to move from "refining large models" to "using large models" is becoming a key test. In particular, it is worth noting that the investment and construction of large models in China's AI industry are more radical, so the application transformation test of large models will emerge more significantly and preemptively in the Chinese market.

The state of AI pre-trained large models at this stage reminds me of one word: Cretaceous.

Cretaceous is the last epoch of Mesozoic era. This is when global warming begins and continental shelf structures begin to take shape. Dinosaurs still ruled the world, but mammals were already active.

The big model seems to be in such a phase. The large model ideas determined by BERT and GPT-3 are still shrouded in the AI industry. But how to make the big model move towards a new era of application has become a very eager and slightly confused question.

New species are emerging, old species are still dominant. Before discussing the transformation of the large model, we need to take a moment to review the development and application logic of the large model.

The so-called pre-trained large model refers to the basic model trained on large-scale wide data. It captures the basic feature that the more data the deep learning algorithm has, the more robust the model is, and violently "feeds" the model. After pre-training on large-scale data, the model can adapt to more diverse and complex downstream tasks, resulting in a better intelligent experience.

The large-scale pre-training model is not actually an innovation on the technical path, but an engineering innovation closer to grasping the technical characteristics. The road to big models is widely recognized, starting with Google's launch of BERT in October 2018. It trained its model on massive data from BooksCorpus and Wikipedia, setting a then-industry record on 11 downstream tasks.

We can think of a large-scale pre-trained model as a "pre-cooked dish." Since it is too difficult for users to cook by themselves, it is a waste of work and fire, so it may as well be prefabricated by merchants first. Users will buy vegetables after heating, add their favorite spices can be served on the table. The same is true of the idea of large models, which enable more industries to apply AI models with good results and high quality by pre-training models upstream and fine-tuning tasks downstream.

After a few years of development, the big model has now reached a critical point where the old and the new alternate. The alternation of the old and the new here can be understood at two levels. First of all, the large model itself is constantly undergoing technical innovation. We know that the industry's most typical and out-of-the-circle large model OpenAI released GPT-3 in May 2020. This large model has 175 billion parameters and excels at very multi-text generation tasks. Both BERT and GPT-3 are large models in the field of natural language processing. After GPT-3, large models are continuously improved in model parameters and iterated in technology. For example, machine vision large models have become the new mainstream of the industry, and large models that combine multi-modal large models with industry knowledge have begun to appear. The ability to drive large models ranges from language to vision, and then to more complex synthetic tasks.

Another level of large model old and new alternation, reflected in the industry side of the large model application call. As the years passed, the enthusiasm for "we have to have a big model soon" began to fade; instead,"we do have a big model, and then what? "Such application anxiety. This is especially true for the Chinese market. In the AI community in the United States, large models have always been done by a small number of technology giants and academic organizations, and the basic positioning of many large models is part of AI technology investment. However, in China, with the emphasis on technology competition, a large number of Internet and cloud computing enterprises have joined the competition to build large models, which must find effective commercial exports to recover their investment. At the same time, a large number of scientific research institutions and colleges have joined them. So we can see large models springing up in China. The advantage of doing so is that Chinese AI is far ahead in the number of large models, and at the same time, it also brings about the problem of how to digest and use so many large model projects.

At this stage, the large model industry is characterized by large model projects that directly target GPT-3 still dominate, or do not give much convincing value beyond. At the same time, new large-scale model technology ideas and industrial transformation ideas have also begun to emerge. That was the Cretaceous: dinosaurs and mammals coexisted, and new species were expecting more change.

The barbaric growth of large models has fallen into a certain exhaustion. Over the past few years, refining large models has become the hottest thing in the AI field, and at the same time, it can also attract the attention of public opinion and capital. With the rapid launch of a large number of large model projects, it is difficult for us to judge which of them are driven by the Internet mentality of "competitors are doing it, so I also want to do it," and which projects are to be linked to popular concepts such as new infrastructure and national science and technology system.

On the whole, the development of large-scale model industry provides a positive atmosphere for the whole AI field. It is easier to promote the integration of large models with various industries and scientific research fields. At the same time, it is easy to compare large models with more AI technologies, even VR, meta-universe, blockchain and other technologies also known as tuyere, and find that the development track of large models also has many traces of "barbaric growth."

In fact, from the application point of view, the big model, like cloud computing, is an intensive operation that tightens the upstream investment of the industry. Generally speaking, there are several ways for enterprises to apply AI. The simplest one is to directly access the standardized API with AI capability, which can only provide simple AI capability and cannot cover complex intelligent requirements; the second one is the overall customized AI solution, which requires high customization cost and expert cost, which is the least cost-effective one. The third is to develop AI by yourself, which is closest to the real needs of enterprises, but will lead to the development of models that are not standardized enough, and there is a gap with the industry leading level, and also requires enterprises to have AI development experience and relevant organizational structure.

The emergence of large models can be said to find a balance between several schemes. Through large-scale pre-training + fine-tuning mode, several enterprises and industries can share and repeatedly apply large models. In this way, enterprises not only use high-level AI capabilities, but also avoid excessive development costs and construction costs, that is, the so-called promotion of AI into the era of industrial production, abandoning workshop AI development.

However, we can find that this logic will eventually lead to a small number of large models and a very rich downstream application industry pattern. At this stage, the opposite is true. Downstream large model applications are in the ascendant stage, and related enterprises and solutions are constantly being developed. On the contrary, the upstream large model projects emerge one after another, and show a certain degree of homogenization. This barbaric growth generally involves several potential problems:

1. Overfocus on large model parameters and dataset test results. GPT-3 with 170 billion parameters officially pulled the large model to the scale of 100 billion parameters. Then the parameter competition of large models continued to escalate, and soon we saw large models with trillions of parameters. The pursuit of large model parameters has once become the mainstream in the field of AI, and has subsequently led to considerable reflection. The large size of the chasing model and the large size of the training data will make it difficult for the model to be deployed in real scenes, and too much low-quality training data will often lead to reverse effects.

Another problem with large models is chasing new records in a dataset test. There is nothing wrong with judging the power of large models on standardized data sets. But many times dataset testing is know-how and can be tuned specifically. Paying attention to test results blindly may lead to insufficient practical application of large models.

2. Technological innovation is too "personalized." Due to the fierce competition in the field of large models and the fact that the engineering route is relatively single, in order to indicate that their large models are differentiated, the industry has begun to rise up the upsurge of "micro-innovation" of large models. The general approach is to present yourself as the industry's first big model of a certain technology. But whether the technology is persuasive and has enough practical value may be questioned. As everyone is the first big model, the definition of big model becomes more and more complicated, and the evaluation scale becomes more and more fuzzy. Downstream users also have greater difficulty selecting large models. Be sure to emphasize that you are the "first," leading to chaotic innovation of large models.

3. Under the name of localization, make a large number of repeated investments. Another problem with large models in the industry is that with the trend of self-control and localization substitution, relevant enterprises and scientific research institutions begin to make a large number of repeated large model investments. The localization of large models is of course reasonable and necessary. However, the cooperation between different enterprises, scientific research institutions and different projects and policies in different regions is easy to cause the large-scale localization projects to be at a low level and repeated construction in the development mode, which reduces the final effect of localization.

Beneath the seed problem, the wild growth of large models, though not over, has shown some exhaustion. The core issue at this stage is to promote the transformation of large models from parameter-centered to application-centered.

The transformation of the large model presents two ideas, whether it is called "refining the large model" or the barbaric growth of the large model. It can be seen that the first stage of the development of the Chinese AI large model is in a saturated situation. Although there may be all kinds of waste and repeated investment, it does lay a solid foundation for the long-term development of the whole industry.

The most direct manifestation of this point is that the infrastructure suitable for the development of large models in China's AI industry has been very perfect. This is an advantage that many previous technologies, even deep learning technologies, did not have when they first emerged. According to the report "Market Glance: Overview of China AI Large Model Market, 2022" released by IDC, large model, as an inevitable form of artificial intelligence integration industry-level practice, is basically perfect at present, multi-type chips continue to iterate, and the layout is optimized deeply around training capability, core operator library and upper software platform.

The perfection of the foundation pillars and supporting facilities at the bottom makes the application of the large model more smooth. Today, we can see that the transformation and application of large models are the center, mainly presenting two development ideas.

1. Embrace AIGC and align with Europe and America for large model development ideas. From GPT-3's out-of-circle automatic collaboration, to the recent popularity of AI painting, to Google and Meta's recent bets on AI-generated video, this ability can be summarized as AI-Generated Content (AIGC).

AIGC can produce high-quality, complex and even fake content, and the "brain power" behind it generally comes from the support of large models. Therefore, AIGC constitutes the most direct and clear commercialization path during the time period when large models need to be connected with commercial value. However, at this stage, the commercialization potential of AIGC has yet to be deeply cultivated. The most widely used AIGC capability should be AI painting, but its own normal users are more illustrators, designers, self-media, a large number of C-end users are in the early adopter mentality to try, it is not clear how much commercial value it can finally activate. Generally speaking, the large model in AIGC plays a dual role, one is to directly provide support for the software of the enterprise to which the large model belongs, and finally complete the business transformation according to the number of times or points required; the other is to empower other software developers to complete the business value through the use of the model or driving cloud computing and cloud storage consumption. Either way, it is imperative to transform AIGC from niche demand to mass demand and further enhance AIGC's commercial space.

From another point of view, AIGC is also a large-scale investment field of European and American technology giants such as Google and Meta. Therefore, domestic Internet and AI enterprises have development references. This of course brings a lot of competition, but it will also ensure that the development route is in the familiar development rhythm of Chinese technology enterprises.

2. Expand the combination space of large model and intelligent and scientific calculation of industry, and independently explore large model in China. Just like AI technology itself, the deep expectation of Chinese industry, academia and politics for the big model is to activate the intelligent value of the industry, integrate with the overall application space of China's economy, and even activate the fourth industrial revolution in China. This path is entirely novel and lacks references for large model development possibilities. The biggest problem is that, after years of development, industrial application AI is still difficult to solve the problem of high cost and difficulty in scale and complexity, the cost of large models is more expensive, and whether it can escape the cost-benefit trap of AI becomes more complicated. Moreover, the combination of large models with specific industries and specific scientific research fields has been attempted in Europe and America, but China has already reached the forefront by relying on rich industrial demand and the digital enthusiasm of the whole society. How to activate the long-term value of large models in no man's land is both an opportunity and a severe challenge for Chinese AI.

At this stage, we can see that some AI manufacturers have launched large industry models, such as large financial models, large energy models, etc. There are also many fields that have carried out cross-border cooperation with large models, such as China Commercial Aircraft Joint Technology Manufacturers, which have released large fluid simulation models applied to large aircraft testing, and relevant teams of Xi'an Jiaotong University have made breakthroughs in the field of super drug-resistant bacteria by applying large models, etc.

China's AI large model has been extensively explored in the fields of cross-modal search, autonomous driving, digital human, biomedical, material chemistry and mathematical energy. However, these explorations are generally at the stage of cross-field cooperation and case building, and there is still a long way to go for example solid commercial value. In particular, there are many challenges in how to dilute the cost of large models and promote large-scale applications.

In any case, the big model has gone from "wait and see" to "wait and see." If you don't adapt to new changes, you may not be able to go further.

65 million years ago, the Cretaceous period officially ended and the earth ushered in the latest geological era: the Cenozoic. With the extinction of dinosaurs and the awakening of primates, the story of the entire planet began to take on a new look.

Today, we might wonder. How much of the development achievements, infrastructure construction and technical route exploration of today's large model can be left to the next stage? There would definitely be, but the high probability was not very high.

We must be soberly aware that a large number of large models will eventually lose industrial space. Like other computing, storage, and AI infrastructures, large models end up with only a small number of frequently used, formed infrastructures. The transition will inevitably lead to the rise of new investment and construction booms and the decline of traditional booms. When the industry and academia do not need to produce large models in full swing, where should the computing power, network and development platform infrastructure built for this purpose go? It also seems to be a variable that needs to be thought about in advance.

In addition, we need to realize that the road ahead for large models is not smooth. The integration of big model and industry is a new way rooted in China's economic and social characteristics and containing great value possibilities. But after years of development, we will find that all AI problems end up being cost problems. Can the big model take out the cost cycle of AI"useful, but too expensive" and bring enough value positioning for IT, cloud computing and Internet manufacturers? These questions still lack clear answers.

So the Cretaceous period of the big model is still not over. But we also know that the development of large models with parameters and a lot of repeated construction will eventually pass, and the test may have just begun at that time.

Many people think that big models are Deep Learning 2.0, Noah's Ark for AI to avoid falling into the third winter. It's got too many expectations.

Big models will be hard to replace for a long time before new AI flames are ignited.

This article comes from Weixin Official Accounts: Brain Body (ID: unity007), Author: Feng Ciyuan

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report