Ten thousand words record of MEET Intelligent Future Conference, ChatGPT: it's exciting after watching it. 04/04 Update SLTechnology News&Howtos

Ten thousand words record of MEET Intelligent Future Conference, ChatGPT: it's exciting after watching it.

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

20 industry bigwigs argue fiercely

Excluding big factories, five or six big model companies will have the last laugh in the end.

Really ambitious developers should do applications like AI-First / AI-Native.

As soon as two years, autopilot will usher in a "ChatGPT" moment.

Side-to-side reasoning creates real killer applications. The next phase of AI is more like Minesweeper.

In the first year of the Big Model, at the MEET2024 Smart Future Conference, 20 industry bigwigs gave such a year-end summary.

ChatGPT spoke highly of it: it was exciting to see the application of AI technology in different fields.

Hundreds of offline and nearly 3 million online human viewers also said that they were full of practical information.

I don't know if the concentration of technology is so high that some netizens wonder whether the guest is a real person or a digital person. 😂

Well, maybe next year.

Centering on the theme of "starting from a new starting point", this conference is mainly divided into "new thinking" in the big model era, as well as changes in "new applications", "new terminals" and "new models" brought to players in the industry.

Come on, follow the big models such as ChatGPT and Claude2 to draw the key points.

New thinking and new trend

Kaifu Lee: the first person who should be a really ambitious developer to do applications like AI-First / AI-Native is Dr. Kaifu Lee, chairman of Innovation works and CEO of everything. He was named by time magazine as the world's 25 AI leaders in 2023 this year. This year, the Innovation Workshop spire has incubated AI 2.0, and its Yi series of large models have achieved industry-leading results.

More than 40 years ago, Kaifu Lee studied at Columbia University and Carnegie Mellon University, so that he joined the AI and became an international expert and investor in this field. He admitted that for more than 40 years, he had been looking forward to experiencing the occurrence of AGI, and once hesitated in "I can't see AGI in my life."

But now the path to AGI is becoming clear.

Lee believes that AI 2.0 is the greatest technological revolution and platform revolution in history, not only rewriting all user interfaces and APP, but also creating great value through various industries. "the platform opportunities brought by AI 2.0are ten times greater than those in the era of PC and mobile Internet."

At present, the big model track in China has entered the stage of "hundred model wars", and the competition is fierce. In terms of entrepreneurial opportunities, while the entrepreneurial window for large pre-training models is gradually closing, there are still many opportunities in other areas, such as AI 2.0 infrastructure and applications.

App in the era of AI 2.0 will be injected with super intelligence, bring a new interface and user experience, the growth rate will wash away new records, be more ferocious than the mobile Internet, create more value and bring more users.

Developers with real dreams and ambitions should do applications such as AI-First and AI-Native, which will make full use of AI technology and become the greatest and most commercially valuable company.

Secondly, in the face of the fierce battle of open and closed source models, in Kaifu Lee's view, the growth of the two is a continuous, catch-up thing, but "in the end, there will not be only one closed source, great GPT or big model company."

He predicted that five or six big model companies would have the last laugh when China and the United States add up, excluding big factories.

In the closing quick question and answer session, Kai-Fu Lee said that not being good at all the things that humans do is called AGI, as long as it is 100 times smarter than humans in some areas, this is a valuable AGI. Although it is impossible to judge when AI will have real emotions such as love and empathy, it is already 100 times smarter than humans in some areas. Facing the risks and challenges brought by AI, he believes that the problems brought about by technology can be solved by technology, supplemented by reasonable and sound laws and regulations to govern AI, so that AI can benefit more human beings.

He also said that the traditional Turing test is no longer suitable for the current rapid development of AI 2.0 era, Agent has entered a rapid landing stage, we need more advanced technology to distinguish AI from real people.

Finally, Kaifu Lee left a colored egg: next year's qubit activity will be attended by his digital body.

Academician Li Bacon: it is difficult for machines to have an indescribable stream of consciousness like human beings. "the manufacturing industry needs to achieve a breakthrough on the shoulders of AI giants." this point was fully expounded in Academician Li Bacon's speech.

Academician Li Bacon believes that AI is a "knowledge giant" that not only knows known knowledge but also may generate new knowledge, and the manufacturing industry should consider how to make full use of AI for innovative design and insight into complex relationships on the shoulders of AI.

For example, the traditional industrial automation mainly deals with the problems of fixed mode, certainty and causality.

However, in fact, there are a lot of uncertainties, no fixed patterns, complex relationships that are not based on causality.

Knowledge can be regarded as the relationship of data in time and space. Human beings usually can only understand and recognize some simple, linear, low-order relationships, while higher-order relationships are often not recognized, which will fall into the sea of so-called "dark knowledge".

But now, with big data and AI technology, we can stand on the shoulders of AI giants to gain insight into complex relationships.

It should be noted that although machines can surpass human thinking in many ways, it is difficult to have an indescribable stream of consciousness like human beings.

The concept of "stream of consciousness" was put forward by American psychologist William James:

The stream of consciousness is like a continuous and inseparable river, human consciousness consists of two aspects, some are rational and conscious consciousness, and some are illogical and irrational unconsciousness.

Academician Li Bacon said that it is the stream of consciousness that keeps human beings from being enslaved by AI, but can use AI to enhance their creative ability.

Ouyang Wanli: AI For Science enables scientists to deliver "delicious food" quickly and economically. Ouyang Wanli, a leading scientist at the Shanghai artificial Intelligence Laboratory, shared his laboratory's scientific research and exploration at AI For Science.

He describes AI For Science as gourmet cooking, which requires AI scholars to work with natural scientists.

If scientific research is compared to gourmet cooking, experimental data are equivalent to high-quality ingredients, while AI for Science allows scientists to serve delicious food "more, faster, better and cheaper".

For him personally, there are two reasons for the transition from computer vision to AI For Science: first, the question itself is very important; second, the question itself is very interesting.

With regard to the importance of the problem, in Ouyang Wanli's view, the field of natural science is facing the same problem, or even more serious, in the field of AI.

On the one hand, it is the problem of less labeling and less samples. For example, the time and resources required to obtain a protein structure are huge, and it may take a scholar a year to obtain a protein structure, that is, a sample tag.

On the other hand, it will be faced with various forms of data expression. Natural science, from physics to biology to earth science, has different forms of expression, from the very bottom atomic representation, molecular representation, gene protein expression, and atmospheric representation if it comes to earth science.

Under the various forms of expression itself, how to deal with the data well is a problem.

In that case, how should we solve the problem? Then Ouyang Wanli made a further explanation in combination with the results of his research team:

In terms of meteorology, Fengwu, a large model of global medium-range weather forecast, has effectively predicted the core atmospheric variables for more than 10 days at high resolution for the first time. Fengwu proposes to treat atmospheric variables as multimodal inputs, so that it can use multimodal and multitask depth learning methods. Fengwu breaks through the bottleneck of traditional forecasting methods and obtains a strong ability to fit the relationship of meteorological data. it only takes 30 seconds to generate high-precision global forecast results in the next 10 days, which is much better than the traditional model in efficiency.

New applications, new scenarios.

Kunlun Wanwei Fanghan: end-to-side reasoning will make a real killer application Kunlun Wanwei Chairman and CEO Fang Han shared "Kunlun Wanwei AGI and AIGC exploration road, from large model to AI Agent development platform."

First of all, Fang Han shared Kunlun Wanwei's exploration of AGI. At present, Kunlun Wanwei has built its own six AI business matrices, including AI model, AI search, AI music, AI games, AI animation, AI social networking. He believes that having their own model generation capabilities and proprietary models is very important for the development of enterprises in the field of AI. At present, the company has launched AI search products for C-side in China, and plans to launch AI products for games, music, animation and social networking in overseas markets.

He then talked in detail about the three opportunities of AI search, Agent and end-to-side reasoning.

For example, AI search, he thinks, can greatly shorten the search time of users and improve the quality of information acquisition.

He also talked about the importance of Agent. The real form of AGI is Agent, but at present, like many large models, API still needs a certain threshold. Fang Han believes that Agent, a low-code, large model secondary development interface, is needed at this time, so that all users can use Agent to make the big model do the actual work for themselves and better land on the ground.

He also mentions ways to reduce the cost of AI training and inference, including technical iteration, content revolution, and end-to-side reasoning.

When it comes to side-to-side reasoning, Fang Han believes that this is an opportunity for all enterprises. Only "end-to-side reasoning" is the final solution, which will lead to the emergence of truly killer applications.

He believes that the current large model payment model is only a transitional stage, with the realization of technology iteration, content revolution and end-to-side reasoning, the AI big model will finally achieve the free model, and only when the free model is realized will the C-end applications usher in a real explosion.

This wave of AI must be the spring tide to rise and fall to be king.

Wang Xiaogang: intelligent cars will be at a critical point in the next 1-2 years. Wang Xiaogang, co-founder and chief scientist of Shangtang, shared the technological breakthroughs and development opportunities brought by general artificial intelligence and large models to intelligent cars.

Wang Xiaogang believes that ChatGPT has changed the new paradigm of artificial intelligence and opened up a new way for the large-scale industrial application of AI. The most obvious perception of this process is the surge in computing demand. In 2018, 5 billion of Shang Tanghua built a large AI device, which many people did not understand. But all the big models we are talking about today are based on strong software and hardware infrastructure system capabilities.

So now the big model era, what kind of trend is worth paying attention to. Wang Xiaogang talked mainly from two aspects: intelligent cockpit and intelligent driving.

In terms of intelligent cockpit, he talked about the future ability to build cockpit brains based on large language models, control all kinds of software and hardware in the cockpit, and use sensors inside and outside the cabin to perceive the environment and passengers' needs in all directions, including the needs of pilots. From the application level, the trends that can be seen at present, such as content generation, AI manual, health consultation, tourism planning and so on, all raise the intelligent experience in the cockpit to a new level.

In terms of intelligent driving, he mainly talked about the development trend of pure vision. At present, only the perceptual part of the intelligent driving system uses AI, and many others are based on handwriting rules. However, in order to really solve all kinds of Corner Case, we still need to rely on data-driven, through the large model to do perception, fusion, positioning, decision-making, regulation and control, connect all the modules together, and then cover as many scenarios as possible.

For example, such as Tesla end-to-end autopilot solution, and such as this year's Shangtang CVPR best paper to achieve multi-module connectivity of large models, are the same way of thinking.

Finally, Wang Xiaogang made a prospect for the future of smart cars: in the next one or two years, our smart cars are actually at a critical breakthrough point.

In fact, there are three things, one is end-to-end data-driven autopilot, the second is the emergence of a cockpit brain based on a large model, and the third is cockpit fusion, where all cockpit and driving experiences are realized on the same chip and on the same user. greatly reduce costs and computing power, achieve better integration at the product level, and achieve a better intelligent driving and cockpit intelligent experience.

And all of this is based on the big model.

Baidu Ma Yanjun: the development of AI native applications is ushering in the best era. Ma Yanjun, general manager of Baidu AI Technology Ecology, took Wen Xin as an example to comprehensively introduce the big language model of knowledge enhancement, as well as the ecology and future development trend around the big model.

Ma Yanjun pointed out that data and alignment techniques are particularly important to improve the effectiveness of large models:

How to use data, how to mine, analyze, synthesize, label and evaluate data is very important.

In addition, Ma Yanjun also summarized the differences between the big model and other technological breakthroughs in the AI field from three aspects.

The first is the way of interaction, "there is a real subversive change this time", and future applications are realized by mobilizing native AI applications through natural language prompts. Whether the interaction effect is good or not has a direct impact on the popularity of technology.

The second is to greatly reduce the threshold of AI development, before this "to develop an AI application to write a lot of code", large model-based application development can be almost zero code.

Finally, the large model not only has an impact on industrial applications, but also promotes the new trend of AI for Science in scientific research.

Driven by these breakthroughs, Ma Yanjun said that the development of AI native applications is ushering in the best era, and more powerful Agent agents are further derived based on the access of large model plug-ins. Based on these capabilities, more AI native applications, digital technology and physical world will accelerate the connection and integration.

Ma Yanjun also mentioned that there are great challenges in training large models, including large models, high difficulty in training, large computing power, high performance requirements, large data scale, uneven quality, and so on. The existence of these problems, but also put forward higher requirements for the basic software and hardware.

Li Dahai: big models make people and machines more equal. Li Dahai, co-founder and CEO of Wall-facing Intelligence, shares the theme of "everything in Zhi Zhou: let AI agents release the productivity of large models".

Face-to-face intelligence is the earliest team to build large models in China. Li Dahai believes that logical reasoning is the key ability for large models to be used in a real production environment. The wall-facing intelligence also focuses on tackling and improving the logical reasoning ability of the model.

According to him, the newly launched hundreds of billions of multimodal large model CPM-Cricket of face wall intelligence can match the level of GPT-3.5, at the same time, the logical reasoning ability is very outstanding. In order to test the logical reasoning performance of the model, the wall-facing intelligence also gave the large model a public test, and the results showed that the total correct rate reached 63.76%, even exceeding 61.88% of GPT-4. In the English GMAT test, the score of the wall-facing intelligent model is 93% of GPT-4, which is very close.

At present, the technical route of the big model has formed a consensus in the whole industry, but is the big model change a technology wave like web3 or a ten-year industrial revolution?

Li Dahai believes that the big model is the fourth technological revolution, which can be compared with the industrial revolution and the information revolution, and this revolution will last at least 20-30 years.

In addition to the big model, Li Dahai also talked about the development of AI Agent, which he believes needs several characteristics: human design, IQ, EQ, perception, values and growth. As for growth, Li Dahai believes that at present, it is still based on the data closed-loop model such as Tunable 1 or Tunable 2, and hopes to achieve more real-time growth in the future.

Li Dahai gave an analogy: a large model is like a car engine, but it still needs to be assembled with various accessories such as steering system, car chassis, interior, and so on, in order to really provide a complete automobile product. Therefore, agents need to stack more capabilities on the basis of large models in order to achieve more application and imagination space.

In addition, as more monomer intelligences begin to collaborate, they will be able to produce more productivity. At this time, a more advanced intelligence, swarm intelligence, is formed. There are many similar cases in nature, such as ant colony, bee colony, fish swarm and so on, which bring higher intelligence performance than individuals.

Based on this thinking, in the past few months, face-to-face Intelligence has released three agent frameworks: AgentVerse, a general agent platform with many experts; ChatDev, a multi-agent collaborative development platform; and XAgent, a super single intelligent application framework with comprehensive capabilities that surpass AutoGPT in an all-round way. At present, the "big model + Agent" technology of wall-facing intelligence has been landed in legal and other scenes.

Will there be super applications based on large models in the future? Li Dahai believes that the most fundamental change brought about by large model technology is the change in the relationship between people and machines: machines become more like people, and people and machines will be more equal.

At the end of the speech, he also shared the concept of "Internet of Agents" put forward by face-to-face intelligence. they believe that the future world will be an Internet of everything connected by agents.

Xiao Bing Li Di: the next stage of AI is more like the sharing of Li Di, CEO of Xiaobing, a "minesweeper" game, starting with the heated discussion of "the first copyright case of artificial intelligence".

B used a picture in the article, and the picture was generated by A using open source AI painting software. Finally, the court ruled that B infringed A's intellectual property rights and paid 500 yuan in compensation.

"the compensation of 500 yuan is probably the biggest return this picture can get in the business world so far." This leads to a topic point-- AI is creating great value, but will not reap the same high return on value.

Li Di said that, in fact, this is one of the dilemmas of today's business model in the AI field.

In the past year, AI technology has made great progress, and the prejudice against AI products is rapidly melting. In Li Di's eyes, the past year has been a golden year for the industry.

Specifically:

The efficiency of generative AI model is greatly improved. A few years ago, when you wanted to create an AI-being that could evaluate articles, you needed to build its values for 82 types of knowledge graphs, which took about six months. Now it can be realized in a very short time.

Social prejudice against AI is dispelling, giving AI more room for fault tolerance is conducive to the rapid development of technology.

However, Li Di observed that at present, AI applications are generally faced with commercial difficulties:

On the one hand, the existing API call payment model is difficult to reflect the creative value of AI system. Take article writing as an example, the market size of AI completely replacing writers is very limited.

On the other hand, the income earned from the work of most vertical AI systems does not match the commercial value of the replacement.

Li Di believes that new business models need to be found so that AI systems can directly get a share of revenue from content creation.

He also stressed in the sharing that AI is still in the stage of high-speed iteration of technological innovation, and the future is not like a race determined by the track after the gun is fired, but more like a minesweeper game with no upper limit of AI capabilities.

At this stage, diversified exploration and tolerance are needed in order to seize the great opportunities in the past two years, truly realize the transformation from technology to application scenarios, and change human life.

Ant Yang Ming: embracing multimodal large model technology from business and application latitudes is the core driving force for creating the future. Yang Ming, researcher of Ant Group and head of multimodal large model research and development of lark, threw out this sentence as soon as he came to power. He says this is what the Ant Group has always believed.

Under the guidance of this sentence, in the past year, Ant Group has focused on the key technical problems and handed over the answer paper: lark language model and multimodal model.

Why do ants need multimodal large models?

According to Yang Ming, ants have a rich multimodal understanding of application scenarios, which can be divided into two latitudes. From the business latitude, there are numbers and numbers; from the application latitude, there are picture and text understanding, video analysis, and image and video content generation.

To this end, Ant Group has collected billions of Chinese and English picture-text pairs from scratch and trained a large model of picture-text understanding at the level of tens of billions of parameters through unsupervised learning.

Training from scratch will face many difficulties, such as lack of open source initialization weight, train from scratch non-convergence; high training cost and long iterative cycle; and training cluster scheduling and stability problems.

Finally, through the stage-by-stage training strategy, the ant solves the convergence problem, and the high training cost is solved by optimizing the training algorithm, IO and storage, and efficient parallel training platform.

Yang Ming introduced at the scene that on the basis of this picture-text model, ants have derived a lot of downstream vertical models, including the application of picture-text understanding model to picture-text dialogue, video understanding, as well as textual pictures, graphic pictures, and so on.

With the ability of picture-text dialogue, from the application point of view, ants begin to land on the ground gradually to the business field. For example, advertising content review is a typical scenario in the business area. On the basis of picture-text understanding, ants introduce timing modeling, analyze the relationship between frames, and understand motion, so that the picture-text model can be extended to video task model. support video-to-text retrieval, text-to-video retrieval and video content generation and understanding.

In addition, Yang Ming said that in order to solve the problem that the image generation model is difficult to be directly put into product application, ants have developed a number of controllable generation techniques to achieve controllable style generalization by extracting target styles from self-reference images. Only need to input a single image to achieve style transfer, face effects and other effects, greatly speeding up the pace of technology to the product.

Liang Zhihui: in the era of large models, everyone can be enhanced rather than replaced. Liang Zhihui, vice president of 360 Group and head of 360 model applications, shared their experiences and cases of large model applications in enterprise production.

First of all, Liang Zhihui believes that in the era of big models, the relationship between models and people is not replaced but enhanced. For everyone, no matter in daily office or enterprise marketing, the big model can greatly improve the speed of reading, writing and searching.

However, generative AI or generative large models are not omnipotent, and many large models still have hallucinations, lack of industry knowledge, need prompt engineering and other challenges.

Take the prompt word project as an example, first of all, the prompt word template is so complex that only AI enthusiasts can master it, which is not conducive to the promotion of large models. Secondly, high-quality content is difficult to generate by large models, so if you want to promote a large number of models, you have to enhance their strengths and avoid weaknesses.

Based on this thinking, they chose to land in a new way of man-machine cooperation-making the big model everyone's assistant.

The advantages of large models lie in content generation and content understanding. For a long time in the past, we have seen the birth of many chat robots. But the robot is like a primary school student hypnotizing the big model, telling the big model that it is now a role and providing the answer according to the routine; but it doesn't know the product, the company, or the way it collaborates.

'We want the big model to be like an autonomous agent Agent that has a variety of skills, industry knowledge and uses a variety of tools, 'says Mr Leung. With the entire Internet as its knowledge background, this Agent can be trained to help you check exchange rates, weather, and even book air tickets.

Based on hundreds of billions of models and Agent architecture, Liang Zhihui shared the application of three scenarios they are now focused on: smart marketing, smart office and intelligent customer service.

Especially like this Zhuge Liang digital person as a literary tour, it was well received by the audience and netizens at the meeting.

New terminal, new interaction

Rokid Zhu Mingming: next year, XR technology may be more hot than AI in the next five years. I hope to replace everyone's glasses with smart glasses.

The above sentence is the firm vision of CEO Misa, the founder of Rokid, for the near future.

In the speech, Misa shared his views on the integration of AI and AR technologies, and how Rokid combines the two technologies to create a new generation of human-computer interaction platform.

In 2014, Misa left Alibaba and founded Rokid. In his view, AI and AR technology represent the ability to understand and interact with the physical world and the digital world respectively, and his mission is to integrate AI and AR into one thing.

People are more likely to be caught by the hardware, but in fact, Rokid (not just an eyewear company, but) is a company dedicated to human-computer interaction between AI and AR.

On the spot, Misa dismantled the Rokid play: through continuous polishing in hardware, software, algorithms and other aspects, gradually push the products to the consumer market.

This year, Rokid released the consumer OST (Optical See Through Optical Perspective) personal space computing platform Rokid AR Studio.

What is spatial calculation? The explanation given by Misa is that its essence is the integration of the physical world and the digital world, and how to display and exchange information in a natural and easy-to-use way under this integration.

He added that there are currently two routes in the industry--

One is VST (Video See Through) represented by Apple, which wraps users in a pure digital world, digitizes the physical world through sensors and rebuilds it in the virtual world.

One is the OST chosen by Rokid, which is lighter and superimposes the real world in the digital world, allowing users to perceive it with the naked eye.

Misa gave his own judgment: in the short term, there is no right or wrong between the two routes, and will co-exist for a long time.

"who is better and who is not, or leave it to time to solve it." Misa concluded by saying that he believes XR technology will make a bigger breakthrough next year, and it may even be hotter than AI.

Around vivo: the scene where the big model can best complete the experience of closed loop and commercial closed loop is that mobile phone manufacturers around the world have accelerated to "cram" the big model into the mobile phone since the second half of 2023.

Taking vivo as an example, the company's large model strategy can be summarized into five points: large and comprehensive, strong algorithm, true security, self-evolution, and wide open source.

The specific approach revolves around two steps, one is the development of the large model, and the other is the landing of the large model.

In terms of large model development, the company officially released the self-developed AI large model matrix blue heart big model, as well as the new mobile operating system OriginOS 4.

The vice president of vivo, the vice president of OS products and the director of vivo AI Global Research Institute shared at the MEET2024 conference that the Blue Heart model contains three parameter orders of 1 billion / 10 billion / 100 billion, with a total of five large models. Today, the 7 billion parameter version is open source, and the 13 billion version is popular on the side.

The large model is so magical because it abstracts the knowledge of human civilization for thousands of years at high latitudes and condenses it into knowledge and information that everyone can obtain.

Let's take a look at the landing application of the large model, the soft-hard combination route of vivo.

In terms of hardware, cooperate deeply with chip manufacturers to accelerate mobile phones on large models; in terms of software, a variety of application forms are launched and deeply integrated with the underlying system, so that consumers can get started with the experience more quickly.

But the pace should not stagnate here.

It has been revealed that mobile phone manufacturers are more concerned about the actual experience of large model applications, so he believes that large models should also have the same logical thinking, emotions and values as human beings.

At this point, the best scenario to complete the experience closed loop and commercial closed loop is to land on the mobile phone and build an agent.

"in the future, we hope to use the capabilities of AI to further restructure the system and join hands to move towards the age of agents through the popularity of smartphones," the surroundings said.

Xiaomi Luan Jian: the big model shows muscle from the technical parameters, which makes no sense. In Xiaomi's view, the big model has three elements: big data, big parameters, and big task. Which of these is the key to the generalization ability of large models?

Luan Jian, head of the large model team in the AI Lab of the Technical Committee of Xiaomi Group, gave his view:

We think that the number of parameters is not the most critical factor, and a smaller model can also produce generalization ability.

This view is also reflected in the whole process of Xiaomi's research and development of large models.

Xiaomi has been in the AI field since 2016, and its investment in AI has continued to grow in recent years. It has been revealed that the total investment in technology research and development is expected to exceed 20 billion this year, and is expected to invest 100 billion in the last five years (2022-2026).

The breakthrough point of Xiaomi big model is not "big", but lightweight and local deployment.

Luan Jian said that this is related to the characteristics of Xiaomi, Xiaomi has a variety of hardware devices, is the world's largest consumer IoT platform, as of the third quarter of this year, the total number of connected devices is nearly 700 million, with more than 5 Xiaomi IoT devices reached 13.7 million users.

Xiaomi's idea is to use the big model as a brain and carry it into hardware.

Just like a floor-sweeping robot, it can not chat or write a small composition, but it needs to know how to plan a path, avoid obstacles, and so on.

Luan Jian said, "what Xiaomi pays special attention to is not what the industry calls a general model, nor a vertical model, but a big model of the scene."

I don't think it makes any sense to show muscle in terms of technical parameters, and then let's return to how to make good use of the big model.

Next, Xiaomi will also explore collaboration with multiple devices in the scene and across scene devices. Luan Jian said that the combination of the cloud edge is a very important path for future development.

Finally, Luan Jian asked "what is the key to the success or failure of the application of the big model?" On this issue, I give my own views:

All applications have two key points. One is where is the traffic entrance? The other is what does user stickiness depend on?

Luan Jian believes that the big model itself is an entrance, and deeply integrated with the operating system, the operating system is the entrance, in the final analysis, the operating system needs a hardware. As for user stickiness, it is necessary to explore how to integrate large models into daily life everywhere.

That is, "the entrance lies in the hardware, and the stickiness depends on the ecology".

New model and new opportunity

Qualcomm Yan Chenwei: only when the generative AI is widely deployed on the end side can we achieve a real explosion. At present, the complexity of the generative AI model continues to rise, new applications around the basic model continue to emerge, and the number of users is also increasing.

Based on this, the thinking brought by Yan Chenwei, senior vice president of product management of Qualcomm Technology Co., Ltd., is that AI will eventually land on the side in order to achieve a real explosion.

Qualcomm has three considerations about why it is necessary to support generative AI on the end side.

First, the reasoning cost of cloud AI model is high. When billions of users are using more and more complex models, the comprehensive cost of cloud computing reasoning will increase sharply, and it is difficult for cloud economy to support large-scale expansion of generative AI.

Second, a large amount of data itself occurs in the end-side, so it is the most economical to deal with AI on the terminal side, and it can better protect the privacy of users.

Third, there may be no 5G data connection in some application scenarios, such as the application of driver-car interaction in the cockpit in the wild. At this time, you must have local computing power.

Therefore, only when the terminal can run the use case based on the AI model, the end-side and cloud-side can be well combined, and the generative AI can be popularized on a large scale and give full play to its full potential.

In addition, the performance of the large language model is becoming more and more powerful, and with the innovation of the underlying model, many use cases can be run entirely on the terminal. And this will really change the way people interact.

In order to achieve a breakthrough in end-to-side AI computing power, Yan Chenwei introduced that Qualcomm recently released two new platforms specially built for generating AI, Snapdragon X Elite for PC and Snapdragon 8 for smartphones. Yan Chenwei pointed out that the third generation Snapdragon 8 can support a generative AI model with up to 10 billion parameters on the terminal side, and run a large language model at a speed of 20 token / s. Snapdragon X Elite is the most powerful computing processor built by Qualcomm for PC so far, supporting a generative AI model with more than 13 billion parameters on the terminal side, with 4.5 times the AI processing speed of Crydom competitors. Will continue to expand Qualcomm's lead in AI.

Wu Shaohua: the change from GPT-3 to GPT-4 training mode requires synchronous improvement of algorithms and data. Wu Shaohua, director of software development for AI, reviewed the important changes from GPT-3 to GPT-4 / ChatGPT on the stage of the MEET conference.

GPT-3 is a pre-training model, which is directly used by prompts, while GPT-4 introduces fine-tuning and reinforcement learning techniques in addition to pre-training, which greatly improves the ability of the model.

"both OpenAI and DeepMind have done a lot of work on the expansion rate of the large model." Wu Shaohua added to the current consensus in the industry, saying, "for example, in the case of a given model structure (that is, Transformer structure), with the increase in the number of model parameters, the larger the scale of calculation and the scale of data, the accuracy of the model tends to be higher. In the past, the research on these expansion rates was carried out under the pre-training paradigm. Under the paradigm of pre-training + fine-tuning, especially when the importance of fine-tuning is getting higher and higher, it is worth rethinking how to improve the algorithm and data to adapt to the characteristics of different stages of pre-training and fine-tuning. "

Wu Shaohua believes that the change from GPT-3 to GPT-4 training mode requires simultaneous improvement of algorithms and data.

He used practical experience to prove the thinking of wave information in this regard:

In September 2021, 245.7 billion parameters of the classical Transformer structure large model source 1.0 was released, and in September 2023, wave information released a new source 2.0. The main improvements of the two versions of the iteration are reflected in three aspects--

The first is the improvement of the algorithm. Tide Information proposes a new attention mechanism LFA (Localized Filtering-based Attention), which models the local dependency of natural language. By considering the local dependency between words, the accuracy of LLaMA structure model is 4.4% higher.

The second is the improvement of data. Source 2.0 innovates in training data sources, data enhancement and synthesis methods compared with source 1.0. Source 2.0 pays more attention to improving the quality of data than simply improving the volume of data. Due to the limited resources of Chinese mathematics and code data, the training data of Source 2.0 not only comes from the Internet, but also adopts data production and filtering methods based on large models, which not only ensures the diversity of data, but also improves the data quality in each category, and obtains a number of high-quality mathematics and code pre-training data.

The third is the improvement of calculation. In view of the great difference of P2P bandwidth among multivariate heterogeneous chips, Tide Information proposes a distributed training method of non-uniform pipelining parallel, which greatly reduces the demand for interconnection bandwidth between chips.

According to Wu Shaohua, after the release of Source 2.0, Tide Information released the "Source 2.0 Model Co-training Plan". Developers can give feedback on the capability defects of the model in the application scenario, and the tide information research and development team will collect and clean the relevant data for model enhancement training, and the trained model will continue to be open source.

Lu Chen Bian Zhengda: distributed algorithm reduces the deployment threshold and training cost of large models the theme brought by Bian Zhengda, co-founder and CTO of Luchen Technology, is "the Challenge and system Optimization of Colossal-AI:AI large models".

He first introduced the background of the big model and the trend of the increasing training cost of the AI model. As a result, the Colossal-AI framework is introduced to reduce the deployment threshold and training cost of the large model through distributed algorithms.

Bian Zhengda specifically introduces the design idea of the whole framework, including three core technologies.

One is N-dimensional parallel system. Bian Zhengda's team found that there were many parallel technologies on the market before, but after more ordinary users got the actual demand, it was difficult to choose a really suitable parallel solution to translate into an actual landing solution.

Therefore, the core idea of the Colossal-AI framework is to integrate the most efficient parallel technology into a system, choose the appropriate parallel scheme according to the needs of different users, and provide the most efficient implementation.

The second point is an efficient memory management system. Bian Zhengda said that in deep learning training, the heavier computing blocks focus on the parts with less storage overhead, while the larger ones focus on the parameter updates of the optimizer.

So their idea is to put the redundant storage overhead on the cheaper storage devices, which is reflected in the Colossal-AI framework, and they achieve more efficient storage of management parameters through adaptive management systems.

In addition, Colossal-AI also implements the management system of Chunk, which provides flexible management for heterogeneous storage.

Through the above system optimization, the Colossal-AI framework has greatly reduced the threshold for deploying large-scale AI models, and the speed of model training and reasoning has been improved.

Finally, Bian Zhengda shared the practical application effect of the Colossal-AI framework and successfully migrated the LLaMA-2 model to the Chinese model for less than $1000.

Round table conversation: as soon as two years, autopilot will usher in the "ChatGPT" moment. Finally, there is the autopilot round table conversation, which is reserved for the annual MEET conference. The topic discussed this year is "how far is the ChatGPT moment of autopilot?" "

The wave of ChatGPT has made everyone see the subversive impact of generative AI. When will autopilot, which has always attracted much attention, usher in its own ChatGPT moment? This conference invited a new wave of self-driving entrepreneurial representatives to share. They are:

Tian Shan, co-founder and CTO of DeepWay Shenzhen, was also the head of Baidu's Apollo commercial vehicle program.

Liao Ruoxue, co-founder of Qianhang Technology, is a man who was once called the "technology ceiling" with Zhang Yiming.

There is also "Young Genius in self-driving", founder and CEO Huang Ze Wah, former co-founder of Tucson.

For this topic, Tian Shan from two perspectives, the technology itself, usually fully autopilot must be needed, but it will take some time to break through. However, from the point of view of requirements, autopilot can be done well without using large models in many limited scenarios, but in order to achieve general scenarios, large models are indispensable.

Liao Ruoxue said that the key sign of realizing the "ChatGPT moment" is whether autopilot can be widely recognized and seen. From the perspective of their commercial vehicles, more and more customers realize that self-driving can bring significant cost savings and efficiency.

Huang Ze Wah, on the other hand, is more optimistic, saying that the ability that ChatGPT has shown now exceeds the knowledge needed for autopilot itself.

How do you implement the key elements of "ChatGPT moment"? Huang Ze Wah talked about a key element behind, that is, the integration of sub-modules to achieve end-to-end on the path. In the past, perception needs to be artificially defined, but if it is data-driven and based on a large model, more long-tail scenarios can be solved.

Liao Ruoxue mentioned the importance of closed loop. He believes that the data related to autopilot is still of a low order of magnitude, so how to obtain and use good data, the big model is a good path, but where does the data come from? only enough customers can use it to get a lot of data. In this way, technical production and commercial landing form a mutually reinforcing closed loop.

Tian Shan talked about three challenges: first, recognized data challenges, and different families make reuse difficult; second, the limitation of computing power, end-to-end autopilot requires a large amount of data and computing power; third, security. It is recommended that some human intervention can be added to ensure security in the end-to-end learning process.

When can we get to the ChatGPT moment? Another big factor this year is policy. The founders said that policy is a good way to boost the confidence of the entire industry, including customers and investors. But this is not the ultimate policy, Mr Ze added. Only when there is enough data to support it, can there be further exploration.

In that case, when can we get to the ChatGPT moment?

Liao Ruoxue thinks it is 2025.

Ze Ze Wah believes that it is also two years.

Tian Shan, on the other hand, is conservatively estimated at three to five years.

In the follow-up, there will be a more detailed version of the conference guests to share, please follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.