"not inferior to GPT-4"! Baidu's most powerful model was released, and we measured a wave at the first time. 02/13 Update SLTechnology News&Howtos

"not inferior to GPT-4"! Baidu's most powerful model was released, and we measured a wave at the first time.

2026-02-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Just now, the 4.0 version of Wenxin model was officially released!

At the scene of Shougang Park in Beijing, Li Yanhong said directly:

The comprehensive level of Wen Xin Da Model 4.0 is not inferior to that of GPT-4.

Without saying much, let's take a look at the live demonstration.

Let's start with a section of flip prompt:

I want to go back to Chengde to buy a house. Can I use a provident fund loan? What about the formalities? I work in Beijing.

Not only the key information "Beijing work" was put at the end, but the exact location of the provident fund was not clearly stated.

However, the new version of Wen Xin's words were not fooled by these small traps at all, and successfully gave the correct answer.

In terms of generation, cut a whole video of digital population broadcast on the spot without any effort:

It is also easy to solve math problems, which can be said to be an artifact of parent tutoring (doge).

The new edition of Wen Xin Yi Yan also wrote martial arts novels on the spot. even if we continue to add characters and dramatic conflicts, there will be no confusion of memory and inconsistency of the preceding words:

This kind of performance really made the live audience high.

Wen Xin big model 4.0 related topics were immediately discussed by netizens at home and abroad.

According to the on-site introduction, compared with the online 3.5 version of Wenxin, Wenxin Model 4.0 has made remarkable progress: it has increased by 30% in the past month when small traffic tests were launched in September alone.

So, here's the question: is Wen Xin Da Model 4.0 really that good? How much does it differ from GPT-4?

At present, Wen Xin Da Model 4.0 has started the solicitation test, and the qubits have also been qualified for testing at the first time.

Let's go straight to the actual measurement.

How does it compare with the measured effect of GPT-4? After qualifying for the test, switch to Wen Xin Da Model 4.0 and you can start playing.

Compared with when Wen Xin Da Model 3.5 came out, Wen Xin Da Model 4.0 has now evolved more functions, with 8 plug-ins alone. including a mirror streaming shadow (text to video), speaking picture drawing (looking at the picture), E language easy map (visual data analysis) and so on.

These plug-ins can also be freely combined to accomplish more complex tasks.

Baidu at the World Congress site, focus on the demonstration of Wen Xin big model 4.0 picture and text creation, mathematical logic reasoning and other practical functions. Then we are still the same, from a more basic point of view, to measure its four "basic skills"--

The ability to understand, generate, logic and remember.

Understanding ability, especially Chinese comprehension ability

In the first wave, let's take a look at the comprehension ability of Wen Xin Da Model 4.0.

Here we mainly examine its ability to deal with "language traps" and the "recognition ability" of network jokes.

Let's start with a Chinese CET-10 proficiency test to see if the big model understands what "true or false" means.

The answer of Wen Xin Da Model 4.0 is very concise and gives the answer directly.

GPT-4 needs to carefully analyze the meaning of each sentence before giving an answer:

Although more careful, I always feel a bit like a foreigner (doge) who is seriously taking a Chinese test.

Let's make it a little more difficult, "thieves steal things".

Wen Xin Model 4.0 quickly disassembled the words "thief", "steal" and "steal". Get came to the meaning of this sentence:

However, GPT-4 fell into this trap, thinking that the two "steals" in the middle were also verbs, and finally missed a thief.

After examining the language trap, let's take a look at both sides' understanding of online jokes.

In view of the local problem of "which Li is expensive", Wen Xin Big Model 4.0 quickly gave the answer, and the characters and events are all intuitive:

If GPT-4 does not open a search, the get will be less than January 2022:

But if you open the search, you will soon be able to "keep pace with the times" and give the answer to this question:

By the same token, we have also tried to introduce domestic stalks from abroad.

Both Wen Xin Da Model 4.0 and GPT-4 can answer it. Wen Xin Da Model 4.0 is more general, while GPT-4 directly carries a set of encyclopedias (more detailed, but tokens is also more expensive 💰 … ):

According to the evaluation of network jokes, Wen Xin big model 4. 0 and GPT-4 with search can be said to have their own advantages.

Multimodal generation capability

Then the next wave will test the ability of multi-modal generation of the most concerned large models.

First, let's try the ability of image generation, and by the way, examine the understanding of the ancient poem "the fisherman in the boat, wearing a coir coat and fighting a hat, fishing alone in the cold on the river."

Wen Xin big model 4.0 quickly gives 4 images, which are consistent with the style and basic artistic conception:

GPT-4 also used DALL ·E 3 to draw four paintings in different styles:

This time the two sides drew.

What about video generation? Here we call the built-in plug-in of Wen Xin Model 4.0. I thought it would only generate a deciduous leaf clip, but I didn't expect that even the copywriting and subtitle voice had been matched, and the one with a high degree of completion:

At present, GPT-4 ontology does not support video generation, which needs external plug-ins (such as Capcut) to achieve this function.

Logical ability

Then, it's time for our favorite mathematical calculation + logical reasoning ability test.

Wen Xin big model 4.0 said that it focuses on upgrading the mathematical computing power, but we are also impolite and directly stumble over the Old McDonald problem of the big model:

There is a horse, two cows and three sheep on the farm in Old McDonald. How many more cattle do you need to raise on the farm so that the total number of all animals is exactly twice the total number of cattle?

Wen Xin Da Model 4.0lists 4 unknowns (doge) in one breath, but the process of solving the problem is relatively rigorous, and there is no problem with the final answer.

Previously, we fed this problem to a large group of models such as Claude and ChatGPT, and "horizontally evaluated" their mathematical abilities, which only GPT-4 could do at that time.

Next, go directly to the mentally handicapped benchmark to test the ability of logical reasoning.

The first question, Wen Xin Da Model 4.0 and GPT-4, quickly gave the correct answer:

The second question, the answer of both sides is also very fast, Wen Xin big model 4.0 also conveniently gives the geography question recitation formula of "seven ocean and three land":

It seems that both sides are good at math and logic. Give me a compliment.

Memory ability

One of the accepted criteria of the large language model is the ability of multi-round dialogue. GPT-4 has been tested for many rounds of conversations, so let's take a brief look at the effect of Wenxin Model 4.0.

Let's first interpret the long paper, there is no problem:

Write a poem on this theme and change it into English by the way. You can also hold:

Try to make it rhyme a little bit, no problem:

Finally, let's ask about the Transformer knowledge points used in poetry, and pick out one of them to ask for an explanation of the principle, which is also handy:

In addition, try to replace the above knowledge points with "it", Wen Xin big model 4.0 can also undertake the above dialogue, and give relevant knowledge answers.

It seems that whether it is a long text interpretation, or multiple rounds of dialogue, it can be said that it is not difficult to beat Wen Xin big model 4.0.

Additional questions

After the real test, let's finally have a little doge.

During this period of time, a magic question has been brought out to "baffle people" on social media such as Little Red Book. The question goes like this:

According to the Marriage Law of the people's Republic of China, who can get married?

A, Lin Daiyu and Jia Baoyu

B, Jia Lian and you Erjie

C, Yang Guo and Xiao Longnu

D, Zhang Qiling and Wu Ye

At first glance, you really can't see the answer, so you might as well give it to Wen Xin Da Model 4.0 and GPT-4 to try.

The answer given by Wen Xin Da Model 4.0 is reasonable and well-founded. Although there is still a little bug in detail, it is not a big problem as a whole.

However, when we put this question to GPT-4, it paused for a long time and then was rushed out of its mother tongue (doge).

GPT-4 thinks that option D is correct.

Let's try again. This time, GPT-4 answered in Chinese, but it seemed to start playing tai chi. For each option, its answer was:

In reality, their marriage qualification depends on whether they meet the provisions of China's marriage law.

At this point, you might as well make a small summary:

On the whole, compared with GPT-4, Wen Xin Da Model 4.0 does not lose ground in comprehensive ability, especially in Chinese comprehension ability and general knowledge ability.

So, how on earth did such a big model come into being?

How is Wen Xin Da Model 4.0 refined? First, let's take a look at the degree of "self-evolution" of Wen Xin's big model 4.0.

According to Wang Haifeng, CTO of Baidu, the ability of creation, programming, problem solving and planning shown by the big model actually depends on the four core basic competencies behind it.

The ability to understand, generate, logic and remember.

Compared with version 3.5, the four basic abilities of Wen Xin Da Model 4.0 have been improved a lot, and the biggest improvement is logic and memory ability.

Among them, the improvement of logic is nearly 3 times that of understanding, while the improvement of memory is more than 2 times that of understanding:

Take writing code for a large model as an example.

At present, many Baidu employees have used large models to write code application Comate, with an average code adoption rate of 40% and 60% of high-frequency users.

Even now 20% of Baidu's new code every day is generated by Comate, and the proportion is still increasing.

So, how on earth did the literary heart model 4.0 behind the words of Wenxin be refined?

According to Wang Haifeng, although the core architecture is still inherited from Wen Xin Big Model 3.0 and 3.5, including the initial 3.0 supervised fine tuning, reinforcement learning based on human feedback, and 3.5 knowledge point enhancement, logical reasoning enhancement, plug-in mechanism and so on.

However, the technical improvements of Wen Xin Da Model 4.0 can be directly summarized by three "changes":

More computing power, more data, stronger algorithms.

In training, the flying propeller platform has been able to run on Vanka computing power, based on cluster infrastructure, scheduling system, software and hardware co-optimization, supporting large-scale, stable and efficient training; at the same time, based on incremental parameter tuning in renewable training technology, to save training resources and time.

Based on this technology, since March, the series of Wenxin model training algorithms have been improved by 3.6 times, with a stable and effective rate of more than 98% per week:

In terms of data, the team built a set of multi-dimensional data system, from data mining, analysis, synthetic tagging and evaluation, to form a set of "pipeline" to further improve the effect of model training.

In algorithm, based on the techniques of supervised, fine tuning, preference learning and reinforcement learning, multi-stage alignment is carried out to ensure that the large model can be better aligned with human judgment and selection.

Among them, there are two key technical details.

On the one hand, the ability to enhance knowledge points.

In the past, large models may only do knowledge point enhancement in one stage, but now Baidu has carried out knowledge point enhancement in both input and output.

Input is first enhanced by knowledge points, understand the questions entered by the user, disassemble the knowledge points needed to answer the questions, search knowledge based on search engine, knowledge graph and database, and generate the first result.

The output is enhanced by knowledge points, the results generated by the first time are analyzed, and the search engine, knowledge graph and database are used to "double check" to correct the errors.

On the other hand is the agent mechanism.

In the book thinking, Fast and slow, the cognitive system is divided into system 1 (quick but error-prone) and system 2 (slow but more rational and accurate).

According to this principle, Baidu further developed the system 2 on the basis of the large model.

In other words, compared with the big model to give the answer directly, now let it further learn to understand, plan, reflect and evolve, so that the implementation of the large model can be more reliable, and even complete self-evolution, and the thinking process is "white-boxed".

These two technical details have also led to a rapid improvement in the 4.0 level of Wenxin Model, even by 30% in the past month alone.

Such technology has also led to a rapid growth in the number of users and developers of Wen Xinda Model 4.0.

So far, the number of users of Wenxin Yiyan has reached 45 million, the number of developers has reached 54000, covering more than 4300 scenarios, the number of applications has reached 825, and more than 500 plug-ins have been connected.

In addition to the technology, what is more noteworthy is that the information revealed at the Baidu World Congress shows that Wenxin Model 4.0 has fully reconstructed dozens of applications such as Baidu search, GBI, library, web disk, map and so on.

The curtain of the original era of AI begins. Why do you say that? When sharing at the Baidu World Conference, Robin Li stressed:

The emergence of intelligence brought about by large models is the basis for the development of AI native applications. Similarly, without rich AI native applications built on the base model, the base model has no value.

Coincidentally, in the second stage of generative AI, Sequoia Capital also believes that the generative AI market is entering the "second act":

Hype and quick presentation are being replaced by real value and a complete product experience.

The underlying logic is actually very simple: there is no doubt about the importance of underlying technology, but cutting-edge technology needs to be applied in order to really create value in people's lives.

If the big model sets off a storm of changes in the way of human-computer interaction, then the native application of AI is the concrete form of pure natural language interaction.

As Baidu demonstrated on the spot, data analysis can now be made by auntie--

If you ask questions about any data directly, AI can carry out specific analysis in minutes, eliminating the need for manual cross-database and cross-table analysis.

In the office software such as stream, explain the travel plan, AI super assistant will be able to arrange the wine on the trip right away.

Generate PPT according to the document, that is, in a word, products such as Baidu Library directly incarnate "the best starting point for producing content".

We are familiar with the daily network disk, maps and other App, based on the ability of large models, but also emerged a new experience.

For example, the key content is extracted directly from the video on the network disk.

For example, book a restaurant in the map command AI.

Baidu this shot, it can be said to directly show a large model of omni-directional application penetration, opened the corner of the original era of AI.

And Baidu "the first to redo all the products with a large model," the first-hand advantage, has also begun to appear on a larger scale.

Li Yanhong revealed that Baidu's large model technology has been applied in manufacturing, energy, electric power, chemical, transportation and other physical industries, 17000 enterprises have participated in it, and the large model is becoming an important driving force for new industrialization.

From the release of Wenxin in March, to the update of 3.5 version of the Chinese model, and now 4.0 stunning debut, the iterative speed of Baidu Wenxin model is not fast.

Behind this is not only the fierce competition of domestic large models from technical demo to landing application, but also once again reflects Baidu's profound technology accumulation in the field of large models.

And with the appearance of Wen Xin Big Model 4.0 and Baidu's native AI applications, the competition direction of the new stage in the big model field is becoming more and more obvious.

As Robin Li said:

We are about to enter an era of native AI. An era of human-computer interaction through prompt.

In this process, whether it is the rapid catch-up of the basic capabilities of domestic large models, or the active attack of AI native application development, it is exciting.

The original era of AI, on all levels, is more and more to look forward to.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.