Baidu Wenxin Model 4.0 revealed: the largest parameter in the history of Wanka training, see you next week as soon as possible. 04/11 Update SLTechnology News&Howtos

Baidu Wenxin Model 4.0 revealed: the largest parameter in the history of Wanka training, see you next week as soon as possible.

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

CTOnews.com October 9 news, yesterday, the Financial Associated Press revealed that Baidu Wen Xin big model 4.0 is stepping up training, has been close to the state of release. Today, CTOnews.com also found more information about Wen Xin 4.0, including the underlying architecture, infrastructure, training data sets, cost and other key information.

Let's start with the core conclusion:

1. Yesterday's revelations are basically true. So far, it has been learned that Wen Xin Da Model 4.0 has actually been tested with small traffic.

2. The number of Wenxin 4.0parameters is larger than all the published parameters of LLM, which is also the first large model for cluster training using Wanka in China.

3. The cost of reasoning is much higher than that of Wen Xin 3.5, which is rumored to be about 8-10 times. (the big model is really expensive.)

Next, let's look at the details of the revelations.

The largest parameter model in the history of Vanka cluster training? According to the information obtained by CTOnews.com, the parameter scale of Wencinda Model 4.0 is larger than the LLM of all the publicly published parameters, which means that the parameter scale of Wencinda Model 4.0 is expected to exceed the trillion level.

Just look at the number of parameters, many people will feel OK, after all, according to the current secret information, the number of GPT-4 participants has been around 1.8 trillion. However, the source further said that Wen Xin Da Model 4.0 is still a single model and does not adopt the hybrid expert model (MoE) used by GPT and many other large language models.

George Holtz, a "talented hacker", revealed that GPT-4 adopted a hybrid model because it could not make the parameter size of the model exceed 220 billion. OpenAI wants the model to get better, but if it just takes longer to train, the effect is diminishing.

Therefore, if Baidu can achieve a breakthrough in a single model, whether the ability of the model will be significantly improved can only be seen after the real release.

For a model with such a large number of parameters, the computational requirements are destined to be not small. Now the news is that Wen Xin 4.0 is trained on the Vanca AI cluster, which should also be regarded as the first large language model in China to use Vanca scale clusters for training.

What is the concept of Wanka cluster? at present, only Huawei and Ali have revealed that Vanka AI cluster has been built in China, but we have not seen the specific model based on it.

This shows that it is not easy to build a Wonka cluster, and it is even more difficult if it is used to maximize its role. According to the analysis, it is precisely because of the deep combination of flying oars that a model of such a scale can be trained based on Wanka cluster.

The cost has soared, and the low-key low-flow test for the public is not only increasing the training cost, but also increasing the reasoning cost of Wen Xin 4.0 compared to 3.5. CTOnews.com has not yet got the specific reasoning cost per thousand token, but it is rumored to be about 8-10 times that of the previous, which is still in the case of high utilization (MFU). If the utilization rate is lower, the estimated cost will continue to increase.

Finally, according to internal employees, Baidu has actually started secretly testing Wenxin Big Model 4.0 with small traffic, and a small number of Wenxin users are already using the latest version of the model, which will be officially announced as soon as next week.

Many people think that this statement is more reliable, and it can also be seen in some recent revelations in the technology community. Perhaps, when you ask a question on Wen Xin now, you are using Wen Xin Big Model 4.0. I don't know whether the resulting result can compete with GPT-4.

CTOnews.com stressed once again that the above information is not officially confirmed, and you will judge its accuracy.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.