Open source big model surpasses GPT-3.5! The measured results of MoE are released, netizens: OpenAI is losing its moat. 02/12 Update SLTechnology News&Howtos

Open source big model surpasses GPT-3.5! The measured results of MoE are released, netizens: OpenAI is losing its moat.

2026-02-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)12/24 Report--

A mysterious magnetic link detonated the entire AI circle, and now the official results have finally arrived:

Mixtral 8x7B, the first open source MoE model, has reached or even exceeded the level of Llama 270B and GPT-3.5.

(yes, it is the same scheme rumored by GPT-4. )

And because it is a sparse model, only 12.9B parameters are used to deal with each token, and its reasoning speed and cost are similar to those of the 12.9B dense model.

As soon as the news came out, there was another upsurge of discussion on social media.

Andrej Karpathy, a founding member of OpenAI, rushed to the scene to sort out his notes and highlighted that the strongest model revealed by this "European version of OpenAI" was only a "medium cup".

P.s. Mixtral 8 × 7B is even just a small cup.

Nvidia AI scientist Jim Fan praised:

More than a dozen new models pop up every month, but few really stand up to the test, let alone attract the attention of the big guys.

And this wave, not only Mistral AI, the company behind the model, has attracted a lot of attention, but also MoE (Mixture of Experts) has once again become the hottest topic in the open source AI community.

HuggingFace officially released an analytical blog post on MoE while it was hot, which also had the effect of "a wave of retweets".

It is worth noting that the latest valuation of Mistral AI has exceeded $2 billion, a more than sevenfold increase in just six months.

Basically surpassing Llama 270B, Mistral AI is also an unusual company. The front foot of the big factory next door just held a press conference and sent the model slowly, but they were all right, and the program was reversed directly:

I first dumped the link to open the download, then mentioned PR to the vLLM project (a large model reasoning acceleration tool), and finally remembered to publish a technical blog to make a serious announcement to my model.

The △ model was first released by Aunt Sauce.

Well, let's first take a look at what information has been given by the authorities and how it is different from the details picked up by onlookers themselves in the past two days.

First of all, the official said confidently:

Mixtral 8 × 7B is better than Llama 270B in most benchmark tests, and its reasoning speed is 6 times faster.

It is the most powerful open weight model with loose license, and it is also the best choice for cost performance.

Specifically, Mixtral uses a sparse hybrid expert network, which is a decoder-only model. In which the feedforward block is selected from eight different parameter groups--

In other words, in fact, Mixtral 8 × 7B is not a collection of eight 7B parameter models, only that there are eight different feedforward blocks in Transformer.

This is why the number of parameters in Mixtral is not 56B, but 46.7B.

Its characteristics include the following aspects:

Outperform Llama 2 70B in most benchmarks, or even beat GPT-3.5

The context window is 32k

Can handle English, French, Italian, German and Spanish

Excellent performance in code generation

Follow the Apache 2.0 license (free commercial)

The specific test results are as follows:

In addition, Mixtral performed better than Llama 270B in hallucination problems.

The score on the TruthfulQA benchmark was 73.9% vs 50.2%; there was less bias on the BBQ benchmark; and on BOLD, Mixtral showed more positive emotions than Llama 2.

This time, along with the basic version of Mixtral 8 × 7B, there is also the Mixtral 8x7B Instruct version. The latter, optimized by SFT and DPO, got a score of 8.3 on MT-Bench, similar to GPT-3.5 and better than other big open source models.

At present, Mistral has officially announced the launch of the API service, but it is still an invitation system, and uninvited users need to wait in line.

It is worth noting that API is divided into three versions:

Small cup (Mistral-tiny), corresponding model is Mistral 7B Instruct

Small cup (Mistral-small), the corresponding model is Mixtral 8 × 7B released this time.

Medium Cup (Mistral-medium), the corresponding model has not yet been released, but officials say its score on MT-Bench is 8.6 points.

Some netizens directly pulled GPT-4 over for comparison. It can be seen that the score of the medium cup model on WinoGrande (Common sense reasoning benchmark) is higher than that of GPT-4.

In terms of price, the input and output prices from small to medium cups range from 1 million token0.14~2.5 euros and 0.42 to 7.50 euros, respectively, while the embedded model is 0.10 euros per million token.

At present, the online version can only be experienced on third-party platforms (Poe, HuggingFace, etc.).

I can read Chinese, but I am reluctant to say that although the official notice does not say that Chinese is supported, our actual test (the online version of HuggingFace Chat and the Instruct version of the model) shows that Mixtral already has some Chinese language ability at least at the understanding level.

On the generation level, Mixtral is not inclined to answer in Chinese, but it can also get a reply in Chinese if specified, but there are still some mixed situations between Chinese and English.

Faced with more "retarded bar" questions, Mixtral's answer is formal, but at least it seems to have taken the literal meaning.

In mathematics, in the face of the classic chicken-rabbit cage problem, Mixtral's answer is completely correct from the process to the result.

Even for advanced mathematical problems, such as the derivation of complex functions, Mixtral can give the correct answer, and what is more commendable is that there is nothing wrong with the process.

And this official announcement specifically emphasized that the code ability of Mixtral is very strong, so it is also subject to our focus.

After a difficult LeetCode, the code given by Mixtral passed the test at once.

Give you an unsorted array of integers nums, please find out the smallest positive integer that does not appear.

Please implement a solution with a time complexity of O (n) and using only constant-level extra space.

But as we continued to ask questions, Mixtral's answer accidentally revealed that he might have trained specifically for LeetCode, and it was a Chinese version of LC.

To more realistically demonstrate the coding capabilities of Mixtral, we instead asked it to write a utility-- a Web version of the calculator in JS.

After several rounds of adjustment, although the layout of the button is a little strange, the basic four operations can be completed.

In addition, we will find that if we constantly add new requirements in the same conversation window, the performance of Mixtral may decline, there may be problems such as code format confusion, and it will return to normal after starting a new round of dialogue.

In addition to API and online versions, Mistral AI also provides model download services that can be deployed locally after downloading using magnetic links on 𝕏 or via Hugging Face.

On 𝕏, many netizens have started Mixtral on their devices and given performance data.

On Apple M3 Max devices with 128GB memory, running Mixtral with 16-bit floating-point precision consumes 87GB video memory and can run 13 token per second.

At the same time, some netizens ran the speed of 52token per second through llama.cpp on M2 Ultra.

Seeing this, how much will you rate the strength of Mistral AI's model?

Many netizens are already excited:

"OpenAI does not have a moat", it looks sure to become a reality.

You know, Mistral AI was just established in May this year.

In just half a year, it has been valued at $2 billion on the one hand and stunned the model of the entire AI community on the other.

More importantly, Princeton doctoral student Tianle Cai analyzed the weight correlation between Mistral-7B and Mixtral-8x7B model, which proved the successful reuse of the model.

Later, netizens found that the founder of Mistral AI also personally confirmed that the MoE model is indeed a copy of the 7B basic model eight times, and then further training.

With the free commercial use of such models, the entire open source community and new startups can promote the development of the MoE model on this basis, just like the storm that Llama has driven.

As an onlooker, I can only say:

Reference link:

[1] https://mistral.ai/news/mixtral-of-experts/

[2] https://mistral.ai/news/la-plateforme/

[3] https://huggingface.co/blog/mixtral#about-the-name

This article is from the official account of Wechat: quantum bit (ID:QbitAI), author: Creasy Fish and Sheep

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.