Big inside story: OpenAI is about to open source a new model, and the prosperity of the open source community depends entirely on the "handouts" of big companies? 02/14 Update SLTechnology News&Howtos

Big inside story: OpenAI is about to open source a new model, and the prosperity of the open source community depends entirely on the "handouts" of big companies?

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Is OpenAI open source again, a "handout" from big technology companies or a "redemption" from the open source community?

Just now, according to the latest revelations from The Information, OpenAI is about to release a new open source big language model.

Although it is not clear whether OpenAI intends to use the soon-to-be open source model to grab market share from Vicuna or other open source models.

But it is almost certain that the capability of the new model can not compete with GPT-4 or even GPT-3.5.

After all, a valuation of $27 billion also determines that OpenAI's state-of-the-art model will be used for commercial purposes, although the first two versions of GPT are open source.

A spokesman for OpenAI didn't respond to a request for comment.

Ten days before the outbreak of open source in the alpaca family, an internal Google document was leaked. In this article entitled "We don't have a moat, neither does OpenAI", the author complains bitterly about the heavy blow of open source to Google and OpenAI.

Indeed, neither Google nor OpenAI seems to be the winner in this arms race because the open source community is eating up their "interests".

As soon as ChatGPT came out, it detonated the global LLM revolution. However, OpenAI is not Open, and many companies and developers can only look at it in a hurry.

At this time, Meta stepped forward to release LLaMA, seeking benefits for developers all over the world.

Originally, Meta promised that LLaMA would open source non-commercial research use cases, but who would have thought that only a week after the release, the weight of LLaMA suddenly leaked on 4chan, triggering thousands of downloads in an instant.

This "epic leak" has directly changed the open source LLM field. In just a few weeks, all kinds of ChatGPT have exploded rapidly.

Alpaca, Vicuna, Koala, ChatLLaMA, FreedomGPT, ColossalChat. It can be called the big bang of the alpaca family.

In fact, the open source model dashed OpenAI's ambitions long before the alpaca.

At that time, Dall-E 2, which had just been released, caused a sensation on the Internet with its amazing graphic effect.

However, while OpenAI was still trying to peddle API, an open source alternative suddenly emerged-Stable Diffusion.

With the rapid rise of Stable Diffusion, Dall-E 2 was quickly left behind by developers.

Open source big model, want to subvert Silicon Valley big companies? Ion Stoica, a computer professor at UC Berkeley, is one of the scholars who use Meta to research and develop Vicuna.

To improve Vicuna's capabilities, Stoica and colleagues are working to increase the number of calculations in the model, which will help with tasks that involve reasoning, such as writing code.

Vicuna was developed by a Berkeley team with an annual budget of several million dollars, of which about $500000 comes from listed companies, including Microsoft, Google and Amazon.

Ion Stoica, a computer professor at UC Berkeley, said that the performance of the free AI model is "quite close" to the proprietary models of Google and OpenAI, and there is no doubt that most developers will choose the free model eventually.

On the one hand, the open source model allows developers to use their own data to solve specific problems.

On the other hand, the training cost of a model like Vicuna can be as low as a few hundred dollars and you don't have to pay expensive fees to big companies.

Https://lmsys.org/blog/2023-03-30-vicuna/

If Stoica is right, open source AI will subvert the business plans of Google, OpenAI, Microsoft and other big companies that sell proprietary model rights.

The Cambrian explosion of Vicuna quality and open source AI led Google engineer Luke Sernau to warn colleagues that Google was paying too much attention to proprietary software in its efforts to catch up with OpenAI.

If there are no restrictions on the use of free, high-quality tablets, who will pay for Google products with rules? The development of open source AI is surpassing us, and Google should establish its leadership in the open source community and give up some control over our model.

The memo quickly struck a chord across the industry-even though Sernau may overestimate the capabilities of open source AI and underestimate their costs and risks, most practitioners agree that Meta is likely to benefit from it.

For example, Meta internally uses the AI model for content recommendation and advertising positioning, and when developers improve Meta's model, Meta can incorporate these improvements into its own internal AI.

Meta CEO Xiaoza has been planning for this for a long time.

In April, in a conference call with analysts, he talked about the company's strategy--

If the industry can standardize on the basic tools we use, then we can benefit from the improvements of others, which is even better.

Google doesn't have a completely proprietary approach to AI software.

Back in 2020, Google released an open source language model, T5, that allows developers to build software that can perform translation and summary tasks. Subsequently, Google released a more advanced Flan-T5.

However, according to Stoica and other practitioners, the software released by Meta can make significant improvements based on the Google model, which makes it much more likely for developers to choose the Meta model.

However, Stoica says Google still has two advantages when it comes to open source software.

1. If Google uses its user data that is not open to the outside world, the model may perform better in certain areas of expertise, such as content recommendations.

However, a Google spokesman said the company did not train its basic model based on existing user data.

two。 The search company's expertise in managing large-scale computer infrastructure means it can run the model at a lower cost, including providing services to cloud customers.

At the same time, OpenAI has a head start in collecting data on how millions of people interact with ChatGPT, which will help OpenAI improve AI software, not to mention its partnership with Microsoft.

Is the prosperity of open source a "handout" for big companies? However, this prosperity based on open source is unstable.

At present, most open source still rely on giant models released by large companies with deep funds. If OpenAI and Meta decide to close their business, the booming open source community may become depressed.

For example, many open source replacements are now built on Meta's LLaMA.

Other models use a large public data set called Pile, collated by the open source non-profit organization EleutherAI.

EleutherAI exists because the openness of OpenAI means that a group of developers can reverse understand how GPT-3 is made and then create their own models in their spare time.

But everything can change.

OpenAI is no longer Open,Meta and is considering restricting open source to prevent startups from using open source code to do bad things.

Joelle Pineau, executive director of Meta AI, said it was right to open up the code to outsiders now, but he was not sure that Meta would adopt the same strategy in the next five years.

If this Close trend continues, not only will the open source community be abandoned, but the next generation of AI breakthroughs will return to the largest and least cash-strapped AI labs.

Obviously, the future of the way AI models are made and used is at a crossroads.

If OpenAI had been stingy, there would not be today's open source boom. Others are weighing whether this open source free competition will bring greater rewards or greater risks.

At the same time as Meta AI released LLaMA, Hugging Face introduced an access control mechanism that requires users to apply for access and be approved before downloading models on the platform, in order to restrict those with legitimate reasons.

"I'm not an open source evangelist," said Margaret Mitchell, chief ethics scientist at Hugging Face. "I can see the meaning of not opening up source. "

One of the drawbacks of the widespread use of large models is that it may lead to the proliferation of AI pornography.

Mitchell, who used to work at Google and founded the AI ethics team, is well aware of the risks of model abuse. Therefore, she is in favor of Meta AI publishing the model in a controlled manner.

At the same time, OpenAI is turning off the tap. When GPT-4 was released, it did not publish details such as architecture (including model size), hardware, training computing, dataset construction, training methods, etc., on the grounds that "given the competitive landscape and security implications of large-scale models like GPT-4".

This limitation reflects a change in the mindset of OpenAI. Ilya Sutskever, co-founder and chief scientist, says OpenAI's past openness was a mistake.

"in the past, if something was open source, maybe a small group of repair unions cared," said Sandhini Agarwal, a policy researcher at OpenAI. But now, the whole environment has changed. Open source can really accelerate development and lead to competition. "

Going back three years, if OpenAI had adhered to the same principle when publishing the details of GPT-3, there would have been no EleutherAI and no vigorous open source innovation.

Today, EleutherAI plays an important role in the open source ecosystem. Pile is used to train several open source projects, including Stability AI's StableLM.

But with GPT-4, 5, and 6 locked up, the open source community may once again be left behind by several big companies.

They will be trapped in the previous generation of models, and if they want to make progress, they have to work behind closed doors.

Reference:

Https://www.technologyreview.com/2023/05/12/1072950/open-source-ai-google-openai-eleuther-meta/

Https://www.theinformation.com/articles/open-source-ai-is-gaining-on-google-and-chatgpt

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.