Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Is GPT-5 here? OpenAI was exposed to speed up the training of multimodal large model Gobi, killing Google Gemini at one stroke!

2025-01-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

[guide to Xin Zhiyuan] on the battlefield of the multimodal model, some people have heard the wind. According to foreign media reports, OpenAI's new multimodal model Gobi seems to be in preparation. The showdown between Google and OpenAI seems to be in the offing.

As autumn approaches, the multimodal model battle between Google and OpenAI has also entered a white-hot phase. Just last week, Google opened the functionality of the multimodal large model Gemini to some outside companies.

And OpenAI, of course, will not sit back and wait for death. They are racing against the clock to integrate multimodal functions into GPT-4, trying to launch a large multimodal model with functions similar to Gemini, killing Google in one fell swoop.

The legendary multimodal function was demonstrated at the GPT-4 event that shocked the world at OpenAI in March this year--

Draw a sketch on the paper, take a picture and send it to GPT-4, saying, "give me a website according to this layout," and it immediately wrote the web code.

Greg Brockman, the boss of ▲, personally demonstrated it online, but then multimodal seemed like a flash in the pan, and no one ever saw the physical function of production again.

So is the multimodal war between Google and OpenAI finally coming?

Against Google, OpenAI scrambled to launch a multimodal model in the face of rumors that Google is going to kill its own mass killer, OpenAI is certainly not indifferent.

According to foreign media The Information, a new multi-modal model called Gobi is already under intensive preparation. OpenAI plans to launch multimodal LLM before the release of Gemini, completely defeating Google.

▲ OpenAI's Greg Brockman vs Google's Demis Hassabis, in fact, after launching a preview of GPT-4 's multimodal feature in March, OpenAI has already launched this feature to a company called Be My Eyes, but has not made it available to other companies. As can be seen from the name, the company is developing technology that allows blind or visually impaired people to see more clearly.

Recently, OpenAI plans to introduce a feature called GPT-Vision more widely.

Why did it take so long for OpenAI? The main reason is that they are worried that new visual functions will be used by criminals, such as impersonating humans by automatically cracking CAPTCHA codes, or tracking humans through facial recognition.

However, OpenAI engineers seem to have addressed these legal security risks. Similarly, a Google spokesman said: Google has taken some measures to prevent Gemini from being abused.

In a promise made in July, Google promised to develop responsible artificial intelligence in all products.

Can Gobi be a GPT-5? After GPT-Vision, OpenAI is likely to launch a more powerful multimodal large model, codenamed Gobi. Unlike GPT-4, Gobi is built on a multimodal model from the beginning.

So, is Gobi the legendary GPT-5?

Right now, we don't know. There is no definite information on how far Gobi has been trained.

In early September, CEO Mustafa Suleyman, co-founder of DeepMind and now Inflection AI, released a blockbuster in an interview-he guessed that OpenAI was secretly training GPT-5.

Suleyman believes that Sam Altman recently said that they did not train GPT-5 and may not be telling the truth. The original words are: Come on. I don't know. I think it's better that we're all just straight about it.)

Here, according to people who have tried Gemini, Gemini will produce fewer hallucinations than existing models. The reasons are detailed below.

In short, the multimodal model war between Google and OpenAI can be said to be an AI version of iPhone versus Android.

One is the Silicon Valley giant that has dominated the AI field for many years, and the other is a top-tier AI start-up. Everyone is holding their breath.

Google is secretly testing Gemini on the other side, Google is also starting to invite some external developers to speed up testing, the upcoming next-generation multimodal model Gemini.

Last week, The Information reported exclusively that Gemini may soon be ready for a test release and integrate into services such as Google Cloud Vertex AI.

At this year's Google I / O developer conference, chopping wood publicly introduced Gemini, a multimodal model, efficient integration tool, API.

In order to do great things together, Google has also merged Google brain with DeepMind Labs.

It is said that at least 20 executives were involved in the development of Gemini, led by Demis Hassabis, the founder of DeepMind, and Sergey Brin, the founder of Google.

There are also hundreds of employees of Google DeepMind, including Jeff Dean, the former head of Google's brain.

One person who has tested it says Gemini has at least one advantage over GPT-4: in addition to public information on the web, the model uses a lot of proprietary data about Google's consumer products (search, Youtube).

As a result, Gemini should be particularly accurate in understanding the user's intentions for a particular query, and it seems to produce fewer wrong answers, or hallucinations.

According to previous revelations from SemiAnalysis analysts, Google's next-generation big model, Gemini, has begun to train on the new TPUv5 Pod, with computing power as high as 1e26 FLOPS, which is five times more powerful than training GPT-4.

In addition, Gemini's training database contains 93.6 billion minutes of video subtitles on Youtube, and the total number of data sets is about twice the size of GPT-4.

It is said that Google's next-generation big model is also made up of multiple sizes, possibly using MoE architecture and speculative sampling technology. The token is generated in advance by the small model and passed to the large model for evaluation, so as to improve the overall reasoning speed of the model.

Hassabis, head of Google DeepMind, said in an interview that Gemini is expected to cost tens of millions to hundreds of millions of dollars, which is comparable to the cost of developing GPT-4.

Gemini will integrate the technologies used in AlphaGo, which will give the system a new ability to plan and solve problems.

It can be said that Gemini combines some of the advantages of the AlphaGo system with the amazing language capabilities of the large language model. And we have some other interesting innovations.

The technology behind AlphaGo is reinforcement learning, which is pioneered by DeepMind.

RL agents interact with the environment over time and learn strategies through trial and error to maximize long-term cumulative rewards.

Through reinforcement learning, AI can adjust its performance through trial and error and receive feedback, thus learning to deal with thorny problems, such as choosing how to take the next step in go or video games.

In addition, AlphaGo uses the Monte Carlo Tree search (MCTS) method to explore and remember all possible actions on the chessboard.

Compared with existing models, Gemini will greatly improve the code generation ability of software developers, and Google hopes to use it to catch up with Microsoft's GitHub Copilot code assistant.

Google has also discussed using Gemini to implement functions such as chart analysis, such as requiring models to interpret the meaning of completed charts and using text or voice commands to browse web browsers or other software.

Google Cloud developer platform Google Cloud Vertex AI will also be supported by Gemini, both large and small versions, so that developers can pay for small models to run on personal devices.

Now, Google is fully prepared for the war, waiting for Gemini to open the road to counterattack.

When gpt-3.5-turbo-instruct was released in July, OpenAI announced that GPT-4 API was fully available and that a new model would be released in the coming months.

Today, netizens have received emails from the new model of gpt-3.5-turbo-instruct to replace the old model text-davinci-003.

According to reports, gpt-3.5-turbo-instruct is an InstructGPT-style model, and its training method is similar to that of text-davinci-003.

The method of use is similar to the previous Prompt-Completion, which is completed according to the instructions of the prompt.

In terms of price, the gpt-3.5-turbo 4K remains the same.

Some netizens have begun to use the latest model to play chess with around 1800 Elo. He had previously found that GPT could not do this at all, but now it seems that this is only a problem with the RLHF chat model, and the pure Completion model is a success.

In the game, gpt-3.5-turbo-instruct easily beat Stockfish level 4 (1700 points), but still did not lose in the level 5 (2000 points) game.

It never takes illegal moves, uses ingenious opening sacrifices, and an incredible pawn and king's death, allowing opponents to advance meaninglessly.

Netizens use the following PGN-style tips to simulate master games. There is something wrong with the highlight. GPT made his own moves, and he manually entered Stockfish's moves.

By the way, OpenAI's first developer conference, which will be held in November, has already begun to register, so start applying.

Reference:

Https://www.theinformation.com/articles/openai-hustles-to-beat-google-to-launch-multimodal-llm

Https://devday.openai.com/

Https://news.ycombinator.com/item?id=37558911#:~:text=Key%20Features%3A%20Gpt%2D3.5%2D,speed%20as%20our%20turbo%20models.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report