OpenAI CEO: GPT-4 is not perfect but it is absolutely different. 04/19 Update SLTechnology News&Howtos

OpenAI CEO: GPT-4 is not perfect but it is absolutely different.

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

March 16 news, artificial intelligence research company OpenAI yesterday released the much-anticipated text generation AI model GPT-4. Greg Brockman, co-founder and president of OpenAI, said in an interview that GPT-4 is not perfect, but it is absolutely different.

GPT-4 builds on its predecessor, GPT-3, in many key areas, such as providing more truthful statements and making it easier for developers to control their style and behavior. In a sense, GPT-4 is also multimodal because it can understand images, annotate photos, and even describe the contents of photos in detail.

But GPT-4 also has serious flaws. Like GPT-3, the model produces "hallucinations" (that is, the text aggregated by the model is irrelevant or inaccurate to the source text) and makes basic reasoning errors. OpenAI gives an example on his blog. GPT-4 describes Elvis Presley (Elvis Presley) as "the son of an actor," but neither of his parents is an actor.

When asked to compare GPT-4 with GPT-3, Brockmann gave only a two-word answer: different. "the GPT-4 is absolutely different, although it still has a lot of problems and mistakes," he explains. "but you can see a jump in its skills in subjects such as calculus or law. It used to perform badly in some areas, but now it has reached a level that exceeds that of ordinary people."

The test results support Brockmann's view. In the calculus exam of the college entrance examination, GPT-4 got 4 (out of 5), GPT-3 got 1, and GPT-3.5 between GPT-3 and GPT-4 also got 4. In the mock bar exam, the GPT-4 score entered the top 10%, while the GPT-3.5 score hovered around the bottom 10%.

At the same time, the more interesting thing about GPT-4 is the multi-mode mentioned above. Unlike GPT-3 and GPT-3.5, they can only accept text prompts, such as asking to "write an article about giraffes", while GPT-4 can accept both image and text prompts to perform certain operations, such as identifying giraffe images taken in Serengeti and giving basic content descriptions.

This is because GPT-4 is trained for image and text data, while its predecessor trained only for text. OpenAI said the training data came from "a variety of legally authorized and publicly available data sources, which may include publicly available personal information", but Brockmann declined when asked for details. The training data have got OpenAI into legal disputes before.

GPT-4 's ability to understand images is quite impressive. For example, enter the prompt "what's so funny about this picture? GPT-4 will break down the whole picture and correctly explain the joke.

Currently, only one partner can use GPT-4 's image analysis feature, an auxiliary app for the visually impaired called Be My Eyes. Brockmann said that any time the OpenAI assesses the risks and pros and cons, the wider promotion will be "slow and intentional."

"some policy issues also need to be addressed, such as facial recognition and how to deal with people's images," he said. we need to find out where the danger area is, where the red line is, and then find a solution over time. "

OpenAI has encountered similar ethical dilemmas in its text-to-image conversion system Dall-E 2. After initially disabling this feature, OpenAI allows customers to upload faces for editing using an image generation system supported by AI. At the time, OpenAI claimed that upgrades to its security system made facial editing possible because it minimized the potential harm of deep fraud and attempts to create erotic, political and violent content.

Another long-term problem is to prevent GPT-4 from being inadvertently used in ways that could cause harm. Hours after the model was released, Adversa AI, an Israeli cyber security start-up, posted a blog post demonstrating ways to bypass OpenAI's content filters and have GPT-4 generate phishing emails, offensive descriptions of homosexuals and other objectionable texts.

This is not a new problem in the field of language models. BlenderBot, the chat robot of Meta, the parent company of Facebook, and ChatGPT of OpenAI have also been tempted to output inappropriate content, even revealing sensitive details of their internal work. But many, including journalists, had hoped that GPT-4 might bring significant improvements in this area.

When asked about the robustness of GPT-4, Brockmann stressed that the model had undergone six months of safety training. In internal tests, it is 82 per cent less likely to respond to requests for content that are not allowed by OpenAI usage policies than GPT-3.5, and 40 per cent more likely to produce a "factual" response than GPT-3.5.

"We spent a lot of time trying to understand the capabilities of GPT-4," Brockmann said. "We are constantly updating, including a series of improvements, so that the model is more scalable to suit the personality or model that people want it to have."

Frankly, the early reality test results are not so satisfactory. In addition to Adversa AI testing, Microsoft's chat robot Bing Chat has also proved to be very easy to break out of prison. Using well-designed input, users can get the chatbot to express love, threaten harm, defend the Holocaust and invent conspiracy theories.

Brockmann does not deny the shortcomings of GPT-4 in this respect, but he highlights the new limiting tools of the model, including API-level features known as "system" messages. System messages are essentially instructions that set the tone and set boundaries for GPT-4 interactions. For example, a system message might read: "you are a mentor who always answers questions in Socratic style." you never give students answers, but always try to ask the right questions to help them learn to think independently. "

The idea is that system messages act as guardrails to prevent GPT-4 from getting off track. "it's always been a big concern for us to really understand the tone, style and nature of GPT-4," Brockmann said. "I think we're starting to learn more about how to design engineering and how to have a repeatable process that allows you to get predictable results that are really useful to people."

Brockmann also mentioned Evals, OpenAI's latest open source software framework to evaluate the performance of its AI model, which is a sign of OpenAI's commitment to "enhance" its model. Evals allows users to develop and run benchmarks for evaluation models, such as GPT-4, while checking their performance, which is a crowdsourced approach to model testing.

"with Evals, we can better see and test use cases that users care about," Brockmann said. "part of the reason we open up this framework is that we no longer release a new model every three months for continuous improvement. You don't make things you can't measure, do you? but as we introduce a new version of the model, we can at least know what has changed."

Brockmann was also asked whether OpenAI would compensate people for testing its model with Evals. He would not make a commitment, but he did point out that OpenAI allows Eevals users who apply to access GPT-4 API in advance for a limited period of time.

Brockmann also talked about GPT-4 's context window, which refers to text that the model can consider before generating extra text. OpenAI is testing a version of GPT-4 that can "remember" about 50 pages, five times the "memory" of a normal GPT-4 and eight times the size of GPT-3.

Brockmann believes that extended context windows lead to new use cases that have never been explored before, especially in the enterprise. He envisioned an AI chat robot for the company that could use background and knowledge from different sources, including employees in various departments, to answer questions in a very expert but conversational way.

This is not a new concept. But Brockmann believes that GPT-4 's answers will be much more useful than those currently provided by other chatbots and search engines. "in the past, the model had no idea who you were, what you were interested in, and so on," he said. and having a larger context window will certainly make it stronger, thus greatly enhancing the support it can provide for people. "

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.