After GPT-4 "self-reflection", the ability is greatly increased, and the test performance is improved by 30%. 12/14 Update SLTechnology News&Howtos

After GPT-4 "self-reflection", the ability is greatly increased, and the test performance is improved by 30%.

2025-12-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

CTOnews.com, April 4, OpenAI's latest language model, GPT-4, can not only generate all kinds of text like humans, but also design and execute tests to evaluate and improve its performance. This "reflective" technology has enabled GPT-4 to make significant progress in a number of more difficult tests, with a 30 per cent increase in test performance.

GPT-4 is the most advanced system introduced by OpenAI after GPT, GPT-2 and GPT-3, and it is also the largest multimodal model (can accept image and text input, output text). It uses deep learning technology and artificial neural network to imitate human writing.

Researchers Noah Shinn and Ashwin Gopinath wrote in the paper: "We have developed a novel technology that allows AI agents to simulate human self-reflection and evaluate their performance. When GPT-4 completes various tests, it adds additional steps to allow it to design its own tests to check its own answers, identify errors and deficiencies, and then modify its solution based on its findings. "

In the HumanEval coding test, GPT-4 uses the self-reflection loop, and the accuracy increases from 67% to 88%.

GPT-4 can criticize its own performance by designing and executing tests, as shown by the AlfWorld test results, which can greatly improve its performance. The research team used this technique to conduct several different performance tests on GPT-4. In the HumanEval test, GPT-4 needed to solve 164unprecedented Python programming problems, with an accuracy of 67%. After using reflection technology, the accuracy was improved to 88%. In Alfworld testing, AI needs to make decisions and solve multi-step tasks by performing some permissible actions in a variety of interactive environments. After using reflective technology, the accuracy of GPT-4 increased from 73% to 97%, and only 4 tasks failed. In the HotPotQA test, GPT-4 had access to Wikipedia and answered 100 questions that required parsing content and reasoning from multiple supporting documents, with an accuracy of 34 per cent, but 54 per cent using reflection technology.

This study shows that the solution to the AI problem sometimes depends on AI itself. CTOnews.com found that this is a bit like building an adversarial network, which is a way for two AI to improve each other's skills, such as one AI trying to generate pictures that look like real pictures, and the other AI trying to tell which is fake and which is real. But in this case, GPT is both a writer and an editor, improving the quality of his output through self-reflection.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Weibo

Tencent

Renren

QQZone

Douban

Weibo

Tencent

Renren

QQZone

Douban

Yixin

The market share of Chrome browser on the desktop has exceeded 70%

The market share of Chrome browser on the desktop has exceeded 70%, and users are complaining about

2025-09-03 14:52:50 SL Technology News Views: 18
The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.

The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.According to a r

2025-09-03 14:07:30 SL Technology News Views: 25
Disney Agrees to Pay $10 Million to Settle with FTC over Alleged Child Data Collection Using YouTube Animations

On September 3, it was reported that Disney has agreed to pay $10 million to settle a case in which

2025-09-03 14:03:30 SL Technology News Views: 28
Google Wins! Court Rules It Doesn't Have to Sell Chrome Browser

A US federal judge has ruled that Google can keep its Chrome browser, but it will be prohibited from

2025-09-03 13:41:31 SL Technology News Views: 23
Build zoopker+hbase environment

Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope

2023-12-25 21:17:29 shulou Views: 382

IT Information

More IT Information >

After GPT-4 "self-reflection", the ability is greatly increased, and the test performance is improved by 30%.

Related

The market share of Chrome browser on the desktop has exceeded 70%

The world's first 2nm mobile chip: Samsung Exynos 2600 is ready for mass production.

Disney Agrees to Pay $10 Million to Settle with FTC over Alleged Child Data Collection Using YouTube Animations

Google Wins! Court Rules It Doesn't Have to Sell Chrome Browser

Build zoopker+hbase environment

IT Information

Lenovo ThinkPad X1 Carbon 2024 notebooks on the shelves: new look + Core Ultra processor

British sports car manufacturers hold hands with American "new car-building power", and Aston Martin will work with Lucid to produce electric cars.

Cainiao formally submitted a listing application to the Hong Kong Stock Exchange, ranking first in the world in cross-border e-commerce logistics.

It is reported that LG display will strengthen the automobile panel business, and the market size of automobile OLED panel is expected to increase greatly.

It is reported that Samsung will launch a 70% discount folding screen phone Tri-Fold, which is expected to be released by the end of the year.

Latest Network Security More Network Security >

Latest Internet Technology More Internet Technology >

Latest Development More Development >

Latest Database More Database >

Latest Servers More Servers >

Latest Mobile Phone More Mobile Phone >

Latest Android Software More Android Software >

Latest Apple Software More Apple Software >

Latest Computer Software News More Computer Software News >

Latest IT Information More IT Information >