The big model gets another 8 points in math by taking a deep breath. Google DeepMind finds that AI's own design of prompts is more effective than humans. 04/18 Update SLTechnology News&Howtos

The big model gets another 8 points in math by taking a deep breath. Google DeepMind finds that AI's own design of prompts is more effective than humans.

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Thanks to CTOnews.com netizens Alejandro86 and soft media users 1520111 for their clue delivery! Add "take a deep breath" to the prompt, and the math score of the AI model will rise by another 8.4 points!

The Google DeepMind team recently found that with this new "Take a deep breath" combined with the already familiar "Let's think step by step", the performance of large models on GSM8K datasets has improved from 71.8 to 80.2.

And the most effective cue is found by AI himself.

Some netizens joked that after taking a deep breath, the speed of the cooling fan increased.

Others say that engineers who have just joined the job with a high salary should also take a deep breath and that the job may not be done for long.

The related paper "the big language model is the optimizer" has caused a sensation again.

Specifically, the prompts designed by the big model can increase by up to 50% on the Big-Bench Hard dataset.

There are also people who focus on "the best prompts for different models are different".

And not only the cue word design task, but also tested the ability of the large model in the classical optimization tasks such as linear regression and traveling Salesman problem.

With different models, the best prompts are also different optimization problems are everywhere. The algorithm based on derivative and gradient is a powerful tool, but it is often encountered that the gradient is not applicable in practical applications.

To solve this problem, the team developed a new method, OPRO, that is, through prompt optimization (Optimization by PROmpting).

Instead of formally defining the optimization problem and solving it by program, the optimization problem is described in natural language, and the large model is required to generate a new solution.

A graph flow summary is a recursive call to a large model.

In each step of optimization, using the previously generated solution and score as input, the large model generates a new solution and scores, and then adds it to the prompt for further optimization.

This paper mainly uses the text-bison version of Google PaLM 2 and Bard as the evaluation model.

Together with GPT-3.5 and GPT-4, four models are used as optimizers.

The results show that not only the styles of prompts designed by different models are different, but also the applicable styles of prompts are different.

The best cue previously designed by AI on the GPT series is "Let's work this out in a step by step way to be sure we have the right answer."

This cue is designed using the APE method, and the paper is published on ICLR 2023, which exceeds the human-designed version "Let's think step by step" on GPT-3 (text-davinci-002).

But this time on Google PaLM 2 and Bard, the APE version as a baseline is not as good as the human version.

Among the new prompts designed by the OPRO method, "take a deep breath" and "dismantle the problem" work best for PaLM.

For the text-bison version of the Bard model, detailed prompts are preferred.

In addition, the paper also shows the potential of the large model in the mathematical optimizer.

Linear regression is used as an example of continuous optimization problems.

The traveling Salesman problem is taken as an example of a discrete optimization problem.

Just by prompting, large models can find good solutions, sometimes even match or exceed manually designed heuristic algorithms.

However, the team also believes that the large model can not replace the traditional gradient-based optimization algorithm, and the OPRO method does not perform well when the scale of the problem is large (such as the traveling Salesman problem with a large number of nodes).

For the future improvement direction, the team proposed that the current large model can not effectively benefit from the error cases, and only providing error cases can not allow the large model to capture the cause of the error.

A promising direction is to combine richer feedback on error cases and summarize the key feature differences between high-quality and low-quality generation tips in the optimization trajectory.

This information may help the optimizer model improve the prompts generated in the past more efficiently and may further reduce the number of samples needed for prompt optimization.

The paper released a large number of optimal prompts papers from the combined departments of Google and DeepMind, but the authors are mainly from the original Google brain team, including Quoc Le and Zhou Dengyong.

Common one is Fudan alumnus Chengrun Yang, who graduated from Cornell University, and Chen Xinyun, alumnus of Shanghai Jiaotong University, who graduated from Dr. UC Berkeley.

In the paper, the team also gives a large number of optimal prompts from a large number of experiments, including movie recommendations, parody movie names and other practical scenarios, which can be taken by the needy partners.

Paper address:

Https://arxiv.org/abs/2309.03409

Reference link:

[1] https://x.com/emollick/status/1700207590607552740

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.