Yao Qizhi took the lead in putting forward the big model "thinking" framework! The correct rate of logical reasoning is 98%, and the way of thinking is more like human beings. 09/21 Update SLTechnology News&Howtos

Yao Qizhi took the lead in putting forward the big model "thinking" framework! The correct rate of logical reasoning is 98%, and the way of thinking is more like human beings.

2025-09-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Here comes the first big language model paper led by Turing Award winner Yao Qizhi.

As soon as I shoot, I aim at the direction of "making the big model think like a person"--

Not only let large models reason step by step, but also let them learn to "work step by step" and remember all the correct processes in the process of reasoning.

Specifically, this new paper proposes a new method called cumulative reasoning (Cumulative Reasoning), which significantly improves the ability of large models to do complex reasoning.

You know, the large model is based on the thinking chain, and can be used for problem reasoning, but it is still easy to make mistakes in the face of the problem of "turning several turns".

It is on this basis that cumulative reasoning adds a "verifier" to judge right or wrong in time. The thinking framework of this model has also changed from chain and tree to a more complex "directed acyclic graph".

In this way, the big model not only has a clearer idea of solving problems, but also gives birth to a skill of "playing cards":

In mathematical problems such as algebra and geometric number theory, the relative accuracy of large models increased by 42 per cent, and the success rate soared to 98 per cent at 24:00.

According to the Institute of Cross Information of Tsinghua University, Zhang Yifan, a co-author, explained the starting point of this paper:

Kahneman believes that human cognitive processing includes two systems: "system 1" is fast, instinctive and emotional, and "system 2" is slow, thoughtful and logical.

At present, the performance of the large language model is closer to that of system 1, which may be the reason why it is not good at dealing with complex tasks.

Cumulative reasoning designed from this point of view works better than thought chains (CoT) and thought trees (ToT). So what does this new method look like? Let's unfold it together.

Breaking through the thinking chain & the core of cumulative reasoning of tree "bottleneck" lies in improving the "shape" of the thinking process of large models.

Specifically, this approach uses three large language models:

Proponent (Proposer): keep coming up with new propositions, that is, suggest what to do next based on the current thinking context.

Verifier: check the proposition accuracy of the proponent and, if correct, add it to the thinking context.

Reporter: determine whether a final solution has been obtained to determine whether the reasoning process is over.

In the reasoning process, the "proponent" gives the proposal, the "verifier" is responsible for the evaluation, and the "reporter" decides whether to finalize the answer and stop the thinking process.

The ▲ CR reasoning example is a bit like three types of roles in a team project: the team members brainstorm various idea, the instructor "checks" to see which idea works, and the team leader decides when to complete the project.

So, how on earth does this method change the "shape" of large model thinking?

To understand this, we have to start with the "ancestor" thinking chain (Chain of Thought,CoT), the strengthening method of large model thinking.

This method was proposed by OpenAI scientist Jason Wei and others in January 2022. The core is to add a piece of "step-by-step reasoning" text to the input in the data set to stimulate the thinking ability of large models.

▲ is selected from the GSM8K data set based on the thought chain principle. Google has also quickly followed up a "thinking chain PLUS version", or CoT-SC, which mainly carries out multiple thought chain processes and carries out a majority vote (majority vote) to select the best answer, which further improves the accuracy of reasoning.

However, both the chain of thinking and CoT-SC have ignored a problem: there is more than one solution to the problem, especially for human beings.

As a result, there was a new study called Tree of Thought,ToT.

This is a tree retrieval scheme, which allows the model to try a variety of different reasoning ideas, self-evaluation, choice of the next action plan, and retrospective selection if necessary.

From the method, we can see that the thinking tree goes further than the thinking chain, which makes the thinking of the big model "more active". That's why when you play 24:00, the GPT-4 success rate of the mind chain bonus is only 4%, but the mind tree success rate soars to 74%.

BUT, regardless of the chain of thought, CoT-SC, or tree of thought, has a common limitation:

None of them set the storage location of the intermediate results in the thought process.

After all, not all thought processes can be made into chains or trees, and the way humans think about things is often more complicated.

This new framework of cumulative reasoning breaks through this point in design--

The whole thinking process of a large model is not necessarily a chain or a tree, it can also be a directed acyclic graph (DAG)! (well, it smells like synapses.)

The edges in the ▲ graph have a direction and there is no circular path; each directed edge is a derivation step, which means that it can store all the historically correct reasoning results in memory so that it can be explored in the current search branch. (by contrast, the mind tree does not store information from other branches.)

But cumulative reasoning can also switch seamlessly with the chain of thought-as long as the "verifier" is removed, it is a standard chain of thought mode.

The cumulative reasoning based on this method has achieved good results in various methods.

Good at math and logical reasoning, the researchers selected FOLIO wiki and AutoTNLI, 24:00 games, and MATH data sets to "test" cumulative reasoning.

Proponents, verifiers and reporters use the same large language model in each experiment and use different prompt to set roles.

The basic models used for experiments here are GPT-3.5-turbo, GPT-4, LLaMA-13B and LLaMA-65B.

It is worth mentioning that, ideally, the relevant derivation task data should be used for special pre-training model, and the "verifier" should also add formal mathematical proving device, propositional logic solver module and so on.

1. Logical reasoning ability FOLIO is a first-order logical reasoning data set, the label of the question can be "true", "False", "Unknown", and AutoTNLI is a high-order logical reasoning data set.

On FOLIO wiki data sets, compared with direct output results (Direct), chain of thought (CoT) and advanced chain of thought (CoT-SC), cumulative reasoning (CR) always performs best.

After deleting the problematic instances in the dataset (such as incorrect answers), the GPT-4 reasoning accuracy using the CR method reaches 98.04%, and has a minimum error rate of 1.96%.

Let's take a look at the performance on the AutoTNLI dataset:

Compared with CoT method, CR significantly improves the performance of LLaMA-13B and LLaMA-65B.

In the LLaMA-65B model, CR has an improvement of 9.3% compared with CoT.

2. The ability to play the 24:00 game ToT originally used the 24:00 game in the paper, so here the researchers use this data set to compare CR and ToT.

ToT uses search trees with fixed width and depth, and CR allows large models to determine the search depth independently.

In the experiment, the researchers found that the CR algorithm and the ToT algorithm were very similar in the 24:00 context. The difference is that the algorithm in CR generates at most one new state per iteration, while ToT generates many candidate states in each iteration, and filters and retains some of the states.

Generally speaking, ToT does not have the "verifier" of CR mentioned above, and cannot determine whether the status (a, b, c) is correct or wrong, so ToT will explore more invalid states than CR.

In the end, the correct rate of the CR method can even reach 98% (the ToT is 74%), and the average number of access states is much less than that of ToT.

In other words, CR not only has higher search accuracy, but also has higher search efficiency.

3. Mathematical ability MATH data set contains a large number of mathematical reasoning questions, including algebra, geometry, number theory, etc., the difficulty of which is divided into five levels.

By using the CR method, the model can decompose the problem into sub-questions that can be completed well, and ask and answer themselves until the answer is produced.

The experimental results show that under two different experimental settings, the correct rate of CR exceeds the existing methods, and the overall correct rate can reach 58%. In the problem of Level 5, it has achieved a relative accuracy improvement of 42%, and won the new SOTA under the GPT-4 model.

This paper is from the AI for Math research group led by Yao Qizhi and Yuan Yang of Tsinghua Cross-Information Institute.

The co-first authors of the paper are Zhang Yifan and Yang Jingqin, doctoral students of the Institute of Cross-Information in 2021.

The instructor and the author of the joint newsletter are Assistant Professor Yuan Yang and Academician Yao Qizhi.

Zhang Yifan graduated from Yuanpei College of Peking University in 2021 and is now an assistant professor from Yuan Yang. His main research interests are the theories and algorithms of basic models (large language models), self-supervised learning, and trusted artificial intelligence.

Yang Jingqin received his bachelor's degree from the Institute of Cross-Information of Tsinghua University in 2021 and is currently studying for a doctorate from Assistant Professor Yuan Yang. The main research interests include large language model, self-supervised learning, intelligent medicine and so on.

Yuan Yang is an assistant professor at the School of Cross Information, Tsinghua University. He graduated from the Department of computer Science from Peking University in 2018, received a doctorate in computer Science from Cornell University in 2018, and worked as a postdoctoral fellow at big data Institute of Science at Massachusetts Institute of Technology before 2019.

His main research interests are intelligent medicine, AI basic theory, applied category theory and so on.

Yao Qizhi is an academician of the Chinese Academy of Sciences and president of the Institute of Cross Information of Tsinghua University. He is also the first Asian scholar to win the Turing Award and the only Chinese computer scientist to win the award so far.

Professor Yao Qizhi resigned from Princeton in 2004 and returned to Tsinghua University. In 2005, he founded Yao Class, a computer science experimental class for Tsinghua undergraduates. In 2011, he founded "Tsinghua Quantum Information Center" and "Cross Information Research Institute". In 2019, he established the artificial Intelligence School for Tsinghua undergraduates, referred to as "Intelligence Class".

Today, the Institute of Cross Information of Tsinghua University, which he leads, has long been well-known, and Yao Class and Zhiban are affiliated to the Institute of Cross Information.

Professor Yao Qizhi is an international pioneer and authority in algorithms, cryptography, quantum computing and so on. Recently, he appeared at the 2023 World artificial Intelligence Conference, and the Shanghai Institute of Intelligence, led by him, is currently working on "embodied general artificial intelligence."

Links to papers:

Https://arxiv.org/abs/2308.04371

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.