No, think step by step! Google's latest natural language reasoning algorithm LAMBADA: "reverse chain reasoning" is the answer 07/15 Update SLTechnology News&Howtos

No, think step by step! Google's latest natural language reasoning algorithm LAMBADA: "reverse chain reasoning" is the answer

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Google released a new reverse reasoning algorithm LAMBADA, not afraid of the explosion of search space!

Automatic reasoning is definitely a difficult problem in the field of natural language processing. The model needs to deduce effective and correct conclusions according to the given premise and knowledge.

Although the NLP field has achieved high performance in various "natural language comprehension" tasks such as reading comprehension and question answering through large-scale pre-training of language models in recent years, the performance of these models in logical reasoning still lags behind.

When the "Chain of Thought, CoT" (Chain of Thought, CoT) came out in May last year, some researchers found that only adding "Let's think step by step" to the prompt could greatly improve the reasoning performance of GPT-3. For example, in MultiArith, the reasoning accuracy increased from 17.7% to 78.7%.

However, methods such as CoT and Selection Inference search the proof process (proof) from the axiom (axioms) to derive the final conclusion (conclusion) in the way of forward (forward direction). There is the problem of combinatorial explosion of search space, so the failure rate is higher for longer reasoning chain.

Recently, Google Research developed an inverse chain (Backward Chaining) algorithm LAMBADA (LAnguage Model augmented BAckwarD chAining), which applies the conclusion in the classical reasoning literature that "the efficiency of reverse reasoning is obviously higher than that of forward reasoning" to the language model (LM).

Paper link: https://arxiv.org/ abs / 2212.13894LAMBADA divides the reasoning process into four sub-modules, each of which is implemented by few-shot prompted language model reasoning.

In the end, compared with the current forward reasoning method of sota, LAMBADA achieves a significant performance improvement on two logical reasoning data sets, especially when the problem requires depth and accurate proof chain, the performance improvement of LAMBADA is more obvious.

"reverse reasoning" into a version of the answer? Logical reasoning, especially for unstructured natural texts, is not only the basic component of automatic knowledge discovery, but also the key to progress in various scientific fields in the future.

Although the development of many NLP tasks benefits from the expanding scale of the pre-training language model, it is observed that the improvement of the size of the model is very limited in solving complex reasoning problems.

In classical literature, there are two main methods of logical reasoning:

1. Forward chained reasoning (Forward Chaining, FC), that is, starting from facts and rules, iterating between making new reasoning and adding it to the theory, until the goal statement can be proved or disproved.

2. Backward chain reasoning (Backward Chaining, BC), that is, starting from the goal, it is recursively decomposed into sub-goals, until the sub-goals can be proved or overturned according to the facts.

In the past, most of the reasoning methods using language model adopt the idea of forward chain reasoning, which requires selecting a subset of facts and rules from the whole set, which may be difficult for LM because it needs to search in a large space.

In addition, deciding when to stop searching and declare a proof of failure is also very difficult in FC, sometimes even requiring a module dedicated to training intermediate tags.

In fact, the classical automatic reasoning literature focuses on backward chain reasoning or goal-oriented verification strategies to a large extent.

LAMBADALAMBADA means "language model enhanced by reverse chain technology". Through experiments, researchers have proved that BC is more suitable for text-based deductive logic reasoning (deductive logical reasoning).

BC does not require a large number of combined searches to select a subset, and has a more natural stop search criteria (halting criteria).

LAMBADA focuses on automatic reasoning of facts, that is, natural language assertions, such as "good people are red", which are coherent, but not necessarily based on reality.

A rule is written by a natural language declaration and can be formally rewritten to "if P then Q", for example, "rough good people are Rough, nice people are red" (If a person is rough and nice, then they are red) can be rewritten to "if a person is rough good, then they are red".

Where P is called the antecedent of the rule (antecedent) and Q is called the consequent of the rule.

A theory theory C consists of facts F = {F1, f2,. . , fn} and rule R = {R1, R2,. . , rm}, G represents a goal that you want to prove or refute according to facts and rules.

Example 1. A theoretical example C with fictional roles and rules

F = {"Fiona is a good person", "Fiona is a rough person"}

R = {"if someone is smart, then he is a good man", "a rough good man is red", "being a good person and red means he is round".

Based on the above theory, people may want to prove or refute a target, such as "Fiona is red?" ".

Backward chain reasoning determines whether a rule applies to a goal through an operation called unification in logic.

For example, for the target in example 1, "Fiona is red?" The second rule has the same consequences as the goal, so it can be applied, but the other two rules have different consequences, so they do not apply.

Considering the theory and goal in example 1, BC from the goal "Fiona is red? "start reasoning.

First, BC verifies whether the target can be proved or refuted from any fact. Since there are no facts to prove or refute this goal, it will then be verified whether the goal is consistent with the results of any rules, and it is found that it is consistent with the second rule, "rough good people are red."

Therefore, the goal can be broken down into two sub-goals: 1) is Fiona rough? And 2) is Fiona a good person?

Since these two sub-goals can be proved by facts, BC's conclusion is that the original goals can be proved.

For a goal, the result of BC is either proof, negation or ignorance (for example, the goal "Fiona is smart?" ")

Language model in LAMBADA in order to use BC for text-based reasoning, researchers have introduced four LM-based modules: fact checking (Fact Check), rule selection (Rule Selection), target decomposition (Goal Decomposition) and symbol consistency (Sign Agreement).

The fact check gives a set of facts F and a goal G in the theory, and the fact check module verifies whether there is a fact f ∈ F such that f contains G (in this case, the goal is proved) or f contains the negation of G (in this case, the goal is denied).

If such a fact cannot be found, then the truth of G is still unknown.

The realization of fact checking includes two sub-modules: the first sub-module selects a fact from the fact set most relevant to the target, and the second sub-module verifies whether the target can be proved or denied according to this fact.

Because the fact selection submodule may not be able to determine the best facts on the first attempt, if the truth of the target is still unknown after one round of invocation of the submodule, you can delete the selected facts and then invoke the submodule again; this process can be repeated multiple times.

Rule selection gives a set of rules R and a goal G in the theory. The rule selection module determines the rule r ∈ R to unify the results of r with G, and then uses these rules to decompose the goal into sub-goals.

If such a rule cannot be determined, then the truth of G is still unknown.

Rule selection also includes two sub-modules: the first sub-module determines the result of each rule (independent of the goal), and the second sub-module takes the result and goal of the rule as input and determines which one is unified with the goal.

It should be noted that due to the recursive nature of BC, the rule selection module may be called multiple times in the process of proving a target. Since the result of identifying each rule is independent of the target, this sub-module only needs to be called once.

The goal decomposition gives a rule r and a goal G, so that the result of r is unified with G, and the goal decomposition module determines the sub-goals that need to be proved so that G can be proved or denied.

In the case of successfully proving the preceding term of r, whether the goal is proved or denied depends on whether the symbol of the target (sign) is consistent with the result symbol of r.

For example, for the target "Fiona is red? Since the symbol of the target is consistent with the result symbol of the second rule, and the preceding item of the rule is proved, it can be concluded that the goal is proved.

Symbol consistency given a rule r and a target G, the symbol consistency module verifies whether the result symbol of r is consistent or inconsistent with the symbol of the target.

In the experimental part, the researchers choose Chain of Thought (CoT), sota neural reasoning method based on explicit reasoning and sota modular reasoning method Selection Inference (SI) as the comparative baseline model.

The experimental data sets are ProofWriter and PrOntoQA, which are challenging to LM reasoning, including examples that need to prove that the chain length is up to 5 hops, and examples that the target can neither be proved nor refuted from the theory provided.

The experimental results show that LAMBADA is significantly better than the other two baselines, especially on the ProofWriter-PUD dataset containing UNKNOWN tags (44% improvement compared to CoT and 56% improvement compared to SI in depth-5), and in the higher depth of PrOntoQA (37% improvement compared to CoT and 113% improvement compared to SI in depth-5).

These results show the advantages of LAMBADA in logical reasoning, and also show that backward chain (the backbone of reasoning in LAMBADA) may be a better choice than forward chain (backbone in SI).

These results also reveal a flaw in the CoT approach when dealing with UNKNOWN tags: unlike the example where the tag is PROVED or DISPROVED, there is no natural chain of thinking for the case where the tag is UNKNOWN.

For the problem of deeper (3 +) proof chain, the prediction generated by SI is close to that of most kinds of predictions on the three data sets.

It can be found that in the case of binary, it tends to over-predict DISPROVED; in the case of ternary classification, it tends to over-predict UNKNOWN, which makes its performance in depth-5 of PrOntoQA even worse than that of most classes, because there are more PROVED tags in that depth than DISPROVED.

However, the researchers are also surprised to find that the performance of CoT for ProofWriterPD data sets is still relatively high, and the accuracy has not decreased.

In a word, LAMBADA has higher reasoning accuracy on these data sets. Compared with other technologies that use false proof traces to find correct conclusions, LAMBADA is more likely to produce effective reasoning chains, and it is also more efficient than other modular reasoning methods based on LM.

The researchers say the results strongly suggest that future work on reasoning with LM should include backward chain or goal-oriented strategies.

Reference:

Https://arxiv.org/abs/2212.13894

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era), editor: LRS

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.