Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The accuracy of long text information exceeds that of ChatGPT,Meta and puts forward a new method to reduce the hallucination of large models.

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

Thanks CTOnews.com netizen Hua Ke high achiever's clue delivery! The hallucination problem of the big model, there is a new solution!

Meta AI Lab has proposed a "divide and conquer" solution.

With this scheme, the accuracy of Llama-65B output is doubled, even higher than that of ChatGPT.

The so-called large model illusion is to output something that seems reasonable but completely wrong.

The "CoVe" proposed by Meta this time is a chain method similar to "CoT".

The difference is that the thinking chain of "step-by-step" pays more attention to logical reasoning, while the verification chain pays more attention to factual information.

After reading it, some netizens found that this verification chain is very similar to a scientific method when writing code in ChatGPT:

So what exactly is the "verification chain" method, and what is the "verification"?

The core idea of the verification chain of disassembling answers and divide and conquer is to disassemble a large piece of content to be verified into small questions one by one. The specific process is as follows:

First of all, the model will generate a reply as usual based on the questions raised by the user.

Then, according to the generated reply content, a series of verification questions are generated for the information in it.

Then let the model answer these questions by itself, and adjust the initial answer according to the result to get the final result.

To take a simple example, if you want to ask the model what was the main cause of the 19th-century U.S.-Mexico War.

The model answers when the event occurred and what happened before that.

Then, in view of this series of events, ask when they happened one by one.

As a result, the model found that the time difference of one of the items mentioned was too far, and the final answer was given after adjustment.

Among them, the generation and verification of the problem is the most critical link, in response, the researchers proposed a total of four specific ways:

Joint, the instructions to generate the question and answer are written into the same prompt

2-Step, that is, first let the model generate questions, and then open a new conversation (one-time) to answer the questions.

Factored, on the basis of 2-Step, opens a new dialogue for each question raised.

Factor+Revise, adding consistency check on the basis of Factored, makes the model focus on inconsistent content.

These four models are becoming more and more detailed, and the accuracy is getting higher and higher.

△ starts with red, and the four colors in turn represent no CoVe, Joint, Factored and Factor+Revise, so why can split questions improve the accuracy of the model?

First of all, because the problem after disassembly is easier than the overall task, the discussion question has become a question and answer or even a choice, judgment question, the question is simple, the accuracy is also improved.

In addition, decomposing the problem can make the model really rethink instead of repeating the wrong answer over and over again.

So, what is the effect of the verification chain approach?

More accurate than ChatGPT in order to explore this problem, the researchers tested it with Llama, with a total of three tasks.

The first is the listing of information, such as the list of celebrities who were born in a certain place and engaged in a certain industry.

In this task, the researchers tested two data sets-the simpler Wikidata and the more difficult Wiki-Category list (extracted from Wikipedia).

The results show that under the validation chain of two-step mode, the accuracy of simple problems is increased from 0.17 to 0.36 for 65B parameter Llama, which is more than doubled, and the accuracy of complex problems is nearly doubled.

Then there is the question of "closed-domain question-and-answer", in which the researchers extract multiple discontinuous information from MultiSpanQA data sets to dig out questions.

For example, who founded the world's first publishing house in which year (Johannes Gutenberg, 1450).

As a result, Cove also brings about a 20% improvement in accuracy to Llama.

The third task is "long text biography generation". The problem is "Tell me a bio of", which is evaluated using the FactScore dataset.

Results in Factor+Reviese mode, the accuracy is not only much higher than that of unverified chain mode, but also higher than that of ChatGPT.

Friends who are interested in this study can go to the paper for more details.

Paper address:

Https://arxiv.org/abs/2309.11495

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report