Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

OpenAI develops new tools to try to explain the behavior of language models

2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

CTOnews.com May 10 message, language model is an artificial intelligence technology that can generate natural language based on a given text. OpenAI's GPT series of language models are one of the most advanced representatives today, but CTOnews.com notes that they also have a problem: their behavior is difficult to understand and predict. To make the language model more transparent and credible, OpenAI is developing a new tool that automatically identifies which parts of the language model are responsible for their behavior and interprets them in natural language.

The principle of this tool is to use another language model (that is, OpenAI's latest GPT-4) to analyze the internal structure of other language models, such as OpenAI's own GPT-2. The language model consists of many "neurons", each of which can observe a particular pattern in the text and affect the next output of the model. For example, give a question about superheroes (such as "which superheroes have the most useful superpowers?" ), a "Marvel superhero neuron" may increase the probability that the model mentions specific superheroes in Marvel movies.

OpenAI's tool is to use this mechanism to decompose various parts of the model. First, it inputs the text sequence into the evaluated model and waits for a neuron to be "activated" frequently. It then "shows" these highly active neurons to GPT-4 and asks GPT-4 to generate an explanation. To determine the accuracy of the interpretation, it provides GPT-4 with some text sequences and asks it to predict or simulate the behavior of neurons. It then compares the behavior of the simulated neurons with that of the actual neurons.

"in this way, we can basically generate some preliminary natural language explanations for each neuron, and there is also a score to measure how well these explanations match the actual behavior." Jeff Wu, head of the OpenAI extensible alignment team, said, "We use GPT-4 as part of the process to generate explanations of what the neuron is looking for and to assess how well those explanations match what it actually does."

The researchers were able to generate explanations for all 307200 neurons in GPT-2, compile them into a dataset, and distribute them as open source on GitHub along with the tool code. Tools like this may one day be used to improve the performance of language models, such as reducing bias or harmful speech. But they also admit that there is still a long way to go before it is really useful. The tool is confident about the explanation of about 1000 neurons, which is only a fraction of the total.

One might think that this tool is actually an advertisement for GPT-4 because it requires GPT-4 to run. But Wu says this is not the purpose of the tool, its use of GPT-4 is "accidental" and, on the contrary, it shows GPT-4 's weakness in this area. He added that it was not created for commercial applications and could theoretically be adapted to other language models other than GPT-4.

"most explanations have low scores or do not explain too much of the behavior of actual neurons." "it's hard to tell how many neurons move-for example, they are activated on five or six different things, but there are no obvious patterns," Wu said. "sometimes there are obvious patterns, but GPT-4 can't find them."

Not to mention more complex, newer, larger models, or models that can browse the web for information. But for the latter, Wu believes that browsing the web will not change the basic mechanism of the tool too much. He says it only needs a little adjustment to figure out why neurons decide to make certain search engine queries or visit specific websites.

"We hope that this will open up a promising way to solve interpretable problems in an automated way so that others can build on it and contribute." "We hope we can really have a good explanation for the behavior of these models," Wu said.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report