Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

UC Berkeley found a surprising flaw in GPT-4: children learn cause and effect from experience, but LLM does not.

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)12/24 Report--

Xin Zhiyuan reports

Editor: la Yan

Recently, a study by UC Berkeley revealed an important cognitive difference between LLM and children-the ability to create new causal structures.

Have you ever thought about the question, what's the difference between LLM and children?

You might say that LLM has so many training data sets, after so many fine-tuning, why not kill the kids in all directions?

However, a recent paper by UC Berkeley shows that LLM lacks a very important ability compared with children.

That is the ability to learn causal structure from experience.

Of course, researchers are not without tricks, and RLHF can solve this problem to some extent. But the logic of the solution is completely different from that of children's learning.

LeCun also forwarded the study, with the article "things that kids can do but LLM can't." "

First of all, we know that discussions about large language models and language and visual models mainly focus on whether these models are agents or not.

Researchers at UC Berkeley put forward a different point of view.

They believe that these AI models are efficient and powerful imitation engines.

Then they explored what AI models can inspire researchers in terms of imitation and innovation by testing whether these AI models can discover new tools and novel causal structures, and comparing their responses to instructions with human children.

Many people say that these LLM are just one agent after another. How smart it is that pictures, text, anything can be generated.

They even imply this anthropomorphic compliment in their colloquial expression-an AI, just as we say a person.

Researchers at UC Berkeley think it's wrong to think so.

LLM is like the technologies we have seen in history, such as writing, printing, libraries, the Internet, and even the language itself.

Large languages and visual models provide a new way for us to easily and effectively access large amounts of text written by others and images generated by others.

In other words, these AI systems provide a new means for cultural production and evolution, allowing information to be transmitted efficiently between different groups. They summarize a large amount of information previously generated by human agents and extract patterns from it.

Therefore, it is not anthropomorphic.

This contrasts with the perception and action systems that intervene in the outside world and generate information about it. In other words, the human model.

It should be noted here that this comparison is not limited to the perception and action system itself, but also includes the causality embodied in scientific or intuitive theory. They relate to the outside world and predict and influence actions in that world.

At the same time, the new evidence obtained from the outside world can fundamentally modify the previous causality.

Of course, these cognitive processes of seeking truth are also the basis of some AI systems. For example, reinforcement learning systems, especially model-based systems, can be understood as systems that take action in the world to solve problems similar to inverse problems.

They accumulate data to model the world, thus achieving extensive and novel generalization. This is particularly true in the field of robotics, where systems come into contact with the outside world, change their models, and allow new actions and generalizations, albeit to a limited extent.

Similarly, some AI methods have integrated causal inference and theory formation into their learning mechanisms to design systems that are more human-like.

However, these systems are significantly different from the relatively simple, large-scale language and visual models that we are familiar with and rely on a large amount of existing data.

The cognitive process of seeking truth and the process of faithfully delivering representations (representation) will always be antagonistic, regardless of the relationship between these representations and the outside world. This kind of transmission is very important for the ability of language learning and social coordination.

At present, researchers have a lot of evidence that this faithful transmission mechanism already exists in the early development and plays a particularly important role in human cognition and culture.

However, these mechanisms may also have some subtle relationship with the causal inference and theoretical formation mechanism of seeking truth, and the reasons may be good or bad.

For example, in the case of "overimitation", human children (and adults) reproduce all the details that have occurred in a complex sequence of actions, even if they are not causal to the outcome of the action.

Excessive imitation may increase the fidelity and efficiency of complex action transmission. However, it also means that the transmission is not rooted in the causal understanding changed by environmental changes. There is also evidence that children accept other people's views of the outside world without critical thinking, and change their views only when they encounter the different views of another person.

This resonates quite well. For example, children start with a blank piece of paper, draw what is, and only when they have new knowledge will they cover the original colors.

The researchers believe that large language models strongly promote this type of transmission by summarizing and generalizing from existing texts.

However, in their training process, or in the objective function, they do not design any cognitive function to perform the system of seeking truth, such as perception, causal inference, theory formation and so on.

Even for the most advanced LLM, their output prediction probabilities do not distinguish between cognitive uncertainty (epistemic uncertainty) (which is actually related to the lack of knowledge and can be solved by more training data) and accidental uncertainty (aleatoric uncertainty).

This brings about the problem of "hallucinations".

This contrast between transmission and objective truth is closely related to the comparison of imitation / innovation in the evolution of human culture. The evolution of culture depends on the balance between these two different cognitive mechanisms, while imitation allows knowledge or skills to be transferred from one person to another, while innovation generates new knowledge or skills through contact with the ever-changing world.

In short, imitation means that each individual does not have to innovate-they can directly take advantage of the cognition of others. But if some individuals do not have the ability to innovate, imitation itself will be useless. That is to say, it is the combination of innovation and imitation that can achieve cultural and technological progress.

Of course, imitation and transmission may also involve some kinds of generalization and novelty. LLM produces similar generalizations, sometimes from known actions, to produce some kind of innovation.

However, to output innovation sufficient to deal with new problems and environments, LLM needs to go beyond the information obtained and what is inferred from that given information. These inferences may start from existing causal models to generate new causality that is very different from previously observed causality, or it may stimulate new exploration of the outside world.

From the perspective of artificial intelligence, imitation involves a kind of interpolation generalization, that is, skills and knowledge can be used, simulated and shared in a variety of contexts.

Innovation, on the other hand, reflects a more extrapolated, or out-of-distribution, generalization.

However, in any given case, it is not easy to determine which cognitive mechanism produces a particular type of representation or behavior, knowledge or skill.

If LLM, which is only trained in internal language statistics, can replicate specific capabilities, such as generating grammatically correct text in response to prompt, this suggests that such capabilities can be developed through imitation. But if not, it means that these capabilities may require innovation, that is, extracting knowledge from the outside world.

As a result, LLM and large visual models provide an opportunity for researchers to discover which abilities need to be imitated and which require innovation. This is also a long-standing problem in cognitive science.

LLM V.S Child researchers compared the performance of LLM models trained with large amounts of text data or text and image data to the performance of children (strange to say, ).

The researchers found that LLM imitation may be different from children's imitation behavior in important ways.

For children, there is a lot of debate in the existing literature about how much of our childhood imitation is faithful cultural transmission (e.g. over-imitation) and how much is driven by the broader search for truth, such as understanding other people's goals and intentions.

Whether LLM can innovate depends on whether it can innovate tools (new tools).

People can discover and create new tools, so tools are one of the best examples of solving the balance between imitation and innovation. Technologies in AI and robotics, such as "behavioral cloning", use a similar approach.

However, it needs to be emphasized again that the ability to imitate and interpolate existing tools depends on the parallel ability of extrapolation to discover new tools.

Tool innovation is an indispensable part of human life, and it is also observed in a variety of non-human animals, so tool innovation is generally regarded as a significant symbol of biological system intelligence.

Then, tool use is also an important comparison point for understanding LLM and children's imitation and innovation.

Both LLM and humans can encode the information of objects, but their abilities in tool imitation and tool innovation may be different. The researchers predict that these models may well capture familiar tool usage (such as hammers).

However, it is difficult for these systems to generate correct feedback when it comes to unusual or novel tools because the latter depends on the discovery and use of new causal relationships, functional analogies and applicability.

However, can children make this kind of innovation on their own? Do you need clear guidance and experience?

In fact, building a new tool from scratch is also a difficult task for children. However, children may more easily identify new functions in everyday items and choose appropriate alternatives to solve various tasks in the absence of typical tools.

In the study, the researchers studied whether human children and adults could use familiar objects to achieve specific results in new ways, and compared the results with the output of large deep learning models such as GPT-3 and GPT-4.

The study consists of two components: an imitation part (interpolation judgment based on the existing knowledge of the known object) and an innovative part (extrapolation judgment about new ways in which the object can be used).

In the innovation section, the researchers asked a series of questions about the need to implement goals without typical tools (for example, drawing a circle without a compass).

The researchers then provided participants with a choice of alternatives:

(a) something that is more similar to a typical tool but has nothing to do with context (such as a ruler).

(B) items that look different on the surface but have the same applicability and causal properties as typical tools (for example, a teapot with a round bottom).

(C) articles that are totally unrelated.

In the imitation part of the study, the researchers provided the same collection of items, but asked participants to choose which item option best matched the typical tool.

The researchers found that children and adults between the ages of 3 and 7 (average age = 27.80, standard deviation = 5.54) were able to identify common superficial relationships between objects when asked which items should be put together.

At the same time, they can also discover new functions of everyday items to solve novel problems, so they will also choose items that are ostensibly irrelevant but functionally relevant.

Next, using exactly the same settings as the text input of human participants in the test, the researchers wanted to see how OpenAI's GPT-4, Gpt-3.5-turbo, and text-davinci-003 models and Anthropic's Claude,Google 's FLAN-T5 (XXL) performed.

Because the researchers noticed that the models changed the output according to the order of the options, they ran the model six times for each scene, taking into account the six different sequences generated by the three options.

The researchers set the model output to deterministic and the temperature to 0, keeping the default values of all other parameters. The researchers then averaged the scores of six repeated trials (1 for the relevant subjects and 0 for the other responses).

As predicted, the researchers found that these LLM can recognize superficial commonalities between objects almost as well as humans.

They are sensitive to the surface correlation between objects and perform well in imitation tasks (GPT-4 averages 83.3% meme gptmae 3.5 micturbo averages 73.1% meme Daavinci averages 59.9% meme Claude averages 69.9% menus Flan averages 74.8%).

However, when they are asked to choose a new functional tool to solve problems, they are not as capable as human beings (GPT-4 average 75.9% meme gptmuri 3.5 micturbo average 58.9% davinci average 8.87% meme Claude average 58.16% meme Flan average 45.7%).

This suggests that simply learning from a large number of languages may not be enough to achieve tool innovation.

Unfortunately, the chart of the study has not been made public.

So can LLM discover new causality and use them to design new tools? We have repeatedly mentioned that the ability to discover new tools depends on the ability to infer new causality.

A large number of studies have shown that even very young children are good at discovering such relationships.

Because information about causal structure can be transmitted through imitation and cultural dissemination. Causal discovery is a good example of how a cognitive process solves inverse problems and discovers new truths through perception and action.

The latest versions of GPT,GPT-4 and GPT-3.5 are fine-tuned by reinforcement learning from human feedback.

This is also a problem. Reinforcement learning from human feedback may itself be seen as a way to enable cultural transmission, which is half cheating, LoL.

Reference:

Https://twitter.com/ylecun/status/1729265577733275786

Https://journals.sagepub.com/doi/full/10.1177/17456916231201401

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report