Nvidia AI agent accesses GPT-4, defeats AutoGPT, and writes code independently to dominate my world without human intervention. 04/21 Update SLTechnology News&Howtos

Nvidia AI agent accesses GPT-4, defeats AutoGPT, and writes code independently to dominate my world without human intervention.

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

Give the game industry a little GPT-4 shock? The agent, called Voyager, can not only train independently based on the game's feedback, but also write its own code to promote the game's tasks.

Following the 25-person town of Stanford, AI has come out with a new fashion.

Recently, Nvidia AI scientist Jim Fan and others integrated GPT-4 into "my World" (Minecraft)-- proposed a new AI agent Voyager.

The beauty of Voyager is that it not only outperforms AutoGPT, but also allows lifelong learning of the whole scene in the game!

Compared with the previous SOTA,Voyager, it acquired 3.3times more items, traveled 2.3times longer, and unlocked the key skill tree 15.3times faster.

In this regard, netizens were directly shocked: we are one step closer to the general artificial intelligence AGI.

So, the future games will be played by NPC driven by big models, right?

After real digital life is connected to GPT-4, Voyager does not have to worry about human beings at all. It is completely self-taught.

It not only mastered the basic survival skills of digging, building houses, collecting and hunting, but also learned to explore on its own.

It will go to different cities, pass by oceans, pyramids, and even build its own portal.

Through self-drive, it continues to explore this magical world, expanding its items and equipment, equipped with different levels of armor, shields to block damage, and fences to keep animals in captivity.

Paper address: https://arxiv.org/ abs / 2305.16291

Project address: https://voyager.minedojo.org/

Voyager's heroic deeds include, but are not limited to-

The final Shadow Man of the War

Build a base

Dig amethyst

Dig for gold

Collect cactus

Hunting

Fishing

What is the potential of digital life? All we know is that Voyager is still exploring and expanding its territory in Minecraft all the time.

"training" does not need a gradient decline. Before that, one of the challenges in AI was to build embodied agents with universal capabilities, allowing them to explore on their own in the open world and develop new skills on their own.

In the past, reinforcement learning and imitation learning have been adopted in academic circles, but these methods are often not satisfactory in the aspects of systematic exploration, interpretability and generalization.

The emergence of the large language model brings a new possibility to the construction of embodied agents. Because LLM-based agents can use the world knowledge contained in the pre-training model to generate consistent action plans or executable strategies, it is very suitable for tasks such as games and robots.

Earlier, Stanford researchers shocked the AI community by building a virtual town with 25 AI agents, which also had the benefit of not requiring materialized natural language processing tasks.

However, these agents are still unable to get rid of the defect of lifelong learning, so they can not gradually acquire knowledge over a long time span and accumulate them.

The most important significance of this work is that GPT-4 opens up a new paradigm: the process depends on code execution "training" rather than gradient descent.

Jim Fan explained: we had this idea before BabyAGI / AutoGPT and spent a lot of time figuring out that the best gradient-free architecture "training model" is a skill code base built iteratively by Voyager, rather than a floating-point matrix. In this way, the team is pushing the gradient-free architecture to its limits.

The agents trained in this case already have the same lifelong learning ability as human beings.

For example, if Voyager finds himself in a desert rather than a forest, he will know that learning to collect sand and cactus is more important than learning to collect iron ore.

Moreover, it can not only identify its most appropriate task according to the current skill level and the state of the world, but also constantly improve the skill according to feedback, keep it in memory and keep it in the next call.

So how far are we from the emergence of silicon-based life? Karpathy, who has just returned to OpenAI, praises the job as a "gradient-free architecture" for advanced skills. In this case, LLM is the equivalent of the prefrontal cortex, generating a lower-level mineflayer API through code.

Karpathy recalls that around 2016, the performance of agents in the Minecraft environment was still very desperate. At that time, RL could only randomly explore ways to perform long-term tasks from ultra-sparse rewards, which felt very stuck.

Now, that barrier has been largely removed-the right thing to do is to take a different approach, first training LLM to learn world knowledge, reasoning and tool use (especially writing code) from Internet texts, and then throwing the problem directly at them.

Finally, he lamented: if I had read about this "gradient-free" approach to agents in 2016, I would have been surprised.

Weibo Big V "Baoyu xp" also spoke highly of this work.

It is really a great attempt, the whole code is open source, this automatic generation task-> automatically write code execution task-> the idea of saving a code base that can be reused should be easily applied to other fields.

Unlike other games commonly used in AI research, Voyager does not impose predefined destination goals or fixed plot lines, but provides a playground with endless possibilities.

For an effective lifelong learning agent, it should have abilities similar to those of human players:

1. Propose appropriate tasks according to its current skill level and the state of the world. For example, if it finds itself in a desert rather than a forest, it will learn to collect sand and cactus before learning to collect iron.

two。 Refine skills based on environmental feedback and memorize acquired skills for reuse in similar situations (for example, fighting zombies and spiders)

3. Continue to explore the world and find new tasks in a self-driven way.

To give Voyager these capabilities, teams from Nvidia, the California Institute of Technology, the University of Texas at Austin, and Arizona State University proposed three key components:

1. An iterative prompt mechanism that combines game feedback, execution errors and self-verification to improve the program

two。 A skill code base for storing and retrieving complex behaviors

3. An automated tutorial that maximizes the exploration of agents

First, Voyager will try to use a popular Minecraft JavaScript API (Mineflayer) to write a program that achieves a specific goal.

Although the program went wrong on the first attempt, game environment feedback and JavaScript execution errors, if any, can help GPT-4 improve the program.

Left: environmental feedback. GPT-4 realized that two more boards were needed before making sticks.

Right: execution error. GPT-4 realized that it should make a wooden axe, not an Acacia axe, because there is no Acacia axe in Minecraft.

By providing the current state and task of the agent, GPT-4 tells the program whether it has completed the task.

In addition, if the task fails, GPT-4 will criticize and suggest how to complete the task.

Self-verification secondly, Voyager gradually builds a skill base by storing successful programs in the vector database. Each program can be retrieved by embedding its document string.

Complex skills are synthesized by combining simple skills, which allows Voyager's capabilities to grow rapidly over time and alleviate catastrophic forgetting.

Above: add skills. Each skill is indexed by the embedded index it describes and can be retrieved in similar situations in the future.

Next: retrieval skills. When faced with a new task from an automated course, a query will be made and the first five related skills will be identified.

Third, the automatic course will propose appropriate exploration tasks according to the current skill level and world state of the agent.

For example, if it finds itself in a desert instead of a forest, it learns to collect sand and cactus instead of iron.

Specifically, the course is generated by GPT-4 based on its goal of "discovering as many things as possible".

Automatic course

Experiment next, let's look at some experiments!

The team systematically compared Voyager with other LLM-based agent technologies, such as ReAct, Reflexion, and AutoGPT, which is popular in Minecraft.

In 160 prompt iterations, Voyager found 63 unique items, 3.3 times more than the previous SOTA.

The search for novel automated courses will naturally drive Voyager to travel extensively. Even without clear instructions, Voyager traverses longer distances (2.3x) and accesses more terrain.

By contrast, the previous method appears to be very "lazy", often circling in a small area.

Map exploration rate, then, after lifelong learning, the "training model"-- skill base, how does it perform?

The team emptied the items / armor, created a new world, and tested the agent with tasks never seen before.

As you can see, Voyager solves tasks significantly faster than other methods.

It is worth noting that the skill library built from lifelong learning not only improves the performance of Voyager, but also improves the performance of AutoGPT.

This shows that the skill base, as a general tool, can be effectively used as a plug and play method to improve performance.

Zero sample generalization the numbers in the figure above are the average of the suggested iterations in the three experiments. The fewer iterations, the more effective the method. As you can see, Voyager solved all the tasks, while AutoGPT failed after 50 prompt iterations.

In addition, compared with other methods, Voyager is 15.3times faster in unlocking wood tools, 8.5times faster in stone tools, and 6.4times faster in iron tools. And Voyager, which has a skill base, is the only one to unlock diamond tools.

Skill tree mastery (wooden tools → stone tools → iron tools → diamond tools) currently, Voyager only supports text, but it can be enhanced by visual perception in the future.

In a preliminary study conducted by the team, humans can provide feedback to agents like an image annotation model.

This allows Voyager to build complex 3D structures, such as hell doors and houses.

The results show that the performance of Voyager is better than that of all alternatives. In addition, GPT-4 is also significantly better than GPT-3.5 in code generation.

The ablation experiment concluded that Voyager is the first embodied agent driven by LLM and can be used for lifelong learning. It can use GPT-4 to constantly explore the world, develop increasingly complex skills, and always make new discoveries without human intervention.

Voyager shows superior performance in discovering new objects, unlocking Minecraft technology trees, crossing diverse terrain, and applying its learned skill base to unknown tasks in the newly generated world.

For the development of general agent, Voyager without adjusting model parameters can be used as a starting point.

Reference:

Https://voyager.minedojo.org/

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.