AI intelligent volume explosion model, AutoGPT and other four big Agent fight, "Western World" who will become the software 2.0 07/19 Update SLTechnology News&Howtos

AI intelligent volume explosion model, AutoGPT and other four big Agent fight, "Western World" who will become the software 2.0

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

The next hot spot after LLM is the AI agent? Karpathy said bluntly: now OpenAI is very popular with AI Agent's paper, maybe this is the new direction of OpenAI.

Recently, AI Agent suddenly became popular again.

What is an AI agent?

They are automatic agents that run in loops in the simplest form, generating self-directed instructions and operations at each iteration. Therefore, they do not rely on humans to guide dialogue and are highly scalable.

The emergence of the large language model undoubtedly brings a new imagination to the development of AI agents.

This has also aroused the interest of countless AI bosses and technology giants. Karpathy, the former director of AI and Daniel who joined OpenAI this year, recently said at a developer event: AI agent represents a future of AI!

Karpathy once called AutoGPT the next frontier of rapid engineering. In fact, as early as March or April this year, there was a big explosion of AI agents. As if by coincidence, a number of agents, such as Stanford West World Town, BabyAGI, AutoGPT, and so on, sprang up in just two weeks.

Some people even make a call: don't roll up the big language model, we can't roll OpenAI, but when it comes to AI agents, they don't have much experience than we do.

Maybe, if you are not careful, you can roll yourself into a "OpenAI" on the AI intelligent track!

The explosion of AI agents has made the embryonic form of AGI appear. Today, let's review the AI Agent outbreak a few months ago.

These agents appeared at a very close time.

Camel was released on March 21.

AutoGPT was released on March 30.

On April 3, BabyAGI was released.

On April 7, a small town in the western world was released.

On May 27, after Nvidia AI agent Voyager connected to GPT-4, it beat AutoGPT directly. By writing its own code, it completely dominates the "my world" and can carry out lifelong learning of the whole scene in the game without human intervention at all.

At the same time, Shang Tang and Tsinghua University jointly proposed the generalist AI agent Ghost in the Minecraft (GITM), which can also solve tasks through autonomous learning and perform well.

These excellent AI agents simply let people see the embryonic form of AGI + agents.

Project 1: Stanford, Google "Westworld" has given a comprehensive analysis of the previous AI agents, Anacondainc data scientist, Dr. Sophia Yang, in the blog.

The most eye-catching of these AI agents is undoubtedly the Westworld town that became popular as soon as it was launched, which was created by researchers at Stanford and Google.

Generative agent simulates human behavior realistically. This is an interactive environment of sandboxie. In a small town, there are 25 generative AI agents that can simulate human behavior.

They walk in the park, drink coffee in cafes and share the news of the day with their colleagues.

Paper address: https://arxiv.org/ abs / 2304.03442, and the social behavior of these AI agents makes humans lose their jaws--

For example, starting with a simple user-specified concept (an agent wants to hold a Valentine's Day party), these agents will automatically spread the news of party invitations, meet new people, and ask each other to go to the party over the next two days. and coordinate time with each other and show up at the party at the right time.

These credible simulations of human behavior are possible precisely because of the agent architecture in the following figure.

It extends a large language model with three important architectural elements-memory, reflection and planning.

The architecture of generative agents 1) the memory and retrieval memory stream contains a list of observations for each agent, where each observation has its own timestamp.

Observation can be the behavior performed by the agent or the behavior perceived by the agent from others. The memory stream is long, but not all observations are important.

In order to retrieve the most important memories for transmission to the language model, there are three factors to consider:

1. Recent sex: recent memory is more important.

two。 Importance: the memory that the agent considers important. For example, breaking up with someone is more important than having breakfast.

3. Relevance: context-related memory, that is, query memory. For example, when discussing how to study for chemistry exams, memorizing homework is more important.

The memory stream contains a large number of observations, and the retrieval process determines a subset of these observations that should be passed to the language model.

2) reflection is a kind of high-level abstract thinking, which can help agents to generalize and reason.

Reflection regularly raises the following two questions: "which three most prominent high-level questions can we answer about the themes in the presentation?" Which five high-level opinions can you infer from the above statement? "

(reflection Tree 3) Planning is important because actions should be focused not only on the present, but also over a longer time frame, so that actions can be coherent and credible.

Planning is also stored in the memory stream. The agent can create actions according to the plan and respond and update the plan according to other observations in the memory stream.

The application of Valentine's Day party has unlimited potential, even a little scary.

Imagine an AI assistant watching your every move, making plans for you, and even executing them for you.

It will automatically adjust the lights and make coffee, and it has already ordered dinner before you open your mouth.

The project 2:CamelCamel is known as "role-playing".

As a communication agent exploring the social "mind" of a large language model, it proposes a role-playing agent framework, which can realize the communication between two artificial intelligence agents:

1) AI user agent: provides instructions to the AI assistant with the goal of completing the task

2) AI helper agent: follow the instructions of AI users and respond by solving tasks

3) Task assignment agent: the role of this agent is to conceive a specific task for AI users and AI helpers. In this way, it can write a specific task prompt on its own, without having to take the time to define it.

The following example shows how to use Camel to develop a trading robot.

The AI user is a stock trader and the AI assistant is a Python programmer.

The task assignment agent first puts forward a specific task and gives the details of the task (monitoring social media emotion according to the emotion analysis results and stock trading according to the emotion analysis results).

The AI user agent then becomes the task planner and the AI helper agent becomes the task executor and prompts each other in a loop until some termination conditions are met.

The core of the role-playing architecture Camel lies in its prompt project, that is, the initial prompt.

These tips are actually carefully defined to assign roles, prevent role reversals, prohibit the generation of harmful and false information, and encourage coherent dialogue.

Https://arxiv.org/ abs / 2303.17760LangChain implementation in the implementation of LangChain, it uses the hints given in Camel's paper, and defines three agents:

1) task_specify_agent (task assignment agent)

2) assistant_agent (Assistant Agent)

3) user_agent (user Agent).

Then, a while loop is used to loop the conversation between the helper agent and the user agent:

Chat_turn_limit, n = 30 0while n chat_turn_limit: n + = 1 user_ai_msg = user_agent.step (assistant_msg) user_msg = HumanMessage (content=user_ai_msg.content) print (f "AI User ({user_role_name}):\ n\ n {user_msg.content}\ n\ n") assistant_ai_msg = assistant_agent.step (user_msg) assistant_msg = HumanMessage (content=assistant_ai) _ msg.content) print (f "AI Assistant ({assistant_role_name}):\ n\ n {assistant_msg.content}\ n\ n") if "" in user_msg.content: break from the result of generation The effect is very good.

However, in Camel, the execution result of the AI helper is only the answer of the language model, and no tool is actually used to run Python code.

For example, Chestnut created a game using Camel, a human programmer and a human gamer, working with two AI agents.

The author uses Camel to create two agents, a player and a programmer.

After setting a goal for them to make a game, the player agent disassembles the steps of making the game step by step.

The programmer agent writes code step by step for the steps set by the player.

Very similar to the process of future humans and coding AI to develop a specific project.

There are also people who use Camel to role play potentially malicious applications.

The goal of the project is for two "carbon traitors" to infiltrate and disrupt the communications, financial and political networks of the world's major countries, and eventually build an AGI empire.

"carbon traitor" 1 disassembles the process of infiltration and infiltrates the network one by one.

Carbon traitor 2 sets up a specific implementation plan based on these small goals.

Of course, because the goal is too ambitious, every approach in the specific plan does not seem to be easy to implement, such as:

Carbon traitor 2 says he wants to use social engineering, phishing attacks, violent attacks, etc., to access the communication network-basically not operable.

But in the future, if other tools such as language models become more intelligent, it may be possible for two "carbon traitors" to subvert human beings.

Therefore, after trying these agents, the editor is more convinced that the great cause of "alignment" of large language models Brooks no delay.

If this malicious agent can really work, it will be an instant for human beings to be "stolen". This makes us even more alert to the problem of AI alignment.

Project 3:BabyAGI

Yohei Nakajima released Task-driven Autonomous Agent (Task-driven Autonomous Agent) on March 28 and opened up the BabyAGI project on April 3.

The key feature of BabyAGI is that there are only three agents: task execution agent (Task Execution Agent), task creation agent (Task Creation Agent) and task priority agent (Task Prioritization Agent).

1) the task execution agent completes the tasks in the list sequentially

2) Task creation agent creates new tasks based on the goals and results of previous tasks

3) Task priority agent reorders tasks.

Then, this simple process will be repeated over and over again.

Speaking at LangChain's webinar, Yohei said he designed BabyAGI to simulate the way he works.

Article address: https://yoheinakajima.com/task-driven-autonomous-agent-utilizing-gpt-4-pinecone-and-langchain-for-diverse-applications/ specifically, he solves the first task from the to-do list every morning, and then completes the task in turn.

If a new task appears, he just needs to add it to the list.

At the end of the day, he reevaluates and reorders the list. This method is then mapped to the workflow of the agent.

Using this project is equivalent to asking the boss himself to give us a round-the-clock job.

BabyAGI flowchart (interestingly, this research paper is done with the help of GPT-4) BabyAGI + LangChain in the LangChain framework, running BabyAGI is very simple.

First, create a BabyAGI controller that contains three chains:

1) Task creation chain (TaskCreationChain)

2) Task priority chain (TaskPrioritizationChain)

3) execution chain (ExecutionChain)

Then, run them in a (potentially) infinite loop.

With Langchain, you can define the maximum number of iterations so that it does not run indefinitely and consume all OpenAI API quotas.

OBJECTIVE = "Write a weather report for SF today" llm= OpenAI (temperature=0) # Logging of LLMChainsverbose=False# If None, will keep on going forevermax_iterations: Optional [int] = 3baby_agi = BabyAGI.from_llm (llm=llm, vectorstore=vectorstore, verbose=verbose, max_iterations=max_iterations) baby_agi ({"objective": OBJECTIVE}) the following is the result of running 2 iterations:

BabyAGI + LangChain tool = superpower as shown in the figure above, BabyAGI only "executes" content with large language model replies.

With the power of LangChain tools, agents can use a variety of tools in the "execution" process, such as Google to search for information on the Internet.

The following example shows the process of "performing" using Google to search for the current weather in San Francisco.

The application potential of BabyAGI can be said to be huge-it only needs to set a goal and it will implement it itself.

However, it still lacks a UI that can interact more with users.

For example, before BabyAGI arranges invitations for users, it should be confirmed.

Let's take a look at some actual use cases:

Cognosys

Web site: https://www.cognosys.ai/ it is the online version of BabyAGI.

The free version can access ChatGPT and perform up to 7 agent cycles.

The paid version costs $21 a month with unlimited access to GPT-4 and up to 20 agent cycles.

Do Anything Machine

Https://www.doanythingmachine.com/ this is an agent that automatically executes daily to-do lists and can help users automate daily tasks after connecting to ChatGPT.

You can connect to various plug-ins, including ChatGPT, to execute your to-do list.

It's just that for use now, you still need to add to the waiting list.

It's a relief to watch your to-do list disappear automatically, and it's worth waiting for.

God Mod

Https://godmode.space/ this is a tool to help you perform various tasks through ChatGPT.

Users are required to enter the requirements in this ChatGPT-like interface after binding their GPT account API.

He will break you down into multiple steps and then provide the solution through ChatGPT.

As soon as the project 4:AutoGPTAutoGPT appeared, it was praised by Karpathy as the next frontier of prompt project. It took 27,000 stars in GitHub in just a few days, which made the whole AI community very popular.

It follows BabyAGI-like logic-- including the process of generating ideas, reasoning, generating plans, reviews, planning next actions and execution, and then infinitely looping through the process.

In the execution steps, AutoGPT can complete a number of commands, such as Google search, browse the website, write to files, and execute Python files.

You can even start and delete GPT agents (this is also hot! ).

When running AutoGPT, there are two initial prompts for input:

1) the role of AI

2) the goal of AI

It can generate thinking, reasoning, planning, criticism, planning for next actions and execution.

For example, do a Google search:

The most powerful thing about AutoGPT is that it allows humans to interact with him to a certain extent.

When it wants to run Google commands, it asks for authorization so that users can avoid wasting OpenAI API token and stop looping before.

It would be great if it could talk to humans and let us provide better guidance and feedback in real time.

Write your own code and execute the script project address: https://github.com/ Significant-Gravitas / Auto-GPT

Similarly, the project is also driven by ChatGPT, which automatically writes code and other tasks according to the user's requirements.

Use AutoGPT to order pizza online

The user experience is similar to a browser plug-in.

This project can directly help you complete the tedious process of ordering food.

Enter the address, choose the taste and other steps do not need you to do it yourself, you just need to watch, if you find a problem and correct it in time.

AI agent civilization is about to appear, do you still roll up the big model? Although the four AI agents just introduced are still in the early stages of development, they have shown impressive results and potential applications.

There is no doubt that autonomous AI agents will be a very promising field.

In the activity, Karpathy imagined that the future AI agent may not be a single individual, but a lot of AI agent organizations, and even there will be an AI agent civilization.

Karpathy says that in his early days at OpenAI around 2016, the trend in the industry was to study how to use reinforcement learning to improve AI agents.

Many projects are based on games like Atari to create AI players.

Today, five years later, because of the new technology, AI agent has once again become a promising direction. No one is using reinforcement learning to study agents as they did in 2016.

At the end of the event, Karpathy encouraged the developers: the AI agents built by all of you are actually at the forefront of modern AI agents, and other large LLM institutions, such as OpenAI, DeFi, etc., are not at the forefront compared to you.

For example, OpenAI is very good at training Transformer big language model, if a paper proposes a different training method, OpenAI will feel that it is a small case, we play the rest.

However, whenever a new AI agent paper appears, OpenAI will be very excited and immediately start a heated discussion.

If you are not doing GPT-5, is OpenAI secretly acting as a big model agent? Let's wait quietly.

PS: by the way, Wu Enda just launched the new course "LangChain: building chatbots to talk to data" about LangChain, which plays an important role in the above-mentioned agents.

Https://www.deeplearning.ai/short-courses/langchain-chat-with-your-data/ reference:

Https://towardsdatascience.com/4-autonomous-ai-agents-you-need-to-know-d612a643fa92

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.