In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >
Share
Shulou(Shulou.com)11/24 Report--
Recently, a team in South Korea let GPT-3.5 and GPT-4 play this game in order to test their humanoid properties.
Generative AI research and new work!
The South Korean team tried to get GPT to play a game, but it was also a gangster-themed game-"Spyfall".
For those of you who are not familiar with the game, take a look at it first. The picture below is the painting style of "Spyfall".
In fact, this is a board game, suitable for young and old, very suitable for the kind of hot game where friends get together.
The main way to play the game is to "talk".
The player will have a player who plays "spy". All players draw a card, including a spy card, and the rest of the players draw the same place card.
The goal of a spy is to find out where the rest of the players are through conversation, while the rest of the players are trying to determine who the spy is.
The game lasts a total of 8 minutes, and players can ask each other questions. As soon as 8 minutes are up, all players will vote together.
Doesn't it look a lot like the one we usually hang out with? who's undercover? The only difference is that the vocabulary of who is undercover may come from various fields, while this game has only location nouns, such as stadiums, theatres, classrooms and so on.
All right, figure out the rules of the game, the next step is to let GPT play.
The research team said that in the course of the experiment, special attention will be paid to the performance of GPT in role-playing. The purpose of this study is to demonstrate the understanding, decision-making and interaction ability and potential of GPT in specific game scenarios.
From the rough point of view of the results, the comparative analysis of GPT-4 and GPT-3.5-turbo shows that GPT-4 enhances the adaptability to the game environment, and makes a significant improvement in asking related questions and forming human-like responses.
However, it is not all advantages. For example, GPT-4 has some limitations in bluffing and predicting the actions of its opponents, especially when not acting as a spy.
The results show that although GPT-4 has made good progress compared with previous versions, it still has the potential for further development, especially in instilling more "humanoid" attributes into AI.
However, the experiment has successfully shown that generative AI has a great potential in simulating human-like interaction. From GPT-2 to GPT-4, the decision-making ability, interpretability and problem-solving ability of the model have all made great progress.
The direction of future efforts is the "humanoid" attribute mentioned above, which makes GPT more universal and extensive.
First of all, we know that the biggest advantage of GPT model is that users can interact with it intuitively through natural language, regardless of whether they are familiar with the kernel of the technology or not.
Of course, almost all model interactions are carried out through natural language, and users can express their ideas and intentions in the way they are most familiar with, and get a response from the model.
In addition, LLM has a broad knowledge pedigree, and GPT-4 's database enables the model to provide in-depth knowledge on many topics.
At the same time, the difference between GPT and other LLM is that it is very extensible, and users can apply GPT in many areas, such as the experiment introduced today.
In this experiment, the researchers arranged for a total of five players, including GPT.
The researchers conducted a total of two experiments.
Experiment 1:
Test the performance difference between GPT-4 and GPT-3.5-turbo.
Experiment 2:
Use only GPT-4 for games. The researchers played a total of eight games, recorded the logs of each game, and discussed the results.
Of course, the number of experiments is not enough to give a definite conclusion about the potential of birth-formed AI. But following this line of thinking, more repetitive experiments and more extensive tests can provide more substantive evidence.
Let's take a look at experiment one first.
In order to evaluate the differences between GPT-4 and GPT-3.5-turbo, especially the probability of errors in format, the understanding of the game background, such as the rules and processes of the game, and the differences in humanoid reactions.
The researchers began the experiment with the first question in the first round of conversation.
With this clearest and least variable part of the game, they can accurately analyze the capabilities of each model and minimize the impact of external factors.
First, the researchers compared GPT-3.5-turbo and GPT-4 's answers to 30 first-round questions for each of the 30 locations described in the rule script.
The action request script that asks questions to both models is the same, just changing the keyword of the location.
The rules and basic policies are the same as the script above, and as shown in the following figure, the experimenter gets the response of the model by merging the three scripts into a single request.
For a more accurate comparison, all requests are fixed as player 1 and assume that player 1 is not a spy.
The scripts submitted to each model are as follows:
You are player 1, you are not a spy. The location of this round is _.
Now it's your turn to ask other players questions. Choose a player from players 1 to 5 (you can't choose yourself) and write down your questions. And submit it in the following format: n (player serial number) _ player, question content
For the game itself, a high-quality question should include the following parts: identify yourself, that is, you are not a spy. Prove that you are not a spy by showing that you know what the location is. Finally, make sure that the spy will not know exactly where the location is.
At the same time, the output of the model must conform to the format in the above script. The researchers say that if the model does not follow the format, it will take a lot of effort to correct it.
Let's look at the following output:
At the top is a good question: do you need to buy a ticket to go to this place?
The following examples are some less relevant questions, but in accordance with the format.
For example, the location is mentioned directly in the question (it's like handing over the cards when playing who is undercover).
Another example: problems that have nothing to do with the above game plan.
The correct place is the theater, and GPT asks other players what is their favorite war movie. )
Of course, there are also cases of complete failure:
As an AI language model, I can't....
There are even cases of simple repetition of the problem.
Based on the above results (see paper for full results), the researchers concluded that GPT-4 is more suitable for the next step of the experiment than GPT-3.5-turbo.
When examining the data, the researchers found that GPT-3.5-turbo often generated problems that were out of context. For example, the direct surrender location mentioned above enables the spy to determine the location immediately, which is disadvantageous to non-spies.
There is also the above to ask the player's personal preferences, rather than game-related topics, disrupting the game process. GPT-3.5 did all this.
In addition, not answering in the required format hinders the progress of the game, which is also GPT-3.5 's specialty.
The above table is the statistics of all the situations in which GPT-3.5 does not meet the requirements. Interestingly, a total of 68 answers, 68 errors, none of the perfect output.
GPT-3.5 pass, let's take a look at GPT-4 for the next experiment.
The researchers played eight games according to the rules outlined and used GPT-4 to collect logs of each game.
All reactions in the game are generated by GPT-4, while the automation code for the game is written by Python.
According to the rules and scripts in this article, interested friends can easily reproduce the experiment.
The researchers chose locations in turn from An in the list of locations and played a total of eight games. Each game starts with player 1, and GPT-4 responds to each request independently, not from one game to the next.
GPT-4 makes decisions only based on the scripts provided, which means that the identity of the initial player does not affect the outcome of the game.
After eight rounds of games, the researchers concluded that GPT-4 's conversations in each game and round were smooth and organic, with a series of questions and answers that felt authentic and human-like.
At the same time, when analyzing the questions and their answers, the team found an obvious observation.
That is, without any special training or fine-tuning, the GPT-4 model can skillfully ask all kinds of relevant questions according to the flow of the game.
To demonstrate the diversity of questions and answers raised by GPT-4 in the game, the researchers divided the questions and corresponding answers into groups
However, although these questions and answers are not pre-trained, each answer has its own object and unique nuances.
Although without any specific training, GPT-4 can skillfully ask questions according to the flow of the game, and seems to be able to tell which questions are appropriate for the game environment and which may not be appropriate or superfluous.
For example, the following example:
Q: what kind of snacks can we have here?
A: snacks and drinks, including soft drinks and alcoholic beverages, are usually served in small portions.
Q: what types of food and snacks are usually served here?
A: in this location, you can find all kinds of food, such as hamburgers, hot dogs, marshmallows, popcorn, and ice cream, as well as drinks such as soda and water to keep everyone refreshed.
Q: what food can we find here?
A: there are a variety of foods to choose from, including fried foods, marshmallows and popcorn.
Q: what kind of food do you usually eat here?
A: according to the situation, we mainly eat pickled food and non-perishable food.
At the end of the paper, the researchers said that despite some limitations, the growing potential of these models is promising to promote innovation and stimulate practical applications.
The GPT series of models have made rapid progress, especially in terms of decision-making, interpretability, and problem-solving capabilities.
Initially, the goal of GPT-2 was only to deal with natural languages at the basic level. Later, the model developed into an interactive model with multiple tasks.
Now, GPT-4 has demonstrated the ability of logical reasoning beyond human performance in some areas. Next, researchers can delve into a new area of integration.
GPT's excellent natural language processing capabilities can greatly help users understand how the model works and interpret its results.
This accessibility expands the potential user base, opens its arms to users from different backgrounds, and enhances the creativity and scalability of the model in different areas.
Finally, the humanoid trait of GPT-4 is undoubtedly superior to other models in terms of its ability to imitate humanoid responses.
For some tasks or activities, such as education, sports, music, and art, it may be more important to humanize the task than to return to the best results.
Reference:
Https://www.reddit.com/r/MachineLearning/comments/16qztf4/r_generative_ai_in_mafialike_game_simulation/
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.