What's the use of letting AI learn to play the king? 04/27 Update SLTechnology News&Howtos

What's the use of letting AI learn to play the king?

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Shulou(Shulou.com)11/24 Report--

From go to games, DeepMind abused the human masters one by one. But this game AI, from a group of kings "understand" something new.

On November 28, NeurIPS 2022 officially opened.

As one of the most famous artificial intelligence events in the world, NeurIPS is the focus of computer science at the end of each year. The papers accepted by NeurIPS represent the highest level of neuroscience and artificial intelligence research today, and also reflect the changes of industry trends.

Interestingly, the research of this year's "contestants" seems to have a special fondness for "games".

For example, Li Feifei's team won the award for best data set and benchmark paper based on MineDojo in the Minecraft gaming environment. Relying on the openness of the game, researchers can train agents through various types of tasks in MineDojo, so that AI has more general capabilities.

And through the strict admission rate, another paper also included in the game field, may be related to many gamers.

After all, who hasn't played Masters?

In this paper, "Arena: a Generalized Environment for competitive reinforcement Learning" address: https://openreview.net/ pdf?id=7e6W6LEOBg3, the researchers proposed a test environment based on the MOBA game "Arena of Valor". The purpose is, in fact, similar to MineDojo-- training AI.

Why is the MOBA game environment favored? Since DeepMind launched AlphaGo, games, as a virtual environment with high degree of freedom and high complexity, have become an important choice for AI research and experiments.

However, compared with human beings who can constantly learn from open tasks, agents trained in games with lower complexity cannot generalize their abilities beyond specific tasks. To put it simply, these AI can only play chess or play the ancient Atari game.

In order to develop a more "universal" AI, the focus of academic research has gradually shifted from board games to more complex games, including imperfect information games (such as poker) and strategy games (such as MOBA and RTS games).

At the same time, as Li Feifei's team said in the award-winning paper, in order for the agent to be generalized to more tasks, the training environment needs to be able to provide enough tasks.

With AlphaGo and its derivative version of AlphaZero, DeepMind, which is invincible in the game of go, soon realized this.

In 2016, DeepMind teamed up with Blizzard to launch the StarCraft II Learning Environment (StarCraft II Learning Environment,SC2LE) based on StarCraft II, which has a space complexity of 10 to the power of 1685, providing researchers with action and reward specifications for agents, as well as an open source Python interface for communicating with game engines.

In China, there is also an extremely qualified "AI training ground"--

As a well-known MOBA game, the player's action state space in Arena of Valor is as high as 10 to the 20000 power, which is far larger than go and other games, and even more than the total number of atoms in the universe (10 to the 80th power).

Like DeepMind, Tencent's AI Lab has also teamed up with "Arena of Valor" to develop a "Arena of Valor AI open research environment" that is more suitable for AI research.

At present, "Arena of Valor AI Open Research Environment" includes the 1v1 combat environment and baseline algorithm model, and supports 20 heroes' mirror combat tasks and non-mirror combat tasks.

Specifically, under the condition of only considering the choice of heroes on both sides, the "Arena of Valor AI Open Research Environment" can support 20 × 20,400 pairs of sub-missions. If summoner skills are taken into account, there will be 40000 seed tasks.

In order to better understand the generalization challenges faced by agents in "Arena of Valor AI Open Research Environment", we can use two tests in this paper to verify them:

First, make a behavior tree AI (BT), whose level is entry-level "gold". In contrast, the agent (RL) is trained by reinforcement learning algorithm.

In the first experiment, only Diao Chan (RL) and Diao Chan (BT) were allowed to fight, and then the trained RL (Diao Chan) was used to challenge different heroes (BT).

The results after 98 rounds of tests are as follows:

When the opponent hero changes, the performance of the same training strategy declines sharply. Because the changes of opponent heroes make the test environment different from the training environment, the strategies learned by the existing methods are lack of generalization.

Fig. 1 generalization challenge across opponents in the second experiment, only Diao Chan (RL) and Diao Chan (BT) were allowed to fight, and then the trained RL model was used to control other heroes to challenge Diao Chan (BT).

The results after 98 rounds of tests are as follows:

When the target of model control changed from Diao Chan to other heroes, the performance of the same training strategy declined sharply. Because the change of the target hero makes the meaning of the action different from that of Diao Chan in the training environment.

Figure 2 the reason for this result of the cross-target generalization challenge is very simple. Each hero has his own unique operating skills. After getting a new hero, a single trained agent does not know how to use it. It can only be blackened with two eyes.

Human players are also similar, players who can "kill" in the middle may not be able to play a good KDA after changing the field.

It is not difficult to see that this actually goes back to the question we raised at the beginning, that it is difficult to train a "general" AI in a simple environment. MOBA games with high complexity provide a convenient environment for testing the generalization of models.

Of course, the game can not be directly used to train AI, so the specially optimized "training ground" arises at the historic moment.

As a result, researchers can test and train their models in things such as StarCraft II Learning Environment and Arena of Valor AI Open Research Environment.

How do domestic researchers access the appropriate platform resources? The development of DeepMind is inseparable from the strong support of Google. The MineDojo proposed by Li Feifei's team uses not only the resources of Stanford, a top university, but also strong support from Nvidia.

However, the domestic artificial intelligence industry at the present stage is still not solid enough at the infrastructure level, especially for ordinary companies and universities, which are facing the problem of shortage of R & D resources.

In order to get more researchers involved, Tencent officially opened the "Arena of Valor AI Open Research Environment" to the public on November 21 this year.

Users only need to register an account on the official website of the enlightenment platform, submit materials and pass the review of the platform, and they can download it for free.

Website link: https://aiarena.tencent.com/ aiarena / zh / open-gamecore is worth mentioning that in order to better support scholars and algorithm developers for research, the enlightenment platform not only encapsulates the "Arena of Valor AI Open Research Environment" for ease of use, but also provides standard code and training framework.

Next, let's have a "shallow" experience of how to start an AI training program on the platform of enlightenment.

Since we want to let AI "play"Arena of Valor", the first thing we need to do is to create the "intelligence" that is used to manipulate heroes.

Does that sound a little complicated? However, in the "Arena of Valor AI Open Research Environment", this is actually very simple.

First, start the gamecore server:

Cd gamecoregamecore-server.exe server--server-address: 23432 install the hok_env package:

Git clone https://github.com/tencent-ailab/hok_env.gitcd hok_env/hok_env/pip install-e. And run the test script:

Cd hok_env/hok_env/hok/unit_test/python test_env.py is now ready to import hok and call hok.HoK1v1.load_game to create the environment:

Import hokenv = HoK1v1.load_game (runtime_id=0, game_log_path= ". / game_log", gamecore_path= "~ / .hok", config_path= "config.dat", config_dicts= [{"hero": "diaochan", "skill": "rage"} for _ in range (2)]) then, we get our first observation from the agent by resetting the environment:

Obs, reward, done, infos = env.reset () obs is a list of NumPy arrays that describe the agent's observation of the environment.

Reward is a list of floating-point scalars that describes the immediate rewards received from the environment.

Done is a Boolean list that describes the state of the game.

The infos variable is a tuple of a dictionary whose length is the number of agents.

The operation is then performed in the environment until time runs out or the agent is killed.

Here, you only need to use the env.step method.

Done = Falsewhile not done: action = env.get_random_action () obs, reward, done, state = env.step (action) like StarCraft II Learning Environment, visualization tools can also be used to view agent playback in Arena of Valor AI Open Research Environment.

At this point, your first agent has been created.

Next, you can pull "her / him" to do all kinds of training!

Speaking of this, it must not be difficult for everyone to find that the "Arena of Valor AI Open Research Environment" does not simply throw out an environment that can train AI, but makes the whole process simple and easy to understand through familiar operations and rich documentation.

This makes it easier for more people who want to enter the AI field.

Game + AI, what are the possibilities? Seeing this, there is actually one unanswered question-why does Tencent's enlightening platform, as a research platform led by enterprises, choose to be open on a large scale?

In August this year, Chengdu artificial Intelligence Industry Ecological Alliance and Yuqian consultant, a think tank, jointly released the country's first game AI report. It is not difficult to see from the report that games are one of the key points to promote the development of artificial intelligence. specifically, games can improve the landing application of AI from three aspects.

First of all, the game is an excellent training and testing ground for AI.

Fast iteration: the game can interact freely, try and make mistakes at will, without any real cost, and there is an obvious reward mechanism, which can fully train and show the effectiveness of the algorithm.

Rich tasks: there are many kinds of games, difficulty and complexity, artificial intelligence must adopt complex strategies to deal with, to overcome different types of games reflects the improvement of the level of algorithms.

The success or failure criteria are clear: calibrate the ability of artificial intelligence through game scores, which is convenient for further optimization of artificial intelligence.

Secondly, the game can train the different abilities of AI and draw different applications.

For example, board games train AI sequence decision-making to obtain long-term deduction ability; card games train AI dynamic self-adaptation to obtain the ability to adapt to changes; real-time strategy games train AI's machine memory ability, long-term planning ability, multi-agent cooperation ability and action coherence.

In addition, games can also break environmental constraints and promote the landing of decision-making intelligence.

For example, games can promote real-time rendering of virtual simulation and synchronization of virtual simulation information, and upgrade virtual simulation interactive terminals.

The enlightenment platform relies on the advantages of Tencent AI Lab and Arena of Valor in algorithms, computing power and complex scenes. After opening up, it can build an effective cooperation bridge between games and AI development, linking university discipline construction, competition organization and industry talent incubation. When the talent reserve is sufficient, the progress of scientific research and the landing of commercial applications will emerge like bamboo shoots after a spring rain.

In the past two years, there have been many measures for the layout of the enlightenment platform in the field of industry-university-research: a "Enlightened Multi-Agent reinforcement Learning Competition" was held, which attracted the participation of a top university team, including famous TOP2 schools like Qingbei; a university science-education consortium was formed; the popular elective course "algorithms in Game AI" from the School of Information Science and Technology of Peking University was used to do experiments with Arena of Valor's 1V1 environment after class.

Looking to the future, it can be expected that these talents who go out with the help of the "enlightenment" platform will radiate to all fields of the AI industry and realize the ecological flowering of the upstream and downstream of the platform in an all-round way.

This article comes from the official account of Wechat: Xin Zhiyuan (ID:AI_era)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.