Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use GCP to develop Roguelike games with reinforcement learning function

2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article is about how to use GCP to develop Roguelike games with reinforcement learning features. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.

Many applications of reinforcement learning (RL) are specifically designed to separate manual labor from the training cycle. For example, OpenAI Gym provides a framework for training RL models to act as players in Atari games, and many questions are rooted in describing the use of RL in robotics. However, one area that is often underdiscussed is the application of RL methods to improve people's subjective experience.

To demonstrate this type of application, I developed a simple game called "Trials of the Forbidden Ice Palace". The game uses reinforcement learning to improve the user experience by tailoring the difficulty of the game for users.

How the game works

The game is a traditional Roguelike game: a round-based dungeon exploration game with RPG elements and a large number of program generation. The player's goal is to escape the ice palace layer by layer, fight monsters and collect useful items along the way. Traditionally, enemies and items that appear on each floor are randomly generated, but the game allows the RL model to generate these entities based on the collected data.

It is well known that reinforcement learning algorithms require a lot of data, so the following constraints should be followed when creating games to reduce the complexity of the RL model:

1) there are 10 layers in the game, and then the player wins.

2) the number of enemies and items that can be generated at each level is fixed.

Reinforcement learning and environment

The core concept of reinforcement learning is that automatic agents (Agent) interact with the environment (Env) through observation and action (Action), as shown in figure 1. Through interaction with the environment, agents can receive rewards (positive or negative), and agents can use these rewards to learn and influence future decisions.

For this application, the agent is the RL algorithm, which adjusts the difficulty of the game according to the entity it chooses to generate, and the game is the environment that the RL algorithm can observe and control.

Status State

Status refers to any observation made by the agent of the environment that can be used to decide what actions to take. Although there are a large number of different data agents that may observe (the player's health, the number of turns the player needs, etc.), the first version of the game takes into account only the character of the player whose floor has reached the player's level.

Action Actions

Due to the program generation nature of the game, the agent will decide to randomly generate monsters / props instead of having a deterministic decision each time. Because there are a large number of random elements in the game, agents do not explore in a typical RL way, but control the weighted probabilities generated by different enemies / props in the game.

When an agent chooses an action, based on the best pattern learned so far, it will decide which enemy / prop to generate in the game through the Q-matrix weighted random sampling learned; conversely, if the agent chooses to explore, the agent will produce enemies / items with equal probability from all entities in the game.

Reward Reward

The reward model of reinforcement learning algorithm is very important to develop the expected behavior that the learning model should show, because machine learning will take shortcuts to achieve its goals. Since the expected goal is to maximize the enjoyment of players, the following assumptions are made to quantify fun according to the rewards of the RL algorithm:

It will be more fun for players to move forward in the game rather than dying prematurely.

In the absence of a challenge, every winning game will be boring.

With these goals in mind, the RL model is rewarded when the player enters the new floor shown in Table I and when the game is completed as described in Table II.

Table 1: player promotion reward model

Table 2: reward model for completing the game

Given the above process and completion score mechanism, the RL algorithm will maximize the reward by allowing the player to advance to Tier 8, where the player will eventually encounter death. In order to minimize the possibility of unexpected behavior, the RL algorithm will also be punished for players dying prematurely.

Update model

The RL algorithm uses Q-Learning, which is improved to adapt to the random behavior of Agent execution. In traditional Q-Learning, an agent takes 1 action between each state, and on this basis, the agent's action is updated according to the probability distribution of all enemies / items generated on the floor, as shown in the following formula.

Where Q'is the updated value of the Q matrix, Q is the Q matrix of the state s, and there is a pair of actions on the time step t, α is the learning rate, r is the discount factor, and the overlined component is the estimation of the future value based on the average return of the time step t + 1.

Realizing Global RL training through GCP

The global AI model uses the game data collected by all players for training. When the players have not played the game, the global AI model is used as the basic RL model. New players will get a local copy of the global RL model when they start the game for the first time, which will be adjusted to their own game style as they play the game, and their game data will be used to further enhance the global AI model for use by new players in the future.

The architecture shown in figure 2 outlines how to collect data and how to update and distribute the global model. GCP is used because their free-use products are best suited for model training that collects and stores game data. In this regard, the game routinely calls GCP's cloud function to store data in the Firebase database.

We use the components of the free GCP architecture to collect game session data for all players to create a global RL model. Although players start the game using global RL mode, their personal experience will create a customized local RL mode to better suit their game style.

The above is how to use GCP to develop Roguelike games with reinforcement learning. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report