Uncover the IT infrastructure behind LOL? the key role "scheduling" 07/08 Update SLTechnology News&Howtos

Uncover the IT infrastructure behind LOL? the key role "scheduling"

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Welcome to the Tungsten Fabric user case series to discover more application scenarios of TF. The protagonist of the "revealing LOL" series is Tungsten Fabric user Riot Games Game Company. As the developer and operator of "League of Legends" of LOL, Riot Games faces the challenge of complex deployment on a global scale. Let's reveal the "heroes" behind LOL and see how they run online services.

Authors: Kyle Allan and Carl Quinn (article source: Riot Games)

We are Kyle Allan and Carl Quinn, working on the infrastructure team at Riot. Welcome to the second article in this series, which details how we deploy and operate back-end features around the world. In this article, we will delve into deploying the first core component of the ecosystem: container scheduling.

In Jonathan's first series of articles, we discussed the deployment history of Riot and the challenges we face. In particular, he outlined the increasing difficulty of our software deployment as we continued to add infrastructure to League of Legends, especially in scenarios such as "manually configuring the server for each application." Later, there was a tool called Docker that changed our approach to server deployment-- further iterated within us to produce Admiral, our internal tool for cluster scheduling and management.

Importantly, the journey of application deployment is far from over, it is still evolving, and we are preparing for the next phase (possibly using DC/OS, which we will discuss later). This article describes how to get to this point, and why such a decision was made, in the hope that others can benefit from this story.

What is Scheduling and why?

When Docker emerged and Linux containerization became a more well-known technology, we realized that we could benefit from the implementation of containerization infrastructure. The Docker container image provides an immutable, deployable "artifact" that can be built and deployed in development, testing, and production at once. In addition, it ensures that the dependencies of the images running in the production environment are exactly the same as those during testing.

Another benefit is particularly important: Docker allows the deployment unit (container) to be decoupled from the computing unit (host) by using the scheduler to assign the container to the host (hopefully in an intelligent way), thus eliminating the coupling between the server and the application-- a given container can run on any number of possible servers.

By packaging back-end services as Docker images that can be deployed and extended to a server cluster at any time, we should be able to adapt quickly to change. We can add new player features, expand capacity when traffic increases, and quickly introduce updates and fixes. When considering deploying services in a container to a production environment, there are three main issues that need to be addressed:

Given a host cluster, how do you select a specific set of hosts to receive a set of containers? How do these containers actually start on a remote host? What happens when the container "crashes (or shuts down)"?

The answer to these three questions is that we need a scheduler-a service that runs and enforces our container policy at the service cluster level. The scheduler is a key component of maintaining clusters, ensuring that containers are running in the right place, and restarting containers when they exit.

For example, we might want to start a service such as Hextech Crafting, which requires six container instances to handle its load. The scheduler is responsible for finding hosts with sufficient memory and CPU resources to support these containers and doing whatever is necessary to make them run. If one of these servers fails, the scheduler is also responsible for finding replacement hosts for the affected containers.

When we decided to use the scheduler, we quickly prototyped to see if containerized services were right for us in production. In addition, we need to ensure that existing open source options work in the current environment, or that maintainers are willing to accept our adjustments.

Why write it yourself?

Before we started writing the Admiral scheduler, we investigated the state of the existing cluster manager and scheduler. Who dispatches containers between Docker host clusters and how do they do that? Can their technology still solve our problems?

In the initial study, we investigated a number of projects:

Mesos + Marathon

These technologies are quite mature and can be used on a large scale, but they are complex and tricky to install, which makes them difficult to try and evaluate. At the time, their support for containers was very limited, did not track the rapid development of Docker, and did not perform well in the Docker ecosystem. They do not support pods-we believe that sidecar containers need to be bundled with many services (note: sidecar is a mode of container logs).

LMCTFY = > Kubernetes

Kubernetes has just evolved from LMCTFY, and although it looks promising, it is not clear whether its future development will meet our needs. Kubernetes does not have a constraint system for container placement as we need it.

Fleet

Fleet was later open source and was not mature enough at that time. Fleet seems to focus more on the deployment of system services than on regular application services.

We reducified a small command-line tool that communicates with Docker API through REST and successfully demonstrated how to use this tool to coordinate deployment. Then we decided to continue to write our own scheduler.

We draw on some of the best features of the systems we have studied, including the core ideas behind Kubernetes's Pods and Marathon's constraint system. Our vision is to track the architecture and functionality of these systems, influence them where possible, and eventually try to integrate with one of them in the future.

Overview of Admiral

After creating a basic deployment metadata language based on JSON (which we call the CUDL,ClUster description language), we began to write Admiral. CUDL becomes the language used by Admiral in its REST API, and the two main components are as follows:

Cluster-A set of Docker hosts. Packs-the metadata needed to start a set of one or more containers. Similar to Kubernetes Pod plus Replication controller.

Clustering and packaging have two different aspects: spec and live. Each aspect represents a description of the different stages of the container life cycle.

Spec, which represents the required state of the element

Published from some external factual source (such as source control) to Admiral and delivered to Admiral, Spec clusters and hosts describe the resources available in the cluster Spec packages describe the resources, constraints, and metadata needed to run the service

Live, indicating the state of the element that has been implemented

Mirror actual running object Live cluster and host image running Docker daemon Live packaging image running Docker container group achieves recoverability by communicating with the Docker daemon

When Admiral is written in Go and runs in a production data center, it is compiled and packaged into a Docker container. Admiral has several internal subsystems, most of which are shown in the following figure.

From the user's point of view, interaction with Admiral is done through the admiralctl command-line tool it provides, which communicates with Admiral through REST API. With admiralctl, users can access all the functions of Admiral through standard instructions: POST's new Spec package for scheduling, DELETE old package (Packs), and the current state of GET.

During production, Admiral uses Hashicorp's Consul to store Spec state and back it up periodically in case of catastrophic failure. In case of complete loss of data, Admiral can also partially rebuild its Spec state using the information in the Live state retrieved from each Docker daemon.

The reconciler belongs to the core of Admiral and is the key subsystem that drives the scheduling workflow. The coordinator periodically compares the actual Live state with the desired Spec state, and when differences occur, schedule the necessary actions to restore the Live state to normal.

The Live state and its driver package support the coordinator by caching the Live host and container state and providing communication with all Docker daemons on the cluster host through its REST API.

Deep scheduling

The coordinator of Admiral can operate on Spec packaging and effectively convert it to Live packaging. When the Spec is packaged and submitted to Admiral, the coordinator creates containers and starts them using the Docker daemon. It is through this mechanism that the coordinator achieves the two most important high-level scheduling goals we described earlier. When the coordinator receives the Spec package, it will:

Evaluate the resources of the cluster and packaging constraints to find the appropriate host for the container. Know how to start the container on a remote host using the data in Spec.

Let's look at an example of launching a container on a Docker host. In this example, we will use the local Docker daemon as the Docker host and interact with the local instance of the Admiral server.

First, we use the "admiral pack create" command to start a package. This command is specific to a specific cluster and submits the Spec-packaged JSON to the Admiral server.

You'll notice that the container started on my machine almost as soon as the command was run. This container is started with the parameters in my package file, as follows:

Next, after invoking "admiral pack create", we can use the "show" command to view the Live packaging created by Admiral. The command here is "admiral pack show".

Finally, by clicking on the service in the container, we can verify that the packaging is working properly. Using the information from the "admiral pack show" command, we can spell out our service through a simple curl:

Within Admiral, the coordinator is always running to ensure that the Live state of the cluster always matches the desired Spec state. In this way, we can also resume business when the container fails and exits due to a crash, or when the entire server is unavailable due to a hardware failure. The coordinator strives to ensure that the state matches so that the player will never encounter interruption problems. This feature solves the third and final problem we mentioned earlier: when the container exits unexpectedly, we can recover quickly and keep the impact to a minimum.

The existing container started with the "admiral pack create" command is shown below. Then I will terminate the container and stop its execution. Within seconds, the coordinator starts a new container (with a different ID) because it realizes that the Live state does not match the Spec state.

Resources and constraints

In order to best allocate containers, the scheduler must have insight into the host cluster. There are two key components to solve this problem:

Resources-A representation of the resources available to the server, including memory, CPU, Imax O, and other resources such as the network.

Constraints-A set of conditions that come with packaging that provides the scheduler with detailed information about the restrictions on which packaging can be placed. For example, we might want to place a packaging instance:

On each host in the entire cluster, on a specific host named "myhost.riotgames.com", in each marked area in the cluster

By defining resources on the host, we give the scheduler the flexibility to decide where to place the container.

By defining constraints on the package set (packs), we can limit the choice of schedulers in order to force specific patterns to be applied to the cluster.

Conclusion

For Riot, Admiral is an important part of the evolution of our deployment technology. By leveraging the capabilities of Docker and the scheduling system, we can deliver back-end functions to players faster than ever before.

In this article, we delve into some of the functions of Admiral and show how to schedule containers between a set of machine clusters. As Jonathan mentioned in his first article, the open source world has quickly shifted to very similar models. Looking ahead, we will shift the work of Admiral and focus on deploying DC/OS, which has become one of the leading open source applications for scheduling container workloads.

If you have experienced a similar journey, or if you feel you have something to add, you are welcome to contact us.

More articles in the "revealing LOL" series

● reveals the IT infrastructure behind LOL? embarking on the journey of deployment diversity

Follow Wechat: TF Chinese Community

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.