Reveal the IT infrastructure behind LOL? SDN unlocks the new infrastructure. 04/21 Update SLTechnology News&Howtos

Reveal the IT infrastructure behind LOL? SDN unlocks the new infrastructure.

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Welcome to the Tungsten Fabric user case series to discover more application scenarios of TF. The protagonist of the "revealing LOL" series is Tungsten Fabric user Riot Games Game Company. As the developer and operator of "League of Legends" of LOL, Riot Games faces the challenge of complex deployment on a global scale. Let's reveal the "heroes" behind LOL and see how they run online services.

Authors: Doug Lardo and David Press (article source: Riot Games)

The writers David Press and Doug Lardo are two engineers at Riot who are committed to improving the data center network to support Riot's online services. This article, the third in a series of articles on this topic, will discuss our SDN (Software defined Network) approach, how to integrate SDN with Docker, and this combination as a new infrastructure example for us to unlock. If you are curious about how SDN transforms the infrastructure, how to enable developers to access and protect network resources through API, or how to get rid of buying larger and larger private network devices, please refer to this article.

In the first article, Jonathan mentioned some of the network challenges in launching the service to support the new features of "League of Legends". It turns out that (deployment on the network) this is not as easy as installing code on the server and pressing enter.

The new features require the capabilities provided by the network infrastructure, including:

Connectivity: low-latency and high-throughput access security to players and internal services: prevent unauthorized access and DoS attacks and communicate when needed to minimize the impact of vulnerabilities on packet services: load balancing, network address translation (NAT), virtual private network (V P N), connectivity and multicast forwarding

Traditionally, setting up these network services has been the area of work of super professional network engineers who log on to a single network device and enter commands that I'm sure are "pure witchcraft". Configuring these usually requires an in-depth understanding of the network, the relevant configuration, and the response to problems.

However, as a result of continuous expansion, the differences between data centers are getting bigger and bigger, making the situation more complicated. For two network engineers in two different data centers, the same goal may seem to be completely different actions and tasks.

All of this means that changes in the data center network infrastructure often become a bottleneck for the introduction of new services. Fortunately, at Riot, anything that hinders the provision of novel special effects to players will receive immediate and serious attention. The rCluster platform is designed to address this bottleneck, and in the following sections, we will delve into its key components: overlay network concepts, OpenContrail (editor's note: renamed to Tungsten Fabric, where OpenContrail appears below, replaced by Tungsten Fabric) implementation, and integration with Docker. In the next article in this series, we will cover some details, such as security, load balancing, and system extension.

SDN and overlay networks

SDN has become a buzzword that means different things to different people: for some people, this means that network configuration should be defined by software; but at Riot, it means that our network functions should be programmed with a consistent API.

By making the network programmable, we can write automated programs that greatly expand our ability to quickly deploy changes to the network. We only need to run a command without having to encapsulate it in a large number of devices to make changes. (note: the concept of encapsulation is to turn the functional requirements of the network into different network device commands). We have changed the time of global network change from days to minutes, and in this way, we can do other cool things in our spare time.

Network devices have been programmable for some time, but the interfaces that program these devices are changing and evolving throughout the industry, and there is no uniform standard for all types of devices and all vendors. Therefore, it is a very difficult task to write powerful automated programs that can communicate with each interface of multiple vendors. We also know that having a consistent API as an abstraction layer above the hardware is a key requirement for Riot to effectively extend its network configuration management and operations. So we turned to the overlay network. (editor's note: the reason for explaining the programmability of network devices in front of overlay is that the network serves applications, and because applications are constantly changing, the configuration of the network also needs to be constantly changed. although network devices are programmable and can be choreographed for business and network, they are also faced with challenges. Different vendors, different configurations, and different API are difficult to unify. )

There is no doubt that the overlay network is on top of the existing network. Applications within the overlay network are unaware of the existence of the network because it feels exactly like a physical network. If you are familiar with virtual machines, the same "physical internal virtual" paradigm applies to virtual networks. A physical network can host many virtual networks. In a virtual machine, applications think they have an entire physical machine, but in fact, they have only a small number of virtual machines. Overlay networking is a similar concept, an internally created physical infrastructure with virtual networks (called underlay networks).

This approach enables us to hide various physical network details that Riot engineers do not have to worry about. Engineers no longer need to ask questions such as "how many ports are there", "which suppliers do we have" and "where should the security policy be put?" Instead, we can provide a consistent API program that allows engineers to focus on what they want to do.

Using the same API in every data center that Riot operates allows us to write automation that works effectively anywhere and at any time, whether it's in the first data center in the past, or in a more modern design. In addition, we can also look for other cloud service providers, such as Amazon, Rackspace, Google Compute, etc., and our API is still available.

In this way, our underlying physical hardware may be Cisco, Juniper, Arista, Dell, D-Link, white box, gray box, a bunch of Linux boxes with 10GE ports, which doesn't matter. But the Underlay network must be built using specific methods, such as automatic configuration templates (see the next series of articles for more information), but this allows us to decouple physical construction and configuration from the service configuration required by the application. When we interface underlay and software services, there are more benefits to keeping the underlay network stable, we can allow underlay to focus on providing highly available packet forwarding and allow us to upgrade the physical network without having to worry about destroying applications that were previously tightly coupled to the physical infrastructure. It also simplifies our operations, allows services to move in and out of any data center, and eliminates the risk of vendor locking.

All in all, we think the overlay network is great.

Tungsten Fabric

When we first started to evaluate SDN, we studied various SDN projects across the industry. Some configure the physical network through a central controller, while others provide an abstraction layer that converts API calls into vendor-specific instructions. Some solutions require new hardware, while others can run on existing infrastructure. Some are developed by large companies, others are open source projects, or are provided by startups.

In short, we spend a lot of time doing our homework, which is not an easy decision. The requirements we need to meet include:

Providing functionality in our data centers (old or new), bare metal, and the cloud is open source projects, but don't disappear overnight can provide professional assistance to our deployment journey.

In the end, our eyes fell on Juniper Networks's Tungsten Fabric project. Tungsten Fabric was designed from the beginning as an open source, vendor-independent solution that can be used with any existing network. Its core is that both BGP and MPLS-- are protocols that have been proved to be scalable throughout Internet. Juniper Networks is certainly not going to disappear anytime soon, and it provided a lot of help when we designed and installed the first cluster. (click the "TF Architecture Series" article to see all the details of this controller.)

Tungsten Fabric consists of three main components: a centralized controller ("brain"), a vRouter (virtual router), and an external gateway. Each component is a member of a high-availability cluster, so any single device failure does not damage the entire system. API interaction with the controller immediately triggers it to push all necessary changes to the vRouter and gateway, which then physically forwards traffic on the network.

The Overlay network consists of a series of tunnels between vRouter, and the optional protocols are GRE w / MPLS, UDP w / MPLS or VXLAN. When a container wants to communicate with another container, vRouter first looks up the location of the container in the list of policies previously pushed to it by the controller, and then forms a tunnel from one compute node to another. The vRouter at the receiving end of the tunnel checks the internal traffic to see if it matches the policy, and then passes it to the expected destination.

If the container wants to communicate with Internet or non-overlapping (non-overlay) destinations, traffic is sent to one of the external gateways. The gateway removes the tunnel and sends traffic to the Internet, leaving the unique IP address of the container intact. This makes it easy to integrate with legacy applications and networks because no one outside the cluster can tell whether the traffic is coming from the overlay network.

Docker integration

If we can't get the container running on the overlay network and do some practical work for players, then all of this is just an interesting thought experiment.

Tungsten Fabric is a virtualization-independent SDN product, so it needs to be integrated with the choreographer to associate scheduled computing instances with the network capabilities provided by Tungsten Fabric. Tungsten Fabric has a strong integration with OpenStack through the Neutron API driver, but since we have our own coordinator Admiral, we also need to write our own custom integration.

In addition, the integration of Tungsten Fabric and OpenStack was originally designed for virtual machines, and we want to apply it to the Docker container. This requires working with Juniper Networks to provide a service we call "Ensign" that runs on each host and handles the integration between Admiral, Docker, and Tungsten Fabric.

To explain how we integrate Docker with Tungsten Fabric, we need to learn a little bit about Linux networks. Docker uses a feature in the Linux kernel called Network Namespace (network namespace) to isolate containers and prevent them from accessing each other. Network namespaces are essentially separate stacks of network interfaces, routing tables, and iptables rules. Those elements in the network namespace apply only to processes started in the namespace. It is very similar to the chroot used in the file system, except that it is applied to the network.

When we started using Docker, there were four ways to configure the container to attach it to the network namespace:

Host network host network mode: Docker places processes in the host network namespace, effectively making them completely unisolated. Bridge network bridging network mode: Docker creates an Linux bridge that connects the network namespaces of all containers on the host and manages iptables rules to transport NAT traffic from outside the host to the container. From network from network mode: Docker uses the network namespace of another container. None network no network mode: Docker sets a network namespace with no interfaces, which means that processes in it cannot connect to anything outside the namespace.

The "networked model" is created specifically for third-party network integration, which is very helpful to what we are trying to do. After starting the container, a third party can connect the container to all the components required by the network and insert it into the network namespace.

However, this also poses a problem: the container has been started and there is no network connection for some time. This is a bad experience for applications because many people want to know which IP addresses are assigned at startup. Although Riot developers may have implemented the retry logic, we don't want to burden them. In addition, many third-party containers cannot handle this problem, and there is nothing we can do about it. A more comprehensive solution is needed.

To overcome this problem, we found a "network" container on Kubernetes that starts before the main application container. We first start the network container in "no network mode" (because it does not require a connection or IP address, so there is no problem), after completing the network setup and assigning IP using Tungsten Fabric, start the master application container and use "slave network mode" to attach it to the network container's network namespace. With this setting, the application has a fully operable network stack at startup.

When we launch a new container inside a physical computing node (or host), vRouter provides the container with a virtual NIC, a globally unique IP address, and any routing or security policies associated with the container.

This is very different from the default Docker network configuration, where each container on the server shares the same IP address, and all containers on a machine are free to communicate with each other. This behavior violates our security policy, and by default, two applications could never have performed this operation. Providing each container with its own IP address in a secure, feature-rich virtual network enables us to provide a consistent, "best-in-class" network experience for the container. It simplifies our configuration, security policy, and avoids the complexity of many Docker containers sharing the same IP address with the host.

Conclusion

We still have a long way to go to SDN and infrastructure automation. We have learned a lot about best practices on how to build autonomous networks, how to debug connectivity problems on overlay networks, and how to deal with new failure modes. In addition, we must deploy this SDN in the two-generation network architecture of the cluster itself and integrate it with six "traditional" data center architectures. This includes investing in automation and learning how to ensure that our system is trustworthy and that testing is well balanced.

Having said that, we now see the results of this work every day, and Riot engineers can now develop, test and deploy their services globally through self-service workflows, transforming the network from persistent delays and setbacks to a powerful tool in value-added services and every developer's toolbox.

In rCluster's next article, we will discuss security, network blueprints, and ACL, including how the system scales, and some of the work we have done to improve uptime.

If you have any thoughts or questions, you are welcome to contact us.

More articles in the "revealing LOL" series

Reveal the IT infrastructure behind LOL? embark on the journey of deployment diversity

Uncover the IT infrastructure behind LOL? the key role "scheduling"

Follow Wechat: TF Chinese Community

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.