Meituan Container platform Architecture and Container Technology practice 10/22 Update SLTechnology News&Howtos

Meituan Container platform Architecture and Container Technology practice

2025-10-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hello, everyone. I am the host of this official account. The programmer of Meituan's technical team encourages Shi Meimei. Based on the speech delivered by Ouyangjian, Technical Director of Meituan Infrastructure Department / Container R & D Center, at the 2018 QCon (Global Software Development Conference), this paper describes the architecture design of Meituan container platform and the practice of container technology.

Background

Meituan's container cluster management platform is called HULK. HULK in Marvel Animation turns into a "hulk" when it gets angry, which is very similar to the "elastic expansion" of the container, so we named the platform HULK. It seems that some companies' container platforms are also named by this name, which is purely coincidental.

Meituan started using containers in 2016, when Meituan already had a certain scale and various systems that existed before the use of containers, including CMDB, service governance, monitoring alarms, publishing platform, and so on. When we explore container technology, it is difficult to give up the original assets. So the first step in containerization is to open up the lifecycle of the container to interact with these platforms, such as container application / creation, deletion / release, release, migration, and so on. Then we verify the feasibility of the container and confirm that the container can be used as the running environment for the online core business.

In 2018, after two years of operation and practical exploration, we upgraded the container platform, which is the container cluster management platform HULK 2.0.

The OpenStack-based scheduling system is upgraded to the de facto standard Kubernetes in the field of container orchestration (hereafter referred to as K8s).

A richer and more reliable container elasticity strategy is provided.

Aiming at some problems encountered in the basic system before, it is optimized and polished.

Meituan's current container usage is that there are more than 3000 online services and more than 30000 container instances, and many core link services with high concurrency and low latency requirements have been running steadily on HULK. This paper mainly introduces our practice in container technology, which belongs to basic system optimization and polishing.

The basic architecture of Meituan container platform

First of all, let's introduce the infrastructure of Meituan's container platform. I believe that the architecture of each container platform is roughly the same.

First of all, the container platform interfaces with service governance, publishing platform, CMDB, monitoring and alarm systems, and so on. By communicating with these systems, the container achieves a basically consistent experience with the virtual machine. Developers can use containers the same way they use VM, without changing their original habits.

In addition, the container provides flexible capacity expansion, which can dynamically increase and decrease the number of container nodes according to a certain elastic strategy, so as to dynamically adjust the service processing capacity. There is also a special module-"Service Portrait". Its main function is to better complete the scheduling container and optimize the resource allocation through the collection and statistics of the service container instance operation index. For example, you can tell whether a service is compute-intensive or IO-intensive according to the CPU, memory and IO usage of the container instance of a service, and try to put complementary containers together when scheduling. For example, we can know that each container instance of a service will have about 500 processes at run time, so we will add a reasonable limit on the number of processes to the container when creating the container (for example, a maximum of 1000 processes). In order to prevent the container from taking up too much system resources when something goes wrong. If the container of this service suddenly applies to create 20000 processes while running, we have reason to believe that the business container encounters Bug, restricts the container through the previous resource constraints, and issues an alarm to inform the business to process it in a timely manner.

Next level are Container orchestration and Mirror Management. Container choreography solves the problem of dynamic instances of containers, including when the container is created, where it is created, when it is deleted, and so on. Image management solves the problem of container static instances, including how container images should be built, how to distribute them, where they should be distributed, and so on.

The lowest layer is when our container runs, Meituan uses the mainstream Linux+Docker container scheme, and HULK Agent is our administrative agent on the server.

Expand the previous Container Runtime specifically, and you can see this architecture diagram, which is introduced in the order from bottom to top:

The lowest layer is the basic physical resources such as CPU, memory, disk, and network.

One level up, we are using CentOS 7 as the host operating system, and the version of the Linux kernel is 3.10. Based on the default kernel of CentOS distribution, we add some new features developed by Meituan for container scenarios, and optimize some kernel parameters for service-oriented services with high concurrency and low latency.

Further up, we are using the Docker that comes with the CentOS distribution, the current version is 1.13, again, with some of our own features and enhancements. HULK Agent is our own host management Agent, which manages Agent on the host. Falcon Agent exists in both the host and the container, and its function is to collect various basic monitoring indicators of the host and container and report them to the backend and monitoring platform.

The top layer is the container itself. We now mainly support CentOS 6 and CentOS 7 containers. There is a container init process in CentOS 6, which is the No. 1 process inside our development container to initialize the container and pull up the business process. In CentOS 7, we used the systemd that comes with the system as the No. 1 process in the container. Our container supports a variety of mainstream programming languages, including Java, Python, Node.js, Cpicurus +, and so on. Above the language layer are various proxy services, including Agent for service governance, log Agent, encrypted Agent, and so on. At the same time, our container also supports some business environments within Meituan, such as set information, swimlane information, and so on. With the service governance system, intelligent routing of service calls can be realized.

Meituan mainly uses the CentOS series of open source components, because we think Red Hat has a strong open source technical strength, rather than directly using the open source community version, we hope that the open source version of Red Hat can help solve most of the system problems. We also found that even with the deployment of CentOS's open source components, it is still possible to encounter problems that have not been solved by the community and Red Hat. To some extent, it also shows that the domestic large interconnection companies have reached the world-leading level in terms of technology application scenario, scale and complexity, so customers who are ahead of the community and Red Hat will encounter these problems.

Some problems encountered by the container

In the container technology itself, we mainly encounter four problems: isolation, stability, performance and promotion.

Isolation-there are two levels: the first question is whether containers can correctly understand their own resource allocation, and the second question is whether containers running on the same server will affect each other. For example, the IO of a container is very high, which will increase the latency of other container services on the same host.

Stability: this means that after high pressure, large-scale and long-term operation, the system function may be unstable, such as containers cannot be created or deleted, due to software problems such as jam and downtime.

Performance: when comparing virtualization technology with container technology, it is generally believed that the execution efficiency of containers will be higher, but in practice, we have encountered some special cases: the same code on the same configured container, the throughput and response delay of the service is not as good as that of the virtual machine.

Promotion: after we have basically solved all the previous problems, we may still encounter a situation where the business is unwilling to use containers, partly because of technical factors. For example, the difficulty of container access, peripheral tools, ecology and other factors will affect the cost of using containers. Promotion is not a purely technical issue, but is closely related to the company's internal business development stage, technical culture, organizational setting and KPI and other factors.

Implementation of Container

The container is essentially a group of related processes serving the same business goal in the system and placed in a space called namespace, where the processes in the same namespace can communicate with each other while not seeing the processes in other namespace. Each namespace can have its own independent hostname, process ID system, IPC, network, file system, user, and so on. To some extent, it implements a simple virtual: multiple unaware systems can run on a host at the same time.

In addition, in order to limit the use of physical resources by namespace, we need to make certain restrictions on the CPU, memory and other resources that can be used by the process. This is Cgroup technology, and Cgroup means Control group. For example, we often talk about the container of 4c4g, which actually limits the processes used in the container namespace, which can use up to 4 cores of computing resources and 4GB memory.

In short, the Linux kernel provides namespace completion isolation and Cgroup fulfillment resource constraints. Namespace+Cgroup constitutes the underlying technology of the container (rootfs is the container file system layer technology).

Meituan's solution, improvement and optimal isolation

I have been dealing with virtual machines before, but it was not until I used the container that I found that the information of CPU and Memory in the container is the information of the server host, not the configuration information of the container itself. up to now, this is still the case with the community version of the container. For example, in a 4c4g container, you can see resources with 40 CPU and 196GB memory inside the container. These resources are actually the information of the host where the container resides. This makes people feel like the "self-expansion" of the container, thinking that they are very capable, but in fact they do not, and it will bring a lot of problems.

The figure above is an example of memory information isolation. When obtaining the memory information of the system, the kernel of the community Linux returns the memory information of the host whether on the host or in the container. If the application in the container is configured according to the host memory found by it, the actual resources are far from enough, resulting in an OOM exception in the system soon.

The isolation work we do is that when we get the memory information in the container, the kernel returns the container's memory information based on the container's Cgroup information (similar to LXCFS's work).

The implementation of CPU information isolation is similar to that of memory. Without further discussion, here is an example of how the number of CPU affects the performance of an application.

As we all know, JVM GC (garbage object collection) has a certain impact on the performance of Java programs. The default JVM uses the formula "ParallelGCThreads = (ncpus"

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.