How does Ali achieve 100% containment of online business? 07/09 Update SLTechnology News&Howtos

How does Ali achieve 100% containment of online business?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

This article will show you how to build million-level container technology. As we all know, Alibaba launched millions of containers before the "double 11" event. In the face of such a large scale, what functional features does Alibaba's container technology have to help it land quickly? I will share with you from the perspective of scene pain points and solutions.

Nowadays, the concept of container and Kubernetes is very hot, but I believe that there are still some enterprises that have not containerized their business internally, and the same is true according to the feedback from our conference site. This shows that in the process of containerization, no matter how the technology falls to the ground, we have some requirements for the environment, and there are always some problems that need to be solved. Next, let's take a look at how Ali makes the online business 100% containerized.

First of all, let's take a look at what the online business is. For example, people buy things on Taobao and Xianyu, or use Alipay to pay some fees, and these services need to provide real-time services. This kind of business is very extensive in Ali. Also include such as video (Youku), search, Ali proprietary cloud also have container capabilities.

Introduction to PouchContainer

First of all, I'd like to introduce you to some history of PouchContainer. Ali began to build container technology in 2011, but this container technology is not as popular as the later Docker. What is the main reason? The main reason is that we serve internally, only to create a container environment to improve the resource utilization of the group, but it has not abstracted the mirroring technology of the current community.

We can think that mirroring technology is one of the key points in the outbreak of container technology, because it brings continuous delivery capability to our business, which allows our business and applications to be issued incrementally. In 2011, Ali is based on LXC's CGroup and Namespace technology. At that time, we already had the technology to make containers, adopted LXC technology at that time, and quickly pushed the container technology of the time to run online.

In 2015, we found that the development of external Docker was becoming more and more popular, so we pushed Docker's mirroring technology within the group. The integration of LXC and Docker, coupled with the necessary technology evolution, formed the current PouchContainer. On the "Singles Day" in 2017, PouchContainer announced that it was open source. Now, including any feature of Ali internal PouchContainer and any addition, deletion, modification and query of any line of code, you can see it on GitHub.

Let's talk about what Ali container technology considered in the process of evolution, which may be different from the container technology in the community. Docker, for example, advocates the idea of one process one container (only one application is always running in the container), but is this the application architecture in the enterprise? Maybe a lot of them aren't. There are a lot of business in the development process is to rely on some other system applications.

For example, we have many applications that are difficult to unbind from the underlying infrastructure. These applications will be delivered using some underlying systemd and crond, and its logs may naturally have to be delivered with syslog. Even for the convenience of the operation and maintenance staff, we need to support a SSHD in this runtime environment. The concept of Docker is not particularly suitable for such needs, which requires container technology to meet.

We must first meet the requirements of developers, and then meet the requirements of operation and maintenance personnel, so that the new technology we provide will not be intrusive to the existing technology architecture. Once the new technology is not intrusive, its promotion tends to be more transparent and faster, and there will be less resistance to implementation within the enterprise.

However, if an infrastructure technology, when providing business services, has a lot of requirements for the business side, then such promotion will often face a lot of resistance. Then why are we doing this? The reason why we want to do PouchContainer (rich container) is because we can't cause any intrusiveness to developers and operators, and we can't invade their processes. We must provide a technology that is completely transparent to them, so that we can quickly containerize all businesses in a very short time. This is also a relatively great value of our existence.

PouchContainer architecture

This is a map of PouchContainer's knowledge architecture:

Now I would like to interpret this picture for you: if we "cut" horizontally in the middle, we will find some concepts in deployment choreography, such as Kubernetes, for example, our CRI and Pod are choreographed concepts, our PouchContainer can easily support Kubernetes, and we have enhanced the Kubernetes experience. I will talk to you at the end of this article. The next floor is container API. For container API, you can see a management domain of container manager, including network and storage. The upper and lower layers are orchestrations and container domains.

If this picture is cut vertically, it will be easier for everyone to understand, on the left is our scheduling domain and Kubernetes;, the second layer is our engine, and the third layer is our Runtime layer, including our runc, runlxc and katacontainer. On the far right is the container's running carrier: Pod, container, or katacontainer.

Technical characteristics of PouchContainer

Rich container

From a functional point of view, what problems does PouchContainer help you solve in real situations? First, the rich container puts everything seen by the business operation and maintenance domain in one container. Why would you do that? When we provide a container technology, we do not want to have any impact on application development and operation and maintenance, otherwise it will be difficult to promote in the enterprise. In terms of the operation and maintenance team, the operation and maintenance team will maintain stability, and they often do not want you to enter too many variables. For example, operation and maintenance students may be very dependent on the set of tools they use, and once this set of tools becomes unavailable in front of our new technology, they will say "no" in the face of new technology.

Rich container technology is to package all the content needed by operation and maintenance into this image. During the operation of the application, these operation and maintenance components continue to play a role. Only in this way, the rich container technology can better adapt to the application. It can be said that rich container technology is an important prerequisite for the rapid realization of 100% containers within Alibaba Group. If you feel that the promotion of container technology in the enterprise is slow or hindered, you might as well try the rich container technology to simplify the process.

From a technical architecture point of view, what are the advantages of our rich containers? First, there will be a systemd inside our container, much like a virtual machine. What is the role of this systemd? Systemd is mainly used to manage the interior of the container in more detail. For example, the zombie process is easy to manage with it, and Docker may not be able to do some of this work. At the same time, he also has the ability to control more system services in the container, including syslogd, SSHD and so on. In this way, the operation and maintenance can be ensured without any changes to the system.

Second, the rich container makes more detailed requirements for the management and control layer of the container, such as our prestart hook and post stop hook. In order to meet some of the needs of the operation and maintenance staff, when we do some initialization control, we will do some pre-work before it starts, or some post-work after stopping.

The value of a rich container can be divided into two points:

Rich containers are fully compatible with container images. Compatibility with container images has no effect on your business delivery efficiency.

Rich containers are compatible with our operation and maintenance system, fully protect the original and existing operation and maintenance capabilities of the operation and maintenance system, and do not have any intrusiveness to the existing operation and maintenance system.

Enhanced isolation

Next, I'll share with you about the isolation of PouchContainer. In terms of isolation, I will start from the direction of personal experience:

First, resource visibility isolation. If you are a deep container user, you should use containers in a production environment. If you are using the JAVA app, you should feel it more deeply. If you create a container on a host with 10 gigabytes of memory and give it 1 gigabyte of memory, your / proc/meminfo in the container will show the host's 10 gigabytes of memory instead of the one gigabyte you set. What is the impact of such a display?

In many cases, application startup is often not just about booting up binaries. For example, JAVA applications tend to determine some resources in the host's / proc/meminfo, and then dynamically allocate the JVM stack size. If the JAVA application in the container is launched in this way, then any resource restrictions you set will not help. This may also lead to OOM exceptions for JAVA applications. In this case, we use LXCFS technology to generalize to solve this problem, for our application, development and operation and maintenance do not need to make any changes.

In fact, what it does is relatively simple, as long as we make a resource limit to a container, in CGroup filesystem, there will be such a value, this is a file under memory.limit_in_bytes. Such a virtual file, in fact, LXCFS can read it dynamically, and then generate a virtual file, and then mount the file to the appropriate location in the real container. During the container startup process, the value he gets is the value that it really limits.

Now the isolation of resource visibility is solved, and there are no OOM exceptions in the application of JAVA. So the question is, why are we doing such a thing? Many e-commerce applications are JAVA applications, and DiskQuota is mainly used to do the disk quota of containers.

In the container public service, I would like to briefly introduce a few questions:

First, if we do not manage container storage through block, but through some pure file system, the file system perspective can be isolated but not disk quota. In other words, if two containers share a file system, although I can't see you, I can use my file system disk to make it impossible for you to use. This is also a situation of mutual interference.

Second, inode (index node). That is, I desperately create small files in my container, I don't take up disk space, and when I try to create small files, the other container doesn't work at all (some basic commands can't be executed). These are all problems with isolation. Many applications in PouchContainer also encounter this situation at run time. If an application is not written well, it may write the file through a loop, causing the disk to explode. So how do we solve this problem? We do this through DiskQuota.

So much for the container public service. Let's move on to the second issue of enhanced isolation.

Second, the isolation feature is hypervisor-based container. Some students have this question: can PouchContainer support some old kernels? We have also done some work in this respect, and we can run an old kernel in the runV virtual machine created by PouchContainer. What will be the effect of this?

In this way, all of our stock business can embrace Kubernetes. There must be a lot of applications based on the 2.6.32 kernel in the enterprise. So far, these applications have nothing to do with Docker or Kubernetes. Why is this happening?

Because those technologies have requirements for the kernel, that is, the new technology has requirements for the kernel, and applications that rely on lower versions of the kernel are completely unable to use the new technology. As far as enterprises are concerned, do they want to join Kubernetes or not? I must want to do it. How to get on it? Is there a capability to upgrade all of our Host OS to 3.10 or kernels like 3.19,4.4,4.9? But why hasn't anyone tried yet?

Because the operation and maintenance team can not bear some of the impact on the upper-level business after the upgrade. But there is a way, if we run the old kernel in our runV virtual machine, and we can run the upgraded kernel on host, then we have the ability to completely upgrade the kernel of the physical machine in the data center.

Because the host kernel is no longer coupled with our business, it can be completely upgraded. But our runV virtual machine is still running the old kernel, and the application is still in the runV virtual machine. In this way, the infrastructure of the entire data center can be upgraded. I believe this kind of operation is very lethal in the traditional industry. Because everyone is thinking about how to solve the problem of heterogeneous operating systems.

If your kernel cannot be upgraded, you will not be able to meet some new technologies, and the process of digital transformation will be slow.

P2P image distribution

This is Alibaba's image distribution technology, we think that image distribution is a very large topic. Do you know the average size of the container image? Is there any enterprise whose average image size is less than 500m? In fact, this situation is rare, which means that the mirror image is very large. What kind of problems will occur when the mirror image is very large? For example, in some of Ali's major promotions, such as Singles' Day, we need to distribute a large number of images to various machines, in which distribution efficiency is a key point that we need to consider.

If you ignore this problem, you will find that 1000 machines are sending download image requests to a registry, and that registry may have a problem. We start with the problem, what we need to do is Dragonfly (P2P Intelligent File Distribution system), which has some characteristics in this aspect, mainly to solve some distribution bottlenecks. Now open source Dragonfly also has some of our users, including companies like Lazada.

Some of its main features are mainly focused on the distribution of cloud native applications, which has three dimensions:

Efficiency of distribution

Distributed flow control

The security of distribution.

In-situ upgrade function of container

This is a function for stateful applications. PouchContainer provides a Upgrade interface at the container engine level to implement the container upgrade function. Many of the ideas advocated by CNCF are stateless, that is, stateless. But are there really so many stateless applications in the enterprise? Maybe not, but a considerable part of the reality is still in a state.

So how should we upgrade and update these stateful businesses? In the process of updating, can you inherit the previous state to continue A? This is what we need to consider. During the upgrade process, we did an operation such as Upgrade of the container. In fact, as far as I know, the architecture of applications in domestic Internet companies is relatively advanced, but most companies still use some stateful containers, in fact, they have each achieved such a function.

How on earth should the container be upgraded in place? In fact, it is very simple, all the stateful things remain the same, and you can upgrade what you want to upgrade.

So what on earth needs to be upgraded? From a technical point of view, we generally think that mirrors need to be upgraded. In the process of running, we really stop a container, then remove the old image and fill in the new image in the process of building the file system, and then start the original command to ensure a complete upgrade in place. At present, we are integrating this function into our own scheduling system, including our internal Sigma also rely on such a function to do business upgrade.

Native support for Kubernetes

Here is an introduction to PouchContainer's native support for Kubernetes. When it comes to native support for Kubernetes, we mainly implement the CRI container runtime interface. You have heard CRI (Container Runtime Interface) many times today, which is mainly used to decouple Kubernetes's dependence on the underlying container Runtime. Then why do we have to practice CRI?

The reason is simple: because we want to transmit so many production-level functions of PouchContainer to the Kubernetes system. We all know that Kubernetes supports LXCFS, so does it support our upgrade? It is definitely not supported. Since it is not supported by Kubernetes, can Kubernetes land directly inside Alibaba? It is often very difficult, and we have to transmit these enhancements of Runtime to our Kubernetes before we can better land the container technology.

Conclusion

The above is Alibaba's practice in the container field. You are welcome to communicate with us later.

Original link: https://mp.weixin.qq.com/s/cBJT1C3YzXCsXwORQs1ovw

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.