Getting started with K8s from scratch | detailed explanation of Pod and container design patterns 07/01 Update SLTechnology News&Howtos

Getting started with K8s from scratch | detailed explanation of Pod and container design patterns

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Author | Zhang Lei, Senior Technical expert of Aliyun Container platform, official Ambassador of CNCF

First, why do you need the basic concept of Pod container

We know that Pod is a very important concept and atomic scheduling unit in the Kubernetes project, but why do we need such a concept? There is no such saying when using container Docker. In fact, if you want to understand Pod, you must first understand containers, so let's review the concept of containers:

The essence of a container is actually a process, a process in which views are isolated and resources are limited.

The process of PID=1 in the container is the application itself, which means that managing the virtual machine is equal to managing the infrastructure, because we are managing the machine, but managing the container is equal to directly managing the application itself. This is also the best embodiment of the immutable infrastructure mentioned before, at this time, your application is equal to your infrastructure, it must be immutable.

Under the premise of the above example, what is Kubernetes? Many people say that Kubernetes is the operating system of the cloud era, which is very interesting, because if and so on, the container image is the software installation package for this operating system, and there is such an analogical relationship between them.

Examples from real operating systems

If Kubernetes is the operating system, look at an example of a real operating system.

In the example, there is a program called Helloworld, this Helloworld program is actually made up of a group of processes, it should be noted that the processes mentioned here are actually equivalent to threads in Linux.

Because the threads in Linux are lightweight processes, if you look at the pstree in Helloworld from the Linux system, you will see that the Helloworld is actually made up of four threads: {api, main, log, compute}. In other words, four such threads work together to share the resources of the Helloworld program, which constitutes the real work of the Helloworld program.

This is a very real example of a process group or thread group in the operating system, and this is a concept of a process group.

Then think about it. In a real operating system, a program is often managed according to the process group. Kubernetes compares it to an operating system, such as Linux. For the container, we mentioned earlier that it can be compared to a process, which is the previous Linux thread. So what is Pod? In fact, Pod is the process group we just mentioned, that is, the thread group in Linux.

Process group concept

When it comes to the process group, first of all, I suggest you have at least a conceptual understanding, and then we will explain it in detail.

Again, the previous example: the Helloworld program consists of four processes that share resources and files. So now there is a question: if you run the Helloworld program in a container now, what would you do?

Of course, the most natural solution is that I will start a Docker container now and run four processes in it. However, there is a question: in this case, who should be the process of PID=1 in the container? For example, it should be my main process, so the question is, "who" is responsible for managing the remaining three processes?

The core problem is that the design of the container itself is a "single process" model, which does not mean that there can only be one process in the container. Because the application of the container is equal to the process, you can only manage this process of the PID=1. The other processes that rise again are actually a managed state. So the service application process itself has the ability of "process management".

For example, the Helloworld program has the ability of system, or directly change the process of PID=1 in the container to systemd, otherwise the application, or the container, will not be able to manage many processes. Because the PID=1 process is the application itself, if you give the PID=1 process to kill now, or if it dies while running, then the resources of the remaining three processes will not be recycled, which is a very serious problem.

Conversely, if you really change the application itself to systemd, or run a systemd in the container, it will lead to another problem: managing the container is no longer managing the application itself, but managing systemd. The problem here is very obvious. For example, if the program or process of run in my container is systemd, then does the application exit? Is it fail? Is there an abnormal failure? In fact, there is no way to know directly, because the container manages systemd. This is one reason why it is often difficult to run a complex program in a container.

Here to help you sort it out again: because the container is actually a "single process" model, if you start multiple processes in the container, only one process can be used as a PID=1, and at this time, if the PID=1 process dies, or fails to exit, then the other three processes will naturally become orphans, no one can manage them, no one can recover their resources. This is a very bad situation.

Note: the "single-process" model of the Linux container means that the life cycle of the container is equivalent to that of the PID=1 process (container application process), not that multiple processes cannot be created in the container. Of course, in general, container application processes do not have process management capabilities, so other processes you create in the container through exec or ssh can easily become orphaned processes if they exit abnormally (such as ssh termination).

Conversely, you can actually run a systemd in the container and use it to manage all other processes. This creates a second problem: in fact, there is no way to manage my application directly, because my application is taken over by systemd, so the life cycle of the application state is not equal to the container life cycle. This management model is actually very complex.

Pod = "process group"

In Kubernetes, Pod is actually an analogous process group concept abstracted for you by the Kubernetes project.

As mentioned earlier, an application Helloworld composed of four processes, in Kubernetes, will actually be defined as a Pod with four containers. This concept must be understood very carefully.

That is to say, there are four collaborative processes with different responsibilities that need to be run in a container, and they are not put into one container in Kubernetes, because there are two problems here. So what do you do in Kubernetes? It starts four separate processes in four separate containers and defines them in a Pod.

So when Kubernetes pulls up Helloworld, you will actually see four containers, they share some resources, these resources all belong to Pod, so we say that Pod has only one logical unit in Kubernetes, there is no real thing corresponding to this is Pod, there will not be. What really exists physically is four containers, or a combination of these four containers is called Pod. And there is another concept that must be very clear. Pod is a unit in which Kubernetes allocates resources, because the containers in it share certain resources, so Pod is also the atomic scheduling unit of Kubernetes.

The Pod design mentioned above was not conceived by the Kubernetes project itself, but such a problem was discovered as early as when Google developed Borg. This is described very, very clearly in Borg paper. To put it simply, Google engineers have found that when deploying applications under Borg, there is a relationship similar to "process and process group" in many scenarios. More specifically, these applications tend to have close collaborative relationships before, so that they must be deployed on the same machine and share some information.

This is the concept of process groups and the use of Pod.

Why does Pod have to be an atomic dispatching unit?

You may have some questions here: although understanding this thing is a process group, why abstract Pod itself as a concept? Or is it possible to solve the Pod problem through scheduling? Why does Pod have to be the atomic dispatching unit in Kubernetes?

Let's explain it through an example.

If there are two containers now, they work closely together, so they should be deployed in a Pod. Specifically, the first container is called App, which is the business container, and it writes log files; the second container, called LogCollector, forwards the log files just written by the App container to the back-end ElasticSearch.

The resource requirements for the two containers are as follows: the App container needs 1G of memory, the LogCollector needs 0.5G of memory, and the available memory of the current cluster environment is such a case: Node_A:1.25G memory, Node_B:2G memory.

If there is no Pod concept, there are only two containers that work closely together and run on the same machine. But what happens if the scheduler first dispatches App to Node_A? At this point you will find that LogCollector cannot actually be dispatched to Node_A because there are not enough resources. In fact, at this time, the whole application itself has gone wrong, the scheduling has failed, and must be rescheduled.

The above is a very typical example of group scheduling failure. It is called the Task co-scheduling problem in English. This problem does not mean that it cannot be solved. In many projects, there is a solution to this problem.

In Mesos, for example, it does something called resource hoarding: unified scheduling begins when all tasks with Affinity constraints are reached, which is a very typical solution to group scheduling.

So the above-mentioned "App" and "LogCollector" containers, in Mesos, they will not say immediate scheduling, but wait for the two containers to be submitted before they start unified scheduling. This will also bring new problems, first of all, the scheduling efficiency will be lost, because we need to wait. Due to the need to wait, there will be another situation, that is, deadlocks, that is, waiting for each other. These mechanisms need to be solved in Mesos, but also bring additional complexity.

Another solution is the solution of Google. It does a very complex and powerful solution in the Omega system (that is, the next generation of Borg), which is called optimistic scheduling. For example, regardless of the exception of these conflicts, schedule first, and set up a very delicate rollback mechanism, so that after the conflict, the problem can be solved through rollback. This approach is relatively more elegant and efficient, but its implementation mechanism is very complex. Many people can understand that pessimistic locks must be easier to set than optimistic locks.

In Kubernetes, a Task co-scheduling problem like this is solved directly through the concept of Pod. Because in Kubernetes, such an App container and LogCollector container must belong to a Pod, and they must be scheduled in a Pod unit, so this problem does not exist at all.

Understand Pod again

After talking about the previous knowledge points, let's understand Pod again. First of all, the container in Pod is "super-intimate relationship".

There is a word "super" that we need to understand. Normally, there is a relationship called intimacy, which can certainly be solved through scheduling.

For example, now there are two Pod, they need to run on the same host, then this is an intimate relationship, the scheduler must be able to help do it. But there is a problem with super-intimate relationships, that is, it must be solved through Pod. Because if super-intimacy can't be given, then the whole Pod or the whole application won't start.

What is super-intimate relationship? It is roughly divided into the following categories:

For example, a file exchange occurs between two processes. For example, one writes the log and the other reads the log. The two processes need to communicate through localhost or local Socket, which is also a super-intimate relationship. Between these two containers or micro-services, very frequent RPC calls need to occur. For performance reasons, we hope that they are super-intimate. Two containers or applications that need to share some Linux Namespace. The simplest and most common example is that I have a container that needs to add another container's Network Namespace. So I can see the network device of another container, and its network information.

Relationships like the above belong to super-intimate relationships, and they are all solved through the concept of Pod in Kubernetes.

Now we understand conceptual design like Pod and why we need Pod. It solves two problems:

How do we describe super-intimate relationships; how do we make unified scheduling of super-intimate containers or businesses? this is one of the most important demands of Pod. II. The problems to be solved by Pod in the implementation mechanism of Pod

Something like Pod is a logical concept in itself. So how on earth does it come true on the machine? This is the second problem we are going to explain. Since Pod wants to solve this problem, the core is how to share some resources and data most efficiently among multiple containers in a Pod.

Because containers were originally separated by Linux Namespace and cgroups, what is really needed now is how to break this isolation and then share something and information. This is the core problem that Pod's design aims to solve.

So the specific solution is divided into two parts: network and storage.

1. Shared network

The first question is how do multiple containers in Pod share the network? Here is an example:

For example, there is now a Pod that contains a container An and a container B, both of which are going to share Network Namespace. The solution in Kubernetes goes like this: it creates an extra small Infra container container in each Pod to share the Network Namespace of the entire Pod.

Infra container is a very small image, about 100~200KB, a container written in assembly language that is always in a "paused" state. With such an Infra container, all other containers will be added to the Network Namespace of Infra container through Join Namespace.

So all the containers in a Pod, they see exactly the same view of the network. That is, they see network devices, IP addresses, Mac addresses, and so on, and all the information related to the network is actually one, which comes from the Infra container created by Pod for the first time. This is a solution for Pod to solve network sharing.

In Pod, there must be an IP address, which is the Network Namespace address of the Pod and the IP address of the Infra container. So what you see is a copy, while all the other network resources are a Pod and are shared by all the containers in the Pod. This is how Pod is implemented on the network.

Because there needs to be a container equivalent to the middle, so in the whole Pod, Infra container must be the first to start. And the entire life cycle of Pod is equivalent to the life cycle of Infra container and has nothing to do with containers An and B. This is why in Kubernetes, it is allowed to update a certain image in the Pod separately, that is, to do this, the entire Pod will not be rebuilt or restarted, which is a very important design.

two。 Shared storage

Second question: how does Pod share storage? Pod shared storage is relatively simple.

For example, there are now two containers, one is Nginx, and the other is a very ordinary container. Put some files in Nginx so that I can access it through Nginx. So it needs to go to the share directory. My share file or share directory is very simple in Pod, actually turning volume into Pod level. Then all the containers, all the containers that belong to the same Pod, share all the volume.

For example, in the example in the figure above, this volume is called shared-data, and it belongs to Pod level, so you can directly declare in each container: to mount the shared-data volume, as long as you declare that you mount the volume, and you look at the directory in the container, in fact, what you see is the same copy. This is one of the ways that Kubernetes shares storage with containers through Pod.

So in the previous example, the application container App writes a log, as long as the log is written in a volume, and as long as the same volume is declared to be mounted, the volume can be immediately seen by another LogCollector container. This is how Pod implements storage.

Third, explain the container design pattern in detail.

Now we know why we need Pod and how Pod is implemented. Finally, on this basis, I will introduce in detail a concept that Kubernetes advocates very much, called Container Design pattern.

Give an example

Next, we will use an example to explain it to you.

For example, I have a very common request now: I am going to release an application, which is written by JAVA, and there is a WAR package that needs to be placed in Tomcat's web APP directory so that it can be started. But how do you do and release a container like this WAR package or Tomcat? There are several ways to do this.

The first way: you can package the WAR package and Tomcat into a single image. But this brings a problem, that is, now the mirror image is actually kneaded into two things. So next, whether I want to update the WAR package or I want to update the Tomcat, I have to make a new image, which is troublesome.

The second way is to package only Tomcat in the image. It is a Tomcat, but you need to use data volumes, such as hostPath, to mount the WAR package from the host into our Tomcat container and hang it under my web APP directory, so that after enabling this container, it can be used inside.

But then you will find a problem: this approach must require the maintenance of a distributed storage system. Because the container may be started on host A for the first time and run to B on the second reboot, the container is a transferable thing and its state is not maintained. Therefore, a distributed storage system must be maintained so that the container can find the WAR package and find the data, whether on An or B.

Note that even if you have a distributed storage system for Volume, you are still responsible for maintaining the WAR package in Volume. For example, you need to write a separate set of Kubernetes Volume plug-ins to download the WAR package needed to launch the application into this Volume before each Pod startup, before it can be mounted and used by the application.

The complexity of this operation is relatively high, and the container itself must rely on a set of persistent storage plug-ins (used to manage the contents of the WAR package in Volume).

InitContainer

So have you ever considered whether there is a more general way of combining like this? It can be used, played, and released even on the local Kubernetes without distributed storage.

In fact, there are methods. In Kubernetes, a combination like this is called Init Container.

Or the same example: in the yaml above, first define an Init Container, which only does one thing, that is, copy the WAR package from the image into a Volume, and it exits after this operation, so the Init Container starts before the user container and executes in strict order according to the defined order.

Then, the key lies in the destination directory you just copied to: the APP directory, which is actually a Volume. As we mentioned earlier, multiple containers in a Pod can share Volume, so now this Tomcat container is just packaged with a Tomcat image. But at startup, I declare that I use the APP directory as my Volume, and mount them under the Web APP directory.

At this time, since an Init Container has been run before, and the copy operation has been completed, the WAR package of the application already exists in the Volume: sample.war, which is definitely already in the Volume. When you start the Tomcat container in the second step, hang the Volume and be sure to find the previously copied sample.war in it.

So it can be described like this: this Pod is a self-contained Pod that can be successfully enabled on any Kubernetes in the world. Don't worry that there is no distributed storage, Volume is not persistent, it must be published.

So this is a very typical example of a container that combines two different roles and packages such an application in a unified way like Init Container, and uses Pod to do it. A concept like this is a very classic container design pattern in Kubernetes, called "Sidecar".

Container design pattern: Sidecar

What is Sidecar? That is to say, in Pod, you can define some special containers to perform some auxiliary work needed by the main business container. For example, in our previous example, we have actually done one thing: this Init Container, which is a Sidecar, is only responsible for copying the WAR package in the image into a shared directory so that it can be used by Tomcat.

What other operations are there? For example:

Some of the things that need to be done to execute SSH in the container, such as writing scripts and some pre-conditions, can actually be solved by ways like Init Container or another like Sidecar.

Of course, another typical example is my log collection. Log collection itself is a process and a small container, so you can package it into Pod to do this collection work.

Another very important thing is the Debug application. In fact, the whole Debug application can now define an extra small Container in the application Pod. It can go to the exec application pod namespace.

Check the working status of other containers, which is what it can do. It is no longer necessary to log in to the container to see the SSH, just install the monitoring component into an additional small container, and then start it as a Sidecar to cooperate with the main business container, so the same business monitoring can also be done through Sidecar.

One of the obvious advantages of this approach is that it decouples auxiliary functions from my business container, so I can release Sidecar containers independently, and more importantly, this capability can be reused, that is, the same monitoring Sidecar or log Sidecar can be shared by people throughout the company. This is one of the powers of design patterns.

Sidecar: application and log collection

Next, let's detail a pattern like Sidecar, which has some other scenarios.

For example, in the application log collection mentioned earlier, the business container writes the logs in a Volume, and because the Volume is shared in the Pod, the log container, that is, the Sidecar container, can read the log files directly by sharing the Volume, and then store them in remote storage or forward them to another example. Now the Fluentd logging process or logging components commonly used in the industry basically work in this way.

Sidecar: proxy container

The second use of Sidecar can be called a proxy container Proxy. What is a proxy container?

If there is an Pod that needs to access an external system, or some external services, but these external systems are a cluster, how to access all these clusters in a unified and simple way with an IP address? One way is to modify the code. Because the addresses of these clusters are recorded in the code; another way to decouple is through the Sidecar proxy container.

To put it simply, write such a small Proxy alone to deal with external service clusters, and it is exposed that there is only one IP address. So next, the business container mainly accesses Proxy, and then Proxy connects these service clusters. The key here is that multiple containers in Pod communicate directly through localhost, because they belong to the same network Namespace and have the same network view, so they communicate with localhost without performance loss.

So in addition to decoupling, the proxy container does not degrade performance, and more importantly, the code of a proxy container like this can be reused throughout the company.

Sidecar: adapter container

The third design pattern of Sidecar-adapter container Adapter, what is Adapter?

Now the business exposed API, for example, there is an API format is A, but now there is an external system to access my business container, it only knows a format is API B, so I have to do a job, that is, how to change the business container, to change the business code. But in fact, you can do this conversion for you through an Adapter.

There is an example: now that the monitoring interface exposed by the business container is / metrics, the URL that accesses the metrics of the container can be obtained. But now, this monitoring system has been upgraded, it visits the URL is / health, I only recognize the URL that exposed the health health check, can do the monitoring, metrics does not know. What about this? Then you need to change the code, but you can write an extra Adapter instead of changing the code, which can be used to forward all requests for health to metrics. So this Adapter exposes a monitoring URL like health, and your business can work again.

The key is that the containers in Pod communicate directly through localhost, so there is no performance loss, and such an Adapter container can be reused by the whole company, which are the benefits of design patterns.

This paper summarizes that Pod is the core mechanism for implementing "container design patterns" in Kubernetes projects; "container design patterns" is one of Google Borg's best practices for large-scale container cluster management, and it is also one of the basic dependencies of Kubernetes for complex application orchestration; the essence of all "design patterns" is: decoupling and reuse. Last

Pod and container design pattern is one of the most important basic knowledge points in Kubernetes system. I hope readers can figure it out and master it carefully. Here, I suggest you re-examine whether the so-called "rich container" design has been used in your company or team before using the Pod approach. This kind of design is only a transitional form, which will cultivate a lot of very bad operation and maintenance habits. I strongly recommend that you gradually adopt the idea of container design patterns, decoupling rich containers and breaking them into multiple containers to form a Pod. This is precisely an important work that Alibaba is making every effort to promote in the current campaign of "comprehensively going to the cloud."

"Alibaba Cloud's native Wechat official account (ID:Alicloudnative) focuses on micro-services, Serverless, containers, Service Mesh and other technology areas, focuses on cloud native popular technology trends, and large-scale cloud native landing practices, and is the technical official account that best understands cloud native developers."

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.