Technical parsing series | Design and implementation of PouchContainer CRI 07/02 Update SLTechnology News&Howtos

Technical parsing series | Design and implementation of PouchContainer CRI

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

Introduction to CRI

At the bottom of each Kubernetes node, there is a program responsible for the creation and deletion of the container, and Kubernetes will call its interface to complete the container scheduling. We call this layer of software the container runtime (Container Runtime), and the famous Docker is the representative of it.

Of course, Docker is not the only container runtime, including CoreOS's rkt,hyper.sh 's runV,Google 's gvisor, and the protagonist PouchContainer of this article, all contain complete container operations that can be used to create containers with different features. Different container runtimes have their own unique advantages and can meet the needs of different users, so it is imperative for Kubernetes to support multiple container runtimes.

Initially, Kubernetes natively built a call interface to Docker, and then the community integrated rkt's interface in Kubernetes 1.3, making it an optional container runtime in addition to Docker. However, at this point, both calls to Docker and to rkt are strongly coupled to the core code of Kubernetes, which undoubtedly leads to the following two problems:

When emerging containers are running, it is difficult for upstarts such as PouchContainer to join Kubernetes ecology. The developer of the container runtime must have a very deep understanding of the Kubernetes code (at least Kubelet) in order to successfully complete the docking between the two.

The code of Kubernetes will be more difficult to maintain, which is also reflected in two aspects: (1) hard-coding all the call interfaces of various container runtimes into Kubernetes will make the core code of Kubernetes bloated. (2) minor changes to the container runtime interface will lead to changes to the core code of Kubernetes and increase the instability of Kubernetes.

To solve these problems, the community introduced CRI (Container Runtime Interface) in Kubernetes 1.5, which shields Kubernetes's invocation interfaces for various container runtimes from the core code by defining a set of common interfaces for the container runtime, which is only called by the Kubernetes core code. For all kinds of container runtimes, as long as the interfaces defined in CRI are met, it can be successfully connected to Kubernetes and become one of the container runtime options. The solution is simple, but it is liberating for both Kubernetes community maintainers and container runtime developers.

Overview of CRI Design

As shown in the figure above, the Kubelet on the left is the Node Agent of the Kubernetes cluster, which monitors the status of the containers on this node to ensure that they are running as expected. To achieve this goal, Kubelet constantly calls the relevant CRI interface to synchronize the container.

CRI shim can be thought of as an interface translation layer, which converts the CRI interface into the interface corresponding to the runtime of the underlying container, calls execution, and returns the result. For some containers, CRI shim exists as a separate process. For example, when Docker is selected as the container for Kubernetes, Kubelet initialization will start a Docker shim process, which is the CRI shime of Docker. For PouchContainer, its CRI shim is embedded in Pouchd, which we call CRI manager. We will discuss this in more detail in the next section when we discuss the relevant architecture of PouchContainer.

CRI is essentially a set of gRPC interfaces, and Kubelet has a built-in gRPC Client,CRI shim and a built-in gRPC Server. Each time Kubelet makes a call to the CRI interface, it translates into a gRPC request sent by gRPC Client to gRPC Server in CRI shim. Server calls the underlying container runtime to process the request and return the result, thus completing a call to the CRI interface.

The gRPC interfaces defined by CRI can be divided into two categories, ImageService and RuntimeService: ImageService is responsible for managing container images, while RuntimeService is responsible for managing container lifecycle and exec/attach/port-forward with containers.

CRI Manager architecture design

In the whole architecture of PouchContainer, CRI Manager implements all the interfaces defined by CRI and plays the role of CRI shim in PouchContainer. When Kubelet calls a CRI interface, the request is sent to the gRPC Server in the figure above through the gRPC Client of Kubelet. Server parses the request and calls the corresponding method of CRI Manager to process it.

Let's take a brief look at the functions of each module through an example. For example, when the incoming request is to create a Pod, CRI Manager will first convert the acquired configuration in CRI format to a format that meets the requirements of the PouchContainer API, call Image Manager to pull the desired image, then call Container Manager to create the required container, and call CNI Manager to configure the Pod network using the CNI plug-in. Finally, Stream Server processes CRI requests of an interactive type, such as exec/attach/portforward.

It is worth noting that CNI Manager and Stream Server are submodules of CRI Manager, while CRI Manager,Container Manager and Image Manager are three equal modules, all in the same binary Pouchd, so the calls between them are the most direct function calls, and there is no remote call overhead such as when Docker shim interacts with Docker. Next, we will go inside the CRI Manager and have a deeper understanding of the implementation of the important functions.

Implementation of Pod Model

In the world of Kubernetes, Pod is the smallest scheduling deployment unit. To put it simply, a Pod is a container group made up of containers that are more closely related. As a whole, these "intimate" containers share something, making the interaction between them more efficient. For example, for a network, containers in the same Pod share the same IP address and port space, so that they can access each other directly through the localhost. For storage, the volume defined in Pod is mounted to each of these containers so that each container can access it.

In fact, all of the above features can be achieved as long as some Linux Namespace is shared among a set of containers and the same volume is mounted. Let's analyze how CRI Manager in PouchContainer implements the Pod model by creating a concrete Pod:

When Kubelet needs to create a new Pod, the CRI interface RunPodSandbox is first called, and CRI Manager's implementation of this interface is to create a special container we call "infra container". From the point of view of container implementation, it is nothing special, but simply calls Container Manager to create a normal container mirrored as pause-amd64:3.0. However, from the point of view of the entire Pod container group, it has a special role. It is it that contributes its own Linux Namespace as the Linux Namespace shared by all containers mentioned above, connecting all containers in the container group. It is more like a carrier, carrying all the other containers in the Pod, providing the infrastructure for their operation. In general, we also use infra container to represent a Pod.

After the infra container is created, Kubelet creates other containers in the Pod container group. Each time a container is created, two CRI interfaces, CreateContainer and StartContainer, are called successively. For CreateContainer,CRI Manager, it is simply to convert the container configuration in CRI format to the container configuration in PouchContainer format, and then pass it to Container Manager to complete the specific container creation work. The only thing we need to care about here is how the container adds the Linux Namespace of infra container mentioned above. In fact, the real implementation is very simple. In the container configuration parameters of Container Manager, there are three parameters: PidMode, IpcMode and NetworkMode, which are used to configure the Pid Namespace,Ipc Namespace and Network Namespace of the container, respectively. Generally speaking, there are generally two modes for configuring a container's Namespace: the "None" mode, which creates the container's own unique Namespace, and the "Container" mode, which adds the Namespace of another container. Obviously, we only need to configure the above three parameters to "Container" mode and add infra container's Namespace. CRI Manager doesn't need to care about exactly how it joined. For StartContainer,CRI Manager, you only do a layer of forwarding, get the container ID from the request and call the Start API of Container Manager to start the container.

Finally, Kubelet will constantly call the two CRI interfaces ListPodSandbox and ListContainers to obtain the running status of the container on this node. ListPodSandbox lists the status of each infra container, while ListContainer lists the status of containers other than infra container. The problem now is that for Container Manager, there is no difference between infra container and other container. So how does CRI Manager distinguish between these containers? In fact, when CRI Manager creates a container, it adds an additional label to the existing container configuration to mark the type of the container. Thus, when implementing the ListPodSandbox and ListContainers interfaces, using the value of the label as a condition, different types of containers can be filtered.

To sum up, for the creation of Pod, we can summarize it as creating infra container first, then creating other containers in pod, and adding them to infra container's Linux Namespace.

Pod network configuration

Because all the containers in Pod share Network Namespace, we only need to configure its Network Namespace when creating infra container.

In the Kubernetes ecosystem, the network function of the container is realized by CNI. Similar to CRI, CNI is also a set of standard interfaces. As long as various network schemes implement this interface, they can seamlessly connect to Kubernetes. CNI Manager in CRI Manager is a simple encapsulation of CNI. During initialization, it loads the configuration file under the directory / etc/cni/net.d, as follows:

$cat > / etc/cni/net.d/10-mynet.conflist

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.