How to understand RuntimeClass and Pod Overhead 04/28 Update SLTechnology News&Howtos

How to understand RuntimeClass and Pod Overhead

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shows you how to understand RuntimeClass and Pod Overhead, the content is concise and easy to understand, can definitely brighten your eyes, through the detailed introduction of this article, I hope you can get something.

I. the evolution of the RuntimeClass requirement source container runtime

Let's first take a look at the evolution of the container runtime, which is roughly divided into three stages:

First phase: June 2014

Kubernetes is officially open source, and Docker is the only and default container runtime at that time

Second stage: Kubernetes v1.3

Rkt merges into the Kubernetes trunk and becomes the second container runtime.

The third stage: Kubernetes v.15

At the same time, more and more containers want to connect to Kubernetes at runtime. If you still have the same built-in support as rkt and Docker, it will pose serious challenges to Kubernetes's code maintenance and quality assurance.

The community is also aware of this, so it launched CRI in version 1.5, its full name is Container Runtime Interface. The advantage of this is that the runtime and Kubernetes are decoupled, and the community no longer has to adapt to various runtimes and worry about version maintenance problems caused by inconsistent runtime and Kubernetes iteration cycles. Typically, for example, cri-plugin in containerd implements containers such as CRI, kata-containers, and gVisor that only need to interface with containerd at runtime.

With the emergence of more and more container runtimes, different container runtimes have different requirements scenarios, so there is a need for multi-container runtimes. However, how to run the multi-container runtime still needs to solve the following problems:

What container runtimes are available in the cluster?

How do I choose the appropriate container runtime for Pod?

How do I get Pod dispatched to a node with a specified container runtime?

When the container runs, it will incur some additional overhead in addition to the business operation. How do you count this "extra overhead"?

The workflow of RuntimeClass

In order to solve the problems mentioned above, the community launched RuntimeClass. It was introduced in Kubernetes v1.12, but initially in the form of CRD. After v1.14, it was introduced as a built-in cluster resource object, RuntimeClas. V1.16 expands the capabilities of Scheduling and Overhead on the basis of v1.14.

Let's take v1.16 as an example to explain the workflow of RuntimeClass. As shown in the above figure, its workflow flow chart is on the left and an YAML file on the right.

The YAML file consists of two parts: the first section is responsible for creating a RuntimeClass object named runv, and the lower section is responsible for creating a Pod that references the RuntimeClass runv through spec.runtimeClassName.

The core of the RuntimeClass object is handler, which represents a program that receives a request to create a container, as well as a container runtime. For example, the Pod in the example will eventually be created by the runv container runtime; scheduling determines which nodes the Pod will eventually be dispatched to.

Combine the figure on the left to illustrate the workflow of RuntimeClass:

K8s-master received a request to create a Pod

The checkered section represents three types of nodes. There is a Label on each node to identify the container runtime supported by the current node, and there will be one or more handler within the node, and each handler corresponds to a container runtime. For example, the second square indicates that there is a handler; in the node that supports both runc and runv container runtimes, and the third square indicates that there is a handler in the node that supports the runhcs container runtime.

According to scheduling.nodeSelector, Pod will eventually be dispatched to the intermediate grid node, and runv handler will eventually create the Pod.

II. The function of RuntimeClass introduces the definition of RuntimeClass structure.

Let's take RuntimeClass in Kubernetes v1.16 as an example. Let's first introduce the structure definition of RuntimeClass.

A RuntimeClass object represents a container runtime, and its structure mainly contains three fields: Handler, Overhead, and Scheduling.

We also mentioned Handler in the previous example, which represents a program that receives a request to create a container, and also corresponds to a container runtime

Overhead is a new field introduced in v1.16, which represents the additional overhead in addition to the resources needed to run the business in Pod.

The third field, Scheduling, was also introduced in v1.16, and the Scheduling configuration is automatically injected into the nodeSelector of Pod.

Example of RuntimeClass resource definition

The use of referencing RuntimeClass in Pod is very simple, as long as you configure the name of RuntimeClass in the runtimeClassName field, you can introduce this RuntimeClass.

Definition of Scheduling structure

As the name implies, Scheduling means scheduling, but the scheduling here does not mean the scheduling of the RuntimeClass object itself, but affects the scheduling of the Pod that references RuntimeClass.

Scheduling contains two fields, NodeSelector and Tolerations. These two are very similar to the NodeSelector and Tolerations contained in Pod itself.

NodeSelector represents a list of label that should be on the nodes that support the RuntimeClass. When a Pod references the RuntimeClass, RuntimeClass admission merges the label list with the label list in the Pod. If there is a conflict between the two label, it will be rejected by admission. The conflict here means that they have the same key but different value, and this situation is rejected by admission. It is also important to note that RuntimeClass does not automatically set label for Node, which requires users to set it in advance before using it.

Tolerations represents the tolerance list for RuntimeClass. After a Pod references the RuntimeClass, admission also merges the toleration list with the toleration list in the Pod. If the Toleration in both places has the same tolerance configuration, it will be merged into one.

Why introduce Pod Overhead?

The image above shows a Docker Pod on the left and a Kata Pod on the right. We know that Docker Pod has a pause container in addition to the traditional container container, but we ignore the pause container when calculating its container overhead. For Kata Pod, the costs of kata-agent, pause, and guest-kernel are not counted except for the container container. Like these expenses, often even more than 100MB, these expenses can not be ignored.

This is why we introduced Pod Overhead. Its structure is defined as follows:

Its definition is very simple, with only one field, PodFixed. It is also a mapping, and its key is a ResourceName,value and a Quantity. Each Quantity represents the usage of a resource. Therefore, PodFixed represents the usage of various resources, such as CPU and memory, which can be set through PodFixed.

Usage scenarios and limitations of Pod Overhead

There are three main usage scenarios for Pod Overhead:

Pod scheduling

Before the introduction of Overhead, as long as the resource usage of a node is greater than or equal to the requests of Pod, the Pod can be dispatched to this node. After the introduction of Overhead, only when the available resource consumption of the node is greater than or equal to the value of Overhead plus requests can it be scheduled.

ResourceQuota

It is a resource quota at the namespace level. Suppose we have a namespace whose memory usage is 1G, and we have a Pod with requests equal to 500G, then we can schedule up to two such Pod under this namespace. If we add the Overhead of 200MB to these two Pod, at most one such Pod can be dispatched under this namespace.

Kubelet Pod eviction

After the introduction of Overhead, Overhead will be counted into the used resources of the node, thus increasing the proportion of used resources, which will eventually affect the expulsion of Kubelet Pod.

The above is the usage scenario of Pod Overhead. In addition, Pod Overhead has some usage restrictions and considerations:

Pod Overhead will eventually be permanently injected into the Pod and cannot be changed manually. Even if RuntimeClass is deleted or updated, Pod Overhead still exists and is valid

Pod Overhead can only be injected automatically by RuntimeClass admission (at least for now) and cannot be added or changed manually. If you do this, you will be rejected.

HPA and VPA are aggregated based on container-level metric data, and Pod Overhead does not affect them.

3. Multi-container runtime example

Ariyun ACK security sandbox container already supports multi-container runtime. We take the environment shown in the figure above as an example to illustrate how multi-container runtime works.

As shown in the figure above, there are two Pod. On the left is a Pod of runc, the corresponding RuntimeClass is runc, on the right is a Pod of runv, and the referenced RuntimeClass is runv. The corresponding request has been identified in a different color, with blue representing runc and red representing runv. In the lower half of the figure, the core part is containerd. Multiple container runtimes can be configured in containerd, and eventually the above request will arrive here for request forwarding.

Let's take a look at runc's request. It first arrives at kube-apiserver, and then the kube-apiserver request is forwarded to kubelet. Finally, kubelet sends the request to cri-plugin (which is a plug-in that implements CRI). Cri-plugin queries the Handler corresponding to runc in the configuration file of containerd, and finally finds that containerd-shim is requested through Shim API runtime v1, and then the corresponding container is created by it. This is the process of runc.

The process of runv is similar to that of runc. It is also first to send the request to kube-apiserver, then to kubelet, and then to cri-plugin,cri-plugin and finally return to match the configuration file of containerd, and finally find the way to create containerd-shim-kata-v2 through Shim API runtime v2, and then it creates a Kata Pod.

Let's take a look at the specific configuration of containerd.

Containerd is placed under file:///etc/containerd/config.toml by default. The core configuration is in the plugins.cri.containerd directory. The configurations of runtimes all have the same prefix plugins.cri.containerd.runtimes, followed by runc and runv RuntimeClass. The runc and runv in this correspond to the name of Handler in the previous RuntimeClass object. In addition, there is a special configuration plugins.cri.containerd.runtimes.default_runtime, which means that if a Pod does not specify RuntimeClass but is scheduled to the current node, the runc container runtime is used by default.

The following example is to create two RuntimeClass objects, runc and runv, through which we can see all the currently available container runtimes through kubectl get runtimeclass.

The following figure shows a Pod of runc and runv from left to right. The core of the comparison is that the container runtime of runc and runv is referenced respectively in the runtimeClassName field.

After the Pod is finally created, we can use the kubectl command to see the running status of each Pod container and the container runtime used by Pod. We can see that there are now two Pod in the cluster: one is runc-pod, the other is runv-pod, referencing the RuntimeClass of runc and runv, respectively, and their status is Running.

RuntimeClass is a built-in cluster resource of Kubernetes, which is mainly used to solve the problem of mixed use of multiple containers at runtime.

Configuring Scheduling in RuntimeClass allows Pod to automatically schedule to nodes that run the specified container runtime. But the premise is that users need to set up label for these Node in advance.

Configure Overhead in RuntimeClass to count the costs other than those needed for business operation in Pod, making scheduling, ResourceQuota, Kubelet Pod eviction and other behaviors more accurate.

The above content is how to understand RuntimeClass and Pod Overhead. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.