How to deeply analyze kubernetes resource management 07/04 Update SLTechnology News&Howtos

How to deeply analyze kubernetes resource management

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about how to deeply analyze kubernetes resource management, which may not be well understood by many people. In order to make you understand better, the editor has summarized the following contents for you. I hope you can gain something according to this article.

Resources, whether computing resources, storage resources, network resources, are a core issue for container management and scheduling platform.

How does the Kubernetes resource model 01 Kubernetes define resources?

In kubernetes, any object that can be applied for, allocated, and eventually used is a resource in kubernetes, such as CPU, memory.

And for each resource managed by kubernetes, it will be given a [Resource Type name]. These names are in accordance with RFC 1123 rules, such as CPU, whose full name is kubernetes.io/cpu (usually abbreviated to cpu when displaying), and GPU resource is named alpha.kubernetes.io/nvidia-gpu.

In addition to the name, there is only one [basic unit] for each resource. This basic unit is generally used uniformly within kubernetes components to represent the number of such resources. For example, the basic unit of memory is bytes.

However, for the convenience of developers, a variety of more readable resource units, such as Gi in memory, can still be used when developers interact with kubernetes through yaml files or kubectl. When this information enters the kubernetes, it will still be explicitly converted to the most basic unit.

All resource types can be divided into two categories: compressible and incompressible. The evaluation criterion is that if the system limits or reduces the container's use of compressible resources, it will only affect the external service performance of the service. For example, CPU is a very typical compressible resource.

For incompressible resources, the shortage of resources may lead to the external unavailability of services, such as memory is a very typical incompressible resource.

What types of resources are there in 02 Kubernetes?

Currently, kubernetes comes with two types of basic resources by default

CPU

Memory

Among them, CPU, no matter how the underlying machine is provided (physical machine or virtual machine), a unit of CPU resources will be standardized into a standard "Kubernetes Compute Unit", which is roughly the same as a single hyperthreaded core of x86 processors.

The basic unit of CPU resources is millicores, because CPU resources actually refer to CPU time. So its basic unit is millicores,1. The core equals 1000 millicores. It also means that kubernetes can subdivide the unit CPU time into 1000 copies, which can be assigned to a container.

The basic unit of memory resources is easy to understand, that is, bytes.

In addition, kubernetes also provides a device plugin mechanism for users to customize their needs, so that users can further expand the types of resources, such as the more commonly used nvidia gpu resources. This feature enables users to manage their own business-specific resources in kubernetes without having to modify kubernetes's own source code.

And kubernetes itself will further support more very general resource types, such as network bandwidth, storage space, storage iops, and so on.

So as above, we answered the question of how kubernetes defines resources.

Kubernetes Compute Node Resource Management

Now that we understand how kubernetes defines resources, let's take a look at

How does Kubernetes determine the amount of resources that can be used?

Each compute node in the cluster managed by kubernetes contains certain resources. how does kubernetes know how many resources this compute node has? How many of these resources can be used by user containers?

When we use kubernetes, we will find that in the information of each Node object, the description of the resource is shown below (available through the kubectl get node xxx-o yaml command)

Allocatable: cpu: "40" memory: 263927444Ki pods: "110" capacity: cpu: "40" memory: 264029844Ki pods: "110"

Among them, [capacity] is the real amount of resources of this Node. For example, if this machine has 8 cores of 32 GB of memory, then the capacity column shows that the CPU resource has 8 cores and the memory resource has 32 GB (memory may be displayed in Ki units).

[allocatable] refers to the machine [the amount of resources that can be used by the container]. In any case, allocatable must be less than or equal to capacity.

For example, the CPU capacity of the 8-core machine just mentioned is 8-core, but its cpu allocatable can be adjusted to 6 cores, that is, the containers scheduled to this machine can use a maximum of 6-core CPU, and the other 2-core CPU can be used by the machine for other non-container processes.

How is the Capacity and allocatable information determined?

First of all, look at [capacity]. Since capacity reflects the true number of resources on this machine, the task of confirming this information should be left to the kubelet running on each machine.

Kubelet currently introduces a lot of cadvisor code directly as vendor. In fact, it is equivalent to launching a small cadvisor inside kubelet. After kubelet starts, the internal cadvisor sub-module starts accordingly and gets all kinds of information on the machine. It includes the resource information about the machine, and this information is naturally used as the real resource information of the machine, and then reported to apiserver through kubelet.

How is the Allocatable information confirmed?

If you want to describe how allocatable information is confirmed, you must first introduce the Node Allocatable Resource feature [6]. The Node Allocatable Resource feature is a feature introduced by kubernetes version 1.6. The main problem to be solved is to reserve resources for the non-container processes of each computing node.

In addition to the user container, many other important components run on each node in the kubernetes cluster. They do not run as containers. The types of these components are divided into two main categories:

Kubernetes daemon:kubernetes-related daemon programs, such as kubelet,dockerd and so on. System daemon: other system-level daemon programs that are not related to kubernetes, such as sshd, etc.

There is no doubt about the importance of these two processes to the stability of the whole physical machine. Therefore, for these two processes, kubernetes provides Kube-Reserved and System-Reserved features respectively, which can set a reserved amount of computing resources for these two groups of processes, such as reserving 2-core 4G computing resources for system processes.

How does Kubernetes achieve this?

We should all know that the bottom layer of kubernetes realizes the isolation and restriction of resources through the cgroup feature, and this kind of resource reservation is also realized by cgroup technology.

By default, for each basic resource (CPU, memory), kubernetes first creates a root cgroup group, which is called kubepods as the root of all container cgroup. This cgroup is used to limit the resources used by all pod on this compute node. By default, the resources obtained by this kubepods cgroup group are equivalent to all the resources of the compute node.

However, when the Kube-Reserved and System-Reserved features are enabled, kubernetes creates two more sibling cgroup for kubepods cgroup, called kube-reserved and system-reserved, which are used to reserve resources for kubernetes daemon and system daemon and to allocate the machine's resources together with kubepods cgroup. Therefore, the resources that kubepods can be allocated are bound to be less than the actual amount of resources of the machine, thus achieving the effect of reserving resources for kubernetes daemon,system daemon.

Therefore, if the features of Kube-Reserved and System-Reserved are enabled, the resources occupied by these two cgroup must be subtracted when calculating the number of Node Allocatable resources.

Therefore, assuming that the Capacity of the current machine CPU resource is 32 cores, and then we set Kube-Reserved to 2 cores and System-Reserved to 1 core, then the CPU resources that this machine can really use by the container are only 29 cores.

However, when calculating the number of allocatable for memory resources, there is another factor to consider in addition to Kube-Reserved,System-Reserved. That's Eviction Threshold.

What is Eviction Threshold?

Eviction Threshold corresponds to the eviction policy feature of kubernetes. This feature is also an important feature introduced by kubernetes to protect the stability of physical nodes. When the two incompressible resources on the machine, memory and disk resources, are seriously insufficient, it is likely to cause the physical machine itself to enter an unstable state, which is obviously unacceptable.

Therefore, kubernetes officially introduces the eviction policy feature, which allows users to specify an eviction hard threshold for each machine for [memory] and [disk], which is the threshold of the amount of resources.

For example, we can set the eviction hard threshold of memory to 100m, then when the available memory resources of this machine are less than 100m, kubelet will conduct a comprehensive ranking according to the QoS level of all pod on this machine (described below at Qos level), as well as their memory usage, and migrate the top pod, so as to release sufficient memory resources.

So for memory resources, its allocatable should be [capacity]-[kube-reserved]-[system-reserved]-[hard-eviction]

So, if you look closely, you will find that if the kube-reserved,system-reserved feature is not enabled in your cluster, through kubectl get node-o yaml, you will find that the CPU Capacity of this machine is equal to CPU Allocatable, but the Memory Capacity is always greater than Memory Allocatable. Mainly because of the eviction mechanism, by default, a 100m memory eviction hard threshold is set, and by default, the difference between memory capacity and memory allocatable is 100m.

Kubernetes pod resource management and allocation

Now that we have seen how kubernetes defines and confirms how many resources are available in the current cluster, let's take a look at how kubernetes allocates these resources to each pod.

Before we show you how to allocate resources to each pod, let's first take a look at how pod applies for resources. This is not unfamiliar to most children's shoes that have already used kubernetes.

01 Kubernetes pod resource application method

In kubernetes, pod applies for resources in a container as the minimum unit. For each container, it can specify the desired amount of resources with the following two messages:

Request

Limit

For example:

Resources: requests: cpu: 2.5memory: "40Mi" limits: cpu: 4.0memory: "99Mi"

So what does request,limit mean respectively?

Request:

Request refers to this kind of resource, and this container wants to guarantee the minimum amount that can be obtained.

But in practice, CPU request can be achieved through the cpu.shares feature.

But memory resources, because it is incompressible, in some scenarios, it is possible that other memory limit and request memory may not be satisfied because other pod containers with higher request settings use a lot of memory than request.

Limit:

Limit, for CPU, as well as memory, refers to the upper limit of the container's use of this resource.

But the behavior of the two resources is also different in response to container usage exceeding limit.

For CPU, the container uses too much CPU and the kernel scheduler switches so that it does not use more than limit. For memory, if the container uses more memory than limit, the container will be dropped by OOM kill, resulting in a container restart. When the container does not specify request, the value of request and limit are equal by default.

If the container does not specify limit, the values that request and limit will be set to have different policies depending on different resources.

02 Kubernetes pod QoS classification

Kubernetes allows the user container to specify its own application resource information through the request and limit fields. Then Pod is also divided into three different QoS levels according to the resources specified by the container. They are:

Guaranteed

Burstable

BestEffort

Different QoS levels can play a role in many aspects, such as scheduling, eviction.

Guaranteed-level pod needs to meet two main requirements:

Each container in pod must contain limit and request information for memory resources, and these two values must be equal

Each container in the pod must contain limit and request information of the CPU resource, and the values of the two information must be equal

Burstable-level pod needs to meet two requirements:

Resource request information does not meet Guaranteed level requirements at least one container in the pod specifies the request information of cpu or memory

BestEffort-level pod needs to meet:

No container in pod can specify the request,limit information of cpu or memory

So it can be seen from the above description.

The Pod of Guaranteed level is the highest priority, and the system administrator is generally clear about the resource usage of this kind of Pod.

The Pod priority of Burstable level is the second. Administrators generally know the minimum amount of resources required by this Pod, but when the machine resources are sufficient, they still want them to use more resources, so generally limit > request.

BestEffort level has the lowest Pod priority, and it is generally not necessary to specify the amount of resources for this Pod. So no matter how the current resource is used, the Pod must be dispatched, and the logic of its use of resources is to take advantage of all available space. When the machine resources are sufficient, it can be fully utilized, but when the machine resources are preempted by the Pod of Guaranteed and Burstable, its resources will also be deprived and compressed infinitely.

03 principle of Kubernetes pod resource allocation

We introduced it in the above two sections:

How pod applies for resources

What is the QoS level of the way pod requests resources?

In the end, kubelet allocates resources to this pod based on [resources requested by pod] + [QoS level of pod].

The fundamental way to allocate resources is based on the mechanism of cgroup.

After getting the resource request information of a pod, kubernetes will do the following things for each resource:

For each container in pod, create a container level cgroup (note: this step is really done when kubernetes sends a command to docker daemon).

Then create a pod level cgroup for the pod, which will become the parent cgroup of all the container level cgroup contained under the pod.

Eventually, according to the QoS level of the pod, the pod level cgroup may be divided into a certain QoS level cgroup and become a child cgroup of the QoS level cgroup.

The entire QoS level cgroup is also a child cgroup of the root cgroup-kubepods of all containers.

So this nesting relationship can be clearly shown in the following pictures.

The figure represents the cgroup hierarchy of an kubernetess compute node (the cgroup hierarchy is identical for cpu and memory). It can be seen that all the cgroup configurations of pod are under the large cgroup of kubepods, while the previously introduced kube-reserved cgroup and system-reserved cgroup and kubepods cgroup are at the first level, and the three of them will share the computing resources of the machine.

Let's first look at how to determine the container level cgroup corresponding to this container and the pod level cgroup in which it is located.

Container level cgroup

First of all, the cgroup configuration of each container is configured according to the request and limit information of this container to this resource.

Let's take a look at how kubernetes creates a container-level cgroup for each container for cpu and memory resources.

CPU resources

The first is the CPU resources. Let's take a look at CPU request first.

CPU request is implemented through the cpu.shares configuration in the CPU subsystem of cgroup.

When you specify the CPU request value of a container as x millicores, kubernetes specifies the cpu.shares value of the cgroup where the container is located as x * 1024 / 1000. That is:

Cpu.shares = (cpu in millicores * 1024) / 1000

Take Chestnut, for example, when the CPU request value of your container is 1, it is equivalent to 1000 millicores, so the cpu.shares value of the cgroup group in which the container belongs is 1024.

The ultimate result of this is:

Even in extreme cases, when all pod on this physical machine are CPU busy jobs (as many CPU will be used as CPU), it can still guarantee that the container can be allocated to one core of CPU computation.

In fact, it is to ensure the minimum demand for CPU resources of this container.

So it can be seen that cpu.request generally represents the minimum CPU resource requirements of this container. But in fact, only by specifying cpu.shares can not fully achieve the above effect, but also need to make synchronous changes to the cgroup of QoS level. As for the specific implementation principle, we will describe it in detail later.

The cpu limit,kubernetes is realized through two configurations of cpu.cfs_period_us,cpu.cfs_quota_us in the CPU cgroup control module. Kubernetes will configure two messages for this container cgroup:

Cpu.cfs_period_us = 100000 (i.e. 100ms) cpu.cfs_quota_us = quota = (cpu in millicores * 100000) / 1000

In the CPU subsystem of cgroup, you can strictly control the amount of CPU used by the process in this cgroup through these two configurations to ensure that the CPU resources used will not exceed the cfs_quota_us/cfs_period_us, which is exactly the limit value we applied for at the beginning.

It can be seen that through this feature of cgroup, it is possible to limit the maximum amount of CPU used by a container.

Memory

For memory resources, in fact, memory request information will not be reflected in container level cgroup. Kubernetes will only configure cgroup based on the value of memory limit in the end.

Here kubernetes uses the memory.limit_in_bytes configuration in the memory cgroup subsystem to implement. The configuration is as follows:

Memory.limit_in_bytes = memory limit bytes

The limit_in_bytes configuration in the memory subsystem can limit the maximum amount of memory that all processes in a cgroup can request to use. If this value is exceeded, the container will be OOM killed and the container instance will be restarted according to the default configuration of kubernetes.

It can be seen that if it is this way of implementation, in fact, kubernetes can not guarantee that pod can really apply for so much memory as it specifies memory.request, which may also make many children's shoes using kubernetes more confusing. Because when kubernetes schedules pod, it only ensures that the sum of memory.request of pod on a machine is less than or equal to the value of node allocatable memory. So if there is a pod with a high memory.limit setting, or even no setting, there may be a situation, that is, this pod uses a lot of memory (larger than its request, but smaller than its limit). At this time, since the memory resources are incompressible, other pod may not have enough memory spare for it to apply for. Of course, this problem can also be alleviated to some extent by a feature, which will be described below.

Of course, the reader may have a question: what if pod does not specify request or limit?

If no limit is specified, then cfs_quota_us will be set to-1, that is, there is no limit. If limit and request are not specified, cpu.shares will be specified as 2, which is the minimum value allowed by cpu.shares. It can be seen that for this kind of pod,kubernetes, he will only be allocated the least CPU resources.

For memory, if there is no limit specified, memory.limit_in_bytes will be specified as a very large value, usually 2 ^ 64, which means that there is no limit to memory.

For the introduction to container level cgroup above, let's cite a specific chestnut. Suppose a pod named pod-burstable-1 is a container1 container2 composed of two business containers. The resource configurations of the two container are as follows:

-image: image1 name: container1 resources: limits: cpu: 1 memory: 1Gi requests: cpu: 1 memory: 1GI-image: image2 name: container2 resources: limits: cpu: 2 memory: 2Gi requests: cpu: 1 memory: 1Gi

So you can see that the request,limit of all containers in this pod has been specified, but request and limit are not exactly equal, so the QoS level of this pod is Burstable.

There is also a pod called pod-guaranteed-1, which consists of a container, and the resource configuration is as follows:

-image: image3 name: container3 resources: limits: cpu: 1 memory: 1Gi requests: cpu: 1 memory: 1Gi

It is visible from this configuration that it is a Guaranteed-level Pod.

There is also a pod called pod-besteffort-1, which is composed of a container, and the resource allocation information is completely empty, so this pod is a besteffort-level pod.

Therefore, through the cgroup configuration described above, the three pod will create four container cgroup as shown in the following figure:

Pod level cgroup

After creating the cgroup of container level, kubernetes creates a pod level cgroup for the containers that also belongs to a pod. As their father cgroup. As for why pod level cgroup is introduced, it is mainly based on several reasons:

Facilitate uniform restrictions on container resources in pod

Facilitate unified statistics of resources used by pod

So for the chestnut we mentioned above, a pod is named pod-burstable-1, and it contains two container:container1 and container2, so the directory structure of this pod cgroup is as follows:

Pod-burstable-1 | +-container1 | +-container2

Note: in real life, the name of pod cgroup is pod. For clarity, use pod name instead.

Then in order to ensure that the container within the pod can get the desired number of resources, the pod level cgroup also needs to configure the cgroup accordingly. And the way of configuration basically meets one principle:

The resource allocation of the pod level cgroup should be equal to the sum of the resource requirements of the container that belongs to it.

However, there are some differences in the details of this rule under different QoS levels of pod. Therefore, for Guaranteed and Burstable-level Pod, the cgroup configuration of each Pod can be accomplished by the following three formulas

Cpu.shares=sum (pod.spec.containers.resources.requests\ [cpu\]) cpu.cfs_quota_us=sum (pod.spec.containers.resources.limits\ [cpu\] memory.limit_in_bytes=sum (pod.spec.containers.resources.limits\ [memory\])

It can be seen from the formula that the cpu.shares, cpu.cfs_quota_us and memory.limit_in_bytes of pod level cgroup are ultimately equal to the sum of the three values of container that belong to this pod.

Of course, in the case of Burstable,Besteffort, it is possible that container does not specify cpu.limit and memory.limit, so cpu.cfs_quota_us and memory.limit_in_bytes will not use this formula, because there is no maximum limit on the usage of cpu,memory in pod, so the two configurations will be handled in the same way as "if request and limit are not set in container" as mentioned in the previous section.

So for our example pod-burstable-1 above, it is a very typical burstable pod. According to burstable pod's resource allocation formula, kubernetes will create a pod-level cgroup for this pod. In addition, pod-guaranteed-1 will also create a cgroup. The configuration of these two pod level cgroup is shown in the following figure.

Besteffort pod cgroup

The pod cgroup configuration rules we mentioned above cannot be applied to besteffort pod.

Because this QoS-level pod is as described by the name, kubernetes can only guarantee the use of your resources as much as possible. in the case of extreme preemption of resources, the resource usage of this pod should be limited to a certain extent, whether cpu or memory (of course, the mechanisms of these two restrictions are completely different).

Therefore, for besteffort-level pod, since none of the containers contain request,limit information, its configuration is very uniform.

Therefore, for cpu, the cgroup configuration of the entire pod is only as follows:

Cpu.shares = 2

It can be seen that the goal of this configuration is to achieve that it can use the cpu of the entire machine when the machine cpu resources are sufficient, because no limit is specified. However, when cpu resources are extremely preempted, the amount of cpu resources that can be allocated is very limited, equivalent to 2 millicores.

There is no special configuration for memory, but the default is adopted. Although in this case, the pod may use as much memory as the whole machine, the kubernetes eviction mechanism will ensure that when memory is insufficient, pod at the Besteffort level will be deleted first to free up sufficient memory resources.

QoS level cgroup

As mentioned earlier, after kubelet starts, a subcgroup named kubepods will be created under the root directory of the entire cgroup system, and this cgroup will store the cgroup of all the pod above this node. Thus, it achieves the goal of limiting all Pod resources on this machine.

Under kubepods cgroup, kubernetes will further create two more QoS level cgroup, named:

Burstable

Besteffort

It can also be inferred from the name that the cgroup of these two QoS level must exist as the parent cgroup of all Pod at their respective QoS level.

Then the question arises:

Where is the pod cgroup for the guaranteed-level pod?

What is the purpose of these two QoS level cgroup?

First of all, the first question is that the cgroup of all guaranteed-level pod is actually directly under the cgroup of kubepods, at the same level as burstable and besteffort QoS level cgroup. The main reason is that guaranteed-level pod has clear resource request (request) and resource limit (limit), so it does not need a unified cgroup of QoS level for management or restriction.

For the two types of pod, burstable and besteffort, by default, kubernetes wants to improve the resource utilization as much as possible, so there are no restrictions on the resource usage of the pod of these two types of QoS.

However, in many scenarios, system administrators still want to ensure that guaranteed level pod, a high-QoS-level pod resource, especially incompressible resources (such as memory), will not be used in advance by low-QoS-level pod. As a result, high-QoS-level pod cannot even satisfy its request resources.

Therefore, kubernetes introduced QoS level cgroup, the main purpose is to limit the use of incompressible resources (such as memory) by low-QoS-level pod, and to reserve resources for high-QoS-level pod. The priority of the three QoS levels from high to low is guaranteed > burstable > besteffort.

So how to achieve this kind of resource reservation? It is mainly controlled by the experimental-qos-reserved parameter of kubelet, which can control the extent to which the use of resources of low QoS level pod is limited, thus realizing the resource reservation of high level QoS pod, and ensuring that the high QoS level pod must be able to use its request resources.

Currently, you can only specify the reservation of an incompressible resource such as memory. For example, experimental-qos-reserved=memory=100%, means that we want to reserve 100% resources for high QoS level pod.

So for this scenario, for memory resources, the configuration rules for the entire QoS Level cgroup are as follows:

Burstable/memory.limit_in_bytes = Node.Allocatable-{(summation of memory requests of `Guaranteed` pods) _ (reservePercent / 100)} besteffort/memory.limit_in_bytes = Node.Allocatable-{(summation of memory requests of all `Guaranteed` and `Burstable` pods) _ (reservePercent / 100)}

It can be seen from the formula that burstable's cgroup needs to reserve memory resources for pod with a higher level of guaranteed than him, so by default, if this parameter is not specified, the pod in burstable cgroup can fill up the entire machine's memory, but if this feature is turned on, the memory limit of burstable cgroup needs to be dynamically adjusted according to the current number of guaranteed-level pod.

Besteffort is similar, but the difference is that besteffort reserves resources not only for guaranteed-level pod, but also for burstable-level pod.

So take Chestnut, the current machine's allocatable memory resource is 8G, we turn on the experimental-qos-reserved parameter for the kubelet of this machine, and set it to memory=100%. If a pod of guaranteed level with 1G request is created at this time, the memory.limit_in_bytes of burstable QoS level cgroup on this machine will be set to 7G and the value of memory.limit_in_bytes of besteffort QoS level cgroup will also be set to 7G.

If another pod of burstable level is created at this time, and its memory request is 2G, then the memory.limit_in_bytes value of besteffort QoS level cgroup will also be adjusted to 5G.

Memory is done, but it brings some trouble for cpu resources.

For besteffort's QoS, the CPU configuration of its cgroup is still very simple:

Besteffort/cpu.shares = 2

But for the QoS of burstable, since all burstable pod are now under the subgroup burstable under kubepods, according to the principle behind the implementation of cpu.shares, cgroup at different levels may end up with different resources for the same number of cpu.shares configurations. For example, the cpu.shares=1024 specified in the pod cgroup at the Guaranteed level may not be exactly the same as the cpu.shares=1024 specified by a pod cgroup under burstable. This is due to the mechanism of cpu.shares itself.

So in order to solve this problem, kubernetes must also dynamically adjust the configuration of burstable cgroup's cpu.shares, as described in the following document:

Burstable/cpu.shares = max (sum (Burstable pods cpu requests, 2)

To ensure that the same cpu.shares configuration is exactly the same for guaranteed-level pod and burstable pod.

As to why we do so, we will explain in detail later.

So for our above example, the three pod are at the guaranteed,burstable,besteffort level, assuming that the kubelet of the current machine has the experimental-qos-reserved parameter enabled and specified as memory=100%. Assuming that the 3 pod is running on a machine with 3 cores and 8 GB of memory, the cgroup configuration of the entire machine will be as follows:

Through this series of configurations, we have achieved the following results:

When the machine cpu resources are idle, the pod-guaranteed-1 pod can use the computing power of 1 core cpu at most. The pod-burstable-1 pod can use up to 3 cores of cpu computing power, and the pod-besteffort-1 pod can run full of the remaining computing power of the machine.

When the machine cpu resources are preempted, for example, when both pod-guaranteed-1 and pod-burstable-1 resources are using cpu at 100%, pod-guaranteed-1 can still get the cpu computing power of 1 core, and pod-burstable-1 can still get the computing power of 2 cores. But pod-besteffort-1 can only get 2 milicores of cpu computing power, which is 1/1000 of pod-burstable-1.

The besteffort pod of this machine, and the maximum amount of memory that can be used is [machine memory 8G]-[sum of request of burstable 2G]-[sum of request of guaranteed 1G] = 5G

The burstable pod of this machine, and the maximum amount of memory that can be used is [machine memory 8G]-[sum of request of guaranteed 1G] = 7g.

The realization of CPU Request

The amount of request of the CPU resource represents the minimum amount of resources that the container expects to get. It is implemented through the cgroup cpu.shares feature. However, the cgroup that the cpu.shares really represents can obtain the proportion of the machine CPU resources, not the absolute value.

For example, a cgroup A whose cpu.shares = 1024 does not mean that this cgroup A can obtain 1-core computing resources. If the machine in which the cgroup resides has a total of 2 cores CPU, and in addition to this cgroup, the cpu.shares value of cgroup B is 2048, then when the CPU resources are highly preempted, cgroup A can only obtain 2 * (1024 / (1024 + 2048)), that is, 2 CPU core resources.

So how does kubernetes implement, no matter what QoS pod, as long as one of its containers cpu.shares = 1024, then it must be able to obtain 1 core computing resources?

In fact, the way of implementation is through the reasonable dynamic configuration of QoS level cgroup,Pod level cgroup.

We can also continue to describe it with the chestnut above, assuming that the three Pod are currently on top of a physical machine with three CPU cores. At this point, the CPU cgroup configuration of this machine will look like the following

The cpu.shares of kubepods cgroup will be set to 3072.

The cpu.shares of pod cgroup in pod-guaranteed-1 will be set to 1024 and the cpu.shares of container cgroup of container3 in cgroup will be set to 1024.

The burstable QoS level cgroup cpu.shares where pod-burstable-1 is located will be set to 2048

The cpu.shares of pod-burstable-1 's pod cgroup is 2048.

The cpu.shares of container1 in pod-burstable-1 is 1024

The cpu.shares of container2 in pod-burstable-1 is 1024

So the hierarchy at this time is as follows:

Because the value of cpu.shares for besteffort is only 2, it can be ignored.

Therefore, when calculating the CPU resources of container1, container2, and container3 when CPU is busy, you can calculate according to the following formula:

Container3 = (1024 amp 1024) * (1024 / (1024 inch 2048)) * 3 = 1

Container1 = (1024 / (1024 / 1024)) * (2048 / 2048) * (2048 / 1024) * 3 = 1

Container2 = (1024 / (1024 / 1024)) * (2048 / 2048) * (2048 / 1024) * 3 = 1

It can be seen that kubernetes achieves the effect that cpu.shares is equal to the minimum CPU resources available to the container by skillfully setting the cpu.shares of kubepods cgroup and reasonably updating the configuration of burstable QoS level cgroup.

After reading the above, do you have any further understanding of how to analyze kubernetes resource management in depth? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.