What is the concept of Kubernetes Resource QoS Classes 11/03 Update SLTechnology News&Howtos

What is the concept of Kubernetes Resource QoS Classes

2025-11-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

In this article, the editor introduces "what is the concept of Kubernetes Resource QoS Classes" in detail, the content is detailed, the steps are clear, and the details are handled properly. I hope this article "what is the concept of Kubernetes Resource QoS Classes" can help you solve your doubts.

Kubernetes Resource QoS Classes introduces basic concepts

Kubernetes defines the QoS Class of Pod based on the values of request and limit of Containers Resource in Pod. Where the container request is specified, which represents the lower limit of resources that can be provided by the system. Specifies the container limit, which represents the upper limit of resources allowed to be provided by the system.

Pods needs to set "minimum resources to ensure operation" to ensure long-term stable operation, but the resources that pod can use are often not guaranteed.

In general, Kubernetes increases resource utilization by setting the values of request and limit to specify the oversold ratio. The scheduling of K8S is based on request, not limit. Borg increases resource utilization by 20% by using "non-guranteed" resources.

In a system where resources are "oversold" (total limits > machine capacity), containers are kill when resources are exhausted. Ideally, those "unimportant" containers are kill first.

For each Resource, the container can be divided into three QoS Classes: Guaranteed, Burstable, and and Best-Effort, and their QoS levels are decreasing in turn. The underlying layer of K8S actually realizes the division of different levels of QoS through limit and request values.

Guaranteed if the limit and request of all Resource of all Container in Pod are equal and not zero, then the QoS Class of this Pod is Guaranteed.

Note that if a container specifies only limit but not request, the value of request is equal to the value of limit.

Examples:containers: name: foo resources: limits: cpu: 10m memory: 1Gi name: bar resources: limits: cpu: 100m memory: 100Micontainers: name: foo resources: limits: cpu: 10m memory: 1Gi requests: cpu: 10m memory: 1Gi name: bar resources: limits: cpu: 100m memory: 100Mi requests: cpu: 100m memory: 100Mi

Best-Effort if all request and limit of Resource in all containers in Pod are not assigned, then the QoS Class of this Pod is Best-Effort.

Examples:containers: name: foo resources: name: bar resources:

The Pod QoS Class of Burstable belongs to Burstable except that it matches the scenarios of Guaranteed and Best-Effort. When the limit value is not specified, its valid value is actually the Capacity of the corresponding Node Resource.

Examples: container bar does not specify Resource.

Containers: name: foo resources: limits: cpu: 10m memory: 1Gi requests: cpu: 10m memory: 1Gi name: bar

Containers foo and bar specify different Resource.

Containers: name: foo resources: limits: memory: 1Gi name: bar resources: limits: cpu: 100m

Container foo does not specify limit, container bar does not specify request and limit.

Containers: name: foo resources: requests: cpu: 10m memory: 1Gi name: bar the difference between compressible / incompressible resources

Kube-scheduler scheduling is completed by Node Select based on the request value of Pod. Pod and all of its Container do not allow the valid value (if have) specified by Consume limit.

How request and limit work depends on whether resources are compressed

Guarantee of compressible resources

Currently, only CPU is supported.

Pods ensures that the total amount of CPU requested can be obtained, but does not get additional CPU time. This does not fully guarantee that the container can use the set resource lower limit value, because CPU isolation is container-level. Cgroups resource isolation at the Pod level will then be introduced to solve this problem.

Excessive / competitive use of CPU resources will be based on CPU request settings. It can be understood by cpu.share that different proportions of time slices are allocated. If a container A's request is set to 600 milli and container B is set to 300mili, when the two compete for CPU time, it will be allocated by the ratio of 2:1.

If the upper limit of the Pod CPU resource limit is reached, CPU will slow down (throttled) instead of kill pod. If pod does not set the upper limit of limit, pods can use more than the upper limit of CPU limit.

Guarantee of incompressible resources

Only memory is currently supported.

Pods can get the total amount of memory set by requests. If a pod exceeds the memory request value, the pod may be dropped by kill when other pod needs memory. But if pods uses less memory than the request value, they will not be kill unless the system task or daemon requires more resources. (to put it bluntly, it still depends on how all the processes on the system are scored when oom killer is triggered. )

When Pods uses more memory than limit, and a process in a container in pod uses a lot of memory, the process will be dropped by the kernel kill.

Management and scheduling strategy

Pods is confirmed by kubelet and dispatched by scheduler, and ensures that the total requests of all containers is within the range of Node allocable capacity based on the requests value assigned to the container. Https://github.com/fabric8io/jenkinshift/blob/master/vendor/k8s.io/kubernetes/docs/proposals/node-allocatable.md

How to recycle Resources according to different QoS

CPU when CPU usage cannot reach the request value, such as system tasks and daemons use a lot of CPU, then Pods will not be reduced by kill,CPU efficiency (throttled).

Memory memory is an incompressible resource. From the perspective of memory management, make the following distinction:

Best-Effort pods has the lowest priority. If the system runs out of memory, processes in this type of pods are the first to be kill. These containers can use any amount of free memory on the system.

Guaranteed pods has the highest priority. They ensure that the upper limit of the limit set by the container will not be kill. It is expelled only if the system has memory pressure and there is no lower priority container.

Burstable pods has some form of minimum resource guarantee, but more resources can be used when needed. When the system has a memory bottleneck, once the memory exceeds their request value and there are no containers of type Best-Effort, these containers are dropped by kill first.

OOM Score configuration on Node Pod OOM scoring configuration

Badness () in mm/oom_kill.c gives each process an OOM score, and processes with higher OOM scores are more likely to be kill. The score depends on:

It mainly depends on the memory consumption of the process, including resident memory, pagetable and swap usage.

Is generally a percentage of memory consumption * 10 (percent-times-ten)

With reference to user permissions, such as processes started by root permissions, the score will be reduced by 30%.

OOM score factor: / proc/pid/oom_score_adj (plus or minus) and / proc/pid/oom_adj (multiplication and division)

Coefficient adjustment of oom_adj:-15 ~ 15

Oom_score_adj:oom_score will add the value of oom_score_adj.

The final value of oom score is still at 0,1000.

Here is a script to calculate the oom_score score TPO10 process (the process that is most likely to be killed by oom killer) on the system:

# vim oomscore.shangxunxinxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx Do printf "% 2d% 5d% s\ n"\ "$(cat $proc/oom_score)"\ "$(basename $proc)"\ "$(cat $proc/cmdline | tr'\ 0' | head-c 50)" done 2 > / dev/null | sort-nr | head-n 10

Here are several K8S QoS levels of OOM score:

Best-effort

Set OOM_SCORE_ADJ: 1000

So the OOM_SCORE value of the best-effort container is 1000

Guaranteed

Set OOM_SCORE_ADJ:-998

So the OOM_SCORE value of the guaranteed container is 0 or 1

Burstable

If the total memory request is greater than 99.9% of the available memory, the OOM_SCORE_ADJ is set to 2. Otherwise, OOM_SCORE_ADJ = 1000-10 * (% of memory requested), which ensures that the POD OOM_SCORE of burstable is > 1

If memory request is set to 0, the default setting is 999. So if burstable pods and guaranteed pods conflict, the former will be kill.

If burstable pod uses less memory than the request value, its OOM_SCORE is less than 1000. If best-effort pod conflicts with these burstable pod, best-effort pod will be dropped by kill first.

If the process in the burstable pod container uses more memory than the request value, OOM_SCORE is set to 1000. On the contrary, the OOM_SCORES is less than 1000.

In a heap of burstable pod, using a pod with memory exceeding the request value takes precedence over a pod that uses less than the request value.

If there is a conflict between multiple processes in burstable pod, the OOM_SCORE will be set at random, not restricted by "request & limit".

Pod infra containers or Special Pod init process

OOM_SCORE_ADJ:-998

Kubelet, Docker

OOM_SCORE_ADJ:-999 (won't be OOM killed)

Key processes on the system, if they conflict with the guranteed process, will be given priority by kill. It will be put into a separate cgroup in the future, and memory will be limited.

Known issue and potential optimization points

Support for swap: the current QoS policy defaults to swap off. If swap is enabled, those guaranteed container resources can be used to reach the threshold, and disks can also be used to provide memory allocation. Finally, when there is not enough space in swap, the processes in pod will be kill. At this point, node needs to take the swap space into account when providing an isolation policy.

Provide user-specified priority: the user asks kubelet to specify which tasks can be kill.

Source code analysis

QoS source code is located in: pkg/kubelet/qos, the code is very simple, mainly two files pkg/kubelet/qos/policy.go,pkg/kubelet/qos/qos.go. The OOM_SCORE_ADJ corresponding to each QoS Class discussed above is defined as follows:

Pkg/kubelet/qos/policy.go:21const (PodInfraOOMAdj int =-998 KubeletOOMScoreAdj int =-999 DockerOOMScoreAdj int =-999 KubeProxyOOMScoreAdj int =-999 guaranteedOOMScoreAdj int =-998 besteffortOOMScoreAdj int = 1000)

The OOM_SCORE_ADJ of the container is calculated as follows:

Pkg/kubelet/qos/policy.go:40func GetContainerOOMScoreAdjust (pod * v1.Pod, container * v1.Container, memoryCapacity int64) int {switch GetPodQOS (pod) {case Guaranteed: / / Guaranteed containers should be the last to get killed. Return guaranteedOOMScoreAdj case BestEffort: return besteffortOOMScoreAdj} / / Burstable containers are a middle tier, between Guaranteed and Best-Effort. Ideally, / / we want to protect Burstable containers that consume less memory than requested. / / The formula below is a heuristic. A container requesting for 10% of a system's / / memory will have an OOM score adjust of 900. If a process in container Y / / uses over 10% of memory, its OOM score will be 1000. The idea is that containers / / which use more than their request will have an OOM score of 1000 and will be prime / / targets for OOM kills. / / Note that this is a heuristic, it won't work if a container has many small processes. MemoryRequest: = container.Resources.Requests.Memory () .Value () oomScoreAdjust: = 1000-(1000*memoryRequest) / memoryCapacity / / A guaranteed pod using 100% of memory can have an OOM score of 10.Ensure / / that burstable pods have a higher OOM score adjustment. If int (oomScoreAdjust) < (1000 + guaranteedOOMScoreAdj) {return (1000 + guaranteedOOMScoreAdj)} / / Give burstable pods a higher chance of survival over besteffort pods. If int (oomScoreAdjust) = = besteffortOOMScoreAdj {return int (oomScoreAdjust-1)} return int (oomScoreAdjust)}

The method to get the QoS Class of Pod is:

Pkg/kubelet/qos/qos.go:50// GetPodQOS returns the QoS class of a pod.// A pod is besteffort if none of its containers have specified any requests or limits.// A pod is guaranteed only when requests and limits are specified for all the containers and they are equal.// A pod is burstable if limits and requests do not match across all containers.func GetPodQOS (pod * v1.Pod) QOSClass {requests: = v1.ResourceList {} limits: = v1.ResourceList {} zeroQuantity: = resource.MustParse ("0") isGuaranteed: = true for _ Container: = range pod.Spec.Containers {/ / process requests for name Quantity: = range container.Resources.Requests {if! supportedQoSComputeResources.Has (string (name)) {continue} if quantity.Cmp (zeroQuantity) = = 1 {delta: = quantity.Copy () if _ Exists: = requests [name] ! exists {requests [name] = * delta} else {delta.Add (requests[ name]) requests [name] = * delta } / / process limits qosLimitsFound: = sets.NewString () for name Quantity: = range container.Resources.Limits {if! supportedQoSComputeResources.Has (string (name)) {continue} if quantity.Cmp (zeroQuantity) = = 1 {qosLimitsFound.Insert (string (name)) Delta: = quantity.Copy () if _ Exists: = limits [name] ! exists {limits [name] = * delta} else {delta.Add (certificates [name]) limits [name] = * delta } if len (qosLimitsFound)! = len (supportedQoSComputeResources) {isGuaranteed = false}} if len (requests) = = 0 & & len (limits) = 0 {return BestEffort} / / Check is requests match limits for all resources. If isGuaranteed {for name, req: = range requests {if lim, exists: = limits [name] ! exists | | lim.Cmp (req)! = 0 {isGuaranteed = false break} if isGuaranteed & & len (requests) = = len (limits) {return Guaranteed} return Burstable}

PodQoS will be called in the Predicates phase of eviction_manager and scheduler, which means it will be used in the K8s processing overconfiguration and scheduling pre-selection phase.

After reading this, the article "what is the concept of Kubernetes Resource QoS Classes" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself to understand it. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.