What is the implementation principle of kubernetes resource QOS mechanism? 07/15 Update SLTechnology News&Howtos

What is the implementation principle of kubernetes resource QOS mechanism?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the implementation principle of kubernetes resource QOS mechanism". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the implementation principle of kubernetes resource QOS mechanism"?

QOS is a resource protection mechanism in K8s, which is mainly a control technology for incompressible resources such as memory. For example, by constructing OOM scores for different Pod and containers in memory, and with the assistance of the kernel strategy, when the node memory resources are insufficient, the kernel can, according to the priority of the policy, give priority to kill which priority is lower (the higher the score, the lower the priority). Today, let's analyze the implementation behind it.

1. Key basic characteristics

1.1 everything is a document

Everything is a file in Linux, and the control of CGroup itself is done through a configuration file, which is the configuration of a container with a memory Lmits of 200m Pod created by me.

# pwd/sys/fs/cgroup# cat. / memory/kubepods/pod8e172a5c-57f5-493d-a93d-b0b64bca26df/f2fe67dc90cbfd57d873cd8a81a972213822f3f146ec4458adbe54d868cf410c/memory.limit_in_bytes2097152001.2 kernel memory configuration

Here we focus on two memory-related configurations: the value of VMOvercommitMemory is 1, which means that all physical memory resources are allocated by running. Note that the value of SWAP resource VMPanicOnOOM is 0: when memory is insufficient, oom_killer is triggered to select some processes for kill,QOS by affecting its kill process.

Func setupKernelTunables (option KernelTunableBehavior) error {desiredState: = map [string] int {utilsysctl.VMOvercommitMemory: utilsysctl.VMOvercommitMemoryAlways, utilsysctl.VMPanicOnOOM: utilsysctl.VMPanicOnOOMInvokeOOMKiller, utilsysctl.KernelPanic: utilsysctl.KernelPanicRebootTimeout, utilsysctl.KernelPanicOnOops: utilsysctl.KernelPanicOnOopsAlways, utilsysctl.RootMaxKeys: utilsysctl.RootMaxKeysSetting, utilsysctl.RootMaxBytes: utilsysctl.RootMaxBytesSetting } 2.QOS scoring Mechanism and decision implementation

The QOS scoring mechanism is mainly based on the resource restrictions in Requests and limits for type determination and scoring. Let's take a quick look at the implementation of this part.

2.1 build a container list based on container determination of QOS type 2.1.1

Traverse the list of all containers. Note that all initialization containers and business containers are included here.

Requests: = v1.ResourceList {} limits: = v1.ResourceList {} zeroQuantity: = resource.MustParse ("0") isGuaranteed: = true allContainers: = [] v1.Container {} allContainers = append (allContainers, pod.Spec.Containers...) / / append all initialization containers allContainers = append (allContainers, pod.Spec.InitContainers...) 2.1.2 handle Requests and limits

Here, we traverse all the resources restricted by Requests and Limits, and add them to different resource collections. Whether or not Guaranteed is determined is mainly based on whether the resources in limits contain CPU and memory resources, which can be called Guaranteed.

For _, container: = range allContainers {/ / process requests for name Quantity: = range container.Resources.Requests {if! isSupportedQoSComputeResource (name) {continue} if quantity.Cmp (zeroQuantity) = = 1 {delta: = quantity.DeepCopy () if _ Exists: = requests [name] ! exists {requests [name] = delta} else {delta.Add (requests [name]) requests [name] = delta } / / process limits qosLimitsFound: = sets.NewString () for name Quantity: = range container.Resources.Limits {if! isSupportedQoSComputeResource (name) {continue} if quantity.Cmp (zeroQuantity) = = 1 {qosLimitsFound.Insert (string (name)) Delta: = quantity.DeepCopy () if _ Exists: = limits [name] ! exists {limits [name] = delta} else {delta.Add (certificates [name]) limits [name] = delta } if! qosLimitsFound.HasAll (string (v1.ResourceMemory) String (v1.ResourceCPU) {/ / must contain all cpu and memory limit isGuaranteed = false}} 2.1.3 BestEffort

If the container in Pod does not have any restrictions on requests and limits, it is BestEffort.

If len (requests) = = 0 & & len (limits) = = 0 {return v1.PodQOSBestEffort} 2.1.4 Guaranteed

If the Guaranteed must be equal in resources, and the limited number is the same

/ / Check is requests match limits for all resources. If isGuaranteed {for name, req: = range requests {if lim, exists: = limits [name] ! exists | | lim.Cmp (req)! = 0 {isGuaranteed = false break} if isGuaranteed & & len (requests) = = len (limits) {return v1.PodQOSGuaranteed} 2.1.5 Burstable

If it's not the above two, it's the last kind of burstable.

Return v1.PodQOSBurstable2.2 QOS OOM scoring mechanism 2.2.1 OOM scoring mechanism

GuaranteedOOMScoreAdj is-998. in fact, this has something to do with the implementation of OOM. A node node is mainly composed of three parts: the kubelet main process, the docker process, and the business container process. In the score of OOM,-1000 indicates that the process will not be kill by oom, and that business process can at least be-999 because you can't guarantee that your business will never have problems, so in QOS-999 is actually reserved by kubelet and docker processes. The rest can be allocated as business containers (the higher the score, the easier it is to be kill)

/ / KubeletOOMScoreAdj is the OOM score adjustment for Kubelet KubeletOOMScoreAdj int =-999 / / DockerOOMScoreAdj is the OOM score adjustment for Docker DockerOOMScoreAdj int =-999 / / KubeProxyOOMScoreAdj is the OOM score adjustment for kube-proxy KubeProxyOOMScoreAdj int =-999 guaranteedOOMScoreAdj int =-998 besteffortOOMScoreAdj int = 10002.2.2 key Pod

Key Pod is a special existence, it can be Burstable or BestEffort type Pod, but OOM score can be the same as Guaranteed. This type of Pod mainly includes three types: static Pod, mirror Pod and high priority Pod.

If types.IsCriticalPod (pod) {return guaranteedOOMScoreAdj}

Decision realization

Func IsCriticalPod (pod * v1.Pod) bool {if IsStaticPod (pod) {return true} if IsMirrorPod (pod) {return true} if pod.Spec.Priority! = nil & & IsCriticalPodBasedOnPriority (* pod.Spec.Priority) {return true} return false} 2.2.3 Guaranteed and BestEffort

Both types have default values of Guaranteed (- 998) and BestEffort (1000), respectively.

Switch v1qos.GetPodQOS (pod) {case v1.PodQOSGuaranteed: / / Guaranteed containers should be the last to get killed. Return guaranteedOOMScoreAdj case v1.PodQOSBestEffort: return besteffortOOMScoreAdj} 2.2.4 Burstable

The key line is: oomScoreAdjust: = 1000-(1000memoryRequest) / memoryCapacity. From this calculation, we can see that the more resources we apply for, the smaller the timing value calculated in (1000memoryRequest) / memoryCapacity, that is, the larger the final result. In fact, it shows that if we occupy less memory, the score will be higher, and such containers will be relatively easy to be kill.

MemoryRequest: = container.Resources.Requests.Memory () .Value () oomScoreAdjust: = 1000-(1000*memoryRequest) / memoryCapacity / / A guaranteed pod using 100% of memory can have an OOM score of 10.Ensure that burstable pods have a higher OOM score adjustment. If int (oomScoreAdjust) < (1000 + guaranteedOOMScoreAdj) {return (1000 + guaranteedOOMScoreAdj)} / / Give burstable pods a higher chance of survival over besteffort pods. If int (oomScoreAdjust) = = besteffortOOMScoreAdj {return int (oomScoreAdjust-1)} return int (oomScoreAdjust) so far, I believe you have a deeper understanding of "what is the implementation principle of kubernetes resource QOS mechanism". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.