Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the principle of Kubernetes's preemption of resources over Critical Pod?

2025-03-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the resource preemption principle of Kubernetes to Critical Pod". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Preemption of Critical resources during Kubelet Predicate Admit

In the Predicate Admit process, kubelet performs various Predicate admission checks on Pods, including GeneralPredicates checking whether the node has sufficient cpu,mem,gpu resources. If GeneralPredicates admission detection fails, Admit fails directly for nonCriticalPod, but if it is CriticalPod, it will trigger kubelet preemption to preempt resources, kill some Pods to release resources according to certain rules, and preempt successfully, then Admit succeeds.

The source of the process should start with the process initialized by kubelet.

Pkg/kubelet/kubelet.go:315// NewMainKubelet instantiates a new Kubelet object along with all the required internal modules.// No initialization of Kubelet and its modules should happen here.func NewMainKubelet (...) (* Kubelet, error) {... CriticalPodAdmissionHandler: = preemption.NewCriticalPodAdmissionHandler (klet.GetActivePods, killPodNow (klet.podWorkers, kubeDeps.Recorder), kubeDeps.Recorder) klet.admitHandlers.AddPodAdmitHandler (lifecycle.NewPredicateAdmitHandler (klet.getNodeAnyWay, criticalPodAdmissionHandler, klet.containerManager.UpdatePluginResources)) / / apply functional Option's for _, opt: = range kubeDeps.Options {opt (klet)}. Return klet, nil}

When NewMainKubelet initializes kubelet, the special feature of Admit that registers criticalPodAdmissionHandler,CriticalPod through AddPodAdmitHandler is criticalPodAdmissionHandler.

Then, let's go into the predicateAdmitHandler process of kubelet and look at the processing logic after the GeneralPredicates failure.

Pkg/kubelet/lifecycle/predicate.go:58func (w * predicateAdmitHandler) Admit (attrs * PodAdmitAttributes) PodAdmitResult {... Fit, reasons, err: = predicates.GeneralPredicates (podWithoutMissingExtendedResources, nil, nodeInfo) if err! = nil {message: = fmt.Sprintf ("GeneralPredicates failed due to% v, which is unexpected.", err) glog.Warningf ("Failed to admit pod% v -% s", format.Pod (pod), message) return PodAdmitResult {Admit: fit Reason: "UnexpectedAdmissionError", Message: message,}} if! fit {fit, reasons, err = w.admissionFailureHandler.HandleAdmissionFailure (pod, reasons) if err! = nil {message: = fmt.Sprintf ("Unexpected error while attempting to recover from admission failure:% v" Err) glog.Warningf ("Failed to admit pod% v -% s", format.Pod (pod), message) return PodAdmitResult {Admit: fit, Reason: "UnexpectedAdmissionError", Message: message }... Return PodAdmitResult {Admit: true,}}

When GeneralPredicates checking cpu,mem,gpu resources for Pod in kubelet predicateAdmitHandler, if it is found that the Admit fails due to insufficient resources, HandleAdmissionFailure is then called for additional processing. It is mentioned that when kubelet initializes, it registers criticalPodAdmissionHandler as HandleAdmissionFailure.

CriticalPodAdmissionHandler struct is defined as follows:

Pkg/kubelet/preemption/preemption.go:41type CriticalPodAdmissionHandler struct {getPodsFunc eviction.ActivePodsFunc killPodFunc eviction.KillPodFunc recorder record.EventRecorder}

The HandleAdmissionFailure method of CriticalPodAdmissionHandler deals with the special logic of CriticalPod.

Pkg/kubelet/preemption/preemption.go:66// HandleAdmissionFailure gracefully handles admission rejection, and, in some cases,// to allow admission of the pod despite its previous failure.func (c * CriticalPodAdmissionHandler) HandleAdmissionFailure (pod * v1.Pod, failureReasons [] algorithm.PredicateFailureReason) (bool, [] algorithm.PredicateFailureReason, error) {if! kubetypes.IsCriticalPod (pod) | |! utilfeature.DefaultFeatureGate.Enabled (features.ExperimentalCriticalPodAnnotation) {return false, failureReasons Nil} / / InsufficientResourceError is not a reason to reject a critical pod. / / Instead of rejecting, we free up resources to admit it, if no other reasons for rejection exist. NonResourceReasons: = algorithm.PredicateFailureReason {} resourceReasons: = [] * admissionRequirement {} for _, reason: = range failureReasons {if r, ok: = reason. (* predicates.InsufficientResourceError) Ok {resourceReasons = append (resourceReasons, & admissionRequirement {resourceName: r.ResourceName, quantity: r.GetInsufficientAmount (),})} else {nonResourceReasons = append (nonResourceReasons) Reason)}} if len (nonResourceReasons) > 0 {/ / Return only reasons that are not resource related, since critical pods cannot fail admission for resource reasons. Return false, nonResourceReasons, nil} err: = c.evictPodsToFreeRequests (admissionRequirementList (resourceReasons)) / / if no error is returned, preemption succeeded and the pod is safe to admit Return err = = nil, nil, err}

If Pod is not CriticalPod, or ExperimentalCriticalPodAnnotation Feature Gate is closed, false is returned directly, indicating that Admit failed.

Determine whether the failureReasons of Admit contains predicate.InsufficientResourceError, and if so, call evictPodsToFreeRequests to trigger kubelet preemption. Note that the preemption here is different from scheduler preemtion, so don't get confused.

EvictPodsToFreeRequests is the logical implementation of kubelet preemption to preempt resources, and its core is to call getPodsToPreempt to select the appropriate Pods (podsToPreempt) to be killed.

Pkg/kubelet/preemption/preemption.go:121// getPodsToPreempt returns a list of pods that could be preempted to free requests > = requirementsfunc getPodsToPreempt (pods [] * v1.Pod, requirements admissionRequirementList) ([] * v1.Pod, error) {bestEffortPods, burstablePods, guaranteedPods: = sortPodsByQOS (pods) / / make sure that pods exist to reclaim the requirements unableToMeetRequirements: = requirements.subtract (append (bestEffortPods, burstablePods...), guaranteedPods...).) If len (unableToMeetRequirements) > 0 {return nil, fmt.Errorf ("no set of running pods found to reclaim resources:% v", unableToMeetRequirements.toString ())} / / find the guaranteed pods we would need to evict if we already evicted ALL burstable and besteffort pods. GuarateedToEvict, err: = getPodsToPreemptByDistance (guaranteedPods, requirements.subtract (append (bestEffortPods, burstablePods...)...)) If err! = nil {return nil, err} / / Find the burstable pods we would need to evict if we already evicted ALL besteffort pods, and the required guaranteed pods. BurstableToEvict, err: = getPodsToPreemptByDistance (burstablePods, requirements.subtract (append (bestEffortPods, guarateedToEvict...)...)) If err! = nil {return nil, err} / / Find the besteffort pods we would need to evict if we already evicted the required guaranteed and burstable pods. BestEffortToEvict, err: = getPodsToPreemptByDistance (bestEffortPods, requirements.subtract (append (burstableToEvict, guarateedToEvict...)...)) If err! = nil {return nil, err} return append (append (bestEffortToEvict, burstableToEvict...), guarateedToEvict...), nil}

The logic for selecting the Pods to be killed in kubelet preemtion is as follows:

If a Resource request quantity of the Pod exceeds the Resource request quantity of all current bestEffortPods, burstablePods, and guaranteedPods, then podsToPreempt is nil, which means there is no suitable Pods to release.

If there are not enough resources for burstablePods to release all bestEffortPods, then select guaranteedPods (guarateedToEvict). The rules of selection are:

Rule 1: the less Pods is released, the better.

Rule 2: the less resources are released, the better

Rule one has a higher priority than rule two.

If it is not enough to release all bestEffortPods and guarateedToEvict resources, select burstablePods (burstableToEvict). The rules of selection are the same as above.

If it is not enough to release all burstableToEvict and guarateedToEvict resources, select bestEffortPods (bestEffortToEvict). The rules of selection are the same as above.

In other words, the lower the priority of Pod Resource QoS, the first to be preempted. The selection of Pods within the same QoS Level follows the following rules:

Rule 1: the less Pods is released, the better.

Rule 2: the less resources are released, the better

Rule one has a higher priority than rule two.

Special treatment of CriticalPod by Priority Admission Controller

Let's first take a look at several special types of CriticalPod reserved by the system:

ClusterCriticalPod: PriorityClass Name is the Pod of system-cluster-critical.

NodeCriticalPod:PriorityClass Name is the Pod of system-node-critical.

If Priority AdmissionController is started in AdmissionController, there is also special handling of CriticalPod for checking the Priority when creating the Pod.

The main function of Priority Admission Controller is to replace the PriorityClassName specified in Pod with the corresponding Spec.Pritory value.

Plugin/pkg/admission/priority/admission.go:138// admitPod makes sure a new pod does not set spec.Priority field. It also makes sure that the PriorityClassName exists if it is provided and resolves the pod priority from the PriorityClassName.func (p * priorityPlugin) admitPod (an admission.Attributes) error {operation: = a.GetOperation () pod, ok: = a.GetObject (). (* api.Pod) if! ok {return errors.NewBadRequest ("resource was marked with kind Pod but was unable to be converted")} / / Make sure that the client has not set `priority` at the time of pod creation. If operation = = admission.Create & & pod.Spec.Priority! = nil {return admission.NewForbidden (a, fmt.Errorf ("the integer value of priority must not be provided in pod spec. Priority admission controller populates the value from the given PriorityClass name"))} if utilfeature.DefaultFeatureGate.Enabled (features.PodPriority) {var priority int32 / / TODO: @ ravig-This is for backwards compatibility to ensure that critical pods with annotations just work fine. / / Remove when no longer needed. If len (pod.Spec.PriorityClassName) = = 0 & & utilfeature.DefaultFeatureGate.Enabled (features.ExperimentalCriticalPodAnnotation) & & kubelettypes.IsCritical (a.GetNamespace () Pod.Annotations) {pod.Spec.PriorityClassName = scheduling.SystemClusterCritical} if len (pod.Spec.PriorityClassName) = = 0 {var err error priority Err = p.getDefaultPriority () if err! = nil {return fmt.Errorf ("failed to get default priority class:% v", err)}} else {/ / Try resolving the priority class name. Pc, err: = p.lister.Get (pod.Spec.PriorityClassName) if err! = nil {if errors.IsNotFound (err) {return admission.NewForbidden (a, fmt.Errorf ("no PriorityClass with name% v was found") Pod.Spec.PriorityClassName)} return fmt.Errorf ("failed to get PriorityClass with name% s:% v", pod.Spec.PriorityClassName Err)} priority = pc.Value} pod.Spec.Priority = & priority} return nil}

When all the following conditions are met at the same time, the Spec.PriorityClassName assigned to Pod is system-cluster-critical, that is, ClusterCriticalPod.

If Enable, ExperimentalCriticalPodAnnotation and PodPriority Feature Gate

The Pod does not specify a PriorityClassName

The Pod belongs to kube-system namespace

The Pod called scheduler.alpha.kubernetes.io/critical-pod= "" Annotation.

This is the end of the content of "what is the principle of Kubernetes's preemption of resources over Critical Pod". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report