How to start Kubernetes Eviction Manager 07/06 Update SLTechnology News&Howtos

How to start Kubernetes Eviction Manager

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to start Kubernetes Eviction Manager". The content in the article is simple and clear, easy to learn and understand. Please follow the editor's train of thought to study and learn "how to start Kubernetes Eviction Manager".

Kubernetes Eviction Manager source code analysis where to start Kubernetes Eviction Manager

When Kubelet instantiates a kubelet object, it calls eviction.NewManager to create a new evictionManager object.

Pkg/kubelet/kubelet.go:273func NewMainKubelet (kubeCfg * componentconfig.KubeletConfiguration, kubeDeps * KubeletDeps, standaloneMode bool) (* Kubelet, error) {. Thresholds, err: = eviction.ParseThresholdConfig (kubeCfg.EvictionHard, kubeCfg.EvictionSoft, kubeCfg.EvictionSoftGracePeriod, kubeCfg.EvictionMinimumReclaim) if err! = nil {return nil, err} evictionConfig: = eviction.Config {PressureTransitionPeriod: kubeCfg.EvictionPressureTransitionPeriod.Duration, MaxPodGracePeriodSeconds: int64 (kubeCfg.EvictionMaxPodGracePeriod), Thresholds: thresholds, KernelMemcgNotification: kubeCfg.ExperimentalKernelMemcgNotification }. / / setup eviction manager evictionManager, evictionAdmitHandler, err: = eviction.NewManager (klet.resourceAnalyzer, evictionConfig, killPodNow (klet.podWorkers, kubeDeps.Recorder), klet.imageManager, kubeDeps.Recorder, nodeRef, klet.clock) if err! = nil {return nil, fmt.Errorf ("failed to initialize eviction manager:% v" Err)} klet.evictionManager = evictionManager klet.admitHandlers.AddPodAdmitHandler (evictionAdmitHandler).}

When kubelet executes the Run method to start work, it starts a goroutine and executes updateRuntimeUp every 5s. In updateRuntimeUp, after confirming the success of runtime startup, initializeRuntimeDependentModules is called to complete the initialization of the runtime dependency module.

Pkg/kubelet/kubelet.go:1219func (kl * Kubelet) Run (updates 0 {thresholdsNotYetResolved: = thresholdsMet (m.thresholdsMet, observations, true) thresholds = mergeThresholds (thresholds, thresholdsNotYetResolved)} / / determine the set of thresholds whose stats have been updated since the last sync thresholds = thresholdsUpdatedStats (thresholds, observations) M.lastObservations) / / track when a threshold was first observed now: = m.clock.Now () thresholdsFirstObservedAt: = thresholdsFirstObservedAt (thresholds, m.thresholdsFirstObservedAt, now) / / the set of node conditions that are triggered by currently observed thresholds nodeConditions: = nodeConditions (thresholds) / / track when a node condition was last observed nodeConditionsLastObservedAt: = nodeConditionsLastObservedAt (nodeConditions, m.nodeConditionsLastObservedAt Now) / / node conditions report true if it has been observed within the transition period window nodeConditions = nodeConditionsObservedSince (nodeConditionsLastObservedAt, m.config.PressureTransitionPeriod, now) / / determine the set of thresholds we need to drive eviction behavior (i.e. All grace periods are met) thresholds = thresholdsMetGracePeriod (thresholdsFirstObservedAt) Now) / / update internal state m.Lock () m.nodeConditions = nodeConditions m.thresholdsFirstObservedAt = thresholdsFirstObservedAt m.nodeConditionsLastObservedAt = nodeConditionsLastObservedAt m.thresholdsMet = thresholds m.lastObservations = observations m.Unlock () / / determine the set of resources under starvation starvedResources: = getStarvedResources (thresholds) if len (starvedResources) = 0 {glog.V (3) .Infof ( "eviction manager: no resources are starved") return} / / rank the resources to reclaim by eviction priority sort.Sort (byEvictionPriority (starvedResources)) resourceToReclaim: = starvedResources [0] glog.Warningf ("eviction manager: attempting to reclaim% v" ResourceToReclaim) / / determine if this is a soft or hard eviction associated with the resource softEviction: = isSoftEvictionThresholds (thresholds, resourceToReclaim) / / record an event about the resources we are now attempting to reclaim via eviction m.recorder.Eventf (m.nodeRef, v1.EventTypeWarning, "EvictionThresholdMet", "Attempting to reclaim% s", resourceToReclaim) / / check if there are node-level resources we can reclaim to reduce pressure before evicting end-user pods If m.reclaimNodeLevelResources (resourceToReclaim, observations) {glog.Infof ("eviction manager: able to reduce% v pressure without evicting pods.", resourceToReclaim) return} glog.Infof ("eviction manager: must evict pod (s) to reclaim% v", resourceToReclaim) / / rank the pods for eviction rank Ok: = m.resourceToRankFunc [resourceToReclaim] if! ok {glog.Errorf ("eviction manager: no ranking function for resource% s", resourceToReclaim) return} / / the only candidates viable for eviction are those pods that had anything running. ActivePods: = podFunc () if len (activePods) = 0 {glog.Errorf ("eviction manager: eviction thresholds have been met, but no pods are active to evict") return} / / rank the running pods for eviction for the specified resource rank (activePods, statsFunc) glog.Infof ("eviction manager: pods ranked for eviction:% s" Format.Pods (activePods)) / / we kill at most a single pod during each eviction interval for i: = range activePods {pod: = activePods [I] status: = v1.PodStatus {Phase: v1.PodFailed, Message: fmt.Sprintf (message, resourceToReclaim), Reason: reason } / / record that we are evicting the pod m.recorder.Eventf (pod, v1.EventTypeWarning, reason, fmt.Sprintf (message) ResourceToReclaim) gracePeriodOverride: = int64 (0) if softEviction {gracePeriodOverride = m.config.MaxPodGracePeriodSeconds} / / this is a blocking call and should only return when the pod and its containers are killed. Err: = m.killPodFunc (pod, status, & gracePeriodOverride) if err! = nil {glog.Infof ("eviction manager: pod% s failed to evict% v", format.Pod (pod), err) continue} / / success, so we return until the next housekeeping interval glog.Infof ("eviction manager: pod% s evicted successfully" Format.Pod (pod)) return} glog.Infof ("eviction manager: unable to evict any pods from the node")}

The code is neatly written and the comments are in place. It's great. The key process is as follows:

When registering Evict Pod through buildResourceToRankFunc and buildResourceToNodeReclaimFuncs, the ranking function of various Resource and the Reclaim function of recycling Node Resource are respectively registered.

Get the StatsFunc of Eviction SignalObservation and Pod from cAdvisor through makeSignalObservations (needed for subsequent Rank of Pods).

If kubelet is configured with-experimental-kernel-memcg-notification and is true, start soft & hard memory notification through startMemoryThresholdNotifier. When system usage reaches soft & hard memory thresholds at the first time, it will notify kubelet immediately and trigger the process of resource recovery by evictionManager.synchronize. This improves the real-time performance of eviction.

Based on the Observation (observasions) calculated from the cAdvisor data and the configured thresholds, the thresholds of this Met is calculated by thresholdsMet.

Then, according to the Observation (observasions) and thresholdsMet calculated from the cAdvisor data, the recorded but unresolved thresholds is calculated by thresholdsMet, and then merged with the thresholds in the previous step.

According to the time of Signal in lastObservations, filter the thresholds by comparing the time in Signal in observasions.

Update thresholdsFirstObservedAt, nodeConditions.

Filter out the thresholds that has experienced grace period time from observed time to now.

Update the internal data of the evictionManager object: nodeConditions,thresholdsFirstObservedAt,nodeConditionsLastObservedAt,thresholds,observations.

Get the starvedResources according to thresholds, and sort it. If memory belongs to starvedResources, then memory sorts first.

Take Resource, which ranks first in starvedResources, and call reclaimNodeLevelResources to recycle this Resource on Node. If the available satisfies thresholdValue+evictionMinimumReclaim after recycling, the process ends and evict user-pods is no longer available.

If the reclaimNodeLevelResources is not enough to meet the requirements, the evict user-pods continues, first sorting all the active Pods according to the previous method of buildResourceToRankFunc registration.

According to the previous order, a sequential call to killPodNow kills the selected pod. If a pod fails in kill, the pod is skipped and the next pod in order is selected for kill. As long as a pod kill succeeds, the end is returned, that is, at most one Pod will be kill in this process.

In the above process, there are two most critical steps, recycling node resources (reclaimNodeLevelResources) and evict user-pods (killPodNow).

Pkg/kubelet/eviction/eviction_manager.go:340// reclaimNodeLevelResources attempts to reclaim node level resources. Returns true if thresholds were satisfied and no pod eviction is required.func (m * managerImpl) reclaimNodeLevelResources (resourceToReclaim v1.ResourceName, observations signalObservations) bool {nodeReclaimFuncs: = m.resourceToNodeReclaimFuncs [resourceToReclaim] for _, nodeReclaimFunc: = range nodeReclaimFuncs {/ / attempt to reclaim the pressured resource. Reclaimed, err: = nodeReclaimFunc () if err = = nil {/ / update our local observations based on the amount reported to have been reclaimed. / / note: this is optimistic, other things could have been still consuming the pressured resource in the interim. Signal: = resourceToSignal [resourceToReclaim] value, ok: = observations [signal] if! ok {glog.Errorf ("eviction manager: unable to find value associated with signal% v" Signal) continue} value.available.Add (* reclaimed) / / evaluate all current thresholds to see if with adjusted observations, we think we have met min reclaim goals if len (thresholdsMet (m.thresholdsMet, observations) True)) = 0 {return true}} else {glog.Errorf ("eviction manager: unexpected error when attempting to reduce% v pressure:% v", resourceToReclaim Err)} return false} pkg/kubelet/pod_workers.go:283// killPodNow returns a KillPodFunc that can be used to kill a pod.// It is intended to be injected into other modules that need to kill a pod.func killPodNow (podWorkers PodWorkers, recorder record.EventRecorder) eviction.KillPodFunc {return func (pod * v1.Pod, status v1.PodStatus) GracePeriodOverride * int64) error {/ / determine the grace period to use when killing the pod gracePeriod: = int64 (0) if gracePeriodOverride! = nil {gracePeriod = * gracePeriodOverride} else if pod.Spec.TerminationGracePeriodSeconds! = nil {gracePeriod = * pod.Spec.TerminationGracePeriodSeconds} / / we timeout and return an error if we don't get a callback within a reasonable time. / / the default timeout is relative to the grace period (we settle on 2s to wait for kubelet- > runtime traffic to complete in sigkill) timeout: = int64 (gracePeriod + (gracePeriod / 2)) minTimeout: = int64 (2) if timeout < minTimeout {timeout = minTimeout} timeoutDuration: = time.Duration (timeout) * Time.Second / / open a channel we block against until we get a result type response struct {err error} ch: = make (chan response) podWorkers.UpdatePod (& UpdatePodOptions {Pod: pod UpdateType: kubetypes.SyncPodKill, OnCompleteFunc: func (err error) {ch

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.