Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the priority queue for Kubernetes Scheduler

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the priority queue of Kubernetes Scheduler". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the priority queue of Kubernetes Scheduler".

Since Kubernetes 1.8, Scheduler has provided preemptive scheduling based on Pod Priorty, which I have analyzed in depth in parsing Pod priority-based preemptive scheduling and Kubernetes 1.8 preemptive scheduling Preemption source code analysis in Kubernetes 1.8. But this is not enough. At that time, the scheduling queue only has the type of FIFO and does not support priority queues. This will cause the High Priority Pod to preempt the Lower Priority Pod and then queue in the FIFO queue again, which will often cause the preempted resources to be occupied by the Lower Priority Pod in front of the queue, resulting in High Priority Pod Starvation problems. To alleviate this problem, the Pod priority scheduling queue, PriorityQueue, is provided from Kubernetes 1.9, which also requires the user to open the PodPriority Feature Gate.

PriorityQueuePriorityQueue Struct

First take a look at the structural definition of PriorityQueue.

Type PriorityQueue struct {lock sync.RWMutex cond sync.Cond activeQ * Heap unschedulableQ * UnschedulablePodsMap nominatedPods map [string] [] * v1.Pod receivedMoveRequest bool}

One of the Sub-Queue of activeQ:PriorityQueue is an ordered Heap structure, which stores the Pending Pod-related information to be scheduled according to the order of Pod priority decreasing from high to low. The highest priority Pod information is at the top, and the highest priority Pod information will be obtained when Pop Heap.

One of the Sub-Queue of unschedulableQ:PriorityQueue is mainly an unordered Map,key for pod.Name + "_" + pod.Namespace,value for those who have attempted to schedule and failed to schedule the UnSchedulable Pod Object.

NominatedPods: Map structure, key node name,value, Nominated Pod Objects on the Node. When preemptive scheduling occurs, preemptor pods will type NominatedNodeName Annotation, indicating that after the logic of preemptive scheduling, the Pod hopes to be dispatched to the NominatedNodeName Node, which will be taken into account when scheduling to prevent the preemptive scheduling of the high-priority Pods from releasing the low-priority Pods to the time when it is scheduled again, and the preempted resources are occupied by the low-priority Pods. I will write a separate blog to analyze how scheduler deals with Nominated Pods.

ReceivedMoveRequest: this value is set to true when scheduler moves Pods from unschedulableQ to activeQ. When scheduler Pop a Pods from activeQ, this value is set to false. This indicates whether the scheduler receives a Move request when it wants to schedule a Pod. When an Error occurs in a schedule, an attempt is made to rejoin the UnSchedulable Pod to the scheduling queue (unSchedulableQ or activeQ), and only if the receivedMoveRequest is false and the Pod Condition Status is False or Unschedulable will the Pod Add be sent to unschedulableQ (or Update it).

ActiveQ

Active is the real Heap for priority scheduling, so let's move on to the implementation of this Heap.

Type Heap struct {data * heapData} type heapData struct {items map [string] * heapItem queue [] string keyFunc KeyFunc lessFunc LessFunc} type heapItem struct {obj interface {} / / The object which is stored in the heap. Index int / / The index of the object's key in the Heap.queue.}

HeapData is the structure that is really used to store items in activeQ:

Items:Map structure, key is the key of the object in Heap, generated by the following keyFunc, value is the heapItem object, and heapItem includes the real Pod Object and its index in Heap.

Queue:string array, which stores the key corresponding to Pod in order, and corresponds to index from 0 to high in the order of priority from high to low.

KeyFunc: the Function of the corresponding key is generated according to Pod Object, with the format "meta.GetNamespace () +" / "+ meta.GetName".

LessFunc: used to compare the Pod Object in the Heap according to the Pod priority (and then determine that the Pod with an index,index of 0 in the Heap has the highest priority, and the Pod priority decreases as the index increases).

NewPriorityQueue

When the scheduler config factory is created, the creation Func of the podQueue is registered as NewSchedulingQueue. NewSchedulingQueue checks whether PodPriority Feature Gate is enable (as of Kubernetes version 1.10, default disable), and if PodPriority enable, invoke NewPriorityQueue creates a PriorityQueue to manage unscheduled Pods. If PodPriority disable, use the familiar FIFO Queue.

Func NewSchedulingQueue () SchedulingQueue {if util.PodPriorityEnabled () {return NewPriorityQueue ()} return NewFIFO ()}

The NewPriorityQueue initialization priority queue code is as follows.

/ / NewPriorityQueue creates a PriorityQueue object.func NewPriorityQueue () * PriorityQueue {pq: = & PriorityQueue {activeQ: newHeap (cache.MetaNamespaceKeyFunc, util.HigherPriorityPod), unschedulableQ: newUnschedulablePodsMap (), nominatedPods: map [string] [] * v1.Pod {},} pq.cond.L = & pq.lock return pq}

Mainly initialize activeQ, unschedulableQ and nominatedPods.

When newHeap initializes activeQ, register the keyFunc and lessFunc for heapData.

When unschedulableQ initializes, register keyFunc.

Cache.MetaNamespaceKeyFunc

When newHeap builds activeQ, you pass in two parameters, the first of which is keyFunc: MetaNamespaceKeyFunc.

Func MetaNamespaceKeyFunc (obj interface {}) (string, error) {if key, ok: = obj. (ExplicitKey) Ok {return string (key), nil} meta, err: = meta.Accessor (obj) if err! = nil {return ", fmt.Errorf (" object has no meta:% v ", err)} if len (meta.GetNamespace ()) > 0 {return meta.GetNamespace () +" / "+ meta.GetName () Nil} return meta.GetName (), nil}

MetaNamespaceKeyFunc generates the Function of the corresponding key based on Pod Object, in the format "meta.GetNamespace () +" / "+ meta.GetName".

Util.HigherPriorityPod

The second parameter passed in by newHeap is lessFunc:HigherPriorityPod.

Const (DefaultPriorityWhenNoDefaultClassExists = 0) func HigherPriorityPod (pod1, pod2 interface {}) bool {return GetPodPriority (pod1. (* v1.Pod)) > GetPodPriority (pod2. (* v1.Pod)} func GetPodPriority (pod * v1.Pod) int32 {if pod.Spec.Priority! = nil {return * pod.Spec.Priority} return scheduling.DefaultPriorityWhenNoDefaultClassExists}

HigherPriorityPod is used to compare Pod Object in Heap according to Pod priority, and then determine its index in Heap.

The Pod with an index of 0 has the highest priority, and as the index increases, the Pod priority decreases.

Note: if pod.Spec.Priority is nil (meaning that the Pod did not have a corresponding global default PriorityClass Object in the cluster when it was created), instead of setting the value in the current global default PriorityClass to the Pod.Spec.Priority, set it to 0. Personally, I think it is reasonable to set it to the default value.

NewUnschedulablePodsMap

The unschedulableQ is built by calling newUnschedulablePodsMap, which initializes the pods of the UnschedulablePodsMap and registers the keyFunc in the pods map.

Func newUnschedulablePodsMap () * UnschedulablePodsMap {return & UnschedulablePodsMap {pods: make (map [string] * v1.Pod), keyFunc: util.GetPodFullName,}} func GetPodFullName (pod * v1.Pod) string {return pod.Name + "_" + pod.Namespace}

Note: the key generation rule implemented by keyFunc in unschedulableQ is pod.Name + "_" + pod.Namespace, which is different from keyFunc in activeQ (format "meta.GetNamespace () +" / "+ meta.GetName"). I also don't understand why it should be made into two different formats. It would be nice to unify the keyFunc in activeQ.

Add Object to Heap

Now that we understand the structure of PriorityQueue, we need to think about how to add objects to the priority Heap (activeQ).

Func (h * Heap) Add (obj interface {}) error {key, err: = h.data.keyFunc (obj) if err! = nil {return cache.KeyError {Obj: obj, Err: err}} if _, exists: = h.data.items [key] Exists {h.data.items [key] .obj = obj heap.Fix (h.data, h.data.items [key] .index)} else {heap.Push (h.data, & itemKeyValue {key, obj})} return nil} func Push (h Interface, x interface {}) {h.Push (x) up (h H.Len ()-1)} func up (h Interface, j int) {for {I: = (j-1) / 2 / / parent if i = = j | |! h.Less (j, I) {break} h.Swap (I) J) j = I} func (h * heapData) Less (I, j int) bool {if I > len (h.queue) | j > len (h.queue) {return false} itemi, ok: = h.items [h.queue] if! ok {return false} itemj Ok: = h.items [h.queue] if! ok {return false} return h.lessFunc (itemi.obj, itemj.obj)}

When you add a Pod to an activeQ, if the Pod already exists, update its index in the heap according to its PriorityClass Value, otherwise Push it into the heap.

Push is similar to Fix in that the Pod needs to be reordered in activeQ heap. When sorting, compared by Less Func, Less Func is ultimately the lessFunc in the activeQ registered before invoke, that is, HigherPriorityPod. In other words, Push and Fix correspond to index from the lowest to the lowest according to the priority of the Pod.

Pop Object from Heap

When you use PriorityQueue for scheduling Pod management, you will Pop a Pod from the activeQ. This Pod is the first Pod in the heap and the highest priority Pod.

Func (h * Heap) Pop () (interface {}, error) {obj: = heap.Pop (h.data) if obj! = nil {return obj, nil} return nil, fmt.Errorf ("object was removed from heap data")} func Pop (h Interface) interface {} {n: = h.Len ()-1 h.Swap (0, n) down (h, 0 N) return h.Pop ()} func down (h Interface, I, n int) {for {J1: = 2Secreti + 1 if J1 > = n | J1 < 0 {/ / j1 < 0 after int overflow break} j: = J1 / / left child if J2: = J1 + 1 J2 < n & &! h.Less (j1, j2) {j = j2 / / = 2roomi + 2 / / right child} if! h.Less (j, I) {break} h.Swap (I, j) I = j}}

When you Pop a Pod from an activeQ heap, it is also through Less Func comparison (that is, HigherPriorityPod) to find out the highest priority Pod.

Pod Queue Handler

After understanding how PriorityQueue and Pod go in and out of Heap, let's go back to Scheduler Config Factory and take a look at the operation of PriorityQueue in the registered EventHandler of podInformer, nodeInformer, serviceInformer, pvcInformer, and so on in Scheduler.

Func NewConfigFactory (...) Scheduler.Configurator {. / / scheduled pod cache podInformer.Informer () .AddEventHandler (cache.FilteringResourceEventHandler {FilterFunc: func (obj interface {}) bool {switch t: = obj. (type) {case * v1.Pod: Return assignedNonTerminatedPod (t) case cache.DeletedFinalStateUnknown: if pod Ok: = t.Obj. (* v1.Pod) Ok {return assignedNonTerminatedPod (pod)} runtime.HandleError (fmt.Errorf ("unable to convert object% T to * v1.Pod in% T", obj) C) return false default: runtime.HandleError (fmt.Errorf ("unable to handle object in% T:% T", c Obj)) return false}}, Handler: cache.ResourceEventHandlerFuncs {AddFunc: c.addPodToCache, UpdateFunc: c.updatePodInCache DeleteFunc: c.deletePodFromCache,},} ) / / unscheduled pod queue podInformer.Informer () .AddEventHandler (cache.FilteringResourceEventHandler {FilterFunc: func (obj interface {}) bool {switch t: = obj. (type) {case * v1.Pod: Return unassignedNonTerminatedPod (t) case cache.DeletedFinalStateUnknown: if pod Ok: = t.Obj. (* v1.Pod) Ok {return unassignedNonTerminatedPod (pod)} runtime.HandleError (fmt.Errorf ("unable to convert object% T to * v1.Pod in% T", obj) C) return false default: runtime.HandleError (fmt.Errorf ("unable to handle object in% T:% T", c Obj)) return false}}, Handler: cache.ResourceEventHandlerFuncs {AddFunc: c.addPodToSchedulingQueue, UpdateFunc: c.updatePodInSchedulingQueue DeleteFunc: c.deletePodFromSchedulingQueue,},},) / ScheduledPodLister is something we provide to plug-in functions that / / they may need to call. C.scheduledPodLister = assignedPodLister {podInformer.Lister ()} nodeInformer.Informer (). AddEventHandler (cache.ResourceEventHandlerFuncs {AddFunc: c.addNodeToCache, UpdateFunc: c.updateNodeInCache, DeleteFunc: c.deleteNodeFromCache,} ) c.nodeLister = nodeInformer.Lister (). / / This is for MaxPDVolumeCountPredicate: add/delete PVC will affect counts of PV when it is bound. PvcInformer.Informer () .AddEventHandler (cache.ResourceEventHandlerFuncs {AddFunc: c.onPvcAdd, UpdateFunc: c.onPvcUpdate, DeleteFunc: c.onPvcDelete,},) c.pVCLister = pvcInformer.Lister () / / This is for ServiceAffinity: affected by the selector of the service is updated. / / Also, if new service is added, equivalence cache will also become invalid since / / existing pods may be "captured" by this service and change this predicate result. ServiceInformer.Informer () .AddEventHandler (cache.ResourceEventHandlerFuncs {AddFunc: c.onServiceAdd, UpdateFunc: c.onServiceUpdate, DeleteFunc: c.onServiceDelete,},) c.serviceLister = serviceInformer.Lister ()} PodInformer EventHandler for Scheduled Pod

Filter out those that are already Scheduled and NonTerminated Pods through assignedNonTerminatedPod FilterFunc, and then register the Add/Update/Delete Event Handler of these Pods, here we only focus on the operation of PriorityQueue.

/ / assignedNonTerminatedPod selects pods that are assigned and non-terminal (scheduled and running) .func assignedNonTerminatedPod (pod * v1.Pod) bool {if len (pod.Spec.NodeName) = 0 {return false} if pod.Status.Phase = = v1.PodSucceeded | | pod.Status.Phase = = v1.PodFailed {return false} return true} addPodToCache Handler

Register Add assignedNonTerminatedPod Event Handler as addPodToCache.

Func (c * configFactory) addPodToCache (obj interface {}) {... C.podQueue.AssignedPodAdded (pod)} / / AssignedPodAdded is called when a bound pod is added. Creation of this pod// may make pending pods with matching affinity terms schedulable.func (p * PriorityQueue) AssignedPodAdded (pod * v1.Pod) {p.movePodsToActiveQueue (p.getUnschedulablePodsWithMatchingAffinityTerm (pod))} func (p * PriorityQueue) movePodsToActiveQueue (pods [] * v1.Pod) {p.lock.Lock () defer p.lock.Unlock () for _, pod: = range pods {if err: = p.activeQ.Add (pod) Err = = nil {p.unschedulableQ.delete (pod)} else {glog.Errorf ("Error adding pod% v to the scheduling queue:% v", pod.Name Err)} p.receivedMoveRequest = true p.cond.Broadcast ()} / / getUnschedulablePodsWithMatchingAffinityTerm returns unschedulable pods which have// any affinity term that matches "pod" .func (p * PriorityQueue) getUnschedulablePodsWithMatchingAffinityTerm (pod * v1.Pod) [] * v1.Pod {p.lock.RLock () defer p.lock.RUnlock () var podsToMove [] * v1.Pod for _ Up: = range p.unschedulableQ.pods {affinity: = up.Spec.Affinity if affinity! = nil & & affinity.PodAffinity! = nil {terms: = predicates.GetPodAffinityTerms (affinity.PodAffinity) for _, term: = range terms {namespaces: = priorityutil.GetNamespacesFromPodAffinityTerm (up & term) selector, err: = metav1.LabelSelectorAsSelector (term.LabelSelector) if err! = nil {glog.Errorf ("Error getting label selectors for pod:% v." Up.Name)} if priorityutil.PodMatchesTermsNamespaceAndSelector (pod, namespaces, selector) {podsToMove = append (podsToMove) Up) break}} return podsToMove}

In addition to adding pod to the schedulerCache, addPodToCache also calls podQueue.AssignedPodAdded.

As far as PriorityQueue is concerned, AssignedPodAdded is responsible for checking the Pod Affinity of the pods in the unSchedulableQ with the pod, moving those pods that meet the Pod Affinity from the unSchedulableQ to the activeQ and waiting for scheduler to schedule.

Notice here that receivedMoveRequest is set to true in movePodsToActiveQueue.

Func (p * PriorityQueue) AddUnschedulableIfNotPresent (pod * v1.Pod) error {p.lock.Lock () defer p.lock.Unlock () if p.unschedulableQ.get (pod)! = nil {return fmt.Errorf ("pod is already present in unschedulableQ")} if _, exists, _: = p.activeQ.Get (pod) Exists {return fmt.Errorf ("pod is already present in the activeQ")} if! p.receivedMoveRequest & & isPodUnschedulable (pod) {p.unschedulableQ.addOrUpdate (pod) p.addNominatedPodIfNeeded (pod) return nil} err: = p.activeQ.Add (pod) if err = nil { P.addNominatedPodIfNeeded (pod) p.cond.Broadcast ()} return err}

If the receivedMoveRequest is false and the Pod Condition Status is False or Unschedulable, the Pod Add/Update will be added to the unschedulableQ, otherwise it will be added to the activeQ.

Therefore, the wrong setting of receivedMoveRequest may cause the pod to be added to the unSchedulableQ but be added to the activeQ, which will cause the scheduler to do one more invalid schedule, which of course has little impact on performance.

But there should be a problem here. If the podsToMove array obtained by getUnschedulablePodsWithMatchingAffinityTerm is empty, no pods will actually be moved from unSchedulableQ to activeQ, then MoveRequest is invalid and receivedMoveRequest should still be false.

What are the problems caused by the incorrect receivedMoveRequest settings above? When an Error occurs in a pod schedule, AddUnschedulableIfNotPresent is called to add the pod to the unSchedulableQ or activeQ.

UpdatePodInCache

Register Update assignedNonTerminatedPod Event Handler as updatePodInCache.

Func (c * configFactory) updatePodInCache (oldObj, newObj interface {}) {... C.podQueue.AssignedPodUpdated (newPod)} / / AssignedPodUpdated is called when a bound pod is updated. Change of labels// may make pending pods with matching affinity terms schedulable.func (p * PriorityQueue) AssignedPodUpdated (pod * v1.Pod) {p.movePodsToActiveQueue (p.getUnschedulablePodsWithMatchingAffinityTerm (pod))}

The operation on podQueue in updatePodInCache is AssignedPodUpdated, and its implementation is the same as AssignedPodAdded.

DeletePodFromCache

Register Delete assignedNonTerminatedPod Event Handler as deletePodFromCache.

Func (c * configFactory) deletePodFromCache (obj interface {}) {... C.podQueue.MoveAllToActiveQueue ()} func (p * PriorityQueue) MoveAllToActiveQueue () {p.lock.Lock () defer p.lock.Unlock () for _, pod: = range p.unschedulableQ.pods {if err: = p.activeQ.Add (pod) Err! = nil {glog.Errorf ("Error adding pod% v to the scheduling queue:% v", pod.Name, err)}} p.unschedulableQ.clear () p.receivedMoveRequest = true p.cond.Broadcast ()}

When Delete assignedNonTerminatedPod Event occurs, podQueue.MoveAllToActiveQueue is called to move all the Pods in the unSchedulableQ to activeQ, and the unSchedulableQ is emptied.

If the pods is deleted frequently in the cluster, all the Pods in the unSchedulableQ will be moved to the activeQ frequently. If there is a High Priority Pod in unSchedulableQ, it will lead to frequent preemption of Lower Priority Pods scheduling opportunities, making Lower Priority Pod hungry for a long time. With regard to this issue, the community is already considering adding a corresponding back-off mechanism to mitigate the impact of this situation.

PodInformer EventHandler for UnScheduled Pod

Filter out the NonTerminated Pods that has not been successfully scheduled through unassignedNonTerminatedPod FilterFunc, and then register the Add/Update/Delete Event Handler of these Pods, here we only focus on the operation of PriorityQueue.

/ / unassignedNonTerminatedPod selects pods that are unassigned and non-terminal.func unassignedNonTerminatedPod (pod * v1.Pod) bool {if len (pod.Spec.NodeName)! = 0 {return false} if pod.Status.Phase = = v1.PodSucceeded | | pod.Status.Phase = = v1.PodFailed {return false} return true} addPodToSchedulingQueue

Register Add unassignedNonTerminatedPod Event Handler as addPodToSchedulingQueue.

Func (c * configFactory) addPodToSchedulingQueue (obj interface {}) {if err: = c.podQueue.Add (obj. (* v1.Pod)) Err! = nil {runtime.HandleError (fmt.Errorf ("unable to queue% T:% v", obj) Err)}} func (p * PriorityQueue) Add (pod * v1.Pod) error {p.lock.Lock () defer p.lock.Unlock () err: = p.activeQ.Add (pod) if err! = nil {glog.Errorf ("Error adding pod% v to the scheduling queue:% v", pod.Name Err)} else {if p.unschedulableQ.get (pod)! = nil {glog.Errorf ("Error: pod v is already in the unschedulable queue." Pod.Name) p.deleteNominatedPodIfExists (pod) p.unschedulableQ.delete (pod)} p.addNominatedPodIfNeeded (pod) p.cond.Broadcast ()} return err}

When a unassigned Pods Add is found, addPodToSchedulingQueue is responsible for adding the pods to the activeQ and ensuring that the unassigned pods is not available in the unSchedulableQ.

UpdatePodInSchedulingQueue

Register Update unassignedNonTerminatedPod Event Handler as updatePodInSchedulingQueue.

Func (c * configFactory) updatePodInSchedulingQueue (oldObj, newObj interface {}) {pod: = newObj. (* v1.Pod) if c.skipPodUpdate (pod) {return} if err: = c.podQueue.Update (oldObj. (* v1.Pod), pod); err! = nil {runtime.HandleError (fmt.Errorf ("unable to update% T:% v", newObj, err)}}

In updatePodInSchedulingQueue, call skipPodUpdate first to check whether the pod update event can be ignored.

If the pod update cannot be ignored, invoke podQueue.Update updates the activeQ, if the pod is not in the activeQ, delete the pod from the unSchedulableQ, and then put the new pod Push into the activeQ.

Func (c * configFactory) skipPodUpdate (pod * v1.Pod) bool {/ / Non-assumed pods should never be skipped. IsAssumed, err: = c.schedulerCache.IsAssumedPod (pod) if err! = nil {runtime.HandleError (fmt.Errorf ("failed to check whether pod% s is assumed:% v", pod.Namespace, pod.Name, err)) return false} if! isAssumed {return false} / / Gets the assumed pod from the cache. AssumedPod, err: = c.schedulerCache.GetPod (pod) if err! = nil {runtime.HandleError (fmt.Errorf ("failed to get assumed pod% s from cache:% v", pod.Namespace, pod.Name, err)) return false} / / Compares the assumed pod in the cache with the pod update. If they are / / equal (with certain fields excluded), this pod update will be skipped. F: = func (pod * v1.Pod) * v1.Pod {p: = pod.DeepCopy () / ResourceVersion must be excluded because each object update will / / have a new resource version. P.ResourceVersion = "" / Spec.NodeName must be excluded because the pod assumed in the cache / / is expected to have a node assigned while the pod update may nor may / / not have this field set. P.Spec.NodeName = "" / / Annotations must be excluded for the reasons described in / / https://github.com/kubernetes/kubernetes/issues/52914. P.Annotations = nil return p} assumedPodCopy, podCopy: = f (assumedPod), f (pod) if! reflect.DeepEqual (assumedPodCopy, podCopy) {return false} glog.V (3). Infof ("Skipping pod% s update", pod.Namespace, pod.Name) return true}

When skipPodUpdate detects that the following situations occur at the same time, it returns true, indicating that the pod update event is ignored.

The pod has been Assumed: check whether the pod is included in the assumePods in the scheduler cache, and if so, it is already Assumed (when pod completes the Predicate and Priority of scheduler, it is immediately set to Assumed, and then the Bind interface of apiserver is called).

The pod update updates only one or all of its ResourceVersion, Spec.NodeName, and Annotations.

Func (p * PriorityQueue) Update (oldPod, newPod * v1.Pod) error {p.lock.Lock () defer p.lock.Unlock () / / If the pod is already in the active queue, just update it there. If _, exists, _: = p.activeQ.Get (newPod); exists {p.updateNominatedPod (oldPod, newPod) err: = p.activeQ.Update (newPod) return err} / / If the pod is in the unschedulable queue, updating it may make it schedulable. If usPod: = p.unschedulableQ.get (newPod) UsPod! = nil {p.updateNominatedPod (oldPod, newPod) if isPodUpdated (oldPod NewPod) {p.unschedulableQ.delete (usPod) err: = p.activeQ.Add (newPod) if err = = nil {p.cond.Broadcast ()} return err } p.unschedulableQ.addOrUpdate (newPod) return nil} / / If pod is not in any of the two queue We put it in the active queue. Err: = p.activeQ.Add (newPod) if err = = nil {p.addNominatedPodIfNeeded (newPod) p.cond.Broadcast ()} return err}

When skipPodUpdate is true, then call PriorityQueue.Update:

If the pod is already in activeQ, update it.

If the pod is in unSchedulableQ, check that the Pod is a valid update (ignore ResourceVersion, Generation, PodStatus).

If it is a valid update, the update is deleted from the unSchedulableQ and the updated pod is added to the activeQ for scheduling.

If it is an invalid update, update the pod information in unSchedulableQ.

If the pod is not in both activeQ and unSchedulableQ, the pod is added to the activeQ.

DeletePodFromSchedulingQueue

Register Delete unassignedNonTerminatedPod Event Handler as deletePodFromSchedulingQueue.

Func (c * configFactory) deletePodFromSchedulingQueue (obj interface {}) {... If err: = c.podQueue.Delete (pod) Err! = nil {runtime.HandleError (fmt.Errorf ("unable to dequeue% T:% v", obj) Err)}} func (p * PriorityQueue) Delete (pod * v1.Pod) error {p.lock.Lock () defer p.lock.Unlock () p.deleteNominatedPodIfExists (pod) err: = p.activeQ.Delete (pod) if err! = nil {/ / The item was probably not found in the activeQ. P.unschedulableQ.delete (pod)} return nil}

The processing of podQueue in deletePodFromSchedulingQueue is to call its Delete interface to remove the pod from activeQ or unSchedulableQ.

Node Informer

NodeInformer registers Node's Add/Update/Delete Event Handler, so here we only focus on the operation of these Handler to PriorityQueue.

AddNodeToCache and updateNodeInCache

Register Add Node Event Handler as addNodeToCache.

Register Update Node Event Handler as updateNodeInCache.

Register Delete Node Event Handler as deleteNodeFromCache.

Func (c * configFactory) addNodeToCache (obj interface {}) {... C.podQueue.MoveAllToActiveQueue ()} func (c * configFactory) updateNodeInCache (oldObj, newObj interface {}) {... C.podQueue.MoveAllToActiveQueue ()}

The operation of addNodeToCache and updateNodeInCache is the same for PriorityQueue. Call PriorityQueue.MoveAllToActiveQueue to move the Pods in all unSchedulableQ to activeQ, which means that when Node is added or updated in the cluster, all unsuccessfully scheduled pods will be reordered in activeQ to wait for scheduling.

The operation of PodQueue is not involved in deleteNodeFromCache.

As mentioned in PodInformer EventHandler for Scheduled Pod, frequent additions or updates of Node in the cluster will result in all Pods in the unSchedulableQ being moved to activeQ frequently. If there is a High Priority Pod in unSchedulableQ, it will lead to frequent preemption of Lower Priority Pods scheduling opportunities, making Lower Priority Pod hungry for a long time.

ServiceInformer

ServiceInformer registers Service's Add/Update/Delete Event Handler, so here we only focus on the operation of these Handler to PriorityQueue.

Register Add Service Event Handler as onServiceAdd.

Register Update Service Event Handler as onServiceUpdate.

Register Delete Service Event Handler as onServiceDelete.

Func (c * configFactory) onServiceAdd (obj interface {}) {... C.podQueue.MoveAllToActiveQueue ()} func (c * configFactory) onServiceUpdate (oldObj interface {}, newObj interface {}) {... C.podQueue.MoveAllToActiveQueue ()} func (c * configFactory) onServiceDelete (obj interface {}) {... C.podQueue.MoveAllToActiveQueue ()}

The operation of Service's Add/Update/Delete Event Handler is the same for podQueue. Call PriorityQueue.MoveAllToActiveQueue to move all Pods in unSchedulableQ to activeQ, which means that when Service is added, updated or deleted in the cluster, all unsuccessfully scheduled pods will be reordered in activeQ to wait for scheduling.

As mentioned in PodInformer EventHandler for Scheduled Pod, frequent Add/Update/Delete Service actions in the cluster will cause all Pods in the unSchedulableQ to be moved to the activeQ frequently. If there is a High Priority Pod in unSchedulableQ, it will lead to frequent preemption of Lower Priority Pods scheduling opportunities, making Lower Priority Pod hungry for a long time.

Pvc Informer

PvcInformer registers pvc's Add/Update/Delete Event Handler, so here we only focus on the operation of these Handler to PriorityQueue.

Register Add PVC Event Handler as onPvcAdd.

Register Update PVC Event Handler as onPvcUpdate.

Register Delete PVC Event Handler as onPvcDelete.

Func (c * configFactory) onPvcAdd (obj interface {}) {... C.podQueue.MoveAllToActiveQueue ()} func (c * configFactory) onPvcUpdate (old, new interface {}) {... C.podQueue.MoveAllToActiveQueue ()}

The operation of sheduler is the same for both Add and Update Event of PVC. Call PriorityQueue.MoveAllToActiveQueue to move all Pods in unSchedulableQ to activeQ, which means that when PVC is added or updated in the cluster, all unsuccessfully scheduled pods will be reordered in activeQ to wait for scheduling.

Delete PVC does not involve the operation of PodQueue.

PV's Add/Update/Delete also does not involve the operation of PodQueue.

As mentioned in PodInformer EventHandler for Scheduled Pod, frequent Add/Update PVC actions in the cluster will cause all Pods in the unSchedulableQ to be moved to the activeQ frequently. If there is a High Priority Pod in unSchedulableQ, it will lead to frequent preemption of Lower Priority Pods scheduling opportunities, making Lower Priority Pod hungry for a long time.

Thank you for reading, the above is the content of "what is the priority queue of Kubernetes Scheduler". After the study of this article, I believe you have a deeper understanding of what the priority queue of Kubernetes Scheduler is, and the specific usage needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report