How to use Kubernetes scheduling algorithm 07/12 Update SLTechnology News&Howtos

How to use Kubernetes scheduling algorithm

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

How the Kubernetes scheduling algorithm is used, in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

Scheduling process

The scheduler is an independent process, which is responsible for constantly pulling the unscheduled pod and the schedulable node list from the apiserver, filtering through some column algorithms, selecting a node and binding it to the pod, and writing the binding result back to apiserver

Scheduling algorithm

The following explains the source code based on k8s v1.6.6

The algorithm needs to go through two stages, namely filtering and scoring, first filter out a part to ensure that the remaining nodes are schedulable, and then select the highest score node in the scoring stage, which is the output node of scheduler.

Filter

The filter link is a filter chain, which contains multiple filters, each equivalent to a function, receives node and pod to be scheduled as parameters, and returns bool to determine whether it is schedulable or not. An extensible filter chain can be completed by combining multiple functions. At present, the filter functions registered in k8s are as follows:

Whether the algorithm name details by default that NoVolumeZoneConflict can be scheduled when the zone-label (region) on the host contains zone label under the PersistentVolume volume in pod. When the host does not have zone-label, it means there is no zone limit, and the schedulable MaxEBSVolumeCount is not scheduled to the host when the AWS EBSVolume mounted on the host exceeds the default limit of 39. When the mounted GCD Persistent Disk on the host exceeds the default limit of 16, the MaxAzureDiskVolumeCount is not scheduled to the machine. When the mounted AzureDiskVolume on the host exceeds the default limit of 16, the NoDiskConflict is not scheduled to the machine.

When there is a conflict between all the volumes used by pod on the host and the volumes used by the scheduled pod, it is not scheduled to that host. This check only applies to GCE, Amazon EBS, Ceph RBD and ISCSI. The specific rules are:

GCE PersistentDisk allows the same volume to be mounted read-only multiple times

EBS forbids two pod to mount volume of the same id

Ceph RBD forbids two pod to share one monitor, pool, image

ISCSI forbids two pod to share the same IQN

MatchInterPodAffinity is an affinity check. Set the pod with scheduling to X. When all running pod and X on the host are not mutually exclusive, then the schedulable PodToleratesNodeTaints can be scheduled only when the pod can tolerate (tolerate) all the taint (stains) of the host (the way to tolerate the taint tag is to tag yourself with the corresponding tolerations tag) CheckNodeMemoryPressure is when the host's remaining memory is tight. The pod of BestEffort type cannot be scheduled to the host CheckNodeDiskPressure when the host's remaining disk space is tight, the host PodFitsHostPorts cannot be dispatched when the HostPort used by all containers in the pod to be scheduled conflicts with the ports used on the work node, the host PodFitsPorts is replaced by PodFitsHostPorts and PodFitsResources is when the total resources-the total request of all pod resources in the host < the amount of pod request resources with scheduling, the host is not dispatched to the host Now we will check that the CPU,MEM,GPU resource HostName is schedulable if the pod to be scheduled specifies pod.Spec.Host, then the MatchNodeSelector is dispatched to the host if the host label matches the nodeSelector and annotations scheduler.alpha.kubernetes.io/affinity in pod.

Scoring

The scoring link is also a link, which contains multiple scoring functions. Each scoring function receives node and the pod to be scheduled as parameters, returning a score in the range of 0-10, and each scoring function also has a weight value. The total score calculated by a node is the sum of the scores * weights of all scoring functions, and the node with the largest total score is obtained (if there is more than one, take one at random), and the node is the node to be scheduled eventually.

Example: suppose you have a node nodeA with two scoring functions priorityFunc1 and priorityFunc2 (each method can return a score), and both methods have weight factors weight1 and weight2, respectively. Then the total score of nodeA is: finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2)

At present, the scoring functions registered in k8s are as follows:

Whether the algorithm name defaults to weight details indicates that the higher the score is, the more dispersed the pods of the same service/rc is 1, the higher the score is, the higher the ServiceSpreadingPriority of the same service/rc is, the higher the pods of the same service is, the higher the excellent score is, and is replaced by SelectorSpreadPriority, which is retained in the system. The higher the score is, the higher the affinity between 1pod and other pod running on node is, the higher the score is, the more the remaining resources are, and the higher the score is. Cpu ((capacity-sum (requested)) * 10 / capacity) + memory ((capacity-sum (requested)) * 10 / capacity) / 2BalancedResourceAllocation is 1cpu and the closer the memory utilization is, the higher the score. 10-abs (cpuFraction-memoryFraction) * 10NodePreferAvoidPodsPriority is 10000 when the annotation scheduler.alpha.kubernetes.io/preferAvoidPods of node is set, indicating that the node does not want to be scheduled, has a low score, and has a high score when it is not set. The reason for the high weight is that once the preferAvoidPods is set to indicate that the node does not want to be scheduled, the score is 0, and the other unset node scores are 10000 *, which is equivalent to directly filtering out the node. Thinking: in fact, you can put it in the filtering process to deal with NodeAffinityPriority. The higher the affinity match between 1pod and node, the higher the score. The higher the score, the higher the TaintTolerationPriority is the stain (taint) of node. The higher the tolerate, the higher the score. EqualPriority No 1 all machines score the same as ImageLocalityPriority No 1 the pod to be scheduled will use some mirrors. The more nodes with these mirrors, the higher the score, the higher the MostRequestedPriority No 1request resources, the higher the score, contrary to LeastRequestedPriority. (cpu (10 * sum (requested) / capacity) + memory (10 * sum (requested) / capacity)) / 2

This is the answer to the question about how the Kubernetes scheduling algorithm is used. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.