How to configure K8s dynamic Scheduler 07/03 Update SLTechnology News&Howtos

How to configure K8s dynamic Scheduler

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to configure K8s dynamic scheduler". The explanation content in this article is simple and clear, easy to learn and understand. Please follow the idea of Xiaobian and go deep into it slowly to study and learn "how to configure K8s dynamic scheduler" together.

introduction

During the operation of K8s cluster, it is often troubled by the high utilization rate of node CPU and memory, which not only affects the stable operation of pods on nodes, but also increases the probability of node failure. In order to cope with the problem of high load on cluster nodes and balance resource utilization among nodes, we should start with the following two strategies based on the actual resource utilization monitoring information of nodes:

In the Pod scheduling phase, priority should be given to scheduling Pods to run on nodes with low resource utilization, rather than on nodes with high resource utilization.

When monitoring node resource rate is high, you can automatically intervene and migrate some pods on the node to the node with low utilization rate.

For this reason, we provide a solution of Dynamic Scheduler + Descheduler to implement it. Currently, the installation entry of these two plug-ins has been provided under the category of Component Management-Scheduling in the public cloud TKE cluster. At the end of this article, we also provide examples of best practices for specific customer cases.

dynamic scheduler

The native Kubernetes scheduler has some good scheduling strategies to deal with the problem of uneven node resource allocation, such as BalancedResourceAllocation, but there is a problem that such resource allocation is static and does not represent the real use of resources. The CPU/memory utilization of nodes is often unbalanced. Therefore, there needs to be a policy that can schedule nodes based on their actual resource utilization. This is what dynamic schedulers do.

technical principle

The native K8s scheduler provides a scheduler extender mechanism to provide the ability to schedule extensions. Using scheduler extenders is less intrusive and more flexible than modifying native scheduler code addition policies or implementing a custom scheduler. So we choose scheduler extender based approach to add scheduling policy based on actual resource utilization of nodes.

Scheduler extenders can add custom logic during the preselection and optimization phases of the native scheduler to provide the same effect as native scheduler internal policies.

architecture

node-annotator: responsible for pulling monitoring data from Prometheus, periodically synchronizing it to Node annotations, and responsible for other logic, such as dynamic scheduler scheduling effectiveness measurement indicators, scheduling hot spots prevention and other logic.

Dynamic-scheduler: responsible for the optimization and pre-selection interface logic implementation of scheduler extender, filtering out nodes with resource utilization higher than threshold in pre-selection stage, and preferentially selecting nodes with low resource utilization for scheduling in optimization stage.

implementation details

How are the weights of the policies of the dynamic scheduler configured in the optimization phase?

The scheduling policy of native scheduler has a weight configuration in the optimization phase, and the score of each policy is multiplied by the weight to obtain the total score of the policy. The higher the weight of the policy, the easier it is to schedule the eligible nodes. By default, all policies are configured with a weight of 1. To improve the effectiveness of the dynamic scheduler policies, we set the weight of the optimal dynamic scheduler policy to 2.

How does dynamic scheduler prevent scheduling hotspots?

In the cluster, if a new node appears, in order to prevent the new node from scheduling too many nodes, we will obtain the scheduling result by monitoring the scheduling success event of the scheduler, mark the number of scheduling Pods of each node in the past period of time, such as the number of scheduling Pods in 1min, 5min, and 30min, measure the hot spot value of the node, and then compensate it into the node's preference score.

Product Capability Component Dependency

Component dependencies are low, relying only on the underlying node monitoring components node-exporter and Prometheus. Prometheus supports hosting and self-built. With hosting, you can install dynamic scheduler with one click, while with self-built Prometheus, you can also configure monitoring indicators.

assembly is configured

Scheduling policies can currently be based on both CPU and memory resource utilization.

pre-selection stage

Configure nodes with thresholds for CPU utilization in 5 minutes, maximum CPU utilization in 1 hour, average memory utilization in 5 minutes, and maximum memory utilization in 1 hour. If these thresholds are exceeded, nodes will be filtered in the preselection stage.

optimization phase

The score of the optimization stage of the dynamic scheduler is obtained according to the comprehensive score of the six indicators in the screenshot. The weight of each of the six indicators indicates which indicator is more emphasized when optimization is performed. The significance of using the maximum utilization ratio in 1h and 1d is to record the peak utilization ratio of nodes in 1h and 1d, because the peak period of some service Pods may be according to hours or days, so as to avoid further increasing the load of nodes at peak time when scheduling new Pods.

product effect

In order to measure the promotion effect of the dynamic scheduler on enhancing Pod scheduling to low-load nodes, the CPU/memory utilization of all scheduled nodes at the scheduling time is obtained and the following indicators are counted according to the actual scheduling results of the scheduler:

cpu_utilization_total_avg: Average CPU utilization of all scheduled nodes.

memory_utilization_total_avg: Average memory utilization of all scheduled nodes.

effective_dynamic_schedule_count: effective scheduling times, when the CPU utilization of the scheduled node is less than the median of the CPU utilization of all current nodes, we consider this to be an effective scheduling, effective_dynamic_schedule_count plus 0.5 points, the same is true for memory.

total_schedule_count: All schedules, accumulating 1 for each new schedule.

effective_schedule_ratio: effective schedule ratio, i.e. effective_dynamic_schedule_count/total_schedule_count The following is the index change of not enabling dynamic scheduling and enabling dynamic scheduling in the same cluster for one week respectively, and you can see the enhanced effect on cluster scheduling.

Indicator not enabled Dynamic scheduling enabled Dynamic scheduling cpu_utilization_total_avg0.300.17memory_utilization_total_avg0.280.23effective_dynamic_schedule_count21603620total_schedule_count78607470effective_schedule_ratio0.2730.486Descheduler

The existing cluster scheduling scenarios are all one-time scheduling, that is, one-hammer sales. If the CPU and memory utilization of the node is too high, the distribution of Pods cannot be adjusted automatically unless the eviction manager of the node is triggered and evicted, or manual intervention. This affects the stability of all pods on the node when the CPU/memory utilization of the node is high, and the node resources with low load are wasted.

In view of this scenario, drawing lessons from the design idea of Descheduler in K8s community, an eviction strategy based on the actual CPU/memory utilization of each node is given.

architecture

Deschedulator obtains Node and Pod information from apiserver, obtains Node and Pod monitoring information from Prometheus, and then evicts Pods on nodes with high CPU/memory usage through Deschedulator's eviction policy. At the same time, we strengthen Deschedulator's sorting rules and inspection rules when evicting Pods to ensure that the service will not fail when evicting Pods. The expelled Pods will be dispatched to the nodes with low water level through the dynamic scheduler, so as to reduce the failure rate of the nodes with high water level and improve the overall resource utilization rate.

Product Capability Product Dependency

Depends on the underlying node monitoring components node-exporter and Prometheus. Prometheus supports hosting and self-built. Descheduler can be installed with one click in hosting mode, and monitoring index configuration method is also provided in self-built Prometheus.

assembly is configured

Descheduler starts evicting Pods when the threshold level is exceeded according to the user-configured utilization threshold, so that the node load is reduced to below the target utilization level as much as possible.

Product effects via K8s events

You can see the message that a Pod has been rescheduled through the K8s event, so you can turn on the cluster event persistence feature to view the Pod expulsion history.

node load variation

in a node CPU utilization monitor view like that below, you can see that CPU utilization of a node drops aft eviction begins.

Best Practice Cluster Status

Take a customer cluster as an example. Since most customer businesses are memory-consuming, nodes with high memory utilization are more likely to appear, and the memory utilization of each node is also uneven. Before the dynamic scheduler is used, each node is monitored as follows:

Dynamic scheduler configuration

The parameters for configuring the preselection and optimization phases are as follows:

In the preselection phase, nodes with an average memory utilization rate of more than 60% within 5 minutes or a maximum memory utilization rate of more than 70% within 1 hour are filtered out, i.e. Pods will not be dispatched to these nodes.

In the optimization phase, the average memory utilization weight of 5 minutes is configured as 0.8, the maximum memory utilization weight in 1h and 1d is configured as 0.2, 0.2, and the index weight of CPU is configured as 0.1. In this case, priority is given to scheduling to nodes with low memory utilization.

Descheduler configuration

Deschedler is configured as follows. When the node memory utilization exceeds the threshold of 80%, Deschedler starts evicting pods on the node and tries to reduce the node memory utilization to the target value of 60%.

Cluster optimized state

With the above configuration, after running for a period of time, the memory utilization data of each node in the cluster is as follows. It can be seen that the memory utilization distribution of the cluster nodes has tended to be balanced:

Thank you for reading, the above is "K8s dynamic scheduler how to configure" the content, after the study of this article, I believe we have a deeper understanding of K8s dynamic scheduler how to configure this problem, the specific use of the situation also needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.