How to understand kubernetes scheduler Architecture Design 07/19 Update SLTechnology News&Howtos

How to understand kubernetes scheduler Architecture Design

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to understand kubernetes scheduler architecture design. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

The basis of resource scheduling

Scheudler is the core component of kubernetes, which is responsible for selecting the appropriate node for the pod resources declared by the user, while ensuring the maximum utilization of cluster resources. Here are some basic concepts in the design of resource scheduling system.

Basic task resource scheduling

Basic task resource scheduling usually consists of three parts:

The role type function nodenode is responsible for the execution of specific tasks. At the same time, to report the resources owned by the package, resource manager summarizes all the resources provided by node in the current cluster, which can be obtained by the upper scheduler. At the same time, update the current cluster resources according to the task information reported by node. Scheduler, combined with the resources of the current cluster and the task information submitted by users, select the current resources of the node node and assign node tasks. Ensure the operation of the task as much as possible

A general scheduling framework often includes an upper cluster manager, which is responsible for the management and resource allocation of scheduler in the cluster, as well as the preservation of scheduler cluster state and even resource manager.

The Challenge of Resource scheduling Design: maximization and averaging of Cluster Resource Utilization

Traditional IDC cluster resource utilization: in the IDC environment, we usually want the machine utilization to be average, keep the machine at a certain average utilization rate, and then reserve enough buffer according to the resource needs to cope with the peak resource utilization of the cluster. After all, the purchase usually has a cycle, and we can neither leave the machine empty nor run full (the business is inelastic).

Resource utilization in the cloud environment: in the cloud environment, we can allocate resources on demand, and cloud vendors usually support seconds of delivery. in fact, the following resource utilization can also be seen that it is only the inconsistency of the environment. it may lead to different scheduling results, all for the goal of maximizing the utilization of cluster resources, there will be a lot of differences.

Scheduling: minimum waiting time and priority of tasks

When the cluster task is busy, it may cause the cluster resource department to be sufficient to assign to all the tasks in the current cluster. While all the tasks can be completed as soon as possible, we also need to ensure that the high priority tasks are completed first.

Scheduling: task locality

Locality refers to a mechanism commonly used in big data's processing, and its core is to assign tasks to nodes that contain its task execution resources as much as possible to avoid data replication.

Clustering: high availability

During the scheduling process, tasks may be unavailable due to hardware, system or software, and some highly available mechanisms are usually needed to ensure that the current cluster will not make the whole system unavailable due to the downtime of some nodes.

Systems: scalability

The extension mechanism mainly refers to how the system responds to the changes in business requirements and provides an extensible mechanism. When the default scheduling strategy of the cluster does not meet the business requirements, the system can be expanded to meet the business requirements by extending the interface.

Challenges of Pod scheduling scenarios

Pod scheduling scenario can actually be regarded as a special kind of task. In addition to the above resource scheduling challenges, there are also some specific scenarios for pod scheduling (some of which are common, which can be described more clearly through pod).

Affinity and anti-affinity

The affinity in kubernetes mainly reflects two kinds of resources: pod and node, mainly in two aspects: 1. Affinity: 1) affinity between pod 2) affinity between pod and node 2. Anti-affinity: 1) Anti-affinity between pod 2) Anti-affinity between pod and node simple example: anti-affinity between 1.pod: in order to ensure high availability, we usually scatter multiple nodes of the same business in impassable data centers and racks 2.pod and node affinity: for example, for some pod that requires disk io operation, we can schedule to machines with ssd to improve IO performance.

Multi-tenancy and capacity Planning

Multi-tenancy is usually for the isolation of cluster resources. in business systems, resources are usually isolated according to the line of business, and the corresponding capacity is set for the business. so as to prevent the overuse of a single line of business resources from affecting all businesses of the whole company.

Zone and node selection

Zone is usually a common concept in business disaster recovery. By distributing services in multiple data centers, it avoids business being completely unavailable due to a single data center failure.

Because of the previous problem of affinity, how to select a suitable node among all the zone is a big challenge.

The expansion of diversified resources

In addition to cpu, memory also includes network, disk io, gpu and so on. For the allocation and scheduling of other resources, kubernetes also needs to provide additional extension mechanism to support scheduling extension.

Resource mixed part

In the early days, kubernetes was born for pod scheduling scenarios, mainly online web services. Most of these tasks are stateless, so how to support offline batch computing and other tasks for offline scenarios

Scheduling in kubernetes centralized data Storage

Centralized data storage

Kubernetes is a data-centric storage system. All the data in the cluster is stored in etcd through apiserver, including the resource information of the node node, the pod information on the node, and all the pod information of the current cluster. Here, apiserver also acts as a resource manager, storing all cluster resources and resources that have been allocated.

Storage and acquisition of scheduling data

Kubernetes uses a list watch mechanism for other nodes in the cluster to perceive data from apiserver. Scheduler also uses this mechanism to provide scheduling use by perceiving changes in apiserver data and building a cache data (resource data) in the local memory.

High availability of scheduler

The high availability mechanisms of most systems are implemented through AP systems such as zookeeper and etcd. The competition of multiple nodes is realized through temporary nodes or locking mechanism, so that when the primary node is down, it can be taken over quickly. Scheduler is naturally the same mechanism, through the underlying etcd of apiserver to achieve lock competition, and then through the data of apiserver, you can ensure the high availability of the scheduler.

A scheduling queue is formed inside the scheduler.

When the pod to be scheduled is perceived from the apiserver, the scheduler will add it to an internal priority queue according to the priority of the pod. In the subsequent scheduling, it will first obtain the pod with higher priority to meet the scheduling priority.

There is another point here, that is, if the lower priority pod is dispatched first, it will also be expelled in the subsequent process of preemption.

Scheduling and preemptive scheduling

As mentioned earlier, kubernetes will attempt to schedule all pod by default. When the cluster resource department is satisfied, it will try to schedule preemptively. Through preemption scheduling, priority scheduling will be carried out for high-priority pod. Its core is implemented through the scheduling algorithm, namely ScheduleAlgorithm.

The scheduling algorithm here is actually a collection of scheduling algorithms and scheduling configurations.

External extension mechanism

Scheduler extender is an extension mechanism of K8s to the scheduler. We can define the corresponding extender. When scheduling the corresponding resources, K8s will check the corresponding resources. If we find that we need to call an external extender, we will send the current scheduling data to extender, and then summarize the scheduling data to determine the final scheduling result.

Internal extension mechanism

It is mentioned above that the scheduling algorithm is a collection of scheduling algorithms and scheduling configurations, and kubernetes scheduler framework is a framework that declares the interface of the corresponding plug-in, thus enabling users to write their own plugin to influence scheduling decisions. Personally, this is not a good mechanism, because it is necessary to modify the code or modify the kubernetes scheduler startup to load custom plug-ins.

Scheduling infrastructure

Combined with the above, we get the simplest architecture, and the main scheduling process is divided into the following parts: 0. Through apiserver to carry on the master node election, the winner carries on the scheduling business process processing 1. Update local schedulerCache 2. 0 through apiserver perception of cluster resource data and pod data. The pod scheduling request of user or controller is perceived by apiserver, and join the local scheduling queue 3. Through the scheduling algorithm to schedule pod requests and assign appropriate node nodes, preemptive scheduling may occur in this process. The scheduling result is returned to apiserver, and then the subsequent pod requests are processed by the kubelet component

This is the simplest scheduling process and basic component module, so there is no source code. The following series will analyze in detail each key scheduling data structure and the specific implementation of some interesting scheduling algorithms.

The above is the editor for you to share how to understand the kubernetes scheduler architecture design, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.