How to practice Scheduling Framework Application 07/01 Update SLTechnology News&Howtos

How to practice Scheduling Framework Application

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, the editor will show you how to practice the Scheduling Framework application. The knowledge points in the article are introduced in great detail. Friends who feel helpful can browse the content of the article with the editor, hoping to help more friends who want to solve this problem to find the answer to the problem. Follow the editor to learn more about "how to practice Scheduling Framework application".

Kubernetes is the most popular mobile container management platform at present, which provides flexible declarative container orchestration, automatic deployment, resource scheduling and other functions. As one of the core components of Kubernetes, Kube-Scheduler is mainly responsible for the scheduling function of the whole cluster resources, and dispatches the Pod to the optimal working node according to the specific scheduling algorithm and strategy, so as to make more reasonable and full use of cluster resources. However, with the increasing number of task types deployed by Kubernetes, native Kube-Scheduler can no longer cope with a variety of scheduling requirements, such as machine learning, deep learning training tasks for collaborative scheduling, high-performance computing jobs, genetic computing workflow for some dynamic resources GPU, network, storage volume dynamic resource binding requirements and so on. As a result, the need for a custom Kubernetes scheduler is becoming more and more urgent. Let's discuss various ways to extend the Kubernetes scheduler, and then demonstrate how to extend Scheduler using Scheduling Framework, the best way to extend it.

01 how to customize the scheduler

02 Scheduling Framework parsing

As shown in the following figure, the scheduling framework provides rich extension points. In this figure, Filter is equivalent to the previous Predicate pre-selection module, Score is equivalent to the Priority preferred module, and each extension point module provides an interface. We can implement the interface defined by the extension point to implement our own scheduling logic and register the implemented plug-ins with the extension point.

When Scheduling Framework executes the scheduling process, when it runs to the extension point, it will call our registered plug-in to meet the scheduling requirements by executing the policy of the custom plug-in. In addition, a plug-in can be registered at multiple extension points to perform more complex or stateful tasks.

Each time Scheduling Framework schedules a Pod, it is divided into two parts: the scheduling cycle and the binding cycle. The scheduling cycle selects a node for the Pod, and the binding cycle calls Apiserver, and updates the spec.nodeName field of the Pod with the selected Node. The scheduling cycle and binding cycle are collectively referred to as "Scheduling Context" (scheduling context). The scheduling cycle runs serially, and only one Pod can be scheduled at a time, which is thread-safe, while the binding cycle takes a long time because it needs to access the interface of Apiserver. In order to improve the efficiency of scheduling, it needs to be executed asynchronously, that is, multiple bind operations can be performed at the same time, which is not thread-safe.

If the Pod is determined to be unschedulable or has an internal error, the scheduling cycle or binding cycle will be aborted. Pod will return to the queue and wait for the next retry. If a binding cycle is terminated, it triggers the UnReserve method in the Reserve plug-in.

Extension point QueueSort of Scheduling Cycle

Used to sort scheduling queues. By default, all Pod are placed in one queue. This extension is used to sort the queue to be scheduled in Pod to determine which Pod,QueueSort extension to schedule first. Essentially, you only need to implement a method, Less (Pod1, Pod2), to compare which of the two Pod gets scheduling first, and only one QueueSort plug-in can take effect at a point in time.

PreFilter

It is used to preprocess the information of the Pod, or to check some prerequisites that the cluster or Pod must meet, such as whether the Pod contains the specified annotations or labels. If the PreFilter returns error, the scheduling process is terminated.

Filter

Used to exclude those nodes that cannot run the Pod, for each node, the scheduler will perform Filter extensions sequentially, and if any Filter marks the node as unselectable, the remaining Filter extensions will not be performed. If you are not satisfied with the preselection rules provided by the default scheduler, you can disable the preselection algorithm of the default scheduler in the configuration and perform only your own custom filtering logic at this extension point. The Node node executes the Filter policy concurrently, so the filter is called multiple times in the same scheduling cycle.

PostFilter

The plug-in that implements this extension is called after the Filter phase, only if no viable node is found for Pod. If any PostFilter plug-in marks the node as schedulable, the subsequent PostFilter plug-in will not be called. A typical PostFilter implementation is preemption, which attempts to make Pod schedulable by preempting other Pod.

PreScore

Called after pre-selection, usually used to generate some information or record log and monitoring information before Score

Score

The plug-in that implements this extension scores all nodes that have passed the filtering phase, and the scheduler invokes the Score extension for each node.

NormalizeScore

In the NormalizeScore phase, the scheduler will combine the score result of each Score extension to a specific node and the weight of the extension as the final score result, which is an integer in a range. If Score or NormalizeScore returns an error, the scheduling cycle is aborted.

Reserve

This extension point is reserved for the resources on the node to be run by Pod to prevent the scheduler from scheduling new Pod to the node while waiting for the Pod to bind to the node, when the actual used resources exceed the available resources. (because binding Pod to a node occurs asynchronously). This is the last step in the scheduling process. After Pod enters the Reserved state, either the Unreserve extension is triggered when the binding fails, or the PostBind extension ends the binding process when the binding is successful.

Permit

Permit extension, which occurs after Pod uses the Reserve plug-in to reserve resources, and before the Bind extension point bind, there are three main strategies: approve, reject, and wait.

1) approve (approval): when all Permit extensions approve the binding of Pod to the node, the scheduler will continue the binding process

2) deny: if any Permit extends the binding of Pod to the node, the Pod will be put back to the queue to be scheduled, and the Unreserve extension will be triggered

3) wait (wait): if a Permit extension returns wait, the Pod will remain in the Permit phase until it is extended by other approve. If the timeout event occurs, the wait state will be changed to deny,Pod and will be put back to the queue to be scheduled, and the UnReserve extension will be triggered.

Extension point PreBind of Binding Cycle

Extensions are used to perform some logic before Pod binding. The reason for the introduction of this plug-in is that there are some resources that determine the available node resources immediately when the Pod is not being scheduled, so the scheduler needs to ensure that these resources have been successfully bound to the selected node before Pod can be dispatched to this node. For example, the PreBind extension can mount a network-based data volume to a node so that Pod can use it. If any of the PreBind extensions returns an error, the Pod will be put back into the queue to be scheduled, and the Unreserve extension will be triggered.

Bind

The Bind extension calls the interface provided by apiserver to bind the Pod to the corresponding node.

PostBind

PostBind is an information extension point. When the Pod is successfully bound, the PostBind plug-in is invoked and can be used to clean up the associated resources.

UnReserve

Is a notification extension. If resources are reserved for Pod and Pod is rejected during binding, the Unreserve extension will be called. The Unreserve extension should release computing resources on nodes that have been reserved for Pod. In a plug-in, the Reserve extension and the UnReserve extension should appear in pairs.

03 use Scheduling Framework to customize Scheduler

Customizing the plug-in requires two steps:

1) implement the interface of the plug-in

2) register and configure the plug-in

3.1 implement the interface of the plug-in

Here we implement the QueueSort extension point, first take a look at the interface defined by the QueueSort extension point:

/ / QueueSortPlugin is an interface that must be implemented by "QueueSort" plugins.// These plugins are used to sort pods in the scheduling queue. Only one queue sort// plugin may be enabled at a time.type QueueSortPlugin interface {Plugin / / Less are used to sort pods in the scheduling queue. Less (* QueuedPodInfo, * QueuedPodInfo) bool}

The default scheduler will schedule the higher priority Pod first. The specific implementation method is to use the plug-in QueueSort. The default implementation is to sort the Priority value of Pod, but when the priority is the same, compare the timestamp of Pod. Timestamp is the time when Pod joined the queue. We now want to sort according to the Priority value of Pod, and then sort according to the QoS type of Pod when the Priority value is the same, and then sort according to the timestamp of Pod.

The Qos types are as follows:

1) Guaranteed: resources limits and requests are equal

2) Burstable: resources limits and requests are not equal

3) BestEffort: resources limits and requests are not set

Specifically, the Guaranteed priority is higher than the Burstable,Burstable priority and the BestEffort priority.

For the implementation of the plug-in, we only need to implement the Less method of QueueSortPlugin:

Package qosimport (v1 "k8s.io/api/core/v1"k8s.io/apimachinery/pkg/runtime"k8s.io/kubernetes/pkg/api/v1/pod" v1qos "k8s.io/kubernetes/pkg/apis/core/v1/helper/qos" framework "k8s.io/kubernetes/pkg/scheduler/framework/v1alpha1") / / Name is the name of the plugin used in the plugin registry and configurations.const Name = "QOSSort" / / Sort is a plugin that implements QoS Class based sorting.type Sort struct {} var _ framework.QueueSortPlugin = & Sort {} / / Name returns name of the plugin.func (pl * Sort) Name () string {return Name} / / Less is the function used by the activeQ heap algorithm to sort pods.// It sorts pods based on their priorities. When the priorities are equal, it uses// the Pod QoS classes to break the tie.func (* Sort) Less (pInfo1, pInfo2 * framework.PodInfo) bool {p1: = pod.GetPodPriority (pInfo1.Pod) p2: = pod.GetPodPriority (pInfo2.Pod) return (p1 > p2) | | (p1 = = p2 & & compQOS (pInfo1.Pod, pInfo2.Pod)) | (p1 = = p2 & pInfo1.Timestamp.Before (pInfo2.Timestamp))} func compQOS (p1, p2 * v1.Pod) bool {p1QOS P2QOS: = v1qos.GetPodQOS (p1), v1qos.GetPodQOS (p2) if p1QOS = = v1.PodQOSGuaranteed {return true} if p1QOS = = v1.PodQOSBurstable {return p2QOS! = v1.PodQOSGuaranteed} return p2QOS = = v1.PodQOSBestEffort} / / New initializes a new plugin and returns it.func New (_ * runtime.Unknown, _ framework.FrameworkHandle) (framework.Plugin, error) {return & Sort {}, nil}

Note: multiple extension points can be implemented with one Plugin. That is, you can implement both Filter, Score and PreBind in a Plugin.

3.2 registering and configuring plug-ins

1) Registration points to the registration plug-in in the default scheduler.

2) configuration determines which plug-ins need to be initialized through configuration.

3.2.1 register the implemented Qos plug-in func main () {rand.Seed (time.Now (). UnixNano ()) command: = app.NewSchedulerCommand (app.WithPlugin (qos.Name, qos.New),) logs.InitLogs () defer logs.FlushLogs () if err: = command.Execute () Err! = nil {os.Exit (1)}} 3.2.2 configure to use the Qos plug-in and deploy a custom scheduler

1) configuration

Let Sheduler know which plug-ins need to be initialized through configuration, for example, the plug-in Qos of QueueSort is specified below, and the plug-ins of other extension points are not specified, then there will be the default implementation of Kube-Scheduler. As you can see, the schedulerName field represents the scheduler name of the extension, the plug-in name of each extension point in the plugins field, and enable represents the extension point about running your plug-in.

ApiVersion: v1kind: ConfigMapmetadata: name: scheduler-config3 namespace: kube-systemdata: scheduler-config.yaml: | apiVersion: kubescheduler.config.k8s.io/v1alpha1 kind: KubeSchedulerConfiguration schedulerName: qos-scheduler leaderElection: leaderElect: true lockObjectName: qos-scheduler lockObjectNamespace: kube-system plugins: queueSort: enabled:-name: "QOSSort"

2) then create a RBAC for the scheduler

Kind: ClusterRoleapiVersion: rbac.authorization.k8s.io/v1metadata: name: qos-crrules:-apiGroups: -'* 'resources: -' * 'verbs: -' *'- nonResourceURLs: -'* 'verbs:-- apiVersion: v1kind: ServiceAccountmetadata: name: qos-sa namespace: kube-system---kind: ClusterRoleBindingapiVersion: rbac.authorization.k8s.io/v1metadata: name : qos-crb namespace: kube-systemroleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: qos-crsubjects:-kind: ServiceAccount name: qos-sa namespace: kube-system

3) configure the Deployment of the scheduler

ApiVersion: apps/v1kind: Deploymentmetadata: name: qos-scheduler namespace: kube-system labels: component: qos-schedulerspec: replicas: 1 selector: matchLabels: component: qos-scheduler template: metadata: labels: component: qos-schedulerspec: imagePullSecrets:-name: hybrid-regsecret serviceAccount: qos-sa priorityClassName: system-cluster-critical volumes:-name: scheduler-config3 configMap: Name: scheduler-config3 containers:-name: qos-scheduler image: hub.baidubce.com/kun/sxy/qos-scheduler:v1.0.0 imagePullPolicy: Always args:-/ qos-sample-scheduler-config=/scheduler/scheduler-config.yaml-VIP3 resources: requests : cpu: "50m" volumeMounts:-name: scheduler-config3 mountPath: / scheduler

4) after deploying with kubectl apply, you can see that qos-scheduler has been started

$kubectl get pods-n kube-systemNAME READY STATUS RESTARTS AGEqos-scheduler-79c767954f-225mr 1 kube-systemNAME READY STATUS RESTARTS AGEqos-scheduler-79c767954f-225mr 1 Running 0 44m

5) use custom scheduler to schedule Pod

Specify qos-scheduler on the spec.schedulerName in Pod, and the custom scheduler will execute the scheduling logic for Pod.

ApiVersion: v1kind: Podmetadata: name: test labels: app: testspec: schedulerName: qos-scheduler containers:-image: nginx name: nginx ports:-containerPort: 80

After creation, you can see that Pod has been scheduled normally and started successfully.

$kubectl get podsNAME READY STATUS RESTARTS AGEtest 1 move 1 Running 0 15s04 uses other examples of the scheduling framework 4.1 Coscheduling (collaborative scheduling)

Sometimes we need to use collaborative scheduling, a function similar to Kube-batch (also known as "Gang scheduling"). Gang scheduling allows scheduling of a certain number of Pod at the same time. If all members of Gang cannot schedule at the same time, none of them should schedule. Gang scheduling in Scheduling Framework can be done using the "Permit" plug-in.

1) the main scheduling thread processes the Pod one by one and reserves nodes for them, and each Pod invokes the Gang scheduling plug-in in the admission phase.

2) when it finds that Pod belongs to a Gang, it checks the properties of Gang. If there are not enough members regularly or in the "wait" state, the plug-in returns "wait".

3) when the number reaches the expected value, all Pod in the waiting state are approved and sent for binding.

4.2 Dynamic Resource Binding (dynamic resource binding)

When scheduling Pod, there are some dynamic resources, such as volumn, which are not yet candidate nodes. The scheduler needs to ensure that such cluster-level resources are bound to the selected node, and then the pod can be scheduled to the node of the node with such resources. You can use Scheduling Framework's PreBind extension point to implement plug-ins to bind dynamic resources.

Thank you for your reading, the above is the whole content of "how to practice Scheduling Framework Application", friends who learn to learn to hurry up to operate it. I believe that the editor will certainly bring you better quality articles. Thank you for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.