What is the use of CFS load balancing in Linux process management 07/03 Update SLTechnology News&Howtos

What is the use of CFS load balancing in Linux process management

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "what is the use of CFS load balancing in Linux process management". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

What is load balancing?

The previous scheduling learning is the default scheduling policy on a single CPU. We know that to reduce "interference" between CPU, there is a task queue on each CPU. In the process of running, there may be some CPU "busy brushstroke" and some CPU "idle egg pain", so load balancing is needed.

Transferring the task from the heavily loaded CPU to the relatively light loaded CPU for execution is the process of load balancing.

It is necessary to understand the topological relationship to CPU on soc before understanding load balancing.

We know that a multi-core soc system on chip, the internal structure is very complex, the kernel uses CPU topology to describe a SOC architecture. The kernel uses scheduling domains to describe the hierarchical relationship between CPU. For low-level scheduling domains, the overhead of load balancing between CPU is relatively small, while for higher-level scheduling domains, the greater the overhead of load balancing.

For example, a 4-core SOC, two cores is a cluster, sharing L2 cache, then each cluster can be thought of as a MC scheduling domain, each MC scheduling domain has two scheduling groups, and each scheduling group has only one CPU. The whole SOC can be regarded as a higher-level DIE scheduling domain, in which there are two scheduling groups, cluster0 belongs to one scheduling group, cluster1 belongs to another scheduling group. L2 cache needs to be cleared for load balancing across cluster, which is expensive, so it is more expensive for DIE scheduling domain at SOC level to do load balancing.

The scheduling domain and scheduling group corresponding to CPU can be viewed in the device model file / proc/sys/kernel/sched_domain.

The main members of the scheduling domain sched_domain are as follows:

Members describe that parent and childsched domain form a hierarchical structure, and parent and child establish a parent-child relationship with different hierarchical structures. For base domain, its child is equal to NULL;; for top domain, its parent is equal to NULL. Groups has several scheduling groups in a scheduling domain. These scheduling groups form a circular linked list. Groups members, even the chain headers min_interval and max_interval, also require overhead. You can't check the equilibrium state of the scheduling domain all the time. These two parameters define the range of the time interval between checking the sched domain equilibrium state. Balance_interval defines the time interval of the sched domain equilibrium busy_factor under normal conditions. Balance_interval defines the time interval for balancing. If the cpu is busy, the time interval for balancing is longer, that is, the time interval is defined as the load balancing operation begins after the unbalanced state in the busy_factor x balance_intervalimbalance_pct scheduling domain reaches a certain degree, and imbalance_pct defines the unbalanced water mark. Level the levelspan_weight of the sched domain in the hierarchical structure of the entire scheduling domain the number of cpu in the sched domain span the span of the scheduling domain

The main members of the scheduling group sched_group are as follows:

Members describe that all sched group in nextsched domain will form a circular linked list. Next points to the next node in the groups linked list, group_weight. How many cpusgc are there in the scheduling group? the arithmetic information of the scheduling group cpumask which cpuCPU topology examples the scheduling group contains?

In order to reduce the lock competition, each cpu has its own MC domain, DIE domain (sched domain is divided into two level,base domain called MC domain (multi core domain), the top domain is called DIE domain) and sched group, and forms the hierarchical structure between sched domain, the circular linked list structure of sched group. You can view cpu topology information at / sys/devices/system/cpu/cpuX/topology.

In the above structure, the sched domain is divided into two level,base domain called MC domain, and the top domain is called DIE domain. The top-level DIE domain covers all the CPU in the system, the MC domain of the small-core cluster includes the cpu of all the small-core cluster, and the MC domain of the large-core cluster includes the cpu of all the large-core cluster.

Through the DTS and CPU topo subsystems, the sched domain hierarchy can be constructed for specific equalization algorithms. The process is: kernel_init ()-> kernel_init_freeable ()-> smp_prepare_cpus ()-> init_cpu_topology ()-> parse_dt_topology ()

Software Architecture of load balancing

You can see in the figure that the left side is mainly divided into CPU load tracking and task load tracking.

CPU load tracking: consider the load of each CPU. Aggregate all the loads on the cluster to facilitate the calculation of load imbalance between cluster.

Task load tracking: determine whether the task is suitable for the current CPU computing power. If you decide that balance is required, how many tasks need to be migrated between CPU to achieve balance.

On the right is the sched domain hierarchy built through the DTS and CPU topo subsystems. The process is: kernel_init ()-> kernel_init_freeable ()-> smp_prepare_cpus ()-> init_cpu_topology ()-> parse_dt_topology ()

With the infrastructure on the left and right, when will the load balancing be triggered? This is mainly related to scheduling events. When scheduling events such as task awakening, task creation, tick arrival and so on, we can check the imbalance of the current system and migrate tasks as appropriate, so that the system load is balanced.

When is the load balancing done?

There are two types of load balancers for CFS tasks. One is periodic balancer for busy CPU, which is used to balance CFS tasks on busy cpu, and the other is idle balancer for idle cpu, which is used to balance tasks on busy CPU to idle cpu.

Periodic load balancing (periodic load balance or tick load balance) means that in tick, periodically check the load balance of the system, find the heaviest domain, group and CPU in the system, and pull the runnable tasks to this CPU so that the load of the system is balanced.

Nohz load balance means that other cpu has entered idle. The task of this CPU is too heavy. You need to wake up the CPUs of other idle through IPI to carry out load balancing. Nohz idle load balance is also driven by tick on busy cpu. If kick idle load balancer is needed, an ipi interrupt will be sent to the selected idle cpu through GIC to balance the load on behalf of all the idle cpu in the system.

New idle load balance is easy to understand, that is, when there is no task execution on the CPU, when you are about to enter the idle state, see if other CPU need help to pull the task from the busy cpu, so that the load of the whole system is balanced.

The basic process of load balancing

When load balancing is carried out on a CPU, it always starts with base domain to check the load balance among the sched group to which it belongs. If there is any imbalance, it will migrate between the cluster to which the cpu belongs, in order to maintain the task load balance of each cpu core in the cluster.

Load_balance is the core function of load balancing, and its processing unit is a scheduling domain, that is, sched domain, which contains the processing of scheduling groups.

Find the busiest sched group in this domain

Pick the busiest group among the busiest CPU runqueue, and the CPU becomes the src for task migration.

Select the tasks to be migrated from the queue (the judgment is mainly based on the size of the task load, priority is given to the heavy tasks of load)

Migrate to CPU runqueue as dst

This is the end of the content of "what is the use of CFS load balancing in Linux process management". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.