How to initialize the kernel scheduler in Linux 07/09 Update SLTechnology News&Howtos

How to initialize the kernel scheduler in Linux

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to initialize the kernel scheduler in Linux, I believe most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's learn about it!

The basic concept of scheduler

Before analyzing the code related to the scheduler, you need to understand the core data (structure) involved in the scheduler and their role.

Run queue (rq)

The kernel creates a run queue for each CPU, and the ready (Running) processes (task) in the system are organized into the kernel run queue, and then the processes on the run queue are scheduled to be executed on the CPU according to the corresponding policy.

Scheduling class (sched_class)

The kernel abstracts the scheduling policy (sched_class) to form a scheduling class (sched_class). The scheduling class can fully decouple the common code (mechanism) of the scheduler from the scheduling strategies provided by different scheduling classes, which is a typical OO (object-oriented) idea. Through this design, the kernel scheduler can be highly scalable, and developers can add a new scheduling class with very little code (basically no need to change the common code), thus implementing a new scheduler (class). For example, the deadline scheduling class is added in 3.x, which only adds the related implementation functions of dl_sched_class from the code level. It is very convenient to add a new real-time scheduling type.

In the current 5.4 kernel, there are five scheduling classes, and the priority distribution from top to bottom is as follows:

Stop_sched_class:

The highest priority scheduling class, like idle_sched_class, is a dedicated scheduling type (except for migration threads, all task cannot or should not be set to stop scheduling class). This scheduling class is designed to implement "urgent" tasks that rely on migration threads, such as active balance or stop machine.

Dl_sched_class:

The priority of deadline scheduling class is second only to stop scheduling class. It is a real-time scheduler (or scheduling strategy) based on EDL algorithm.

Rt_sched_class:

The priority of rt scheduling class is lower than that of dl scheduling class, so it is a real-time scheduler based on priority.

Fair_sched_class:

The priority of CFS scheduler is lower than the above three scheduling classes. It is a scheduling type designed based on the idea of fair scheduling and is the default scheduling class of the Linux kernel.

Idle_sched_class:

The type of idle scheduling is swapper thread, which mainly allows swapper thread to take over CPU and puts CPU into energy-saving state through frameworks such as cpuidle/nohz.

Scheduling domain (sched_domain)

The scheduling domain is introduced into the kernel in 2.6. through the introduction of the multi-level scheduling domain, the scheduler can better adapt to the physical characteristics of the hardware (the scheduling domain can better adapt to the CPU multi-level cache and the challenges brought by the physical characteristics of NUMA to load balancing), and achieve better scheduling performance (sched_domain is a mechanism developed for CFS scheduling load balancing).

Scheduling Group (sched_group)

The scheduling group is introduced into the kernel together with the scheduling domain, and it will cooperate with the scheduling domain to assist the CFS scheduler to complete the load balancing among multiple cores.

Root domain (root_domain)

The root domain is mainly responsible for the data structure designed for load balancing of real-time scheduling classes (including dl and rt scheduling classes) to assist dl and rt scheduling classes to complete the reasonable scheduling of real-time tasks. When the scheduling domain is not modified with isolate or cpuset cgroup, all CPU will be in the same root domain by default.

Group scheduling (group_sched)

In order to control the resources in the system more finely, the kernel introduces cgroup mechanism to control the resources. Group_sched is the underlying implementation mechanism of cpu cgroup. Through cpu cgroup, we can set some processes to a cgroup, and configure the corresponding parameters such as bandwidth and share through the control interface of cpu cgroup, so that we can finely control CPU resources according to group.

Scheduler initialization (sched_init)

Let's get to the point and start to analyze the initialization process of the kernel scheduler. I hope you can understand it through the analysis here:

1. How the run queue is initialized

2. How group scheduling is associated with rq (only after association can group scheduling be carried out through group_sched)

3. CFS soft interrupt SCHED_SOFTIRQ registration

Scheduling initialization (sched_init)

Start_kernel

| |-setup_arch |

| |-build_all_zonelists |

| |-mm_init |

| |-sched_init scheduling initialization |

Scheduling initialization is located relatively behind the start_kernel. By this time, memory initialization has been completed, so you can see that memory request functions such as kzmalloc can be called in sched_init.

Sched_init needs to initialize the run queue (rq) for each CPU, the global default bandwidth for dl/rt, the run queue for each scheduling class, and the registration of CFS soft interrupts.

Let's take a look at the specific implementation of sched_init (omitting some of the code):

Void _ _ init sched_init (void) {unsigned long ptr = 0; int I / * * initialize global default rt and dl CPU bandwidth control data structures * * rt_bandwidth and dl_bandwidth here are used to control global DL and RT usage bandwidth to prevent real-time processes from overusing * CPU This leads to hunger in ordinary CFS processes * / init_rt_bandwidth (& def_rt_bandwidth, global_rt_period (), global_rt_runtime ()) Init_dl_bandwidth (& def_dl_bandwidth, global_rt_period (), global_rt_runtime ()) # ifdef CONFIG_SMP / * * initialize the default root domain * * the root domain is an important data structure for global balance of real-time processes such as dl/rt. Take rt as an example, * root_domain- > cpupri is the highest priority of RT tasks running on each CPU in this root domain, and * the distribution of each priority task on CPU, through the data of cpupri Then when rt enqueue/dequeue *, the rt scheduler can ensure that high-priority tasks are given priority according to the distribution of rt tasks * run * / init_defrootdomain () # endif # ifdef CONFIG_RT_GROUP_SCHED / * * if the kernel supports rt group scheduling (RT_GROUP_SCHED) Then the bandwidth control of RT tasks will be able to use the granularity of cgroup * to control the CPU bandwidth usage of rt tasks in each group * * RT_GROUP_SCHED allows rt tasks to control bandwidth as a whole in the form of cpu cgroup * this will bring more flexibility to RT bandwidth control (without RT_GROUP_SCHED) Can only control the global * bandwidth usage of RT, but cannot control the bandwidth of some RT processes by specifying group) / init_rt_bandwidth (& root_task_group.rt_bandwidth, global_rt_period (), global_rt_runtime ()) # endif / * CONFIG_RT_GROUP_SCHED * / / * initialize its run queue for each CPU * / for_each_possible_cpu (I) {struct rq * rq; rq = cpu_rq (I); raw_spin_lock_init (& rq- > lock) / * * initialize the run queue of cfs/rt/dl on rq * each scheduling type has its own run queue on rq, and each scheduling class manages its own process * when pick_next_task (), the kernel according to the order of scheduling class priority Select tasks from high to bottom * this ensures that high priority scheduling tasks will be given priority to run * * stop and idle are special scheduling types designed for specific purposes and do not allow users * to create corresponding types of processes So the kernel does not design the corresponding run queue * / init_cfs_rq (& rq- > cfs) in rq. Init_rt_rq (& rq- > rt); init_dl_rq (& rq- > dl) # ifdef CONFIG_FAIR_GROUP_SCHED / * * CFS group scheduling (group_sched), you can use cpu cgroup to control CFS * you can use cpu.shares to provide CPU proportion control between group (let different cgroup share CPU according to the corresponding * ratio) Quotas can also be set through cpu.cfs_quota_us (similar to RT's * bandwidth control). CFS group_sched bandwidth control is one of the underlying technologies implemented by the container * * root_task_group is the default root task_group, and other cpu cgroup will use it as * parent or ancestor. The initialization here associates root_task_group with rq's cfs run queue *. It's interesting to directly add root_task_group- > cfs_ RQ [CPU] = & rq- > cfs * so that the processes under the cpu cgroup root or the sched_entity of cgroup tg will be directly added to the rq- > cfs * queue, reducing a layer of lookup overhead. * / root_task_group.shares = ROOT_TASK_GROUP_LOAD; INIT_LIST_HEAD (& rq- > leaf_cfs_rq_list); rq- > tmp_alone_branch = & rq- > leaf_cfs_rq_list; init_cfs_bandwidth (& root_task_group.cfs_bandwidth); init_tg_cfs_entry (& root_task_group, & rq- > cfs, NULL, I, NULL) # endif / * CONFIG_FAIR_GROUP_SCHED * / rq- > rt.rt_runtime = def_rt_bandwidth.rt_runtime;#ifdef CONFIG_RT_GROUP_SCHED / * initialize the rt runtime queue on rq, similar to the group scheduling initialization of CFS above * / init_tg_rt_entry (& root_task_group, & rq- > rt, NULL, I, NULL) # endif # ifdef CONFIG_SMP / * * here associate rq with the default def_root_domain. If it is a SMP system, then * in the case of sched_init_smp, the kernel will create a new root_domain, and then replace it here * def_root_domain * / rq_attach_root (rq, & def_root_domain) # endif / * CONFIG_SMP * /} / * * register CFS's SCHED_SOFTIRQ soft interrupt service function * this soft interrupt is prepared for periodic load balancing and nohz idle load balance * / init_sched_fair_class (); scheduler_running = 1;} multicore scheduling initialization (sched_init_smp)

Start_kernel

| |-rest_init |

| |-kernel_init |

| |-kernel_init_freeable |

| |-smp_init |

| |-sched_init_smp |

| |-sched_init_numa |

| |-sched_init_domains |

| |-build_sched_domains |

Multicore scheduling initialization is mainly to complete the initialization of the scheduling domain / scheduling group (of course, the root domain will also do it, but relatively speaking, the initialization of the root domain will be relatively simple).

Linux is an operating system that can run on multiple chip architectures and multiple memory architectures (UMA/NUMA), so Linux needs to be able to adapt to a variety of physical structures, so its scheduling domain design and implementation is relatively complex.

Implementation principle of scheduling domain

Before we talk about the specific scheduling domain initialization code, we need to understand the relationship between the scheduling domain and the physical topology (because the design of the scheduling domain is closely related to the physical topology, if we do not understand the physical topology, then there is no way to really understand the implementation of the scheduling domain)

Physical topology diagram of CPU

Let's assume a computer system (similar to intel chips, but reducing the number of CPU cores for ease of representation):

In a dual-socket computer system, each socket is composed of 2 cores and 4 threads, so the computer system should be a 4-core and 8-threaded NUMA system (the above is only the physical topology of intel, while the AMD ZEN architecture adopts the design of chiplet, which adds an extra layer of DIE domains between MC and NUMA domains).

Layer 1 (SMT domain):

As shown in the figure above, the CORE0,2 hyperthreads constitute the SMT domain. For intel cpu, hyperthreads share L1 and L2 (even store buffe is shared to some extent), so there is no cache heat loss when migrating between SMT domains.

Layer 2 (MC domain):

As shown in the figure above, CORE0 and CORE1 are located in the same SOCKET and belong to the MC domain. For intel cpu, they generally share LLC (usually L3). Although process migration in this domain will lose the heat of L1 and L2, the cache heat of L3 can be maintained.

Layer 3 (Numa domain):

For example, SOCKET0 and SOCKET1 in the figure above, the process migration between them will result in the loss of all cache heat and will have a large overhead, so the migration of NUMA domains needs to be relatively cautious.

It is precisely because of such hardware physical characteristics (different levels of cache heat, NUMA access latency and other hardware factors) that the kernel abstracts sched_domain and sched_group to represent such physical characteristics. When doing load balancing, we make different scheduling strategies (such as the frequency of load balancing, unbalanced factors and wake-up kernel logic, etc.) according to the characteristics of the corresponding scheduling domain, so as to better balance the CPU load and cache affinity.

Specific implementation of scheduling domain

Next we can see how the kernel sets up scheduling domains and scheduling groups on the above physical topology

The kernel establishes the corresponding hierarchical scheduling domain according to the physical topology, and then establishes the corresponding scheduling group on each layer of the scheduling domain. The scheduling domain is doing load balancing by finding the heaviest busiest sg (sched_group) in the scheduling domain at the corresponding level, and then determining whether the load of buiest sg and local sg (but the scheduling group in which the former CPU belongs) is uneven. If there is a load imbalance, the buisest cpu is selected from the buiest sg and the load is balanced between the two CPU.

The SMT domain is the lowest scheduling domain, and you can see that each hyperthreaded pair is a smt domain. There are two sched_group in smt domain, and there is only one CPU for each sched_group. Therefore, the load balancing in the smt domain is to perform the process migration between hyperthreads, which has the shortest time and the most relaxed conditions.

For architectures where there is no hyperthreading (or the chip does not have hyperthreading enabled), then the lowest level domain is the MC domain (at this time there are only layer 2 domains, MC and NUMA). In this way, each CORE in the MC domain is a sched_group, and the kernel can adapt to such a scenario when scheduling.

The MC domain is composed of all the CPU of the CPU on the socket, and each sg is composed of all the CPU of the superior smt domain. So for the figure above, the sg of MC consists of two CPU. The kernel is designed in the MC domain so that when the CFS scheduling class wakes up load balancing and idle load balancing, it requires balancing between the sg of the MC domain.

This design is important for hyperthreading, and we can observe this in some real business. For example, we have a codec business and find that the test data is better in some virtual machines and worse in some virtual machines. After analysis, it is found that this is caused by whether to transparently transmit hyperthreading information to the virtual machine. When we transparently transmit hyper-threaded information to the virtual machine, the virtual machine forms a two-tier scheduling domain (SMT and MC domain), and when waking up load balancer, CFS tends to schedule business to idle sg (that is, idle physical CORE, rather than idle CPU). At this time, when the CPU utilization is not high (no more than 40%), the business can make more full use of the performance of physical CORE (or the old problem). When a hyperthreaded pair on a physical CORE runs CPU consumable business at the same time, the performance gain is only about 1.2 times that of a single thread. ) to obtain a better performance gain. If there is no transparent transmission of hyper-threaded information, then the virtual machine has only one layer of physical topology (MC domain), so because the business is likely to be scheduled through the hyperthreading of a physical CORE, the system will not be able to make full use of the performance of the physical CORE, resulting in low business performance.

The NUMA domain is composed of all the CPU in the system, all the CPU on the SOCKET constitute a sg, and the NUMA domain in the above figure is composed of two sg. When there needs to be a large imbalance between the sg of the NUMA (and the imbalance here is sg-level, that is, the sum of all the CPU loads on the sg is unbalanced with another sg), the process migration across the NUMA can occur (because the migration across the NUMA will result in all cache heat loss of L1 L2 L3 and may cause more cross-NUMA memory access, it needs to be handled carefully).

As can be seen from the above introduction, through the cooperation of sched_domain and sched_group, the kernel can adapt to various physical topologies (whether to enable hyperthreading and whether to enable the use of NUMA) and use CPU resources efficiently.

Smp_init/* * Called by boot processor to activate the rest. * * in the SMP architecture, BSP needs all other non-boot cp bring up * / void _ _ init smp_init (void) {int num_nodes, num_cpus; unsigned int cpu; / * to create its idle thread * / idle_threads_init () for each CPU; / * register cpuhp threads with the kernel * / cpuhp_threads_init (); pr_info ("Bringing up secondary CPUs.\ n") / * * FIXME: This should be done in userspace-- RR * * if CPU does not have online, use cpu_up to bring up * / for_each_present_cpu (cpu) {if (num_online_cpus () > = setup_max_cpus) break; if (! cpu_online (cpu)) cpu_up (cpu) }.}

Before actually initializing the sched_init_smp scheduling domain, you need to bring up all non-boot cpu to ensure that these CPU are in the ready state before initializing the multicore scheduling domain.

Sched_init_smp

So let's take a look at the specific code implementation of multicore scheduling initialization (if CONFIG_SMP is not configured, then the relevant implementation here will not be executed)

Sched_init_numa

Sched_init_numa () is used to detect whether the system is NUMA, and if so, the NUMA domain needs to be added dynamically.

/ * Topology list, bottom-up. * * default physical topology of Linux * * there is only a three-level physical topology, and the Numa domain is automatically detected in sched_init_numa () * if there is a Numa domain, the corresponding NUMA scheduling domain will be added * * Note: there may be some problems with the default default_topology scheduling domain, such as * some platforms do not have a Die domain (intel platform) Then it is possible that the LLC and DIE domains overlap * so the kernel will scan all schedules in cpu_attach_domain () after the scheduling domain is established * if there is scheduling overlap. Then destroy_sched_domain corresponding overlapping scheduling domains * / static struct sched_domain_topology_level default_topology [] = {# ifdef CONFIG_SCHED_SMT {cpu_smt_mask, cpu_smt_flags, SD_INIT_NAME (SMT)}, # endif#ifdef CONFIG_SCHED_MC {cpu_coregroup_mask, cpu_core_flags, SD_INIT_NAME (MC)}, # endif {cpu_cpu_mask, SD_INIT_NAME (DIE)}, {NULL,} }

Default physical topology of Linux

/ * * NUMA scheduling domain initialization (create a new sched_domain_topology physical topology based on hardware information) * * the kernel does not actively add NUMA topology by default, but needs to be based on configuration (if NUMA is enabled) * if NUMA is enabled, it is necessary to determine whether a * sched_domain_topology_level domain needs to be added based on hardware topology information (only after this domain is added The kernel will create NUMA DOMAIN) / void sched_init_numa (void) {. When initializing * sched_domain later. / * * here we will check whether there is a Numa domain (or even a multi-level Numa domain) based on distance, and then update it to the physical topology according to the * situation. Later, when the scheduling domain is established, the new * physical topology will be used to establish a new scheduling domain * / for (j = 1; j)

< level; i++, j++) { tl[i] = (struct sched_domain_topology_level){ .mask = sd_numa_mask, .sd_flags = cpu_numa_flags, .flags = SDTL_OVERLAP, .numa_level = j, SD_INIT_NAME(NUMA) }; } sched_domain_topology = tl; sched_domains_numa_levels = level; sched_max_numa_distance = sched_domains_numa_distance[level - 1]; init_numa_topology_type();} 检测系统的物理拓扑结构，如果存在 NUMA 域则需要将其加到 sched_domain_topology 里，后面就会根据 sched_domain_topology 这个物理拓扑结构来建立相应的调度域。 sched_init_domains 下面接着分析 sched_init_domains 这个调度域建立函数 /* * Set up scheduler domains and groups. For now this just excludes isolated * CPUs, but could be used to exclude other special cases in the future. */int sched_init_domains(const struct cpumask *cpu_map){ int err; zalloc_cpumask_var(&sched_domains_tmpmask, GFP_KERNEL); zalloc_cpumask_var(&sched_domains_tmpmask2, GFP_KERNEL); zalloc_cpumask_var(&fallback_doms, GFP_KERNEL); arch_update_cpu_topology(); ndoms_cur = 1; doms_cur = alloc_sched_domains(ndoms_cur); if (!doms_cur) doms_cur = &fallback_doms; /* * doms_cur[0] 表示调度域需要覆盖的cpumask * * 如果系统里用isolcpus=对某些CPU进行了隔离，那么这些CPU是不会加入到调度 * 域里面，即这些CPU不会参于到负载均衡(这里的负载均衡包括DL/RT以及CFS)。 * 这里用 cpu_map & housekeeping_cpumask(HK_FLAG_DOMAIN) 的方式将isolate * cpu去除掉，从而在保证建立的调度域里不包含isolate cpu */ cpumask_and(doms_cur[0], cpu_map, housekeeping_cpumask(HK_FLAG_DOMAIN)); /* 调度域建立的实现函数 */ err = build_sched_domains(doms_cur[0], NULL); register_sched_domain_sysctl(); return err;}/* * Build sched domains for a given set of CPUs and attach the sched domains * to the individual CPUs */static intbuild_sched_domains(const struct cpumask *cpu_map, struct sched_domain_attr *attr){ enum s_alloc alloc_state = sa_none; struct sched_domain *sd; struct s_data d; struct rq *rq = NULL; int i, ret = -ENOMEM; struct sched_domain_topology_level *tl_asym; bool has_asym = false; if (WARN_ON(cpumask_empty(cpu_map))) goto error; /* * Linux里的绝大部分进程都为CFS调度类，所以CFS里的sched_domain将会被频繁 * 的访问与修改(例如nohz_idle以及sched_domain里的各种统计)，所以sched_domain * 的设计需要优先考虑到效率问题，于是内核采用了percpu的方式来实现sched_domain * CPU间的每级sd都是独立申请的percpu变量，这样可以利用percpu的特性解决它们 * 间的并发竞争问题(1、不需要锁保护 2、没有cachline伪共享) */ alloc_state = __visit_domain_allocation_hell(&d, cpu_map); if (alloc_state != sa_rootdomain) goto error; tl_asym = asym_cpu_capacity_level(cpu_map); /* * Set up domains for CPUs specified by the cpu_map: * * 这里会遍历cpu_map里所有CPU，为这些CPU创建与物理拓扑结构对应( * for_each_sd_topology)的多级调度域。 * * 在调度域建立的时候，会通过tl->

Mask (cpu) obtains the span of * corresponding to cpu in this level scheduling domain (that is, cpu and other corresponding cpu constitute this scheduling domain), and the corresponding sd of * CPU in the same scheduling domain will be initialized to the same at the beginning (including parameters such as sd- > pan, * sd- > imbalance_pct and sd- > flags). * / for_each_cpu (I, cpu_map) {struct sched_domain_topology_level * tl; sd = NULL; for_each_sd_topology (tl) {int dflags = 0; if (tl = = tl_asym) {dflags | = SD_ASYM_CPUCAPACITY; has_asym = true } sd = build_sched_domain (tl, cpu_map, attr, sd, dflags, I); if (tl = = sched_domain_topology) * per_cpu_ptr (d.sd, I) = sd; if (tl- > flags & SDTL_OVERLAP) sd- > flags | = SD_OVERLAP If (cpumask_equal (cpu_map, sched_domain_span (sd)) break }} / * * Build the groups for the domains * * create scheduling group * * We can see the role of sched_group from the implementation of two scheduling domains * 1, Numa domain 2, LLC domain * * numa sched_domain- > span will contain all the CPU on the NUMA domain. When equalization is needed, * the NUMA domain should not be in units of cpu. Instead, it should be in units of socket, that is, CPU should be migrated between the two SOCKET only when socket1 and socket2 * are extremely unbalanced. If you use sched_domain to implement this * abstraction, it will result in inflexibility (as you can see in the MC domain later), so the kernel will represent a collection of cpu as sched_group *, and each socket belongs to a sched_group. Migration is allowed only when the two sched_group are out of balance * * the MC domain is similar. CPU may be hyperthreaded, and the performance of hyperthreading is not equivalent to that of the physical core. A pair of * hyperthreads is about 1.2 times the performance of the physical core. Therefore, when scheduling, we need to consider the balance between hyper-threaded * pairs, that is, the balance between CPU should be satisfied first, and then the hyper-threading balance within CPU. At this time * sched_group is used for abstraction, and a sched_group represents a physical CPU (2 hyperthreads). At this time, * LLC ensures a balance between CPU, thus avoiding an extreme situation: balance between hyperthreads, but imbalance in physical cores *. At the same time, it can be guaranteed that when scheduling kernel selection, the kernel will give priority to the implementation of physical threads. Only physical threads * consider using other hyperthreads after they are used up, so that the system can make more full use of CPU computing power * / for_each_cpu (I, cpu_map) {for (sd = * per_cpu_ptr (d.sd, I)) Sd; sd = sd- > parent) {sd- > span_weight = cpumask_weight (sched_domain_span (sd)); if (sd- > flags & SD_OVERLAP) {if (build_overlap_sched_groups (sd, I)) goto error } else {if (build_sched_groups (sd, I)) goto error } / * * Calculate CPU capacity for physical packages and nodes * * sched_group_capacity is used to represent the CPU computing power that can be used by sg * * sched_group_capacity takes into account the different computing power of each CPU (the highest frequency setting is different, the size core of * ARM, etc.), removing the CPU used by the RT process (sg is prepared for CFS, etc.) Therefore, after * removing the CPU computing power used by DL/RT processes on CPU), the available computing power of CFS sg should be left (because * in load balancing, not only the load on CPU should be taken into account, but also the CFS * available computing power on this sg should be considered. If there are fewer processes on this sg, but the sched_group_capacity is also small, it is also * that the process should not be migrated to this sg) * / for (I = nr_cpumask_bits-1; I > = 0; iMel -) {if (! cpumask_test_cpu (I, cpu_map)) continue; for (sd = * per_cpu_ptr (d.sd, I); sd Sd = sd- > parent) {claim_allocations (I, sd); init_sched_groups_capacity (I, sd);}} / * Attach the domains * / rcu_read_lock () / * * bind the rq of each CPU to rd (root_domain) and check whether the sd overlaps * if so, you need to remove it with destroy_sched_domain () (so we can see that * the server of intel has only three layers of scheduling domain, and the DIE domain actually overlaps with the LLC domain. So here * will be removed) * / for_each_cpu (I, cpu_map) {rq = cpu_rq (I) Sd = * per_cpu_ptr (d.sd, I); / * Use READ_ONCE () / WRITE_ONCE () to avoid load/store tearing: * / if (rq- > cpu_capacity_orig > READ_ONCE (d.rd- > max_cpu_capacity)) WRITE_ONCE (d.rd- > max_cpu_capacity, rq- > cpu_capacity_orig); cpu_attach_domain (sd, d.rd, I) } rcu_read_unlock (); if (has_asym) static_branch_inc_cpuslocked (& sched_asym_cpucapacity); if (rq & & sched_debug_enabled) {pr_info ("root domain span:% * pbl (max cpu_capacity =% lu)\ n", cpumask_pr_args (cpu_map), rq- > rd- > max_cpu_capacity);} ret = 0 Error: _ _ free_domain_allocs (& d, alloc_state, cpu_map); return ret;}

So far, we have built the scheduling domain of the kernel, and CFS can use sched_domain to achieve load balancing among multiple cores.

These are all the contents of the article "how to initialize the kernel scheduler in Linux". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.