Analysis on how to schedule Linux process 07/02 Update SLTechnology News&Howtos

Analysis on how to schedule Linux process

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article shows you how to carry out Linux process scheduling analysis, the content is concise and easy to understand, absolutely can make your eyes bright, through the detailed introduction of this article, I hope you can get something.

In order to realize multi-process in the operating system, process scheduling is essential. Process scheduling is the scheduling of processes in TASK_RUNNING state. If a process is not executable (sleeping or otherwise), it has little to do with process scheduling.

So, if your system load is very low, expect the stars to look forward to the moon before there is an executable process. Then process scheduling will not be too important. If any process is executable, let it be executed. There is nothing to think about.

On the other hand, if the system load is very high, there are more than N processes in the executable state all the time, waiting to be scheduled to run. Then the process scheduler must do a lot of work in order to coordinate the execution of these N processes. If the system is not well coordinated, the performance of the system will be greatly reduced. At this time, process scheduling is very important.

Although many computers we usually come into contact with (such as desktop systems, network servers, etc.) have low load, linux, as a general operating system, cannot assume that the system load is low, and must be carefully designed to cope with process scheduling under high load.

Of course, these designs are of little use in environments with low load (and no real-time requirements). In extreme cases, if the load of CPU is always 0 or 1 (there is always only one process or no process to run on CPU), then these designs are basically futile.

Priority

In order to coordinate the "simultaneous" operation of multiple processes, the most basic means of today's operating system is to define the priority of the process. The priority of the process is defined, and if there are multiple processes in the executable state at the same time, then the one with the highest priority will execute, and there is nothing to struggle with.

So, how to determine the priority of the process? There are two ways: specified by the user program and dynamically adjusted by the kernel scheduler. (I'll talk about it below)

The linux kernel divides processes into two levels: normal processes and real-time processes. Real-time processes have higher priority than ordinary processes, in addition, their scheduling strategies are also different.

Scheduling of real-time processes

Real-time, the original meaning is "a given operation must be completed within a certain time". The point is not how fast the operation must be handled, but that the time should be controllable (in the worst case, you can't break through the given time).

Such "real-time" is called "hard real-time" and is often used in very sophisticated systems (such as rockets, missiles, etc.). Generally speaking, hard real-time systems are relatively dedicated.

General-purpose operating systems such as linux are obviously unable to meet such requirements, and the existence of interrupt handling, virtual memory, and other mechanisms brings great uncertainty to the processing time. Hardware cache, disk seek, bus contention, will also bring uncertainty.

Consider, for example, the C code "iTunes;". In most cases, it executes very quickly. But in extreme cases, it's still possible:

1. The memory space of I is not allocated, and CPU triggers a page fault exception. On the other hand, when linux tries to allocate memory in the handling code of page fault exception, the allocation may fail due to the shortage of system memory, causing the process to go to sleep.

2. The hardware interrupts during the code execution, and linux enters the interrupt handler and shelves the current process. On the other hand, new hardware interrupts may occur during the processing of the interrupt handler, and the interrupts are always nested. Wait a minute.

The general operating system such as linux, which claims to achieve "real-time", actually only implements "soft real-time", that is, to meet the real-time needs of the process as much as possible.

If a process has real-time needs (it is a real-time process), the kernel keeps it executing as long as it is executable to meet its CPU needs as much as possible until it finishes what it needs to do, and then sleeps or exits (becomes non-executable).

If there are multiple real-time processes in the executable state, the kernel will first meet the CPU needs of the highest priority real-time process until it becomes non-executable. Therefore, as long as the high-priority real-time process is always in the executable state, the low-priority real-time process can not get the CPU; all the time, as long as the real-time process is always in the executable state, the ordinary process can not get the CPU.

(later, the kernel added two parameters, / proc/sys/kernel/sched_rt_runtime_us and / proc/sys/kernel/sched_rt_period_us, to limit the maximum amount of time a real-time process can run sched_rt_runtime_us in a sched_rt_period_us cycle. In this way, when there are real-time processes in the executable state all the time, it leaves a little chance for ordinary processes to be executed. See "Analysis of linux Group scheduling". )

What if multiple real-time processes with the same priority are executable? There are two scheduling strategies to choose from:

1. SCHED_FIFO: first in, first out. The subsequent process is not scheduled for execution until the first executed process becomes unexecutable. Under this strategy, the first-come process can make the sched_yield system call and voluntarily give up the CPU to give rights to the later process.

2. SCHED_RR: round robin scheduling. The kernel allocates time slices to real-time processes, and when the time slices are used up, let the next process use CPU

It is emphasized that these two scheduling strategies are only for situations where multiple real-time processes with the same priority are executable at the same time.

Under linux, user programs can set the scheduling policy and related scheduling parameters of the process through sched_setscheduler system calls, while sched_setparam system calls are only used to set scheduling parameters. These two system calls require the ability of the user process to set the priority of the process (CAP_SYS_NICE, which generally requires root permission) (see the article related to capability).

The process becomes a real-time process by setting the process's policy to SCHED_FIFO or SCHED_RR. The priority of the process is specified when setting the scheduling parameters through the above two system calls.

For real-time processes, the kernel does not attempt to adjust its priority. Because the process is real-time or not? How real-time? These questions are related to the application scenario of the user program, and only the user can answer them, but the kernel cannot assume.

To sum up, the scheduling of real-time processes is very simple. The priority and scheduling strategy of the process are determined by the user, and the kernel only needs to choose the real-time process with the highest priority to schedule execution. The only slightly more troublesome thing is to consider two scheduling strategies when selecting real-time processes with the same priority.

Scheduling of ordinary processes

The central idea of real-time process scheduling is to make the real-time process with the highest priority in the executable state occupy CPU as much as possible because it has real-time requirements, while ordinary processes are considered to be processes without real-time requirements, so the scheduler tries to make all ordinary processes in executable state coexist peacefully to share CPU, so as to make users feel that these processes are running at the same time.

Compared with real-time processes, the scheduling of ordinary processes is much more complex. The kernel needs to consider two hassles:

I. dynamically adjust the priority of the process

According to the behavior characteristics of processes, processes can be divided into "interactive processes" and "batch processes":

The main task of interactive processes (such as desktop programs, servers, etc.) is to interact with the outside world. Such processes should have a higher priority, and they always sleep waiting for input from the outside world. When the input arrives and the kernel wakes it up, they should be scheduled and executed quickly to respond. For example, if a desktop program does not respond half a second after the mouse click, the user will feel that the system is "stuck".

The main task of batch processes, such as compilers, is to do continuous operations, so they remain executable. Such processes generally do not need high priority. For example, if the compiler runs for a few more seconds, most users will not care too much about it.

If the user can clearly know what the priority of the process should be, the priority can be set through nice and setpriority (non-real-time process priority setting) system calls. (if you want to increase the priority of the process, the user process is required to have CAP_SYS_NICE capabilities.

However, applications are not necessarily as typical as desktop programs and compilers. Programs can behave in a variety of ways, one moment like an interactive process and the other like a batch process. So that it is difficult for users to set an appropriate priority for it. Furthermore, even if the user clearly knows whether a process is interactive or batch, it is probably due to permissions or lazy to set the priority of the process. Have you ever set a priority for a program? )

So, in the end, the task of distinguishing between interactive processes and batch processes falls to the kernel scheduler.

The scheduler pays attention to the performance of the process in the recent period of time (mainly checking its sleep time and running time) and determines whether it is now interactive or batch based on some empirical formulas. To what extent? It was finally decided to make some adjustments to its priorities.

After the priority of the process is dynamically adjusted, two priorities appear:

1. The priority set by the user program (if it is not set, the default value is used), which is called static priority. This is the benchmark of the priority of the process, which is often unchanged during the execution of the process.

2. The actual effective priority after the priority is dynamically adjusted. This value may be changing all the time.

Second, the fairness of scheduling

In a system that supports multiple processes, ideally, each process should have a fair share of CPU according to its priority. There will be no uncontrollable situation such as "who is lucky and who has more".

There are basically two ways to realize fair scheduling in linux:

1. Assign time slices to processes in the executable state (according to priority), and processes that have used up the time slices are put into the "expiration queue". Wait for the executable process to expire, and then reassign the time slice.

2. Dynamically adjust the priority of the process. As the process runs on CPU, its priority is continuously lowered so that other processes with lower priorities can be run.

The latter approach has a smaller scheduling granularity and combines "fairness" and "dynamic priority adjustment" into one, which greatly simplifies the code of the kernel scheduler. Therefore, this approach has also become the new favorite of kernel schedulers.

To emphasize, the above two points are only for ordinary processes. For real-time processes, the kernel can neither adjust the priority dynamically nor be fair.

The specific scheduling algorithms for ordinary processes are very complex and are constantly changing with the evolution of the linux kernel version (not just simple adjustments), so this article will not go any further. Interested friends can refer to the following link: "Overview of the Development of Linux Scheduler"

Efficiency of scheduler

Priority specifies which process should be scheduled for execution, and the scheduler must also be concerned about efficiency. The scheduler, like many processes in the kernel, is executed frequently, and if inefficient, a lot of CPU time will be wasted, resulting in system performance degradation.

At linux 2.4, executable processes are hung in a linked list. For each schedule, the scheduler needs to scan the entire linked list to find the optimal process to run. The complexity is O (n)

In the early days of linux 2.6, executable processes were hung in N (n = 140) linked lists, each representing a priority, and there were as many linked lists as many priorities were supported in the system. For each schedule, the scheduler only needs to pull the process at the head of the chain from the first non-empty linked list. In this way, the efficiency of the scheduler is greatly improved and the complexity is O (1).

In recent versions of linux 2.6, executable processes are hung in a red-black tree (can be thought of as a balanced binary tree) in order of priority. Each time the scheduler needs to find the process with the highest priority from the tree. The complexity is O (logN).

So why does the scheduler have more complexity in selecting processes from the early days of linux 2.6 to the recent linux 2.6 version?

This is because, at the same time, the scheduler's realization of fairness has changed from the first train of thought mentioned above to the second way of thinking (through dynamic priority adjustment). The algorithm of O (1) is based on a small number of linked lists, which makes the value range of priority very small (low degree of discrimination) and can not meet the needs of fairness. Using a red-black tree has no restrictions on the value of priority (priority values can be represented by 32-bit, 64-bit, or more bits), and the complexity of O (logN) is also very efficient.

Timing of scheduling trigger

Scheduling can be triggered in the following ways:

1. The current process (the process running on CPU) becomes non-executable.

The process executes the system call to actively become non-executable. Such as executing nanosleep to go to sleep, performing exit exit, and so on

The requested resources of the process are not satisfied and are forced to go to sleep. For example, when performing a read system call, there is no required data in the disk cache, so sleep waits for disk IO.

The process responds to the signal and becomes non-executable. For example, response to SIGSTOP entering pause state, response to SIGKILL exit, and so on

2. Preemption. When the process is running, it is unexpectedly deprived of the right to use CPU. This is divided into two situations: the process runs out of time slices, or there is a higher priority process.

Higher priority processes are awakened by processes running on the CPU. Such as sending a signal to wake up actively, or being woken up because a mutex is released (such as releasing a lock)

In response to the clock interrupt, the kernel found that the time slices of the current process had been used up.

In the process of responding to the interrupt, the kernel wakes it up by finding that the external resources waiting for the higher priority process become available. For example, CPU receives an interrupt from the network card, and the kernel handles the interrupt and finds that a certain socket is readable, so it wakes up the process waiting to read the socket. Another example is that the kernel triggers a timer in the process of handling the clock interrupt, thus awakening the corresponding process sleeping in the nanosleep system call.

Other questions

1. Kernel preemption

Ideally, as long as a higher priority process is met, the current process should be preempted immediately. However, just as multithreaded programs need locks to protect critical section resources, there are many such critical areas in the kernel that are unlikely to receive preemption anytime, anywhere.

The design of linux 2.4 is very simple, and the kernel does not support preemption. Preemption is not allowed when a process is running in kernel state (such as executing a system call and being in an exception handling function). Scheduling must not be triggered until the user state is returned (to be exact, before returning to the user state, the kernel specifically checks whether scheduling is required)

Linux 2.6 implements kernel preemption, but in many places it is necessary to temporarily disable kernel preemption in order to protect critical resources.

There are also places where preemption is disabled for efficiency reasons, typically spin_lock. A spin_lock is a lock in which, if the request for a lock is not satisfied (the lock is already occupied by another process), the current process detects the state of the lock in a dead loop until the lock is released.

Why are you so busy waiting? Because the critical area is very small, for example, it only protects the sentence "iatrophijacked;". If the "sleep-wake" process is formed because of the failure of locking, the loss will outweigh the gain. So since the current process is busy waiting (no sleep), who will release the lock? In fact, the process that has acquired the lock is running on another CPU, and kernel preemption is disabled. This process will not be preempted by other processes, so the process waiting for the lock can only run on another CPU. What if there is only one CPU? Then there can be no process waiting for the lock. )

And what if you can't stop using the kernel to preempt? Then the process that gets the lock may be preempted, so the lock may not be released for a long time. As a result, the process of waiting for the lock may be rewarded at some point.

For some systems with higher real-time requirements, things like spin_lock cannot be tolerated. It is better to switch to a more laborious sleep-wake process than to keep a higher priority process waiting because preemption is disabled. For example, this is what embedded real-time linux montavista does.

Thus it can be seen that real-time does not represent efficiency. In many cases, in order to achieve "real-time", we still need to make some concessions to the performance.

2. Load balancing under multiprocessor

We didn't specifically discuss the impact of multiprocessors on the scheduler, but there's nothing special about having multiple processes running in parallel at the same time. So why is there such a thing as multiprocessor load balancing?

If there is only one executable queue in the system, which CPU is idle will go to the queue to find the most appropriate process to execute. Isn't that good and balanced?

That's true, but there are some problems with multiprocessors sharing an executable queue. Obviously, each CPU needs to lock up the queue when executing the scheduler, which makes it difficult for the scheduler to run in parallel and may lead to system performance degradation. There is no such problem if each CPU corresponds to an executable queue.

In addition, there is another advantage of multiple executable queues. This makes a process always execute on the same CPU for a period of time, so it is likely that the data of this process is cached in all levels of cache of this CPU, which is conducive to the improvement of system performance.

Therefore, under linux, each CPU has a corresponding executable queue, and a process in an executable state can only be in one executable queue at a time.

So here comes the hassle of multiprocessor load balancing. The kernel needs to pay attention to the number of processes in each CPU executable queue and make appropriate adjustments when the number is uneven. When it needs to be adjusted and how hard the process will be adjusted are all things that the kernel needs to care about. Of course, it's best not to adjust as much as possible. after all, it takes a lot of CPU and locks the executable queue.

In addition, the kernel has to care about the relationship between the various CPU. The two CPU may be independent of each other, may be shared cache, and may even be virtual by the same physical CPU through hyperthreading technology. The relationship between CPU is also an important basis for load balancing. The closer the relationship, the less expensive it is for processes to migrate between them. See "linux Kernel SMP load balancing Analysis".

3. Priority inheritance

Due to mutual exclusion, a process (set to A) may sleep while waiting to enter the critical section. Process An is not awakened until the process that is occupying the corresponding resources (set to B) exits the critical area.

There may be situations where A has a very high priority and B has a very low priority. B enters the critical section, but is preempted by other higher priority processes (set to C), and cannot exit the critical area if it is not run. So A can't be awakened.

A has a high priority, but now it is reduced to B and is preempted by C, which is not a high priority, causing the execution to be delayed. This phenomenon is called priority reversal.

It is very unreasonable for this phenomenon to occur. A better response is that when A starts to wait for B to exit the critical area, B temporarily gets the priority of A (or assuming that A's priority is higher than B) in order to successfully complete the process and exit the critical area. After that, the priority of B is restored. This is the method of priority inheritance.

The kernel has to do a lot of things in order to achieve priority inheritance. For more details, refer to the article on "priority inversion" or "priority inheritance".

4. Interrupt processing threading

Under linux, the interrupt handler runs in an unscheduled context. From the CPU response hardware interrupt automatic jump to the kernel set interrupt handler to execute, to the interrupt handler exit, the whole process can not be preempted.

If a process is preempted, it can resume running at a later time through the information stored in its process control block (task_struct). On the other hand, the interrupt context has no task_struct and cannot be recovered if it is preempted.

The interrupt handler cannot be preempted, which means that the interrupt handler has a higher "priority" than any process (the process cannot be executed until the interrupt handler is completed). But in real-world application scenarios, it is possible that some real-time processes should be given higher priority than interrupt handlers.

Therefore, some systems with higher real-time requirements give interrupt handlers task_struct and priority, so that they can be preempted by high-priority processes when necessary. But obviously, doing this work will cause some overhead to the system, which is also a concession to performance in order to achieve "real-time".

The above content is how to analyze Linux process scheduling. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.