How to understand process scheduling in Linux system 07/16 Update SLTechnology News&Howtos

How to understand process scheduling in Linux system

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to understand the process scheduling in the Linux system". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to understand the process scheduling in the Linux system.

In order to realize multi-process in the operating system, process scheduling is essential.

Some people say that process scheduling is the most important part of the operating system. I think this statement is a bit too absolute, just like many people often say that "the so-and-so function is XX times more efficient than the so-and-so function", divorced from the actual environment, these conclusions are relatively one-sided.

How important is process scheduling? First of all, we need to be clear: process scheduling is the scheduling of processes in TASK_RUNNING state (see "linux process State Analysis"). If a process is not executable (sleeping or otherwise), it has little to do with process scheduling.

So, if your system load is very low, expect the stars to look forward to the moon before there is an executable process. Then process scheduling will not be too important. If any process is executable, let it be executed. There is nothing to think about.

On the other hand, if the system load is very high, there are more than N processes in the executable state all the time, waiting to be scheduled to run. Then the process scheduler must do a lot of work in order to coordinate the execution of these N processes. If the system is not well coordinated, the performance of the system will be greatly reduced. At this time, process scheduling is very important.

Although many computers we usually come into contact with (such as desktop systems, network servers, etc.) have low load, linux, as a general operating system, cannot assume that the system load is low, and must be carefully designed to cope with process scheduling under high load.

Of course, these designs are of little use in environments with low load (and no real-time requirements). In extreme cases, if the load of CPU is always 0 or 1 (there is always only one process or no process to run on CPU), then these designs are basically futile.

Priority

In order to coordinate the "simultaneous" operation of multiple processes, the most basic means of today's operating system is to define the priority of the process. The priority of the process is defined, and if there are multiple processes in the executable state at the same time, then the one with the highest priority will execute, and there is nothing to struggle with.

So, how to determine the priority of the process? There are two ways: specified by the user program and dynamically adjusted by the kernel scheduler. (I'll talk about it below)

The linux kernel divides processes into two levels: normal processes and real-time processes. Real-time processes have higher priority than ordinary processes, in addition, their scheduling strategies are also different.

Scheduling of real-time processes

Real-time, the original meaning is "a given operation must be completed within a certain time". The point is not how fast the operation must be handled, but that the time should be controllable (in the worst case, you can't break through the given time).

Such "real-time" is called "hard real-time" and is often used in very sophisticated systems (such as rockets, missiles, etc.). Generally speaking, hard real-time systems are relatively dedicated.

General-purpose operating systems such as linux are obviously unable to meet such requirements, and the existence of interrupt handling, virtual memory, and other mechanisms brings great uncertainty to the processing time. Hardware cache, disk seek, bus contention, will also bring uncertainty.

Consider, for example, the C code "iTunes;". In most cases, it executes very quickly. But in extreme cases, it's still possible:

1. The memory space of I is not allocated, and CPU triggers a page fault exception. On the other hand, when linux tries to allocate memory in the handling code of page fault exception, the allocation may fail due to the shortage of system memory, causing the process to go to sleep.

2. The hardware interrupts during the code execution, and linux enters the interrupt handler and shelves the current process. On the other hand, new hardware interrupts may occur during the processing of the interrupt handler, and the interrupts are always nested.

Wait a minute...

The general operating system such as linux, which claims to achieve "real-time", actually only implements "soft real-time", that is, to meet the real-time needs of the process as much as possible.

If a process has real-time needs (it is a real-time process), the kernel keeps it executing as long as it is executable to meet its CPU needs as much as possible until it finishes what it needs to do, and then sleeps or exits (becomes non-executable).

If there are multiple real-time processes in the executable state, the kernel will first meet the CPU needs of the highest priority real-time process until it becomes non-executable.

Therefore, as long as the high-priority real-time process is always in the executable state, the low-priority real-time process can not get the CPU; all the time, as long as the real-time process is always in the executable state, the ordinary process can not get the CPU.

What if multiple real-time processes with the same priority are executable? There are two scheduling strategies to choose from:

1. SCHED_FIFO: first in, first out. The subsequent process is not scheduled for execution until the first executed process becomes unexecutable. Under this strategy, the first-come process can execute the sched_yield system call and voluntarily relinquish the CPU to give rights to later processes.

2. SCHED_RR: round robin scheduling. The kernel allocates time slices to real-time processes, and when the time slices are used up, let the next process use CPU

To emphasize that both scheduling strategies and sched_yield system calls are only for situations where multiple real-time processes with the same priority are executable at the same time.

Under linux, user programs can set the scheduling policy and related scheduling parameters of the process through sched_setscheduler system calls, while sched_setparam system calls are only used to set scheduling parameters. These two system calls require the ability of the user process to set the priority of the process (CAP_SYS_NICE, which generally requires root permission) (see the article related to capability).

The process becomes a real-time process by setting the process's policy to SCHED_FIFO or SCHED_RR. The priority of the process is specified when setting the scheduling parameters through the above two system calls.

For real-time processes, the kernel does not attempt to adjust its priority. Because the process is real-time or not? How real-time? These questions are related to the application scenario of the user program, and only the user can answer them, but the kernel cannot assume.

To sum up, the scheduling of real-time processes is very simple. The priority and scheduling strategy of the process are determined by the user, and the kernel only needs to choose the real-time process with the highest priority to schedule execution. The only slightly more troublesome thing is to consider two scheduling strategies when selecting real-time processes with the same priority.

Scheduling of ordinary processes

The central idea of real-time process scheduling is to make the real-time process with the highest priority in the executable state occupy CPU as much as possible because it has real-time requirements, while ordinary processes are considered to be processes without real-time requirements, so the scheduler tries to make all ordinary processes in executable state coexist peacefully to share CPU, so as to make users feel that these processes are running at the same time.

Compared with real-time processes, the scheduling of ordinary processes is much more complex. The kernel needs to consider two hassles:

I. dynamically adjust the priority of the process

According to the behavior characteristics of processes, processes can be divided into "interactive processes" and "batch processes":

The main task of interactive processes (such as desktop programs, servers, etc.) is to interact with the outside world. Such processes should have a higher priority, and they always sleep waiting for input from the outside world. When the input arrives and the kernel wakes it up, they should be scheduled and executed quickly to respond. For example, if a desktop program does not respond half a second after the mouse click, the user will feel that the system is "stuck".

The main task of batch processes, such as compilers, is to do continuous operations, so they remain executable. Such processes generally do not need high priority. For example, if the compiler runs for a few more seconds, most users will not care too much about it.

If the user can clearly know what the priority of the process should be, the priority can be set through nice and setpriority system calls. (if you want to increase the priority of the process, the user process is required to have CAP_SYS_NICE capabilities. )

However, applications are not necessarily as typical as desktop programs and compilers. Programs can behave in a variety of ways, one moment like an interactive process and the other like a batch process. So that it is difficult for users to set an appropriate priority for it.

Furthermore, even if the user clearly knows whether a process is interactive or batch, it is probably due to permissions or lazy to set the priority of the process. Have you ever set a priority for a program? )

So, in the end, the task of distinguishing between interactive processes and batch processes falls to the kernel scheduler.

The scheduler pays attention to the performance of the process in the recent period of time (mainly checking its sleep time and running time) and determines whether it is now interactive or batch based on some empirical formulas. To what extent? It was finally decided to make some adjustments to its priorities.

After the priority of the process is dynamically adjusted, two priorities appear:

1. The priority set by the user program (if it is not set, the default value is used), which is called static priority. This is the benchmark of the priority of the process, which is often unchanged during the execution of the process.

2. The actual effective priority after the priority is dynamically adjusted. This value may be changing all the time.

Second, the fairness of scheduling

In a system that supports multiple processes, ideally, each process should have a fair share of CPU according to its priority. There will be no uncontrollable situation such as "who is lucky and who has more".

There are basically two ways to realize fair scheduling in linux:

1. Assign time slices to processes in the executable state (according to priority), and processes that have used up the time slices are put into the "expiration queue". Wait for the executable process to expire, and then reassign the time slice.

2. Dynamically adjust the priority of the process. As the process runs on CPU, its priority is continuously lowered so that other processes with lower priorities can be run.

The latter approach has a smaller scheduling granularity and combines "fairness" and "dynamic priority adjustment" into one, which greatly simplifies the code of the kernel scheduler. Therefore, this approach has also become the new favorite of kernel schedulers.

To emphasize, the above two points are only for ordinary processes. For real-time processes, the kernel can neither adjust the priority dynamically nor be fair.

The specific scheduling algorithms for ordinary processes are very complex and are constantly changing with the evolution of the linux kernel version (not just simple adjustments), so this article will not go any further.

Efficiency of scheduler

Priority specifies which process should be scheduled for execution, and the scheduler must also be concerned about efficiency. The scheduler, like many processes in the kernel, is executed frequently, and if inefficient, a lot of CPU time will be wasted, resulting in system performance degradation.

At linux 2.4, executable processes are hung in a linked list. For each schedule, the scheduler needs to scan the entire linked list to find the optimal process to run. The complexity is O (n)

In the early days of linux 2.6, executable processes were hung in N (n = 140) linked lists, each representing a priority, and there were as many linked lists as many priorities were supported in the system. For each schedule, the scheduler only needs to pull the process at the head of the chain from the first non-empty linked list. In this way, the efficiency of the scheduler is greatly improved and the complexity is O (1).

In recent versions of linux 2.6, executable processes are hung in a red-black tree (can be thought of as a balanced binary tree) in order of priority. Each time the scheduler needs to find the process with the highest priority from the tree. The complexity is O (logN).

So why does the scheduler have more complexity in selecting processes from the early days of linux 2.6 to the recent linux 2.6 version?

This is because, at the same time, the scheduler's realization of fairness has changed from the first train of thought mentioned above to the second way of thinking (through dynamic priority adjustment). The algorithm of O (1) is based on a small number of linked lists, which makes the value range of priority very small (low degree of discrimination) and can not meet the needs of fairness. Using a red-black tree has no restrictions on the value of priority (priority values can be represented by 32-bit, 64-bit, or more bits), and the complexity of O (logN) is also very efficient.

Timing of scheduling trigger

Scheduling can be triggered in the following ways:

1. The current process (the process running on CPU) becomes non-executable.

The process executes the system call to actively become non-executable. Such as executing nanosleep to go to sleep, performing exit exit, and so on

The requested resources of the process are not satisfied and are forced to go to sleep. For example, when performing a read system call, there is no required data in the disk cache, so sleep waits for disk IO.

The process responds to the signal and becomes non-executable. For example, response to SIGSTOP entering pause state, response to SIGKILL exit, and so on

2. Preemption. When the process is running, it is unexpectedly deprived of the right to use CPU. This is divided into two situations: the process runs out of time slices, or there is a higher priority process.

Higher priority processes are awakened by processes running on the CPU. Such as sending a signal to wake up actively, or being woken up because a mutex is released (such as releasing a lock)

In response to the clock interrupt, the kernel found that the time slices of the current process had been used up.

In the process of responding to the interrupt, the kernel wakes it up by finding that the external resources waiting for the higher priority process become available. For example, CPU receives an interrupt from the network card, and the kernel handles the interrupt and finds that a certain socket is readable, so it wakes up the process waiting to read the socket; or the kernel triggers a timer in the process of handling the clock interrupt, thus awakening the corresponding process sleeping in the nanosleep system call.

When all tasks adopt linux time-sharing scheduling strategy:

1. The creation task specifies a time-sharing scheduling policy and a priority nice value (- 20,19).

2, the execution time (counter) on the cpu will be determined based on the nice value of each task.

3, if there are no waiting resources, the task is added to the ready queue.

4. The scheduler traverses the tasks in the ready queue and selects the one with the largest result by calculating the weight (counter+20-nice) of the dynamic priority of each task. When the time slice is used up (counter is reduced to 0) or actively abandons cpu, the task will be placed at the end of the ready queue (time slice used up) or waiting queue (cpu is abandoned due to waiting for resources).

5, at this time, the scheduler repeats the above calculation process and goes to step 4.

6, repeat step 2 when the scheduler finds that the weights calculated by all ready tasks are less than 0.

When all tasks use FIFO:

1. Specify FIFO when creating the process, and set the real-time priority rt_priority (1-99).

2, if there are no waiting resources, the task is added to the ready queue.

3. The scheduler traverses the ready queue, calculates the scheduling weight (1000+rt_priority) according to the real-time priority, and selects the task with the highest weight to use cpu. The FIFO task will occupy the cpu until the higher priority task is ready (even if the priority is the same) or give up actively (waiting for resources).

4, the scheduler finds that a higher priority task arrives (the high priority task may be awakened by an interrupt or a timer task, or by a currently running task, etc.), then the scheduler immediately saves all the data of the current cpu register in the current task stack and reloads the register data from the stack of the high priority task to the cpu, when the high priority task starts to run. Repeat step 3.

5. If the current task actively relinquishes the right to use cpu because it is waiting for resources, the task will be removed from the ready queue and join the waiting queue. Repeat step 3 at this time.

When all tasks adopt the RR scheduling policy:

1. Specify the scheduling parameter RR when creating the task, and set the real-time priority and nice value of the task (the nice value will be converted to the length of the task's time slice).

2, if there are no waiting resources, the task is added to the ready queue.

3. The scheduler traverses the ready queue, calculates the scheduling weight (1000+rt_priority) according to the real-time priority, and selects the task with the highest weight to use cpu.

4, if the RR task time slice in the ready queue is 0, the time slice of the task is set according to the nice value, and the task is placed at the end of the ready queue. Repeat step 3.

5. If the current task actively exits the cpu due to waiting for resources, it will join the waiting queue. Repeat step 3.

There are time-sharing scheduling, time slice rotation scheduling and first-in-first-out scheduling in the system:

1 the processes scheduled by FIFO and RR are real-time processes, while the processes scheduled by time-sharing are non-real-time processes.

2. When the real-time process is ready, if the current cpu is running a non-real-time process, the real-time process immediately preempts the non-real-time process.

3The real-time priority is used as the scheduling weight standard for both the FIFO process and the FIFO process, and RR is an extension of FIFO. In FIFO, if the priority of two processes is the same, which of the two processes with the same priority is determined by the unknown in the queue, which leads to some unfairness (the priority is the same, why should you keep running?). If the scheduling policy of two tasks with the same priority is set to RR, it ensures that the two tasks can be executed in a loop and fair.

Ingo Molnar- Real-time Patch

In order to integrate into the mainstream kernel, Ingo Molnar's real-time patch also adopts a very flexible strategy, which supports four preemption modes:

1.No Forced Preemption (Server), which is equivalent to a standard kernel without enabling preemption options, is mainly suitable for server environments such as scientific computing.

2.Voluntary Kernel Preemption (Desktop), which enables voluntary preemption but still fails the preemption kernel option, reduces preemption latency by increasing preemption points, so it is suitable for environments that require better responsiveness, such as desktop environments, at the expense of some throughput.

3.Preemptible Kernel (Low-Latency Desktop), this mode not only includes voluntary preemption, but also enables preemption kernel options, so it has a good response delay, in fact, it has achieved soft real-time to some extent. It is mainly suitable for desktops and some embedded systems, but the throughput is lower than mode 2.

4.Complete Preemption (Real-Time), this mode enables all real-time functions, so it can fully meet the needs of soft real-time. It is suitable for real-time systems with latency requirements of 100 microseconds or less.

Thank you for reading, the above is the content of "how to understand process scheduling in Linux system". After the study of this article, I believe you have a deeper understanding of how to understand process scheduling in Linux system, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.