Detailed explanation of Kernel preemption Mechanism in Linux system 07/12 Update SLTechnology News&Howtos

Detailed explanation of Kernel preemption Mechanism in Linux system

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "detailed explanation of kernel preemption mechanism in Linux system". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Overview of kernel preemption

2.6 the new preemptable kernel refers to kernel preemption, that is, when a higher priority task appears when the process is in kernel space, if the current kernel allows preemption, you can suspend the current task and execute a higher priority process.

Prior to version 2.5.4, the Linux kernel was not preemptive, and high-priority processes could not abort low-priority processes running in the kernel and preempt CPU runs. Once a process is in a nuclear mindset (such as a user process performing a system call), it will run until it completes or exits the kernel unless it voluntarily abandons the CPU. In contrast, a preemptable Linux kernel allows the Linux kernel to be preempted as well as user space. When a high-priority process arrives, the Linux that can preempt the kernel will schedule the high-priority process to run, regardless of whether the current process is in user mode or in a core state.

2. User preemption

When the kernel is about to return to user space, if the need resched flag is set, it will cause schedule () to be called, and user preemption occurs. When the kernel returns to user space, it knows it is safe. Therefore, the kernel checks the need resched flag whether it returns from an interrupt handler or after a system call. If it is set, the kernel will choose another (more appropriate) process to run.

In short, user preemption occurs when:

Return to user space from the system call.

Returns user space from the interrupt handler.

3. The characteristics of non-preemptive kernel

In a kernel that does not support kernel preemption, kernel code can be executed until it is complete. That is, the scheduler has no way to reschedule a kernel-level task while it is executing-the tasks in the kernel are scheduled in a collaborative manner and are not preemptive. Of course, processes running in kernel state can actively abandon CPU. For example, in system call service routines, kernel code abandons CPU because it is waiting for resources, which is called scheduled process switching (planned process switch). Kernel code is executed until it is completed (returning to user space) or until it is obviously blocked.

In the case of a single CPU, this setting greatly simplifies the synchronization and protection mechanism of the kernel. This can be analyzed in two steps:

First of all, do not consider the situation where the process voluntarily gives up CPU in the kernel (that is, no process switching occurs in the kernel). Once a process enters the kernel, it runs until it completes or exits the kernel. Before it completes or exits the kernel, another process will not enter the kernel, that is, the execution of the process in the kernel is serial, and it is impossible for multiple processes to run in the kernel at the same time. In this way, the kernel code design does not have to consider the concurrency problem caused by the simultaneous execution of multiple processes. Kernel developers of Linux do not have to consider the problem of concurrent execution of complex processes to mutually exclusive access to critical resources. When a process accesses and modifies the data structure of the kernel, it does not need to be locked to prevent multiple processes from entering the critical area at the same time. At this time, you only need to consider the situation of interrupts. If there is an interrupt handling routine, it is also possible to access the data structure that the process is accessing. Then the process only needs to advance the interrupt operation before entering the critical area and perform the open interrupt operation when exiting the critical area.

Consider again the situation in which the process voluntarily abandons CPU. Because the abandonment of CPU is voluntary and active, it means that the process switching in the kernel is known in advance, and the process switching will not occur without knowing it. In this way, it is only necessary to consider the concurrency problems caused by the simultaneous execution of multiple processes where the process switching occurs, rather than the concurrent execution of processes throughout the kernel.

4. Why do you need kernel preemption?

The realization of kernel preemption is of great significance to Linux. First of all, this is necessary to apply Linux to real-time systems. The real-time system has a strict limit on the response time, and when a real-time process is awakened by the hardware interrupt of the real-time device, it should be scheduled for execution within a limited time. Linux can not meet this requirement, because the kernel of Linux can not be preempted and can not determine how long the system stays in the kernel. In fact, when the kernel executes the system call, the real-time process can not be scheduled until the process running in the kernel exits the kernel, and the resulting response delay can reach the 100ms level under today's hardware conditions.

This is unacceptable for systems that require high real-time response. The preemptable kernel is not only important for the real-time application of Linux, but also can solve the defect that Linux is not good enough to support multimedia (video, audio) and other applications with low latency.

Because of the importance of preemptive kernel, preemption is incorporated into the kernel when the Linux2.5.4 version is released, as well as SMP as a standard optional configuration of the kernel.

5. Under what circumstances kernel preemption is not allowed

There are several situations where the Linux kernel should not be preempted, except that the Linux kernel can be preempted at any point. These situations are as follows:

The kernel is doing interrupt processing. In the Linux kernel, processes cannot preempt interrupts (interrupts can only be aborted and preempted by other interrupts, processes cannot be aborted, preemptive interrupts), and process scheduling is not allowed in interrupt routines. The process scheduling function schedule () determines this and prints an error message if it is called during an interrupt.

The kernel is doing Bottom Half (the bottom half of the interrupt) processing of the interrupt context. The soft interrupt is executed before the hardware interrupt returns, and it is still in the interrupt context.

The code snippet of the kernel is holding locks such as spinlock spin locks, writelock/readlock read-write locks, and so on, drying the protected state of these locks. The purpose of these locks in the kernel is to ensure the correctness of concurrent execution of processes running on different CPU in a short period of time in SMP systems. When holding these locks, the kernel should not be preempted, otherwise preemption will cause other CPU to be unable to acquire locks for a long time and die.

The kernel is executing the scheduler Scheduler. The reason for preemption is for new scheduling, there is no reason to preempt the scheduler and then run the scheduler.

The kernel is "proprietary" data structure manipulation (Per-CPU date structures) for each CPU. In SMP, per-CPU data structures are not protected by spinlocks because they are implicitly protected (different CPU have different per-CPU data, and processes running on other CPU do not use another CPU's per-CPU data). However, if preemption is allowed, but a process is preempted and rescheduled, it is possible to schedule to other CPU, then there will be a problem with the defined Per-CPU variable, and preemption should be prohibited.

To ensure that the Linux kernel is not preempted in the above cases, the preemptive kernel uses a variable preempt_ count, called kernel preemption lock. This variable is set in the process's PCB structure task_struct. Whenever the kernel is about to enter the above states, the variable preempt_ count is added by 1, indicating that the kernel does not allow preemption. Whenever the kernel exits from these states, the variable preempt_ count is subtracted by 1, and preemptive judgment and scheduling are made at the same time.

When returning to kernel space from an interrupt, the kernel checks the values of need_resched and preempt_count. If need_ resched is set and preempt count is 0, this means that there may be a more important task to perform and can be safely preempted, at which point the scheduler will be called. If the preempt-count is not 0, the kernel is now in a non-preemptive state and cannot be rescheduled. At this point, the current execution process is returned directly from the interrupt as usual. If all locks held by the current process are released, the preempt_ count will return to 0. At this point, the code that releases the lock checks to see if the need_ resched is set. If so, the scheduler is called.

6. the kernel preempts the opportunity

In version 2.6 of the kernel, the kernel introduced preemption; now, as long as rescheduling is secure, the kernel can preempt executing tasks at any time.

So, when is rescheduling safe? As long as premptcount is 0, the kernel can preempt. Usually locks and interrupts are signs of non-preemptive areas. Because the kernel supports SMP, if there is no holding lock, the code being executed is redirectable, that is, preemptive.

If a process in the kernel is blocked, or if it explicitly calls schedule (), kernel preemption occurs explicitly. This form of kernel preemption has always been supported (actually actively giving up CPU) because there is no additional logic at all to ensure that the kernel can be safely preempted. If the code explicitly calls schedule (), then it should know that it can safely be preempted.

Kernel preemption may occur when:

When the slave interrupt handler is executing and returns to kernel space.

When kernel code is preemptive again, such as unlocking and enabling soft interrupts.

If a task in the kernel explicitly calls schedule ()

If a task in the kernel is blocked (which also results in a call to schedule ())

7. How to support preemption kernel

There are two main modifications to the preemptive Linux kernel: one is to modify the entry code and return code of the interrupt. Add 1 to the kernel preemption lock preempt_count at the entrance of the interrupt to prevent kernel preemption; at the return of the interrupt, the kernel preemption lock preempt_count minus 1, making it possible for the kernel to be preempted.

We say that the preemptive Linux kernel can be preempted at any point in the kernel, mainly because interrupts can occur at any point. Whenever an interrupt occurs, the Linux preemptive kernel will judge the preemption of the kernel when it returns after processing the interrupt. If the current state of the kernel is allowed to be preempted, the kernel will reschedule to select high-priority processes to run. This is different from the non-preemptable kernel. In the non-preemptable Linux kernel, when returning from a hardware interrupt, it will be rescheduled only if the current interrupted process is a user-mode process. If the current interrupted process is a kernel state process, it is not scheduled, but resumes the interrupted process to continue to run.

Another basic modification is to redefine spin locks, read locks, and write locks, adding operations on preempt count variables during lock operations. When locking these locks, the preemptcount variable is added 1 to prevent kernel preemption; when the lock is released, the preemptcount variable is minus 1, and preemptive scheduling is carried out when the kernel preemption condition is met and rescheduling is needed.

Another preemptive kernel implementation is to insert a preemption point (preemption point) into the kernel code segment. In this scheme, we first find out the code segment in the kernel that has a long delay, and then insert the preemption point in the appropriate position of the kernel code segment, so that the system does not have to wait for this code to finish execution before rescheduling. In this way, for the events that need to be responded quickly, the system can schedule the service process to CPU to run as soon as possible. The preemption point is actually a call to the process scheduling function, with the following code:

The code is as follows:

If (current- > need_resched) schedule ()

Usually such a code segment is a loop body, and the solution of inserting a preemption point is to constantly detect the value of need_ resched in this loop body, and when necessary, call schedule () to make the current process forcibly abandon CPU.

8.When rescheduling is needed

The kernel must know when to call schedule (). If only the user program code explicitly calls schedule (), they may be executed forever. Instead, the kernel provides a need_resched flag to indicate whether a rescheduling is required. Scheduler tick () sets this flag when a process runs out of its time slices, and try_to_wake_up sets this flag when a high-priority process enters an executable state.

Set_ tsk_need_resched: sets the need_resched flag in the specified process

Clear tsk need_resched: clears the need_resched flag in the specified process

Need_resched (): check the value of the need_resched flag; return true if it is set, false otherwise

Semaphores, waiting queues, completion and other mechanisms are all based on waitqueue when waking up, while the wake-up function of waitqueue is default_wake_function, which calls try_to_wake_up to change the process to runnable state and sets the scheduling flag.

The kernel also checks the need_resched flag when returning user space and when returning from an interrupt. If set, the kernel invokes the scheduler before continuing execution.

Each process contains a need_resched flag because it is faster to access values in the process descriptor than to access a global variable (because current macros are fast and descriptors are usually in the cache). In kernel versions prior to 2.2, this flag used to be a global variable. It is in task_struct in the kernel version 2. 2 through 2. 4. In version 2.6, it is moved into the thread_info structure, represented by one of a particular flag variable. It can be seen that kernel developers are always improving.

9. Avoid kernel preemption

Once the process has called schedule, if it is scheduled to run again, then there are the following possibilities: 1. Status is TASK_RUNNING, in the running queue, then it must have a chance to run again; 2. If you are in a sleep queue, once the condition is met and awakened, it will run. So if a process is preempted and it is not in the run queue, how do you get it to run again? The answer is that it doesn't work. To avoid this, you must avoid that processes that are preempted by non-TASK_RUNNING processes are not driven out of the run queue, that is, the following code, the code of schedule:

The code is as follows:

If (prev- > state & &! (preempt_count () & PREEMPT_ACTIVE)) {

Switch_count = & prev- > nvcsw

If (unlikely ((prev- > state & TASK_INTERRUPTIBLE) & & unlikely (signal_pending (prev)

Prev- > state = TASK_RUNNING

Else {

If (prev- > state = = TASK_UNINTERRUPTIBLE)

Rq- > nr_uninterruptible++

Deactivate_task (prev, rq)

}

Some people may ask, how can there be a process that is not TASK_RUNNING and is preempted? this question is really difficult to answer, but remember that the process state has nothing to do with its queue, and there is always a gap between setting the process state and preemption. Let's look at the following code:

The code is as follows:

For (;) {\

1: prepare_to_wait (& wq, & _ wait, TASK_UNINTERRUPTIBLE);\

2: if (condition)\

3: break;\

4: schedule ();\

}

If it is preempted in 1 and is preempted when the process is set to TASK_UNINTERRUPTIBLE, it is about to test whether the conditions are satisfied, but it is added to the sleep queue to sleep again. If there is no PREEMPT_ACTIVE, then it will be removed from the running queue in schedule. If there is only one chance to wake up, the process will never be awakened. If the conditions for returning from schedule are not met this time. Then it will be removed from the run queue in the following schedue, which is not the responsibility of preemption. If you have to do something, you will make an error. Array- > queue is empty in dequeue_task, and it will make an error due to a null pointer reference the second time you actually leave the queue. (this will not happen, because as long as you come back from schedue, the state of the process must be TASK_RUNNING, just an example). Therefore, you must make sure that when the process is removed from the run queue, it must be in the run queue, otherwise it will be removed. In fact, the role of PREEMPT_ACTIVE is to prevent processes that are in a non-TASK_RUNNING state and are not in any sleep queue from being removed from the running queue. In short, you must ensure that the process is in a queue or can be awakened, the preempted process cannot be awakened, and if it is not in the run queue, it will never run again. So how does PREEMPT_ACTIVE ensure that preempted processes are not removed from the run queue? It is implemented in preempt_schedule:

The code is as follows:

Asmlinkage void _ sched preempt_schedule (void)

{

Struct thread_info * ti = current_thread_info ()

If (likely (ti- > preempt_count | | irqs_disabled ()

Return

Do {

Add_preempt_count (PREEMPT_ACTIVE); / / set the PREEMPT_ACTIVE bit until the following sub_preempt_count (PREEMPT_ACTIVE). During this period, you can no longer preempt the process, but there is no point in preemption. If you have to preempt, it is not too late to get the following sub_preempt_count (PREEMPT_ACTIVE).

Schedule ()

Sub_preempt_count (PREEMPT_ACTIVE); / / clear it after preemption

Barrier ()

} while (unlikely (test_thread_flag (TIF_NEED_RESCHED)

}

In addition to this, if you want to preempt when returning to kernel space from an interrupt in an earlier kernel, this PREEMPT_ACTIVE will also be added to entery.S. Now there is another question, that is, why does wait_event use that way of implementation? Why do you need a cycle? My answer is: in this case, the process can be awakened because it joined a sleep queue, and it is not safe to judge condition directly after schedule, as you said, because waking up is not necessarily because the conditions are satisfied. If two processes are awakened at the same time, there is likely to be a process condition that cannot be met, if the process is preempted at this time. Then the process has no chance to join the sleep queue, and there is no chance to be awakened. Although PREEMPT_ACTIVE ensures that the process does not leave the running queue, it loses the original intention of the program. The original intention of the program is to make the process run by waking up the running queue, but at this time it has become entirely based on priority. Even if the conditions are met because the process is not in the sleep queue, the system will not be woken up.

In fact, it is very simple, the condition must be judged after the process is added to the sleep queue, because in this way, the wake-up notification can not be missed, and if the reverse is done, it is judged first and then joined the sleep queue. If other processes wake up the sleep queue before joining, then the process will miss the wake-up. The reason why there will be a cycle is that more than one process may be awakened, then there will be competition. This loop is set up for competition, and this loop ensures that every process that comes out of the loop is safe with the condition that the result is true.

In addition, when it comes to the task _ RUNNING state, someone asked why the process state is set to TASK_RUNNING in the missing page, isn't it TASK_RUNNING before the missing page? In most cases, it should be, but the linux kernel is not sure that the reason why the process state is set to TASK_RUNNING in handle_mm_fault is to ensure that if the process sleeps during page fault processing, then the process can be awakened, for example, in select, it will copy_from_user when the process is set to non-TASK_RUNNING, which may cause page faults. If the process state is not set to TASK_RUNNING, then if the process is schedule in page fault, the process will be thrown out of the runtime queue and will never come back. To prevent this, the measure is to distinguish the state in any place where schedule is called, and then set the process state, such as using PREEMPT_ACTIVE to prevent it. In addition, as done in handle_mm_fault, try to make the process enter schedule in the TASK_RUNNABE state. However, I wonder if this should be removed now, even if the process is not set to run state in the missing page, if it has to be scheduled, it will be set as the runtime before.

The role of ACTIVE_PREEMPT: to prevent processes that are already in a non-running state from being preempted and removed from the run queue before they join the sleep queue. This will never come back, although this is rare, it is common to put the process on the sleep queue and then set the state.

This is the end of the content of "detailed explanation of kernel preemption mechanism in Linux system". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.