How to understand Linux kernel process context switching 07/02 Update SLTechnology News&Howtos

How to understand Linux kernel process context switching

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to understand Linux kernel process context switching". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to understand Linux kernel process context switching".

1. The concept of process context

A process context is a static description of the whole process of a process's execution activity. We call the contents of the executed process instructions and data in the relevant registers and stacks as the process above, and the contents of the executed instructions and data in the registers and stacks as the process body. The contents of the instructions and data to be executed in the registers and stacks are called the process below.

In fact, in the linux kernel, the process context includes the virtual address space and hardware context of the process.

The process hardware context contains a set of registers of the current cpu, which is described by the cpu_context members of the thread members of the task_struct structure in arm64, including x19murx28j sp, pc, etc.

The following is a sample diagram for storing the hardware context:

two。 Detailed process of context switching

Process context switching mainly involves two main processes: process address space switching and processor state switching. Address space switching is mainly for user processes, while processor state switching corresponds to all scheduling units. Let's take a look at these two processes:

_ _ schedule / / kernel/sched/core.c-> context_switch-> switch_mm_irqs_off / / process address space switch-> switch_to / / processor state switch

2.1 process address space switching

The process address space refers to the virtual address space owned by the process, and this address space is fake, which is described by the linux kernel through the data structure, so that each process feels that it has the whole memory. The instructions and data accessed by cpu will eventually be implemented to the actual physical address. For the process, page fault exception is used to allocate and establish page table mapping. There are instructions and data that the process is running in the process address space, so when the scheduler switches back to me from other processes, the address space must be switched in order to ensure that the virtual address accessed by the current process is its own.

In fact, the process address space is described by the mm_struct structure, which is embedded in the process descriptor (what we usually call the process control block PCB) task_struct. The mm_struct structure organizes each vma to manage, among which one member, pgd, is very important, and the most important thing in address space switching is the setting of pgd.

What is saved in pgd is the virtual address of the page global catalog of the process (this article will involve some concepts related to the page table, which is not the focus here, you can consult the relevant information clearly, and you will explain the process page table later), remember that the virtual address is saved, so when is the value of pgd set? The answer is fork, if you are creating a process, you need to assign the setting mm_struct, which assigns the page where the global catalog of the process page is located, and then assigns the first address to pgd.

Let's take a look at how the process address space is switched, and the result will surprise you (forget about the asid mechanism here, which will be explained later in other articles):

The code path is as follows:

Context_switch / / kernel/sched/core.c-> switch_mm_irqs_off-> switch_mm-> _ _ switch_mm-> check_and_switch_context-> cpu_switch_mm-> cpu_do_switch_mm (virt_to_phys (pgd), mm) / / arch/arm64/include/asm/mmu_context.h arch/arm64/mm/proc.S 158 / * 159* cpu_do_switch_mm (pgd_phys Tsk) 160x161 * Set the translation table base pointer to be pgd_phys. 162163 *-pgd_phys-physical address of new TTB 166* / 165ENTRY (cpu_do_switch_mm) 166mrs x2, ttbr1_el1 167mmid x1, x1 / / get mm- > context.id 168phys_to_ttbr x3, x0169170 alternative_if ARM64_HAS_CNP 171cbz x1 1f / / skip CNP for reserved ASID 172 orr x3, x3, # TTBR_CNP_BIT 173 1: 174 alternative_else_nop_endif 175 # ifdef CONFIG_ARM64_SW_TTBR0_PAN 176 bfi x3, x1, # 48, # 16 / / set the ASID field in TTBR0 177 # endif 178 bfi x2, x1, # 48 # 16 / / set the ASID 179msr ttbr1_el1, x2 / / in TTBR1 (since TCR.A1 is set) 180 isb 181msr ttbr0_el1, x3 / / now update TTBR0 182 isb 183b post_ttbr_update_workaround / / Back to C code... 184 ENDPROC (cpu_do_switch_mm)

The core of the code is 181lines, and finally the pgd virtual address of the process is converted into a physical address and stored in ttbr0_el1, which is the page table base address register in user space. When accessing the user space address, mmu will use this register to traverse the page table to get the physical address (ttbr1_el1 is the page table base address register in kernel space, used when accessing kernel space address, all processes share, no need to switch). When this step is completed, the address space switching of the process is completed, that is, the virtual address space switching of the process.

Is it very simple and elegant for the kernel to deal with? it just sets the page table base address register, that is, the physical address of the page global catalog of the process to be executed is set to the page table base address register, but he has completed the feat of address space switch. some partners may not understand why the address space switch is completed. Imagine that if a process wants to access a user space virtual address, what cpu's mmu does is get the physical base address of the page global catalog from the page table base address register, then work with the virtual address to look up the page table, and finally find the physical address for access (of course, if tlb hits, there is no need to traverse the page table), every time the user visits the virtual address (kernel space sharing is not considered) Since the page table base address register stores the physical address of the page global catalog of the currently executing process, you can access your own set of page tables and get your own physical address. (in fact, page fault exceptions continue to occur when the process accesses the instruction data in the virtual address space, and then the page fault exception handler assigns the actual physical page to the process. Then fill the page frame number and page table properties into your own page table entry), the instructions and data of other processes will not be accessed, which is why multiple processes can access the same virtual address without error. And the isolation of each address space does not affect each other (except shared memory).

In fact, in the process of address space switching, the tlb will also be emptied to prevent the current process from hitting the tlb entry of the previous process in the process of virtual address translation, which will generally invalidate all tlb, but this will lead to great performance loss, because the new process is faced with a brand new empty tlb when it is switched in, resulting in a high probability of tlb miss, which requires re-traversing the multi-level page table. Therefore, arm64 adds non-global (nG) bits to distinguish between kernel and process page table items in the tlb table entry, and uses ASID to distinguish different process page table entries to ensure that tlb can not be brushed when switching address space. Later, we will mainly talk about ASID technology.

It is also important to note that only the user address space is switched, and the kernel address space does not need to be switched because it is shared, which is why switching to the kernel thread does not need and does not have address space.

The following is an example diagram of process address space switching:

2.2 processor status (hardware context) switching

The previous address space switching only ensures that the process accesses its own address space when accessing instruction data (of course, context switching is in kernel space and executes kernel address data. Only when the user space is returned will you have the opportunity to execute the user space instruction data * *, and the address space will be ready for the process to access its own user space * *) However, the kernel stack executed by the process is still the previous process, and the current execution flow is still the previous process, so it needs to be switched.

The switching code in arm64 is as follows:

Switch_to-> _ switch_to. / / switch of floating point registers, etc.-> cpu_switch_to (prev, next) arch/arm64/kernel/entry.S: 1032 / * 1033 * Register switch for AArch74. The callee-saved registers need to be saved 1034 * and restored. On entry: 1035 * x0 = previous task_struct (must be preserved across the switch) 1036 * x1 = next task_struct 1037 * Previous and next are guaranteed not to be the same. 1038 * 1039 * / 1040 ENTRY (cpu_switch_to) 1041 mov x10, # THREAD_CPU_CONTEXT 1042 add x8, x0, x10 1043 mov x9, sp 1044 stp x19, x20, [x8], # 16 / / store callee-saved registers 1045 stp x21, x22, [x8], # 16 1046 stp x23, x24, [x8] # 16 1047 stp x25, x26, [x8], # 16 1048 stp x27, x28, [x8], # 16 1049 stp x29, x9, [x8], # 16 1050 str lr, [x8] 1051 add x8, x1, x10 1052 ldp x19, x20, [x8] # 16 / / restore callee-saved registers 1053 ldp x21, x22, [x8], # 16 1054 ldp x23, x24, [x8], # 16 1055 ldp x25, x26, [x8], # 16 1056 ldp x27, x28, [x8], # 16 1057 ldp x29, x9, [x8], # 16 1058 ldr lr [x8] 1059 mov sp, x9 1060 msr sp_el0, x1 1061 ret 1062 ENDPROC (cpu_switch_to)

X19-x28 is the register that needs to be called in the arm64 architecture. You can see that during the processor state switch, the x19murx28th fpjiggery SPREPC of the previous process (prev) is saved to the cpu_contex of the process descriptor, and then the cpu_contex x19murx28mfpcent SPG of the process (next) descriptor to be executed is restored to the corresponding register. And the process descriptor task _ struct address of the next process is stored in sp_el0, which is used to find the current process through current, so that the state switch of the processor is completed.

In fact, processor state switching is to save the value of registers such as sp,pc of the previous process to a piece of memory, and then restore the value of registers such as sp,pc of the process to be executed from another block of memory to the corresponding registers, restore sp to complete the switching of the kernel stack in the process, and restore pc to complete the switching of instruction execution flow. The memory used to save / restore needs to be identified by the process, which is the location of the cpu_contex structure (process switching is done in kernel space).

Since user space needs to save the site when it enters kernel space through exception / interrupt, that is, to save the values of all general registers in the event of exception / interrupt, the kernel will save the "site" to each process's unique kernel stack and describe it with pt_regs structure. When the exception / interrupt handling is completed, it will return to user space, and the previously saved "site" will be restored before return. The user program continues to execute.

So when a process is switched, the current process is interrupted by a clock interrupt, and the scene of the interruption is saved to the kernel stack of the process (such as sp, lr, etc.), and then it will switch to the next process. When it is switched back again, the previous site will be restored when it returns to user space, and the process can continue to execute (execute the next instruction interrupted before, and continue to use its own user mode sp). This is transparent to the user process.

The following is an example of hardware context switching:

3.ASID mechanism

As mentioned earlier, when a process is switched, because the tlb entries of other processes may be stored in tlb, it is necessary to empty the tlb during the process switch (emptying makes all tlb entries invalid, address translation needs to traverse the multi-level page table, find the page entry, and then reload the page entry to tlb). With the ASID mechanism, the tlb entry is hit. It is determined by the virtual address and ASID (and, of course, the nG bit), which reduces the chance that the tlb will be emptied during process switching.

Next we will explain the ASID mechanism, ASID (Address Space Identifer address Space Identifier), which is used to distinguish the page table items of different processes. In arm64, you can choose two ASID lengths of 8 or 16 bits, which are explained here with 8 bits.

If the ASID length is 8 bits, then ASID has 256values, but because 0 is reserved, all the ASID ranges that can be allocated are 1-255. then 255processes can be identified. When there are more than 255processes, the ASID of the two processes will be the same, so the kernel uses the ASID version number.

The kernel processes are as follows (see arch/arm64/mm/context.c):

1) the kernel assigns a 64-bit software ASID to each process, in which the low 8 bits are hardware ASID and the high 56 bits are ASID version numbers. The software ASID is stored in the id of the context structure of the process's mm_struct structure, and the process is initialized to 0 when it is created.

2) there is a 64-bit global variable asid_generation in the kernel, and its high 56 bits is the ASID version number, which is used to identify the batch currently allocated by ASID.

3) when a process is scheduled to switch from a prev process to a next process, if it is not a kernel thread, the address space switch is performed to call check_and_switch_context. This function determines whether the ASID version number of the next process is the same as the global ASID version number (whether it is in the same batch). If the same is the same, there is no need to assign ASID to the next process. If the difference is not the same, you need to assign ASID.

4) the kernel uses asid_map bitmaps to manage the allocation of hardware ASID, asid_bits records the length of ASID used, each processor variable active_asids stores the currently allocated hardware ASID, and each processor variable reserved_asids stores reserved ASID,tlb_flush_pending bitmap records need to empty the cpu collection of tlb.

The hardware ASID allocation strategy is as follows:

(1) if the ASID version number of the process is the same as the current global ASID version number (in the case of the same batch), there is no need to reassign ASID.

(2) if the ASID version number of the process is different from the current global ASID version number (in different batches), and the original hardware ASID of the process has been assigned, the new hardware ASID is reassigned, and the current global ASID version number is combined with the newly assigned hardware ASID into the software ASID of the process.

(3) if the ASID version number of the process is different from the current global ASID version number (in different batches), and the original hardware ASID of the process has not been assigned, then there is no need to reassign the new hardware ASID, only need to update the process software ASID version number, and the current global ASID version number combines the original hardware ASID of the process into the software ASID of the process.

(4) if the ASID version number of the process is different from the current global ASID version number (in different batches), when you need to allocate the hardware ASID, you find that the hardware ASID has been assigned by other processes (look in the asid_map bitmap and find the bitmap all 1), then you need to increment the global ASID version number, empty all cpu tlb, empty the asid_map bitmap, and then assign the hardware ASID. And the current global ASID version number is combined with the newly assigned hardware ASID to write to the software ASID of the process.

Let's look at the allocation process of ASID with an example:

As shown below:

Let's assume that there are 255processes in the figure from process A to process D, just after the asid has been allocated, that the same batch of asid version numbers are used in the switching process from A to D.

In this process, when a process is created, it will be switched to. Assuming that there are no more than 255processes, the hardware ASID of the new process will be assigned to the new process. The next time you switch to him after allocation, because his ASID version number is the same as the current global ASID version number, there is no need to re-assign ASID, and of course, there is no need to empty the tlb.

Note: ASID here means that the hardware ASID is different from the ASID version number.

The situation 1-ASID version number remains the same as policy (1): when switching from C process to D process, the kernel determines that the ASID version number of D process is the same as the current global ASID version number, so there is no need to assign ASID to him (execute the fast path switch_mm_fastpath to set ttbrx_el1).

Case 2-all hardware ASID allocation belongs to policy (4): suppose that by the time the D process arrives, the asid has been fully allocated (all 255processes in the system are assigned hardware asid numbers). At this time, the newly created process E is selected by the scheduler and switched to E. because the software ASID of the newly created process is initialized to 0, it is different from the current global ASID version number (not in the same batch). At this time, new_context will be executed to assign ASID to the process, but since there is no ASID that can be assigned, the global ASID version number will be increased by 1 (ASID winding occurs). At this time, the global ASID will be 801, then empty asid_map, set all bit of tlb_flush_pending to clear all cpu's tlb, and then assign hardware ASID to E process again. At this time, 1 will be assigned to him (ASID version number).

When the 3-ASID version number changes, the hardware ASID of the process can be used again as part of strategy (3): suppose you switch from E to process B, and process B has previously allocated hardware ASID number 5 on the batch with global ASID version number 800. but the ASID version number of process B is different from the current global ASID version number 801.All require new_context to assign ASID to the process. During the allocation, it is found that the number 5 in the asid_map is not set, that is, no other process has assigned the ASID of 5, so all can continue to use the original assigned hardware ASID 5.

Case 4-ASID version number has changed, and other processes have assigned the same hardware ASID belongs to policy (2): suppose you switch from process B to process A, and process B has previously assigned hardware ASID number 1 on the batch with global ASID version number 800. but the ASID version number of process B is different from the current global ASID version number 801.All require new_context to assign ASID to the process. During the allocation, it is found that the number 1 in the asid_map has been set, that is, other processes have assigned the ASID of 1, and if you need to find the next idle ASID from asid_map, a new ASID of 6 will be assigned.

Suppose from A to E, because the ASID version number of E and the global ASID version number (the same batch) are the same as in case 1, there is no need to assign ASID. However, the processes that were originally in the ASID version batch 800 need to reassign ASID, some can use the original hardware ASID, some can reassign hardware ASID, but all change the ASID version number to the current global ASID version number 801. However, with the continuous allocation of hardware ASID, eventually in the 801 batch of hardware ASID will also be allocated, this time is the above case 2, to the case of all cpu tlb.

I can see that with the ASID mechanism, all cpu tlb will be emptied only when the hardware ASID is allocated (such as being used by 255processes), which greatly improves the performance of the system (without ASID mechanism, each process switch needs to clear tlb when address space switching is needed).

4. The difference between ordinary user process, ordinary user thread and kernel thread switching

There is a rule when switching the kernel address space: look at the mm_struct structure of the process descriptor, that is, the member mm:

1) if mm is NULL, it means that the kernel thread is about to be switched, and there is no need to switch the address space (all tasks share the kernel address space).

2) the kernel thread will borrow the mm of the previous user process and assign it to its own active_mm (its own mm is empty). When the process is switched, it will compare the active_mm of the previous process with the mm of the current process.

3) if the previous task and the task to be switched have the same mm members, that is, threads that share the address space, there is no need to switch the address space.

-> all process threads need to switch processor state to switch between threads.

-> for ordinary user processes to switch between processes, you need to switch the address space.

-> switching between threads in the same thread group does not need to switch address space because they share the same address space.

-> the kernel thread does not need to switch the address space when switching context, but only borrows the mm_struct structure of the previous process.

There is a scene:

Convention: we collectively refer to processes / threads as tasks, where U represents user tasks (processes / threads), K represents kernel threads, and numbers represent threads in the same thread group.

There are the following tasks: Ua1 Ua2 Ub Uc Ka Kb (eg:Ua1 is a user process, Ua2 is a user process in the same thread group as Ua1, Ub ordinary user process, Ka ordinary kernel thread).

If the scheduling order is as follows:

Uc-> Ua1-> Ua2-> Ub-> Ka-> Kb-> Ub

Since the Uc-> Ua1 is a different process, you need to switch the address space.

Since different threads in the same thread group share the address space from Ua1-> Ua2, the address space has been switched when switching to Ua1, so there is no need to switch the address space.

Since the Ua2-> Ub is a different process, you need to switch the address space.

From Ub-> Ka, you don't need to switch the address space because you switch to the kernel thread.

Before switching from Ka-> Kb two kernel threads, there is no need to switch address space.

Switch from Kb-> Ub from kernel thread to user process, because both Ka and Kb borrow Ub's active_mm, and Ub's active_mm is equal to Ub's mm, so Kb's active_mm is the same as Ub's mm at this time, and all will not switch address space.

The following is an example of multitasking address space switching:

5. Process toggles panoramic view

Let's take the following scenarios as an example:

Two processes are ordinary user processes, switching from process A to process B. for the sake of simplicity, we do not consider other preemptive opportunities here, we assume that the AMagne B process only circulates some basic operations, never calls any system calls, and only considers the situation that it is preempted before being interrupted by the clock and returning to user space.

The following is a panoramic view of the process switch:

It is clear in the view that there are three key points that need to be highlighted:

1. When an interrupt occurs, all the general registers are saved to the kernel stack of the process, using the struct pt_regs structure.

two。 Address space switching stores the base address pgd of the process's own page global catalog in ttbr0_le1, which is used as the starting point for mmu's page table traversal.

3. When the hardware context is switched, save the call register and pc, sp to the struct cpu_context structure. When the process is dispatched back again, the next instruction in cpu_switch_to is returned through the pc saved in cpu_context to continue execution. Because the sp saved in cpu_context causes the current process to return to its own kernel stack, after a series of kernel stack de-stack processing, the value of the general register originally stored in pt_regs is restored to the general register. In this way, when the process returns to the user space, it can continue to execute along the next instruction interrupted by the interrupt, the user stack returns to the position before it was interrupted, and the address conversion of the instruction data accessed by the process (VA to PA) starts from its own pgd, as if it never happened to the user, almost seamlessly.

Thank you for reading, the above is the content of "how to understand Linux kernel process context switching". After the study of this article, I believe you have a deeper understanding of how to understand Linux kernel process context switching, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.