How to implement Percpu variable in Linux Kernel 07/08 Update SLTechnology News&Howtos

How to implement Percpu variable in Linux Kernel

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces how to achieve Percpu variables in the Linux kernel, the content is very detailed, interested friends can refer to, hope to be helpful to you.

The so-called thread local variable, that is, for the same variable, each thread has its own share, the access to the variable is thread isolated, they will not affect each other, so there will not be all kinds of multithreading problems.

Using thread local variables correctly can greatly simplify multithreaded development. So both c/c++/rust and java/c# have built-in support for thread local variables.

But did you know that not only in programming languages, but also in the linux kernel, there is a similar mechanism for similar purposes, called percpu variables.

Percpu variable, as the name implies, is that for the same variable, each cpu has its own copy, which can be used to store some data unique to cpu, such as running threads on cpu's id,cpu, and so on. Because this mechanism can easily solve some specific problems, it is widely used in kernel programming.

Curiously, you must be asking, how did it happen?

Regardless of the details, let's look at a picture so as to understand its implementation from a global point of view.

From the figure above, we can see that many percpu variables are defined through DEFINE_PER_CPU in various source files. According to the relevant definitions in vmlinux.lds.S, these variables are aggregated by linker and put into a section called .data.. percpu in the final vmlinux file.

The addresses of these variables are also specially treated, and they are incremented from zero to zero, so that the address of a variable is the location of the variable in the .data.. percpu area of the whole vmlinux. With this location, and then knowing the starting address of a cpu's percpu memory block, you can easily calculate the runtime memory address of the variable corresponding to the cpu.

When the linux kernel starts, it will first load the vmlinux file into memory, and then allocate a memory area for each cpu to store percpu variables according to the number of cpu, then copy the contents of the vmlinux. Data..percpu section into the static area of the percpu memory block of each cpu, and finally put the starting address of each percpu memory block into the gs register of the corresponding cpu.

At this point, the initialization of the percpu variable is over.

When we access the percpu variable, we only need to add the address in the gs register with the address of the percpu variable we want to access, and we can get the real memory address of the percpu variable on the cpu.

With this address, we can easily manipulate the percpu variable.

The above figure focuses on the percpu variables that have been identified at kernel compilation time. These variables are static and do not dynamically increase or decrease over time, so they are copied directly to the static area of each percpu memory block when the kernel is initialized.

In addition to this static percpu variable, there are two other percpu variables.

One of them is the static percpu variable in the kernel module, although it can be determined at the compilation time, but because of the dynamic loading characteristic of the kernel module, it is not completely static. The kernel opens a separate area for this percpu variable in the percpu memory block, called reserved area. When the kernel module is loaded into memory, its static percpu variable will allocate memory in this area.

Another type of percpu variable is the purely dynamic percpu variable, which is dynamically allocated at run time and uses the dynamic area in the figure above.

Even if the size of the static area is good at the compilation time, it is fixed, and the reserved area is also fixed, but its size is estimated, and the dynamic area can be increased dynamically.

Although the allocation methods of these three percpu variables are different, their internal mechanisms are essentially the same, so here we only talk about static percpu variables in the kernel. Students who are interested in the other two ways can refer to the kernel source code to study them.

Let's use a concrete example to see how the percpu variable is implemented.

The current in the figure above indicates that you want to get the current thread object, which is actually a macro, which is defined as follows:

As you can see from the above, the current thread object obtained by current is actually a percpu variable named current_task.

In the get_current method, get the current_task that belongs to the current cpu through the this_cpu_read_stable method.

The this_cpu_read_stable method is actually a macro, and when it is all expanded it looks like this:

Here, let's not talk about the meaning of each sentence after the macro expansion, let's run to the topic first.

Students who have read the linux kernel source code know that in the linux kernel, macros are used very much, and they are quite complex. If we have no confidence in the correctness of our macro expansion, we can use the method I introduced below. Using it, you can easily get the results of any file macro expansion.

We know that the construction of a program is divided into preprocessing, compilation, assembly, and linking stages, and macro expansion occurs in the preprocessing stage.

After each stage is completed, a temporary file is usually generated for the next phase to use. These temporary files are not saved to disk by default, but we can tell gcc to keep these temporary files for us by specifying some parameters, so that we can view the generated content of each stage.

According to this idea, as long as we compile such as the above net/socket.c file, add these parameters, we can get these temporary files, and we can see what the preprocessed macro expansion looks like.

However, if you just want to see the results of a single file's macro expansion, it is very time-consuming and uneconomical to save the temporary files of all source files in the entire kernel. Is there any way to see which file's macro is expanded and compile that file separately?

There is.

In fact, this method is also very simple, we just need to know what the compilation command is when compiling a file, so that when we need to view the macro expansion of the file, we use this compilation command, and add some specific parameters. Compile again, so that you can get the temporary files for each stage of the file compilation process.

So how do you find the command to use when compiling each source file?

This kernel has actually been done for us.

When we compile the kernel, the commands used for each file in the kernel are saved to a corresponding temporary file. For example, the compilation commands for the net/socket.c file above are saved in the following file:

The compilation command for net/socket.c is the first line in the figure above, from the beginning of gcc to the end of the line.

This compilation command is complicated enough, but we don't need to worry about it, we just need to know that with this command, we can compile net/socket.c into net/socket.o.

Now we add the-save-temps=obj parameter to the command to tell gcc to keep the temporary files of each stage during compilation. The specific operation flow is as follows:

As you can see from above, after adding the-save-temps=obj parameter, the compilation process generates two more files, and net/socket.i is the file preprocessed by gcc.

Open net/socket.i and find the get_current method we need:

Look at the selected part of the picture above, its content and our own macro expansion of the result is exactly the same.

That's a good method, isn't it?

Of course, we can further confirm that this is true after the macro is expanded by decompiling:

As you can see from the above, when the macro is expanded, it is mainly a mov instruction, where the value of the current_task variable address is 0x16d00.

What this instruction means is to add the address in the gs register to the address in current_task, and then move the value in the memory space pointed to by the added address to rax.

This is consistent with the implementation mechanism of percpu mentioned above.

OK, let's go back to the break above and continue to take a look at the meaning of the statements after the macro expansion in the get_current method.

As mentioned above, the macro expansion of the this_cpu_read_stable method in the get_current method is mainly an asm statement. Some students may not be familiar with this statement. In fact, it is not the syntax in the c language standard specification, but an extension of the c standard by gcc. Through the asm statement, we can directly execute assembly instructions in c.

For detailed syntax rules, you can refer to the following link:

Https://gcc.gnu.org/onlinedocs/gcc/Using-Assembly-Language-with-C.html#Using-Assembly-Language-with-C

Students who don't care about details don't have to look at the specific syntax, we just need to know that the asm statement means to get the address of current_task, add it to the base address value in the register of gs segments, and get a final address, and then through the mov instruction, put the value of the memory that the final address points to into the pfo_val__ variable.

After the instruction is executed, the value stored in the pfo_val__ variable is the address of the current thread object struct task_struct executed by the current cpu, that is, the pfo_val__ variable is the pointer to the currently executing thread object.

So why is it that in this way, you get a pointer to the current thread object that the current cpu is executing?

In fact, we have already talked about this, the key point is that what is stored in the gs register is the starting address of the current cpu percpu memory block, while the current_task address represents the location of the current_task variable in any percpu memory block, so when these two addresses are added together, the natural result is the current value of the current_task variable of the current cpu.

This is true in theory, but let's take a look at it again from the source point of view.

First, let's look at the definition of the current_task variable:

DEFINE_PER_CPU is also a macro, which is expanded as follows:

In the definition of a variable after the macro is expanded, the most important thing is to specify that the variable's section is .data. Percpu.

Let's see where the section is used:

As you can see from the image above, the section is used in the PERCPU_INPUT macro, and the PERCPU_ input macro is used by the PERCPU_ VADDR macro below.

Let's take a look at where PERCPU_VADDR macros are used:

PERCPU_ VADDR macros are also used in vmlinux.lds.S files.

Vmlinux.lds.S is a linking script. In the linking phase, linker aggregates the kernel variables or methods of the same section into the corresponding section of the final output file vmlinux according to the definition in vmlinux.lds.S.

For example, the PERCPU_VADDR macro above means that all the variables in the source file that belong to various .dataPercpu section are extracted and put into the section of the output file vmlinux in turn.

Note in the figure above that when PERCPU_VADDR is called, the vaddr parameter passed in is 0, which means that the address of the variable stored in the vmlinux. Data.. percpu section starts at 0 and increases in turn.

As we said before, the address is used to indicate the location of the variable in the .data... percpu section, that is, the address represents the location of the variable in the percpu memory blocks of each cpu at run time.

The address of the variable stored in the .data... Percpu section in vmlinux starts with 0, which can be confirmed by the value of _ _ per_cpu_start:

Another thing to note is that the address value of _ _ per_cpu_load is the normal kernel compilation address, which is used to specify the memory location of. Data..percpu section in vmlinux when vmlinux is loaded into memory:

To sum up, the function of the PERCPU_ VADDR macro is to aggregate all the variables in the source file that belong to each .data. percpu section, and then put them in turn into the output file vmlinux. Percpu section, and the variable address in section starts at 0, so the address of these variables indicates the location of the section.

In addition, three address values are defined in the PERCPU_VADDR macro:

_ _ per_cpu_load indicates the memory location of. Data..percpu section in vmlinux when vmlinux is loaded into memory. The value of _ _ per_cpu_start is 0. The value of _ _ per_cpu_end is the end address of .data..percpu section in vmlinux.

In this way, you can know the location of the .data..percpu section when the vmlinux is loaded into memory through _ _ per_cpu_load, and the size of the .data.. percpu section through _ _ per_cpu_end-_ _ per_cpu_start.

As you can see from the above, the percpu variable in the kernel takes up about the same amount of memory as 170KiB.

So far, all the preparations for percpu variables have been done, so let's take a look at how it uses this information during kernel vmlinux file startup to allocate percpu blocks of memory to each cpu, initialize memory block data, and set memory block addresses to gs registers.

By searching for _ _ per_cpu_load, _ _ per_cpu_start, _ _ per_cpu_end, we can see that this memory allocation is done in the setup_per_cpu_areas method:

The file path and general appearance of this method are shown in the figure above. For ease of viewing, I have deleted a lot of unnecessary code.

Because the logic of this method is very complex, we will not explain each line of code in detail here, but only look at the key parts.

The main purpose of this method and related methods is to allocate its own block of percpu memory for each cpu:

Then copy the .data.. percpu section of vmlinux into the percpu memory block of each cpu:

Here ai- > static_size is the value of _ _ per_cpu_end minus _ _ per_cpu_start.

Finally, set the starting address of each cpu's percpu memory block to the gs register of their respective cpu:

What we need to pay attention to in the above figure is the setting of the gs register. We know that in x86 / 64 mode, the segment registers CS, DS, ES, SS are basically not in use. Although FS and GS are still in use, they use traditional mov instructions to set FS and GS values. The address space supported is only 32 bits. If you want to support 64 bits, you must write MSR.

This is detailed in the official AMD documentation:

After setting the value of the gs register, let's go back and think about how the kernel gets the address value of the current_task variable of the current cpu:

Mov gs:0x16d00, rax

Now you fully understand the meaning of this line of code.

That's all for the percpu section, but there's one thing left to say about how to get the current_ task value of the current thread on which cpu is currently running.

We know that a cpu can run multiple threads. If you want the percpu variable current_task to point to the current thread of the current cpu, you must update the current_task when the thread switches:

On how to achieve Percpu variables in the Linux kernel to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.