Introduction of copy_ {to, from} _ user () in Linux 07/15 Update SLTechnology News&Howtos

Introduction of copy_ {to, from} _ user () in Linux

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

In this issue, the editor will bring you an introduction to the copy_ {to, from} _ user function in Linux, and analyze and describe it from a professional point of view. I hope you can get something after reading this article.

Introduction

We should be familiar with the use of the copy_ {to,from} _ user () interface. Basic Linux books will describe its role. After all, it is the bridge between kernel space and user space. All data interactions should use interfaces like this. Therefore, there is no reason why we should not know the role of the interface. However, I have also had the following questions.

Why do we need copy_ {to,from} _ user (), and what exactly does it do for us behind our backs?

What's the difference between copy_ {to,from} _ user () and memcpy ()? is it possible to use memcpy () directly?

Is it necessarily a problem for memcpy () to replace copy_ {to,from} _ user ()?

I suddenly found myself who was confused at that time. I have thought about every question I have raised. I think about it more than once, and each time I have a different idea. Of course it's because I didn't fully understand it from the beginning. Now we go back to this heavy topic and continue to think about what we used to be.

A hundred schools of thought contend

In view of the above problems, of course, Baidu is the first. Baidu also has a lot of blogs on this issue, which is enough to see that this problem must have perplexed a large number of Linux fans. For my review results, the views are mainly divided into the following two categories:

1. Copy_ {to,from} _ user () has more validation of incoming addresses than memcpy ().

For example, whether it belongs to the user space address range. In theory, kernel space can directly use pointers passed from user space, and even if you want to make a data copy, you can also directly use memcpy (). In fact, on an architecture without MMU, the final implementation of copy_ {to,from} _ user () is using mencpy ().

But for most platforms with MMU, the situation has changed: the pointer passed from the user space is on the virtual address space, and the virtual address space it points to is probably not really mapped to the actual physical page. But what can it do? The exception caused by the missing page is transparently fixed by the kernel (submitting a new physical page for the address space of the missing page), and the instruction accessing the missing page continues to run as if nothing had happened. But this is only the behavior of page fault exception in user space, which must be explicitly fixed in kernel space, which is determined by the design pattern of the page fault exception handler provided by the kernel.

The idea behind it is that in kernel mode, if a program tries to access a user-space address that has not yet been submitted to a physical page, the kernel must be vigilant and not as unaware as user space.

2. If we make sure that the pointer passed in the user state is correct, we can use the memcpy () function instead of copy_ {to,from} _ user (). After some experimental tests, it is found that there is no problem with the operation of the program using memcpy (). Therefore, the two can be replaced while ensuring the security of the user-state pointer.

From various blogs, the views are mainly focused on the first point. It seems that the first point is widely accepted by everyone. However, people who pay attention to practice have come to the second point of view, which is, after all, true knowledge comes from practice. Is the truth in the hands of a few people? Or are the eyes of the masses discerning? Of course, I do not deny any of the above views. And I can't guarantee you which view is correct. Because I believe that even the once unassailable theory may no longer be correct with the passage of time or the change of specific circumstances. For example, Newton's classical theory of mechanics seems to have gone a little too far. If you ask me to speak human language, it is: over time, the Linux code is constantly changing. Perhaps the above view was once correct. Of course, it may still be true now. The following analysis is my opinion. Similarly, everyone needs to be skeptical. Next I will throw a brick to attract jade.

A modest spur to induce others to come forward with valuable contributions

First, let's look at the function definitions of memcpy () and copy_ {to,from} _ user (). There is little difference in the parameters, including the destination address, the source address, and the byte size that needs to be copied.

Static _ always_inline unsigned long _ must_check copy_to_user (void _ user * to, const void * from, unsigned long n); static _ always_inline unsigned long _ must_check copy_from_user (void * to, const void _ user * from, unsigned long n); void * memcpy (void * dest, const void * src, size_t len)

However, there is one thing we must know. That is, memcpy () does not check the validity of the incoming address. Copy_ {to,from} _ user () performs a validation check similar to the following for the incoming address (to put it simply, please refer to the code for more verification details).

If you copy data from user space to kernel space, the user space address to and the byte length n of to plus copy must be in the user space address space.

If you copy data from kernel space to user space, of course, you also need to check the validity of the address. For example, whether to cross the boundary to access data that is not a code snippet, and so on. In short, all illegal operations need to be put an end to immediately.

After a simple comparison, let's take a look at other differences and discuss the two points raised above. Let's start with the second point of view. When it comes to practice, I still believe that practice leads to true knowledge. From the results of my tests, the implementation results are divided into two situations.

The result of the first case is that when you use the memcpy () test, there is no problem, and the code runs normally. The test code is as follows (only shows the read interface function corresponding to file_operations under the proc file system):

Static ssize_t test_read (struct file * file, char _ user * buf, size_t len, loff_t * offset) {memcpy (buf, "test\ n", 5); / * copy_to_user (buf, "test\ n", 5) * / return 5;}

We use the cat command to read the contents of the file, cat calls test_read through the system call read, and the size of the buf passed is 4k.

The test went well and the results were gratifying. Successfully read the "test" string. There seems to be nothing wrong with the second point. However, we need to continue to verify and explore. Because the first point of view mentions that "the page fault exception in kernel space must be explicitly fixed."

So we also need to verify that if buf has allocated virtual address space in user space but has not established a specific mapping with physical memory, kernel state page fault will occur in this case. We first need to create this condition, find the buf that matches, and then test it. Of course I didn't test it here. Because there are test conclusions (mainly because I am lazy, I find it troublesome to construct this condition).

This test is a friend of mine, known as Mr. Song's "teaching assistant" Ackerman Daniel. He once did this experiment and came to the conclusion that the code would work properly even if the buf did not establish a specific mapping relationship with physical memory. Page fault occurs in the kernel state and is repaired by it (allocating specific physical memory, populating page tables, establishing mapping relationships). At the same time, I analyze it from the point of view of the code, and the conclusion is the same.

After the above analysis, it seems that memcpy () can also be used normally, in view of security considerations, it is recommended to use interfaces such as copy_ {to,from} _ user ().

The result of the second case is that the above test code does not work properly and triggers the kernel oops. Of course, the kernel configuration options of this test are different from those of the last test. The configuration item is CONFIG_ARM64_SW_TTBR0_PAN or CONFIG_ARM64_PAN (for the ARM64 platform). The function of both configuration options is to prevent the kernel state from directly accessing the user address space. It's just that CONFIG_ARM64_SW_TTBR0_PAN implements this function by software simulation, while CONFIG_ARM64_PAN implements this function by hardware (ARMv8.1 extension). We use CONFIG_ARM64_SW_TTBR0_PAN as the analysis object (only software simulation has code to provide analysis). BTW, if the hardware does not support, even if the configuration of CONFIG_ARM64_PAN is useless, can only use software simulation method. If you need to access the user space address, you need to go through an interface like copy_ {to,from} _ user (), otherwise it will result in kernel oops.

When the option for CONFIG_ARM64_SW_TTBR0_PAN is turned on, testing the above code will result in kernel oops. The reason is that the kernel state accesses the user space address directly. Therefore, we cannot use memcpy () in this case. We have no choice but to use copy_ {to,from} _ user ().

Why do we need PAN (Privileged Access Never) functionality? The reason may be that security issues are easily introduced in the interaction between user space and kernel space data, so we do not allow kernel space to easily access user space, and if we have to do so, we must shut down PAN through a specific interface. On the other hand, the PAN function can standardize the interface of data exchange between kernel mode and user mode. When PAN is enabled, kernel or driver developers can be forced to use security interfaces such as copy_ {to,from} _ user () to improve the security of the system. Similar to memcpy () non-standard operation, kernel will show you oops.

Security vulnerabilities are introduced due to non-standard programming. For example, the Linux kernel vulnerability CVE-2017-5123 can elevate privileges. The reason for this vulnerability is the lack of access_ok () to check the validity of the user delivery address. Therefore, in order to avoid introducing security problems into our own code, we should be extra careful about the interaction between kernel space and user space data.

get to the bottom of

Now that we have mentioned the configuration options for CONFIG_ARM64_SW_TTBR0_PAN. Of course, I also want to know the principle of the design behind it. Because of the special hardware design of ARM64, we use two page table base address registers ttbr0_el1 and ttbr1_el1. The processor determines whether the accessed address belongs to user space or kernel space based on the high 16 bit of 64 bit addresses. If it is a user space address, use ttbr0_el1, and vice versa, use ttbr1_el1. Therefore, when the ARM64 process switches, you only need to change the value of ttbr0_el1. Ttbr1_el1 can choose not to change because all processes share the same kernel space address.

When the process switches to kernel state (interrupt, exception, system call, etc.), how can kernel state access to the user state address space be avoided? In fact, it is not difficult to figure out that you can change the value of ttbr0_el1 and point to an illegal mapping. Therefore, we have prepared a special page table for this. The size of the page table is 4k memory, and its values are all 0. When the process switches to kernel mode, changing the value of ttbr0_el1 to the address of the page table ensures that the access user space address is illegal. Because the value of the page table is illegal. The memory of this particular page table is allocated through the link script.

# define RESERVED_TTBR0_SIZE (PAGE_SIZE) SECTIONS {reserved_ttbr0 =. + = RESERVED_TTBR0_SIZE; swapper_pg_dir =. + = SWAPPER_DIR_SIZE; swapper_pg_end =.}

This special page table is with the kernel page table. It is only 4k different from swapper_pg_dir. The contents of the 4k memory space starting with the reserved_ttbr0 address are cleared.

When we enter the kernel state, we will switch ttbr0_el1 with _ _ uaccess_ttbr0_disable to turn off user-space address access, and turn on user-space address access through _ uaccess_ttbr0_enable when needed. These two macro definitions are not complex, so take _ uaccess_ttbr0_disable as an example to illustrate the principle. It is defined as follows:

.clients _ _ uaccess_ttbr0_disable, tmp1 mrs\ tmp1, ttbr1_el1 / / swapper_pg_dir (1) bic\ tmp1,\ tmp1, # TTBR_ASID_MASK sub\ tmp1,\ tmp1 # RESERVED_TTBR0_SIZE / / reserved_ttbr0 just before / / swapper_pg_dir (2) msr ttbr0_el1,\ tmp1 / / set reserved TTBR0_EL1 (3) isb add\ tmp1,\ tmp1, # RESERVED_TTBR0_SIZE msr ttbr1_el1 \ tmp1 / / set reserved ASID isb .endm

Ttbr1_el1 stores the base address of the kernel page table, so its value is swapper_pg_dir.

Swapper_pg_dir minus RESERVED_TTBR0_SIZE is the special page table described above.

Pointing ttbr0_el1 changes to this particular page table base address, of course, ensures that subsequent access to the user address is illegal.

The C language implementation corresponding to _ _ uaccess_ttbr0_disable can be referenced here.

How do you allow kernel state access to user space addresses? It is also very simple, that is, the anti-operation of _ _ uaccess_ttbr0_disable, which gives ttbr0_el1 a legal base address of the page table. There is no need to repeat it here.

What we need to know now is that when CONFIG_ARM64_SW_TTBR0_PAN is configured, the copy_ {to,from} _ user () interface will allow kernel state access to user space before copy and turn off kernel state access to user space after the end of copy. Therefore, it is orthodox to use copy_ {to,from} _ user (). It is mainly reflected in security check and security access processing. This is the first feature that has more than memcpy (), and another important feature will be introduced later.

Now we can answer the questions left over in the previous section. How can I continue to use memcpy ()? Now it's easy to allow kernel access to the user space address through uaccess_enable_not_uao () before the memcpy () call, call memcpy (), and finally turn off kernel state access to user space through uaccess_disable_not_uao ().

Prepare for the rainy day

The above test cases are all tested on the basis of passing a legal address in user space. What is a legal user space address?

The range of addresses contained in the virtual address space requested by the system call is the legal address (regardless of whether or not a physical page is assigned to establish a mapping relationship). Since we want to write an interface program, of course, we should also consider the robustness of the program, we can not assume that all the parameters passed by the user are legal. We should anticipate the occurrence of illegal transmission of parameters and make preparations in advance, that is, we should make preparations in advance.

We first use the test case of memcpy () to pass an illegal address at random. After testing, it is found that kernel oops will be triggered. Continue to use copy_ {to,from} _ user () instead of the memcpy () test.

The test found that read () only returns an error, but does not trigger kernel oops. This is what we want. After all, an application should not trigger kernel oops. What is the implementation principle of this mechanism?

We take copy_to_user () as an example. The function call process is as follows:

Copy_to_user ()-> _ copy_to_user ()-> raw_copy_to_user ()-> _ _ arch_copy_to_user ()

_ arch_copy_to_user () is implemented in assembly code on the ARM64 platform, which is critical.

End .req x5 ENTRY (_ _ arch_copy_to_user) uaccess_enable_not_uao x3, x4, x5 add end, x0, x2 # include "copy_template.S" uaccess_disable_not_uao x3, x4 mov x0, # 0 ret ENDPROC (_ _ arch_copy_to_user) .section .fixup "ax" .align 2 9998: sub x0, end, dst / / bytes not copied ret .seam

Uaccess_enable_not_uao and uaccess_disable_not_uao are the kernel state switches mentioned above to access user space.

The copy_template.S file is the function of assembling the implementation of memcpy (), as you'll see later on in the implementation code of memcpy ().

.section.fixup, "ax" defines a section called ".fixup", and the permission is ax ('a 'relocatable segment,' x 'executable segment). The instruction of label 9998 is to deal with the aftermath of "plan in advance". Remember the meaning of the return value of copy_ {to,from} _ user ()? Returning 0 indicates that the copy was successful, otherwise it returns the remaining number of bytes without copy. This line of code is to calculate the number of bytes remaining without copy. Page fault is bound to be triggered when we access an illegal user space address. In this case, the page fault that occurs in the kernel state and returns without fixing the exception, so you must not return the address where the exception occurred to continue to run. Therefore, the system can have two choices: the first choice is kernel oops and sends a SIGSEGV signal to the current process; the second option is not to return the abnormal address to run, but to choose an address that has been repaired to return. If you are using memcpy (), there is only the first option. But copy_ {to,from} _ user () has a second choice. The .fixup section is designed to implement this repair function. When an illegal user space address is accessed during copy, the address returned by do_page_fault () becomes 9998, the remaining uncopy byte length can be calculated and the program can continue to execute.

Compared with the results of the previous analysis, _ arch_copy_to_user () can be approximately equivalent to the following relationship.

Uaccess_enable_not_uao (); memcpy (ubuf, kbuf, size); = _ _ arch_copy_to_user (ubuf, kbuf, size); uaccess_disable_not_uao ()

First insert a message to explain why copy_template.S is memcpy (). Memcpy () is implemented in assembly code on the ARM64 platform. It is defined in the arch/arm64/lib/memcpy.S file.

Include memcpy ENTRY (_ _ memcpy) ENTRY (memcpy) # include "copy_template.S" ret ENDPIPROC (memcpy) ENDPROC (_ _ memcpy)

So obviously, the definitions of the memcpy () and _ _ memcpy () functions are the same. And the memcpy () function declares as weak, so you can override the memcpy () function (which goes a little too far). A little bit more, why use compilation? Why not use the memcpy () function of the lib/string.c file? To optimize the execution speed of memcpy (), of course. The memcpy () function of the lib/string.c file is copy in bytes (even the best hardware can be destroyed by rough code).

But today's processors are basically 32-or 64-bit, which can be 4 bytes or 8 bytes or even 16 bytes copy (considering address alignment). It can significantly improve the execution speed. Therefore, the ARM64 platform is implemented using assembly. This part of the knowledge can refer to this blog "memcpy Optimization and implementation of ARM64".

Let's move on to the topic and repeat: the kernel mode accesses the user space address. If page fault is triggered, as long as the user space address is valid, the kernel state will fix the exception as if nothing had happened (allocate physical memory and establish page-table mapping). But if you visit an illegal user space address, choose the second way and try to redeem yourself. This way is to take advantage of the .fixup and _ _ ex_table sections.

If you are powerless, you can only send a SIGSEGV signal to the current process. And, light is kernel oops, heavy is panic (depending on the kernel configuration option CONFIG_PANIC_ON_OOPS). In the case of kernel access to an illegal user space address, do_page_fault () will eventually jump to do_kernel_fault () at the no_context label.

Static void _ do_kernel_fault (unsigned long addr, unsigned int esr, struct pt_regs * regs) {/ * * Are we prepared to handle this kernel fault? * We are almost certainly not prepared to handle instruction faults. * / if (! is_el1_instruction_abort (esr) & & fixup_exception (regs)) return; / * * /}

Fixup_exception () continues to call search_exception_tables () by looking for the _ extable segment. The _ _ extable segment stores exception table, and each entry stores the address of the exception and the address of its corresponding fix.

For example, the address of the 9998 subx0remending DST instruction above will be found and the return address of the do_page_fault () function will be modified to achieve the function of jump repair. In fact, the search process is based on the address addr of the problem to find out whether the _ extable segment (exception table) has a corresponding exception table entry. If so, it means that it can be repaired. Because of the difference between 32-bit processor and 64-bit processor, let's start with the implementation principle of 32-bit processor exception table.

The addresses at the beginning and end of the _ extable segment are _ _ start___ex_table and _ _ stop___ex_table (defined in include/asm-generic/vmlinux.lds.h. This piece of memory can be thought of as an array, and each element of the array is of type struct exception_table_entry, which records the address where the exception occurred and its corresponding repair address.

Exception tables _ start___ex_table-- > +-+ | entry | +-+ | entry | +-+ |. | +-+ | entry | +- -+ | entry | _ _ stop___ex_table-> +-+

On 32-bit processors, struct exception_table_entry is defined as follows:

Struct exception_table_entry {unsigned long insn, fixup;}

One thing to be clear is that on 32-bit processors, unsigned long is 4 bytes. Insn and fixup store the address where the exception occurred and the corresponding repair address, respectively. Find the corresponding repair address based on the exception address ex_addr (return 0 is not found). The code is as follows:

Unsigned long search_fixup_addr32 (unsigned long ex_addr) {const struct exception_table_entry * e; for (e = _ _ start___ex_table; e

< __stop___ex_table; e++) if (ex_addr == e->

Insn) return e-> fixup; return 0;}

On 32-bit processors, creating an exception table entry is relatively simple. Instructions accessed for each user space address in the copy {to,from} user () assembly code create an entry, and insn stores the address of the current instruction, and fixup stores the address of the repair instruction.

When 64-bit processors start to evolve, if we continue to use this approach, we will need twice as much memory exception table as 32-bit processors (because it takes 8 bytes to store an address). So, kernel is implemented in a different way. On 64 processors, struct exception_table_entry is defined as follows:

Struct exception_table_entry {int insn, fixup;}

Each exception table entry takes up the same amount of memory as a 32-bit processor, so the memory footprint remains the same. But the meaning of insn and fixup has changed. Insn and fixup respectively store the offset of the exception occurrence address and the repair address from the current structure member address (a bit of a mouthful). For example, if you look up the corresponding repair address based on the exception address ex_addr (return 0 is not found), the code is as follows:

Unsigned long search_fixup_addr64 (unsigned long ex_addr) {const struct exception_table_entry * e; for (e = _ _ start___ex_table; e

< __stop___ex_table; e++) if (ex_addr == (unsigned long)&e->

Insn + e-> insn) return (unsigned long) & e-> fixup + e-> fixup; return 0;}

Therefore, our focus is on how to build exception_table_entry. For memory access for each user space address, we need to create an exception table entry and insert the _ extable segment. For example, the following assembly instruction (the address corresponding to the assembly instruction is written at random, there is no need to struggle with right or wrong. Understanding the principle is the king.

0xffff000000000000: ldr x1, [x0] 0xffff000000000004: add x1, x1, # 0x10 0xffff000000000008: ldr x2, [x0, # 0x10] / *. / 0xffff000040000000: mov x0, # 0xfffffffffffffff2 / /-14 0xffff000040000004: ret

Suppose the x0 register holds the user space address, so we need to create an exception table entry on the assembly instruction of the 0xffff000000000000 address, and we expect that when X0 is an illegal user space address, the repair address returned by the jump is 0xffff000040000000. For simplicity of calculation, assume that this is the first entry created, and the _ _ start___ex_ table value is 0xffff000080000000. So the values of the insn and fixup members of the first exception table entry are: 0x80000000 and 0xbffffffc (both values are negative). Therefore, an entry is created for each instruction accessed by the user space address in the copy {to,from} user () assembly code. So the assembly instruction at the 0xffff000000000008 address also needs to create an exception table entry.

So what happens if kernel state accesses illegal user space addresses? The above analysis process can be summarized as follows:

Access illegal user space address:

0xffff000000000000:ldr x1, [x0]

MMU trigger exception

CPU calls do_page_fault ()

Do_page_fault () calls search_exception_table () (regs- > pc = = 0xffff000000000000)

Look at the _ extable segment, look for 0xffff000000000000 and return the repair address 0xffff000040000000

Do_page_fault () modifies the function return address (regs- > pc = 0xffff000040000000) and returns

The program continues to execute to deal with error situations

Modify the function return value x0 =-EFAULT (- 14) and return (ARM64 passes the function return value through x0)

Summary

When it comes time to review and summarize, the thinking of copy_ {to,from} _ user () ends here. Let's make a summary and conclude this article.

Whether in kernel mode or user mode, when the virtual address does not establish a mapping relationship between physical addresses, the process of page fault is almost the same, which will help us to apply for physical memory and create a mapping relationship. So in this case memcpy () and copy_ {to,from} _ user () are similar.

When the kernel state accesses the illegal user space address, it looks for the repair address according to the abnormal address. The way to fix the exception is not to establish an address mapping, but to modify the return address of do_page_fault (). Memcpy () can't do that.

When enabling CONFIG_ARM64_SW_TTBR0_PAN or CONFIG_ARM64_PAN (valid only if supported by hardware), we can only use the interface copy_ {to,from} _ user (). It is not possible to use memcpy () directly.

The above is the introduction of copy_ {to, from} _ user () in Linux shared by Xiaobian. If you have similar doubts, you might as well refer to the above method to try. If you want to know more about it, please follow the industry information.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.