What are the knowledge points of memory barrier in Linux kernel 04/16 Update SLTechnology News&Howtos

What are the knowledge points of memory barrier in Linux kernel

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly talks about "what are the knowledge points of memory barrier in Linux kernel". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn what are the knowledge points of memory barrier in the Linux kernel.

Cache consistency

It has always been thought that a lot of things in linux are used to ensure cache consistency, but they are not. Cache consistency is mostly achieved by hardware mechanisms, and it has something to do with cache only when instructions with lock prefixes are executed. (that's absolutely true, but that's what it seems to me at the moment.) We are more often used to ensure sequence consistency.

Cache consistency means that in a multiprocessor system, each cpu has its own L1 cache. It is possible that the contents of the same memory are cached in the L1 cache of two different cpu. If one cpu changes the content of its own cache, it wants to ensure that the other cpu will also read this piece of data when it reads the data. But don't worry, this complex task is done entirely by hardware, which can easily do cache consistency by implementing a MESI protocol. Don't say to read and write one at a time, that is, it's okay to write more than one at the same time. A cpu can always read * data when it is read, whether it is in its own cache, in the cache of other cpu, or in memory. This is cache consistency.

Sequential consistency

The so-called sequential consistency is a completely different concept from cache consistency, although they are the product of processor development. Because of the continuous development of compiler technology, it may change the order of certain operations in order to optimize your code. The concept of multi-emission and out-of-order execution has long been in the processor. The result is that the order of instructions actually executed is slightly different from that of the code at programming time. Of course, this doesn't matter under a single processor, after all, as long as your own code is left alone, no one cares, and compilers and processors disrupt the execution order while ensuring that their own code cannot be found. However, this is not the case with multiprocessors, and the order in which instructions are completed on one processor may have a great impact on the code executed on other processors. So there is the concept of sequence consistency, that is, to ensure that the execution order of threads on one processor is the same for threads on other processors. The solution to this problem can not be solved by the processor or compiler alone, but requires the intervention of software.

Memory barrier

The software intervention is also very simple, which is to insert the memory barrier (memory barrier). In fact, the term memory barrier is made by the artificial processor, which makes it very difficult for us to understand. The memory barrier makes it easy for us to string to cache consistency and even wonder if this is the only way for other cpu to see the modified cache. The so-called memory barrier, from the processor point of view, is used to serialize read and write operations, from the software point of view, is used to solve the sequence consistency problem. Doesn't the compiler want to disrupt the order of code execution? doesn't the processor want to execute out of order? when you insert a memory barrier, you tell the compiler that the order of instructions before and after the barrier cannot be reversed, telling the processor that the instructions behind the barrier can not be executed until the instructions in front of the barrier have been executed. Of course, the memory barrier can stop the compiler from messing around, but the processor still has a way. Isn't there the concept of multi-emission, out-of-order execution and sequential completion in the processor? when it is in the memory barrier, as long as it ensures the read and write operation of the previous instruction, it must be completed before the read and write operation of the later instruction is completed. That's why the memory barrier can be divided into three types: read barrier, write barrier and read-write barrier. For example, before x86, write operations were guaranteed to be done sequentially, so there was no need for write barriers, but now some ia32 processors have written operations out of order, so write barriers are also needed.

In fact, in addition to special read-write barrier instructions, many instructions are executed with read-write barrier functions, such as instructions with the lock prefix. Before the advent of special read-write barrier instructions, linux survived by lock.

As for inserting a read-write barrier there, it depends on the needs of the software. The read-write barrier cannot fully achieve order consistency, but threads on multiprocessors don't stare at your execution order all the time, just make sure that when it looks at it, you think you are in order. Execution won't happen that you didn't expect in your code. In the so-called unexpected situation, for example, your thread first assigns a value to variable a, and then to variable b. As a result, threads running on other processors find that b is assigned, but an is not assigned. (note that this inconsistency is not caused by cache inconsistency, but by the inconsistent order in which processor write operations are completed). Add a write barrier between an assignment and b assignment.

Synchronization between multiprocessors

With SMP, threads start running on multiple processors at the same time. As long as it is a thread, there are requirements for communication and synchronization. Fortunately, the SMP system is shared memory, that is, all processors see the same memory content, although there is an independent L1 cache, but the hardware still completes the problem of cache consistency. If threads on different processors want to access the same data, they need a critical section and synchronization. Rely on what synchronization? In the past, in the UP system, we relied on semaphores, turned off interrupts and read-modify write instructions. Now in SMP systems, off interrupts are obsolete, and although it is still needed to synchronize threads on the same processor, it is no longer possible. Read modify write instructions? I can't either. Before the read operation in your instruction completes the write operation, another processor may have done the read or write operation. The cache consistency protocol is advanced, but it is not advanced enough to predict which instruction this read operation will be issued. So x86 invented instructions with the lock prefix. When this instruction is executed, all cache line containing the read-write address in the instruction is invalidated and the memory bus is locked. In this way, if other processors want to read and write to the same address or the address on the same cache line, they can neither read and write from the cache (the relevant line in the cache has been invalidated), nor from the memory bus (the whole memory bus is locked), and finally achieve the goal of atomic execution. Of course, starting with the P6 processor, if the address to be accessed with the lock prefix instruction is already in cache, atomic operations can be done without locking the memory bus (although I suspect this is due to the addition of the common L2 cache within the multiprocessor).

Because the memory bus is locked, the unfinished read and write operations will be completed before the instruction with lock prefix is executed, which also acts as a memory barrier.

Now the synchronization of threads among multiprocessors uses spin locks on the top and read-modify-write instructions with the lock prefix on the down. Of course, the actual synchronization plus the prohibition of the processor task scheduling, plus the task off interrupt, but also add the outer coat of semaphores. The realization of this kind of spin lock in linux has gone through four generations of development and become more and more efficient and powerful.

Realization of memory Barrier

# ifdef CONFIG_SMP # define smp_mb () mb () # define smp_rmb () rmb () # define smp_wmb () wmb () # else # define smp_mb () barrier () # define smp_rmb () barrier () # define smp_wmb () barrier () # endif

CONFIG_SMP is used to support multiprocessors. If it is a UP (uniprocessor) system, it will be translated into barrier ().

# define barrier () _ asm__ volatile__ (""::: "memory")

The function of barrier () is to tell the compiler that the value of the variable in memory has changed, that the copy of the variable in the register is invalid, and that you need to access the memory to access the variable. This is sufficient to satisfy all the memory barriers in UP.

# ifdef CONFIG_X86_32 / * * Some non-Intel clones support out of order store. Wmb () ceases to be a * nop for these. * / # define mb () alternative ("lock; addl $0esp 0 (% esp)", "mfence", X86_FEATURE_XMM2) # define rmb () alternative ("lock; addl $0esp 0 (% esp)", "lfence", X86_FEATURE_XMM2) # define wmb () alternative ("lock" Addl $0lfence 0 (% esp) "," sfence ", X86_FEATURE_XMM) # else # define mb () asm volatile (" mfence ":" memory ") # define rmb () asm volatile (" lfence ":" memory ") # define wmb () asm volatile (" sfence ":" memory ") # endif

In the case of a SMP system, the memory barrier translates to the corresponding mb (), rmb (), and wmb (). What CONFIG_X86_32 means here is that this is a 32-bit x86 system, otherwise it is a 64-bit x86 system. The current linux kernel merges 32-bit x86 and 64-bit x86 in the same x86 directory, so you need to add this configuration option.

As you can see, if it is 64-bit x86, there must be three instructions mfence, lfence and sfence, while 32-bit x86 systems may not, so you need to further check whether cpu supports these three new instructions. If not, add a lock to increase the memory barrier.

The SFENCE,LFENCE,MFENCE instruction provides an efficient way to ensure the sorting of read and write memory, which occurs between the program that produces the weakly sorted data and the program that reads the data.

SFENCE-- serialization is a write operation that occurs before the SFENCE instruction but does not affect read operations.

LFENCE-- serialization is a read operation that occurs before the SFENCE instruction but does not affect write operations.

MFENCE-- serializes the read and write operations that occur before the MFENCE instruction.

Sfence: the write operation before the sfence instruction must be completed before the write operation after the sfence instruction.

Lfence: the read operation before the lfence instruction must be completed before the read operation after the lfence instruction.

Mfence: the read and write operations before the mfence instruction must be completed before the read and write operation after the mfence instruction.

As for the memory operation with lock, it will end the previous read and write operations before locking the memory bus, which is equivalent to mfence, but of course it is less efficient.

At this point, I believe you have a deeper understanding of "what are the knowledge points of the memory barrier in the Linux kernel?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.