Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the difference between mutex and spin lock

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is the difference between mutex and spin lock". The content of the explanation in this article is simple and clear, and it is easy to learn and understand. let's study and learn "what's the difference between mutex and spin lock".

1. Mutex or spin lock: who is easier and more efficient?

If you want to know which of them is more efficient, you must first understand how they behave differently when they are doing the same thing. Suppose one thread succeeds in locking, and the other threads will fail naturally. The failed thread is handled as follows:

After the mutex lock fails, the thread releases the CPU to other threads

When the spin lock fails, the thread waits until it gets the lock.

After seeing that the lock has an owner, the thread holding the mutex will politely exit and be awakened by the system when the lock is released, while the spin lock asks repeatedly whether the lock has been used up. I write a while loop to compete for resources over and over again, isn't that spin lock? No, no. Isn't it clear at a glance who is more relaxed and efficient?

In fact, spin locks are not that bad, there are a lot of scenarios, and in many cases they are better than mutex locks. I'm going to wash the floor for spin locks in this article. As for how to wash, it needs to talk in detail about their respective principles and engineering choices, which is really so amazing.

two。 Mutex lock

Mutex is a kind of "exclusive lock". For example, when thread A succeeds in locking, the mutex has been monopolized by thread A. as long as thread A does not release the lock, thread B will fail, and the failed thread B will release CPU to other threads. Since thread B has released CPU, the code of natural thread B locking will be blocked.

The phenomenon of blocking due to the failure of mutex locking is implemented by the operating system kernel. When locking fails, the kernel puts the thread to a "sleep" state, and when the lock is released, the kernel wakes up the thread at the appropriate time, and when the thread successfully acquires the lock, it can continue to execute. As shown below:

If the mutex lock fails, it will fall into the kernel state from the user mode, and the kernel will help us switch threads, which simplifies the difficulty of using the mutex lock, but there is also performance overhead.

What is the cost of this expense? There is the cost of two thread context switches:

When a thread fails to lock, the kernel sets the state of the thread from "run" to "sleep", and then switches the CPU to another thread to run.

Then, when the lock is released, the thread in the previous "sleep" state becomes "ready", and then the kernel switches the CPU to the thread to run at the appropriate time.

What is the context switch of the thread? When two threads belong to the same process, because the virtual memory is shared, these resources remain unchanged when switching, and only the private data, registers and other unshared data of the thread need to be switched.

Context switching takes between tens of nanoseconds and a few microseconds, and if the locked code takes a very short execution time (as is common), it will take much longer to change the context than to lock the code. Moreover, the thread's private data has been warmed up on CPU's cache, and as soon as this is in and out, the data may be cold, and then repeated cache miss can be really sour. So, if locked code execution takes only a few nanoseconds, why not hold the CPU and continue to spin and wait?

3. The principle of mutex

The above mutexes are all based on the assumption that Xiaoming has taken the lock and no one else can get his hands on it unless Xiaoming doesn't want it. Hey! How did you do that?

First consider the single-core scenario: can the hardware do a locked atomic operation? Yes! This is what the "test and set" instruction does, because it is a hardware instruction, the smallest execution unit, and can never be interrupted. With the "test and set" atomic instruction, the problem of lock implementation in a single-core environment has been satisfactorily solved.

What about the multi-nuclear environment? Simple, or "test and set" on it, this is an instruction, atomic, there will be no problem. Oh yeah? A single instruction ensures that the instruction will not be interrupted during execution on a single core, but what about two cores executing the instruction at the same time? Think again, the hardware still has to read lock from memory, judge and set the state to memory, it seems that this process is not so atomic, this is really nesting dolls ah. What about multiple nuclear implementations? First of all, we have to understand the key point of this place, the key point is that the two cores will operate memory in parallel, and from the scheduling of operating memory, "test and set" is not atomic, so we need to read memory first and then write memory. If we ensure that this memory operation is not parallel, we will return to a single-core scenario. As it happens, the hardware provides a mechanism to lock the memory bus. When we perform test and set operations in the state of locking the memory bus, we can ensure that there is only one core to test and set at the same time, thus avoiding the problem of multi-core.

On the x86 platform, CPU provides the means to lock the bus during instruction execution. There is a lead # HLOCK pin on the CPU chip. If an instruction is prefixed with "LOCK" in an assembly language program, the assembled machine code causes CPU to lower the potential of # HLOCK pin when executing this instruction and release it until the end of the instruction, thus locking the bus, so that other CPU on the same bus cannot access memory through the bus for the time being. It ensures the atomicity of this instruction in a multiprocessor environment.

The instructions that can be used with the LOCK instruction prefix are as follows:

BT, BTS, BTR, BTC (mem, reg/imm) XCHG, XADD (reg, mem / mem, reg) ADD, OR, ADC, SBB (mem, reg/imm) AND, SUB, XOR (mem, reg/imm) NOT, NEG, INC, DEC (mem)

4. Spin lock

The spin lock is the simplest type of lock that spins all the time, using the CPU cycle until the lock is available. It should be noted that on a single-core CPU, a preemptive scheduler is needed (that is, one thread is interrupted by the clock to run other threads). Otherwise, spin locks cannot be used on a single CPU, because a spinning thread will never give up CPU.

Spin lock has less overhead, and generally does not actively generate thread switching in multi-core systems, so it is suitable for programming methods such as asynchronous and cooperative programs to switch requests in user mode, but if the locked code takes too long to execute, the spin thread takes up CPU resources for a long time, so the spin time and the locked code execution time are proportional to each other. We need to know this clearly.

The use level of spin lock is similar to that of mutex lock, but the implementation level is completely different: when locking fails, mutex lock is dealt with by "thread switching" and spin lock by "busy waiting". The busy waiting here can be implemented with a "while" loop, but it's best not to do so! CPU provides a "PAUSE" instruction to implement busy waiting.

5. Spin lock principle

Spin lock is not a continuous while loop to acquire the lock, but also need to explain the principle? Wait, how do you ensure the atomicity of the data when you get the lock state? Do you use mutexes again? If there is a layer of mutex lock, then I can not wash the floor of the spin lock. Obviously, you can't hold a baby like that here!

When repeatedly trying to lock, there are two steps:

The first step is to check the status of the lock, and if the lock is idle, perform the second step

Second, set the lock to be held by the current thread

This process is called "Compare And Swap", or "CAS" for short. It combines the above two steps into a single hardware-level instruction and completes locking and unlocking operations in "user mode" without actively generating thread context switching, so it is faster and less expensive than mutexes.

As mentioned above, it is not recommended that the while loop acquire locks. What is the "PAUSE" instruction and "PAUSE" instruction provided by Intel CPU? So how does it solve the problem that anencephalic while circulation takes up CPU and is inefficient?

In fact, spin locks do not release CPU actively, so it is impossible to solve the problem of occupying CPU, but it can make this process more power-saving and more efficient to preempt locks.

The "PAUSE" command allows the CPU to rest a certain clock cycle, during which power consumption almost stops. The clock cycle of rest varies from version to version of CPU, probably between tens and hundreds of hours. Take the CPU running at the main frequency of 5Ghz as an example, one clock cycle is 0.2ns.

The clock cycle of the rest is not the bigger the better. For example, in Intel's new generation of Skylake architecture, the rest cycle of the initial "PAUSE" instruction is as high as 140clock cycles. This directly led to MySQL in the theoretically better performance of CPU, database performance ran worse than CPU in previous years, the squeezed toothpaste sucked back! In the subsequent steps, the clock cycle of "PAUSE" was reduced to 10 clock cycles of the previous generation, and the performance of the database was restored to the level that the toothpaste factory should have (a loss of performance improvement per generation).

Another advantage has something to do with pipelining, which can be filled with read operations by frequent testing. When another thread throws a lock variable write operation into the pipeline, the pipeline must be rearranged because the CPU must ensure that all read operations read the correct value. Pipeline rearrangement is very time-consuming and affects the performance of lock (). Imagine that when a worker thread W that acquires a lock exits from the critical section, when calling unlock to release the lock, several waiting threads S are spinning to check whether the lock is available, then the W thread will generate a store instruction, and several S threads will generate a lot of load instructions. The load instructions after the store will not be executed until the store has finished executing on the pipeline, because the processor is out of order, before there is no store instruction. Processors can be executed randomly and out of order on multiple undependent load. When there are store instructions, reorder reorder execution is required, which will seriously affect processor performance. According to intel, it will bring 25 times performance loss. The purpose of the Pause instruction is to reduce the number of parallel load, thereby reducing the time spent on reorder.

Thank you for your reading. the above is the content of "what's the difference between mutex and spin lock". After the study of this article, I believe you have a deeper understanding of the difference between mutex and spin lock. the specific use of the situation also needs to be verified by practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report