Does the lightweight lock of Synchronized spin? 07/11 Update SLTechnology News&Howtos

Does the lightweight lock of Synchronized spin?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the knowledge of "whether the lightweight lock of Synchronized is spinning or not". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Lock upgrade must have been mentioned in too many articles online. It is mentioned here that when the lightweight lock CAS fails, the current thread will try to use spin to acquire the lock.

In fact, I thought so at first, after all, they all said so, and it made a lot of sense.

Because heavyweight locks block threads, if the locked code executes very quickly, other threads don't need locks after a little spin, and CAS succeeds directly, so you don't have to block the thread and wake up again.

But I looked at the source code and found that this is not the case, this code is in synchronizer.cpp.

So after the CAS fails, there is no spin operation. If the CAS succeeds, it will return directly. If it fails, the following lock inflation method will be executed.

I flipped through the ObjectSynchronizer::inflate code to lock the bloat, and I didn't see the spin operation.

So from the source code point of view, lightweight lock CAS failure will not spin but directly expand into a heavyweight lock.

However, to optimize performance, spin operations do exist in Synchronized.

That is, after upgrading to a heavy lock, if the thread does not scramble for the lock, it will spin for a period of time waiting for the lock to be released.

Let's look at the source code, and the comments alone have made it very clear:

After all, it is a bit expensive to block threads from joining the queue and then waking up.

Let's take a look at the operation of TrySpin, where there is adaptive spin. In fact, TrySpin_VaryDuration can reflect the change of spin from the actual function name.

At this point, the problem of Synchronized spin is over. If the heavyweight lock competition fails, there will be spin operation. Lightweight locks do not have this action (at least 1.8 source code is like this). If someone refutes you, please throw this article to him.

But at this point, I might as well continue to talk about Synchronized, after all, the appearance rate of this thing is still very high.

How deep is this article about Synchronized?

After that, if an interviewer asks you what source code you have seen?

After reading this article, you can answer: I have read the source code of JVM.

Of course, the source code is a little too much. I've gone through all the operations related to Synchronized, and it's still a little difficult.

However, readers who have seen my source code analysis before will know that I will draw a flow chart to sort it out, so even if the code does not understand, the process can still be made clear!

All right, let's go!

Start with the heavyweight lock

Synchronized was a heavyweight lock before 1. 6.

Because there will be thread blocking and waking up, this operation is realized with the help of the system call of the operating system, and the common Linux is realized by using the mutex of pthread.

I took a screenshot of the source code of the calling thread blocking, and you can see that mutex is indeed used.

When it comes to system calls, there will be context switching, that is, the switching between user mode and kernel mode. We know that this kind of switching is still very expensive.

So it is called heavyweight lock, and because of this, there will be the above-mentioned adaptive spin operation, because you do not want to go to this stage!

Let's take a look at the implementation principle of the weight lock.

The Synchronized keyword modifies code blocks, instance methods, and static methods, all of which essentially act on objects.

The code block acts on the object in parentheses, the instance method is the current instance object, that is, this, and the static method is the current class.

There's a concept called the critical zone.

We know that the reason for the competition is that there is a shared resource, and multiple threads want to get that shared resource, so we divide a region, and the code that operates the shared resource is in the area.

It can be understood that if you want to enter this area, you must hold the lock, otherwise you will not be able to enter. This area is called the critical area.

When decorating a code block with Synchronized

At this time, the compiled bytecode will have monitorenter and monitorexit instructions. I am used to understanding according to the critical section, enter is about to enter the critical section, and exit is about to exit the critical section, corresponding to getting locked and unlocked.

In fact, these two instructions are related to the object that decorates the code block, that is, the lockObject in the code above.

Each object has a monitor object associated with it, and the thread that executes the monitorenter instruction is trying to acquire ownership of the monitor, which means successfully acquiring the lock.

This monitor below will be analyzed in detail. Let's first take a look at the bytecode generated.

At the top of the picture is the bytecode compiled by the lockObject method, and below is the lockObject method, which is easier to understand.

From the screenshot, monitorenter execution is performed before the execution of System.out. Here, the contention for lock is performed. If you get the lock, you can enter the critical area.

After the call, there is a monitorexit instruction indicating that the lock is released and is about to go out of the critical area.

When I mark a monitorexit instruction in the picture, it also needs to be unlocked because of any exception, otherwise it will be deadlocked.

From the generated bytecode, we can also know why synchronized doesn't need to be unlocked manually.

Someone is carrying a heavy load for us! The bytecode generated by the compiler has been done for us, and the exception has been taken into account.

When using synchronized to modify the method

The bytecode generated by the decorating method is not quite the same as the one that decorates the code block, but it is essentially the same.

There are no monitorenter and monitorexit instructions in the bytecode at this time, but tampered with the access tag of the current method.

I use the idea plug-in here to look at the bytecode, so the literal result is different, but the flag tag is the same: 0x0021, a combination of ACC_PUBLIC and ACC_SYNCHRONIZED.

The principle is to mark ACC_SYNCHRONIZED on the flag when decorating the method, and distinguish it by the ACC_SYNCHRONIZED flag in the runtime pool, so that the JVM knows that the method is marked by synchronized, so it will execute the contention operation when entering the method, and only when it gets the lock can it continue to execute.

Then, whether it is a normal exit or an abnormal exit, the operation will be unlocked, so the essence is still the same.

There is also an implicit lock object that I mentioned above, the decorating instance method is this, and the decorating class method is the current class (there is a hole in this point, which I analyzed in this article).

I still remember an interview question that seemed to be asked when the byte was bouncing. What is the difference between the bytecode level when the interviewer asks the synchronized embellishment method and the code block?

How to say that ? Before you know it, it's getting closer to the byte beat.

Let's go further into synchronized.

We already know from the above that synchronized acts on objects, but without going into details, let's take a look at the next wave.

In Java, object structures are divided into object headers, instance data, and alignment padding.

The object header is divided into: MarkWord, klass pointer, array length (only arrays), our focus is on locks, so the focus is only on MarkWord.

Let me draw the memory layout of MarkWord in different states when 64-bit (the monitor in it is wrong, but I'm not going to change it, leave a mark ).

The reason why the MarkWord structure is so complex is that it needs to save memory so that the same memory area has different uses at different stages.

Remember this diagram, all kinds of lock operations are strongly related to this MarkWord.

As you can see from the figure, in the case of a heavyweight lock, the lock mark of the object header is 10, and there is a pointer to the monitor object, so the lock object and monitor are associated in this way.

And this monitor is realized by C++ in HotSpot, called ObjectMonitor, it is the implementation of pipe process, and some are also called monitors.

It looks like this. I have annotated the meaning of the key fields and specially truncated the comments of the header file:

Remember for a while, the source code has a lot to do with these fields.

Basic principle of synchronized

First, a picture, combined with the above monitor comments, first look, do not understand it does not matter, have a general impression of circulation:

All right, let's move on.

We mentioned the monitorenter directive earlier, which executes the following code:

We are now analyzing heavyweight locks, so we don't care about the biased code, but the screenshot at the beginning of the slow_enter method article will eventually be executed into the ObjectMonitor::enter method.

You can see that the key point is to set the _ owner in ObjectMonitor as the current thread through CAS. A successful setting means that the lock is acquired successfully.

Then the reentrant is represented by the self-increment of recursions.

If CAS fails, the following loop is executed:

In fact, the EnterI code has already been screenshot above. Here again, I added the important queuing operation and deleted some unimportant code:

First, try to acquire the lock again. If not, you can spin adaptively. If you can't, you can wrap it as an ObjectWaiter object and add it to the one-way linked list of _ cxq. If you struggle and don't get the lock, it will block, so there is a blocking method below.

You can see that no matter which branch executes Self- > _ ParkEvent- > park (), which is the call to pthread_mutex_lock mentioned above.

At this point, the process of scrambling for locks is very clear. I'll draw a picture to sort it out.

Next, let's take a look at how to unlock it.

ObjectMonitor::exit is the method that is called when it is unlocked.

The reentrant lock is judged by _ recursions. If the reentrant _ recursions++, unlocks once _ recursions--, is reduced to 0, the lock needs to be released.

Then the unlocked thread will wake up the waiting thread. Here are several modes, let's take a look.

If QMode = = 2 & & _ cxq! = NULL:

If QMode = = 3 & & _ cxq! = NULL, I intercepted part of the code:

If QMode = = 4 & & _ cxq! = NULL:

If QMode is not 2, it will eventually execute:

At this point, the unlocking process is over! Let me draw another flow chart:

Next, let's look at the method of calling wait.

Nothing funny, just add the current thread to the bidirectional linked list of _ waitSet, and then execute the ObjectMonitor::exit method to release the lock.

Next, let's look at the method of calling notify.

There is nothing funny, just take the node from the head of _ waitSet, and then choose whether to put it in the head or tail of cxq or EntryList according to the policy, and wake up.

As for notifyAll, I won't analyze it, it's the same, it just makes a cycle and wakes up all of it.

So far, several operations of synchronized have been done, and it can be said that I have studied synchronized deeply.

Now if we take a look at this picture, we should have a good idea.

Why are there two lists _ cxq and _ EntryList to put threads?

Because multiple threads will compete for locks at the same time, an one-way linked list of _ cxq is created to hold these concurrency based on CAS, and then a two-way linked list of _ EntryList is created to move some thread nodes each time it wakes up, reducing the tail competition of _ cxq.

Introduction of spin

The principle of synchronized should be generally clear, we also know that the underlying layer will use system calls, there will be a lot of overhead, so think about how to optimize?

It has been known since the subtitle that the plan is spin, which has been mentioned at the beginning of the article, and it is mentioned here again.

Spin is actually idling the CPU, executing some meaningless instructions in order not to give up the CPU and wait for the lock to be released.

Normally, lock acquisition failure should block joining the queue, but sometimes as soon as the lock is blocked, other threads release the lock and then wake up the newly blocked thread, which is not necessary.

Therefore, when the thread competition is not very fierce, spin for a while, which means that the lock can be acquired directly without blocking the thread, which avoids unnecessary overhead and improves the performance of the lock.

But the number of spins is also a difficulty, in the case of fierce competition, spin is a waste of CPU, because the result must be that spin will block later.

So Java introduces adaptive spin, which dynamically adjusts the number of spins according to the number of last spins, which is called doing things with historical experience.

Note that this is a heavyweight lock step, don't forget what you said at the beginning of the article.

At this point, the principle of synchronized heavyweight locks should be very clear, right? Make a brief summary

The underlying synchronized is implemented using monitor objects, CAS and mutex mutexes, and there are waiting queues (cxq and EntryList) and conditional waiting queues (waitSet) inside to store the corresponding blocked threads.

The thread that does not compete for the lock is stored in the waiting queue, and the thread that acquires the lock is stored in the conditional waiting queue after calling wait. Both unlocking and notify will wake up the waiting thread in the corresponding queue to compete for the lock.

Then, because blocking and wake-up depend on the underlying operating system implementation, the system call has a switch between user state and kernel state, so it has a high overhead, so it is called heavy-level lock.

Therefore, an adaptive spin mechanism is introduced to improve the performance of the lock.

Now it's time to introduce lightweight locks.

Let's consider again whether there is a scenario where multiple threads request the same lock at different times, and there is no need to block the thread at all, not even the monitor object, so the concept of lightweight lock is introduced, which avoids system calls and reduces overhead.

When the lock competition is not fierce, this scenario is still very common and may be the norm, so the introduction of lightweight locks is necessary.

Before introducing the principle of lightweight locks, take a look at the previous MarkWord diagram.

Lightweight locks operate on the MarkWord of the object header.

If it is determined that the current unlocked state is unlocked, an area called LockRecord is delimited in the current stack frame of the current thread stack, and a copy of the MarkWord of the lock object is then copied into the LockRecord called dhw (which is executed by the set_displaced_header method).

Then point the lock object header to the LockRecord via CAS.

The locking process of lightweight locks:

If it is currently locked and held by the current thread, put the null into the dhw, which is the logic of reentering the lock.

Let's take a look at the logic of lightweight lock unlocking:

The logic is very simple, which is to change the markword (dhw) stored in LockRecord in the current stack frame back to the object header through CAS.

If the obtained dhw is null, it means that it is re-entered at this time, so you can return it directly, otherwise you will use CAS to change it. If CAS fails, it means there is competition at this time, then it will be inflated!

Let me say a few more words about this lightweight lock.

Each lock must be in a method call, and the method call has a stack frame on the stack. If it is a lightweight lock reentry, then the dhw in the stack frame is null, otherwise it is the markword of the lock object.

In this way, when unlocking, you can determine whether it is reentrant or not by the value of dhw.

Now it's time to introduce a bias lock.

Let's consider again whether there is a scenario in which only one thread holds the lock at the beginning, and there will be no competition from other threads, where frequent CAS is unnecessary and CAS is expensive.

So the JVM researchers created a bias lock, which is biased towards a thread, so that thread can acquire the lock directly.

Let's take a look at this diagram again, and it tends to be locked on the second line.

The principle is not difficult, if the current lock object supports biased locks, then through the CAS operation: the address of the current thread (also as a unique ID) is recorded in the markword, and the last three digits of the tag field are set to 101.

When a thread requests the lock, you only need to determine whether the last three digits of the markword are 101and point to the address of the current thread.

Another point that may be missed in many articles is that you also need to determine whether the epoch value is the same as the epoch value in the class of the lock object.

If all are satisfied, it means that the current thread holds the bias lock and can return directly.

What's this epoch for?

It can be understood as the generation of biased lock.

The preferred lock is to undo when there is competition, which is actually to upgrade to a lightweight lock.

When a class of objects are undone too many times, such as an object of class Yes as a bias lock, it is often undone. When the number of times reaches a certain threshold (XX:BiasedLockingBulkRebiasThreshold, the default is 20), the contemporary bias lock will be discarded and the epoch of the class will be added to one.

So when the epoch values of the class object and the lock object are different, the current thread can bias the lock to itself, because the previous generation of biased locks has been abandoned.

However, in order to ensure that the executing thread holding the lock cannot lose the lock because of this, biased lock revocation requires all threads to be at a safe point, and then traverse the Java stack of all threads to find locked instances of this class, and add 1 to the epoch value in their flag field.

When the number of undo exceeds another threshold (XX:BiasedLockingBulkRevokeThreshold, the default is 40), the biased function of this class is discarded, that is, the class cannot be biased.

At this point, the whole Synchronized process should be relatively clear.

I am talking about the process of lock upgrade in reverse, because in fact there are heavy locks first, and then the biased locks and lightweight locks are optimized according to the actual analysis.

Including some of the details during the period should also be relatively clear, I think for Synchronized to understand this is almost enough.

I made another picture on openjdk wiki to see if it was clear:

This is the end of the content of "whether the lightweight lock of Synchronized is spinning". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.