Thread synchronization of multithreading 07/09 Update SLTechnology News&Howtos

Thread synchronization of multithreading

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

Multi-thread content is roughly divided into two parts, one is asynchronous operation, can be used through dedicated, thread pool, Task,Parallel,PLINQ, etc., and here involves worker threads and IO threads; the second is thread synchronization, I am now learning and exploring the problem of thread synchronization.

By studying the contents of "CLR via C #", a clear architecture is formed for thread synchronization. Thread synchronization construction is realized in multi-thread, which is divided into two categories, one is primitive construction, the other is mixed construction. The so-called primitive is to use the simplest construct in the code. Primitive constructs are divided into two categories, one is user mode, and the other is kernel mode. On the other hand, the hybrid construction is the user mode and kernel mode which will be constructed internally, and the mode that uses it will have certain strategies, because the user mode and kernel mode have their own advantages and disadvantages. The hybrid construction is designed to balance the advantages and disadvantages of the two. The following lists the entire thread synchronization architecture

Primitive element

1.1 user mode

1.1.1 volatile

1.1.2 Interlock

1.2 Kernel mode

1.2.1 WaitHandle

1.2.2 ManualResetEvent and AutoResetEvent

1.2.3 Semaphore

1.2.4 Mutex

Mix

2.1 various Slim

2.2 Monitor

2.3 MethodImplAttribute and SynchronizationAttribute

2.4 ReaderWriterLock

2.5 Barier (less used)

2.6 CoutdownEvent (less used)

Let's start with the cause of the thread synchronization problem. When there is a variable An in memory, and the value stored in it is 2, when thread 1 executes, it will take the value of An out of memory and store it in the CPU register, and assign A to 3, which happens to be the end of thread 1's time slice. Then CPU distributes the time slice to thread 2, and thread 2 also takes the value of An out of memory and puts it into memory, but because thread 1 does not put the new value of variable A back into memory, thread 2 still reads the old value (that is, dirty data) 2, and then thread 2 will have some unexpected results if it needs to make some judgments on the value of A.

In view of the above problem of resource sharing, a variety of methods are often used. The following is described one by one.

First of all, let's talk about the user mode in the primitive construction. the advantage of user mode is that it executes relatively quickly, because it is coordinated by a series of CPU instructions, and the blocking it causes is only a very short period of blocking. For the operating system, this thread has been running and has never been blocked. The disadvantage is that only the system kernel can stop such a thread from running. On the other hand, because the thread is spinning rather than blocking, it will also take up the CPU time, resulting in a waste of CPU time.

The first is the volatile construct in the primitive user mode construction, which is often said to make CPU read the specified field (Field, that is, variables) from memory, and write to memory every time. However, it has something to do with code optimization of the compiler. Take a look at the following code first

Public class StrageClass {volatile int mFlag = 0; int mValue = 0; public void Thread1 () {mValue = 5; mFlag = 1;} public void Thread2 () {if (mFlag = = 1) Console.WriteLine (mValue);}}

Students who understand the problem of multi-thread synchronization will know that if two threads are used to execute the above two methods respectively, there are two results: 1. Do not output anything; 2. Output 5. However, in the process of compiling the CSC compiler to IL or JIT to machine language, the code will be optimized. In the method Thread1, the compiler will think that it doesn't matter to assign values to two fields, it will only stand from the point of view of single thread execution, and will not take into account the problem of multithreading at all, so it may disrupt the execution order of the two lines of code, resulting in first assigning mFlag to 1, and then assigning mValue to 5. This leads to the third result, the output of 0. Unfortunately, I have not been able to test this result.

The solution to this phenomenon is the volatile construction, and the effect of using this construction is that whenever you read a field that uses this construction, the operation is guaranteed to be executed first in the original code order, or when you write a field that uses this construction, the operation is guaranteed to be executed last in the original code order.

There are now three constructs that implement volatile. One is the two static methods VolatileRead and VolatileWrite of Thread. The parsing on MSND is as follows

Thread.VolatileRead reads the field value. This value is the latest value written by any processor on the computer, regardless of the number of processors or the state of the processor cache.

Thread.VolatileWrite immediately writes a value to the field so that the value is visible to all processors in the computer.

On multiprocessor systems, VolatileRead gets the latest value of the memory location written by any processor. This may require flushing the processor cache; VolatileWrite ensures that the value of the written memory location is immediately visible to all processors. This may require flushing the processor cache.

Even on uniprocessor systems, VolatileRead and VolatileWrite ensure that values are read or written to memory and are not cached (for example, in processor registers). Therefore, you can use them to synchronize access by another thread or through hardware-updated fields.

You can't see from the above text that he has anything to do with code optimization, so read on.

The volatile keyword is another implementation of volatile construction, which is a simplified version of VolatileRead and VolatileWrite, and using the volatile modifier to pair a field ensures that all access to that field uses VolatileRead or VolatileWrite. The description of the volatile keyword in MSDN is

The volatile keyword indicates that a field can be modified by multiple threads executing simultaneously. Fields declared as volatile are not restricted by compiler optimization (assuming they are accessed by a single thread). This ensures that the field renders the latest value at any time.

From this we can see that it has something to do with code optimization. Throughout the above introduction, two conclusions are drawn:

1. The read and write of the field constructed by volatile is operated directly to memory and does not involve the CPU register, so that all threads read and write it synchronously, and there is no dirty read. The read operation is atomic, and the write operation is atomic.

two。 The volatile construction modifier (or access) field is used, which is executed strictly in the order in which the code is written, the read operation will be performed at the earliest, and the write operation will be performed at the latest.

The last volatile construct, which is new in the .NET Framework, contains both Read and Write, which is actually the equivalent of Thread's VolatileRead and VolatileWrite. This needs to be explained by the source code. Take a random Read method of Volatile.

And look at Thraed's VolatileRead method.

Another user mode construct is Interlocked, which ensures that both reading and writing are in atomic operations, which is the biggest difference from the above volatile. Volatile can only ensure simple reading or writing.

Why is Interlocked like this? just take a look at Interlocaked's method.

Add (ref int,int) / / call ExternAdd external method

Whether CompareExchange (ref Int32,Int32,Int32) / 1 and 3 are equal, if they are equal, replace 2, and return the original value of 1.

Decrement (ref Int32) / / decrements and returns the call to add

Exchange (ref Int32,Int32) / / set 2 to 1 and return

Increment (ref Int32) / / call add by increment

Take one of the methods Add (ref int,int) as an example (Increment and Decrement actually call the Add method internally). It will first read the value of the first parameter and, after summing with the second parameter, write the result to the first parameter. First of all, the whole process is an atomic operation, which includes both reading and writing. As for how to ensure the atomicity of this operation, it is estimated that you need to check the Rotor source code. In terms of code optimization, it ensures that all writes are performed before Interlocked, which ensures that the values used in Interlocked are up-to-date, while any variable reads are read after Interlocked, which ensures that the values used later are newly changed.

The CompareExchange method is very important. Although there are few methods provided by Interlocked, more methods can be extended based on this. Here is an example of finding the maximum of the two values and copying the source code of Jeffrey directly.

Look at the above code, before entering the loop, first declare the value of target at the beginning of each loop, and after finding the maximum value, check whether the value of target has changed. If there is a change, you need to record the new value and calculate the maximum value again according to the new value until the target does not change. This satisfies the Interlocked that writing occurs before Interlocked, and Interlocked can read the latest value later.

Primitive kernel mode

Kernel mode relies on the kernel object of the operating system to deal with thread synchronization. First of all, its disadvantages, its speed will be relatively slow. There are two reasons, one is because it is implemented by the operating system kernel objects and needs to be coordinated within the operating system, and the other is that the kernel objects are all unmanaged objects. After learning about AppDomain, you will know that the objects accessed are either marshaled by value or by reference when they are not in the current AppDomain. After observing that this part of the unmanaged resources are marshaled by reference, there will be a performance impact. Combining the two points of the above two aspects, we can find out the disadvantages of kernel mode. But he is also on the plus side: 1. Threads do not "spin" but block while waiting for resources, which saves CPU time, and the blocking can set a timeout value. two。 It is possible to synchronize semaphores threads and CLR threads, or to synchronize threads in different processes (the former is not experienced, while the latter knows that there are boundary value resources in CLR). 3. Security settings can be applied to prohibit access to authorized accounts (I don't know what's going on here).

The base class for all objects in kernel mode is WaitHandle. All the class levels of kernel mode are as follows

WaitHandle

EventWaitHandle

AutoResetEvent

ManualResetEvent

Semaphore

Mutex

WaitHandle inherits MarshalByRefObject, which marshals unmanaged objects by reference. WaitHandle is mainly a variety of Wait methods, calling the Wait method will be blocked before the signal is received. WaitOne is waiting for a signal, WaitAny (WaitHandle [] waitHandles) is receiving any waitHandles signal, and WaitAll (WaitHandle [] waitHandles) is waiting for all waitHandles signals. Each of these methods has a version that allows you to set a timeout. Other kernel mode constructs have similar Wait methods.

The EventWaitHandle maintains a Boolean value internally, and the Wait method blocks the thread when the Boolean value is false and does not release the thread until the Boolean value is true. The methods to manipulate this Boolean value are Set (), which sets the Boolean value to true;, and Reset (), which is set to false. This is the equivalent of a switch, and after calling Reset, thread execution is paused to Wait until Set resumes. It has two subclasses, using a similar way, except that AutoResetEvent automatically calls Reset after calling Set, so that the switch immediately returns to the off state; while ManualResetEvent needs to manually call Set to turn the switch off. This has the effect that AutoResetEvent generally lets one thread pass each time it is released, while ManualResetEvent may let multiple threads through before manually calling Reset.

The interior of Semaphore maintains a × ×. When constructing a Semaphore object, the maximum semaphore and the initial semaphore value are specified. Each time WaitOne is called, the semaphore will be increased by 1. When the semaphore is added to the maximum value, the thread will be blocked. When Release is called, one or more semaphores will be released, and one or more blocked threads will be released. This is in line with the producer and consumer problem, when the producer continues to add products to the product queue, he will WaitOne, when the queue is full, it is equivalent to the semaphore is full, the generator will be blocked, when consumers consume a commodity, Release will release a space in the product queue, at this time because there is no space to store products producers can start to work to store products in the product queue.

The interior of Mutex is a little more complicated than the previous two, first of all, the similarity is that the current thread is blocked by WaitOne, and the blocking is released by ReleastMutex. The difference is that WaitOne allows the thread of the first call to pass, and the rest of the thread will be blocked when it calls WaitOne. The thread that has passed WaitOne can call WaitOne many times, but it must call ReleaseMutex the same number of times to release, otherwise other threads will be blocked all the time because of the unequal number of times. Compared to the previous constructs, this construct will have the concepts of thread ownership and recursion, which cannot be implemented by the previous constructs alone, with the exception of additional encapsulation.

Mixed structure

The above primitive construction is implemented in the simplest way, user mode is faster than user mode, but it will lead to a waste of CPU time; kernel mode solves this problem, but it will bring performance loss, each has its own advantages and disadvantages, while hybrid construction combines the advantages of both, it will use user mode internally through a certain strategy at the right time, and in another case it will use kernel mode. But these layers of judgment bring memory overhead. There is no perfect structure in multithreaded synchronization, and each structure has its advantages and disadvantages. It makes sense to exist. Combined with specific application scenarios, there will be the best construction available. It just depends on whether we can weigh the pros and cons according to the specific situation.

For classes with various Slim suffixes, you can see several classes that end with the Slim suffix: ManualResetEventSlim,SemaphoreSlim,ReaderWriterLockSlim in the System.Threading namespace. Except for the last one, the other two have the same construction in primitive kernel mode, but these three classes are simplified versions of the original construction, especially the first two, using the same way, but try to avoid the use of operating system kernel objects, and achieve a lightweight effect. For example, the kernel construction ManualResetEvent is used in SemaphoreSlim, but this construction is initialized with a delay and is not used as a last resort. As for ReaderWriterLockSlim, we will introduce it later.

The Monitor and lock,lock keywords are the most well-known means to achieve multithread synchronization, so let's start with a piece of code

This method is quite simple and meaningless, just to see how the compiler compiles this code, by looking at the IL as follows

Notice that try appears in the IL code. Finally statement block, Monitor.Enter and Monotor.Exit method. Then change the code and then compile to see the IL

IL code

The code is similar, but not equivalent. In fact, the code equivalent to the lock statement block is as follows

So since lock essentially calls Monitor, how does Monitor achieve thread synchronization by locking an object? It turns out that every object in the managed heap has two fixed members, one pointer to the object type and the other to a thread synchronization block index. This index points to the elements of an array of synchronous blocks on which Monitor locks threads. According to Jeffrey (author of CLR via C #), there are three fields in the synchronization block, the thread of ownership Id, the number of waiting threads, and the number of recursions. However, I learned from another batch of articles that the members of thread synchronization blocks are not just these. Interested students can read the article "revealing synchronous Block Index", which has two articles. When Monitor needs to lock an object obj, it checks whether the synchronization block index of obj is an index of the array. If it is-1, it finds an idle synchronization block associated with it from the array, while the ownership thread of the synchronization block Id records the Id of the current thread. When a thread calls Monitor again, it will check whether the ownership Id of the synchronization block corresponds to the current thread Id, and let it pass if it can, add 1 to the number of recursions, and if it does not, throw the thread into a ready queue (which actually exists in the synchronization block) and block it. This synchronization block checks the number of recursions when calling Exit to make sure that the ownership thread Id is cleared when the recursion is over. Know whether there is a thread waiting by the number of waiting threads, and if so, remove the thread from the waiting queue and release it, otherwise disassociate the synchronous block and let the synchronous block wait to be used by the next locked object.

There is also a pair of methods Wait and Pulse in Monitor. The former enables the thread that acquired the lock to release the lock briefly, while the current thread is blocked and placed in the waiting queue. The thread will not be put into the ready queue from the waiting queue until other threads call the Pulse method, and the next time the lock is released, the lock will be acquired again, depending on what happens in the waiting queue.

ReaderWriterLock read-write lock, the traditional lock keyword (Enter and Exit equivalent to Monitor), his lock on shared resources is a full mutex lock, once locked resources other resources can not be accessed at all.

On the other hand, the lock split read and write locks added by ReaderWriterLock to mutually exclusive resources are similar to the shared locks and exclusive locks mentioned in the database. Roughly speaking, a resource with a read lock allows multiple threads to access it, while a resource with a write lock can be accessed by only one thread. Two kinds of threads with different shrinking can not access resources at the same time, but strictly speaking, threads with read locks can access resources as long as they are in the same queue, while threads with different queues cannot; resources with write locks can only be in one queue, while only one thread in the write lock queue can access resources. The criterion to distinguish whether the thread of the read lock is in a unified queue is whether any other thread added a write lock and no other thread added a write lock during the period of time between the thread that added the read lock and the thread that added the read lock last time. The two threads are in the same read lock queue.

ReaderWriterLockSlim is similar to ReaderWriterLock in that it is an updated version of the latter and appears in the. NET Framework3.5, which is said to optimize recursion and simplify operations. I haven't delved into the recursive strategy here. Now give a rough list of the methods they usually use.

Common methods of ReaderWriterLock

The arrangement and combination of Acqurie or Release ReaderLock or WriteLock

UpGradeToWriteLock/DownGradeFromWriteLock is used to upgrade to write locks in read locks. Of course, this upgrade process also involves the thread switching from the read lock queue to the write lock queue, so you need to wait.

ReleaseLock/RestoreLock releases all locks and restores lock statu

ReaderWriterLock implements the IDispose interface in the following mode

TryEnter/Enter/Exit ReadLock/WriteLock/UpGradeableReadLock

(the above is quoted from another note "ReaderWriterLock")

CoutdownEvent's rarely used hybrid construction, contrary to Semaphore, is reflected in that Semaphore blocks threads when the internal count (that is, semaphores) reaches its maximum, while CountdownEvent blocks threads when the internal count reaches 0. The methods are as follows

AddCount / / count increment

Signal / / count decreasing

Reset / / count is reset to specified or initial

Wait / / does not block if and only if the count is 0, otherwise it is blocked.

Barrier is also a less used hybrid construct for dealing with multi-thread collaboration in step-by-step operations. It internally maintains a count that represents the number of participants in this collaboration. When different threads call SignalAndWait, it adds 1 to this count and blocks the calling thread until the count reaches its maximum value. All blocked threads are not released. If you still don't understand, take a look at the sample code above MSND.

Here, the number of participants initialized to the Barrier is 3, and the delegate is called each time a step is completed, which is the value step index of the output count. The number of participants later increased by two and decreased by one. The operation of each participant is the same, adding atoms to count, and then calling SgnalAndWait to tell Barrier that the current step has been completed and waiting for the next step to start. But the third time, due to an exception thrown in the callback method, each participant throws an exception when calling SignalAndWait. A parallel operation is started through Parallel. Assuming that the number of jobs running in parallel is different from the number of Barrier participants, this will lead to unexpected situations in SignalAndWait.

Next, let's talk about two Attribute, which is probably not a synchronization construct, but it can also play a role in thread synchronization.

MethodImplAttribute this Attribute applies to the method, when the given parameter is MethodImplOptions.Synchronized, it will lock the whole method body, and any thread calling this method will be blocked when it does not acquire the lock, and will not wake up until the thread that owns the lock is released. For static methods, it is equivalent to locking the type object of this class, that is, lock (typeof (ClassType)); for instance methods, it is equivalent to locking instances of that object, that is, lock (this). At first, I was suspicious of the conclusion that lock was called internally, so I compiled it with IL and found that the code of the method body was not different. I looked at some of the source code and had no clue. Later, I found that its IL method header was different from the ordinary method, with an extra synchronized.

So I looked for all kinds of materials on the Internet, and finally found that the blog [1] [2] of "junchu25" mentioned using WinDbg to view the code generated by JIT.

Call the Attribute's

Call the lock's

Even Jeffrey is not recommended for thread synchronization implemented with this Attribute.

System.Runtime.Remoting.Contexts.SynchronizationAttribute this Attribute applies to the class. Add the Attribute to the class definition and inherit the class from ContextBoundOject. It will add the same lock to all methods in the class, which is broader than MethodImplAttribute. When a thread calls any method of this class, if it does not get the lock, then the thread will be blocked. There is a saying that it essentially calls lock, and it is even more difficult to verify this statement. There are few resources in China, and it involves AppDomain, thread context, and finally, the core is implemented by the class SynchronizedServerContextSink. AppDomain should be introduced in a separate article. But I would like to say a little bit here, I used to think that memory is wired stack and heap memory, but this is only a very basic division, heap memory will also be divided into several AppDomain, there is at least one context in each AppDomain, and each object will be subordinate to a context in an AppDomain. Objects that cross the AppDomain cannot be accessed directly, either marshaling by value (equivalent to deeply copying an object to the calling AppDomain) or marshaling by reference. For marshaling by reference, this class is required to inherit MarshalByRefObject. When calling an object that inherits this class, it is not the calling class itself, but in the form of a proxy. Then those above and below also need to be marshaled by value. Normally, an object is constructed in the default context of the process default AppDomain, while the instance of the class that uses the SynchronizationAttribute feature belongs to another context. When the class inherits the ContextBoundObject base class to access the object, it also accesses the object by reference marshaling, not to the object itself. As for whether to cross the following access object can be judged by the RemotingServices.IsObjectOutOfContext (obj) method. SynchronizedServerContextSink is an inner class of mscorlib. When a thread invokes the following object, the call is encapsulated by SynchronizedServerContextSink into an object of WorkItem, which is also an inner class of mscorlib. SynchronizedServerContextSink requests SynchronizationAttribute,Attribute to decide whether the currently processed WorkItem will be executed immediately or sequentially in a first-in-first-out WorkItem queue, which is a member of SynchronizationAttribute, based on whether there are multiple WorkItem execution requests. When a queue member joins the queue and leaves the queue, or when Attribute determines whether to execute WorkItem immediately, it needs to acquire a lock lock, and the object to be locked is the queue of this WorkItem. This involves several types of interaction, I have not yet fully seen, the above process may be wrong, wait for a clear analysis before supplement. However, thread synchronization implemented through this Attribute is intuitively not recommended, mainly due to performance depletion and a wide range of locks.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.