Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of JUC Synchronizer Framework AQS for java concurrent packet

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you java and JUC synchronizer framework AQS example analysis, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

1. Background introduction

Through JCP's JSR166 specification, version 1.5 of Java introduced the j.u.c package, which provides a series of classes that support moderate concurrency. These components are a series of synchronizers (abstract data types (ADT)). These synchronizers mainly maintain the following functions: internal synchronization state management (for example, indicating whether the state of a lock is acquired or released), synchronization state update and check operations. and at least one method causes the calling thread to block when the synchronization state is acquired, and to unblock the thread when other threads change the synchronization state. Practical examples of the above include different forms of mutex exclusive locks, read-write locks, semaphores, barriers, Future, event indicators, and transmission queues.

Almost any synchronizer can be used to implement other forms of synchronizer. For example, you can implement a semaphore with a reentrant lock or a reentrant lock with a semaphore. However, the complexity, overhead and inflexibility of doing so make it a second-rate project at best. And unattractive. If any such construction cannot be intrinsically more concise than other forms, then developers should not randomly choose one of them to build another synchronizer. Instead, JSR166 sets up a small framework, the AQS class. This framework provides a general mechanism for constructing synchronizers and is used by most classes in the j.u.c package, while many users also use it to define their own synchronizers.

The following sections of this paper discuss the requirements of the framework, the main ideas behind the design and implementation, sample usage, and some measurements of performance metrics.

2 requirements 2.1 functionality

Synchronizers generally contain two methods, one is acquire, and the other is release. The acquire operation blocks the calling thread until or unless the synchronization state allows it to continue execution. On the other hand, the release operation changes the synchronization state in some way so that one or more threads blocked by acquire continue to execute.

There is no uniform definition of the synchronizer's API in the j.u.c package. As a result, some classes define generic interfaces (such as Lock), while others define their proprietary versions. Therefore, the names and forms of acquire and release operations vary from class to class. For example: Lock.lock,Semaphore.acquire,CountDownLatch.await and FutureTask.get, in this framework, these methods are acquire operations. However, J.U.C has a consistent agreement between classes to support a range of common usage options. Where meaningful, each synchronizer supports the following operations:

Blocking and non-blocking (such as tryLock) synchronization.

Optional timeout setting that allows callers to give up waiting

Tasks that are cancelled through interrupts are usually divided into two versions, one acquire that can be canceled and the other that cannot.

The implementation of the synchronizer varies depending on whether its state is exclusive or not. In an exclusive state synchronizer, only one thread can pass through a blocking point at a time, while a shared state synchronizer can have multiple threads executing at the same time. General lock implementation classes tend to maintain only exclusive state, but, for example, counting semaphores allow multiple threads to execute at the same time if the number permits. In order for the framework to be widely used, both modes should be supported.

The Condition interface is also defined in the j.u.c package to support await/signal operations in the form of monitoring, which are related to the Lock class in exclusive mode, and the implementation of Condition is inherently closely related to the Lock class associated with it.

2.2 performance targets

The performance of Java built-in locks (methods or code blocks that use synchronized) has always been a concern, and there have been a series of articles describing its construction (for example, references [1], [3]). However, most studies focus on how to minimize the cost of space (because any Java object can be regarded as a lock) and time when a single-core processor is used most of the time in a single-threaded context. None of this is particularly important for synchronizers: programmers use synchronizers only when needed, so there is no need to compress space to avoid waste. and synchronizers are almost exclusively used in multithreaded designs (especially on multi-core processors), where occasional competition is to be expected. Therefore, the conventional JVM lock optimization strategy is mainly aimed at zero-competition scenarios, while other scenarios use the unpredictable "slow path" (slow paths), so the conventional JVM lock optimization strategy is not suitable for typical multithreaded server applications that rely heavily on J.U.C packages.

The main performance goal here is scalability, that is, to ensure its efficiency steadily in most cases, even or especially when the synchronizer is competitive. Ideally, no matter how many threads are trying to pass through the synchronization point, the cost of passing through the synchronization point should be constant. In cases where a thread is allowed to pass through the synchronization point but has not yet passed, minimizing its total time is one of the main goals. However, this must also take into account the balance of resources, including the need for total CPU time, memory load, and thread scheduling overhead. For example, it usually takes less time to acquire a spin lock than to block a lock, but it also wastes CPU clock cycles and causes memory competition, so it is used infrequently.

Achieving these goals of the synchronizer consists of two different types of use. Most applications maximize their overall throughput, fault tolerance, and best ensure that hunger is minimized. However, for programs that control resource allocation, it is more important to maintain the fairness of multithreaded reads and accept poor overall throughput. No framework can decide which approach to choose on behalf of the user, so different fairness strategies should be provided.

No matter how elaborate the internal implementation of the synchronizer is, it can still cause performance bottlenecks in some applications. Therefore, the framework must provide appropriate monitoring tools for users to discover and mitigate these bottlenecks. At least one way to determine how many threads are blocked needs to be provided.

3 design and implementation

The basic idea behind the synchronizer is very simple.

The acquire operation is as follows:

While (synchronization state does not allow acquire) {enqueue current thread if not already queued; possibly block current thread;} dequeue current thread if it was queued

The release operation is as follows:

Update synchronization state;if (state may permit a blocked thread to acquire) unblock one or more queued threads

In order to do this, the following three basic components need to cooperate with each other:

Atomic Management of synchronous State

Blocking and unblocking of threads

Queue management

It is possible to create a framework to implement each of these three components. However, this makes the entire framework both difficult and inefficient. For example, the information stored in the queue node must be consistent with the information needed to unblock, and the signature of the exposed method must depend on the characteristics of the synchronization state.

The core decision of the synchronizer framework is to choose a specific implementation for these three components, while there are a large number of options available in how to use them. This intentionally limits its scope of application, but provides sufficient efficiency so that there is actually no reason to build a new one without using this framework under appropriate circumstances.

3.1 synchronization status

The AQS class uses a single int (32-bit) to hold the synchronization state and exposes getState, setState, and compareAndSet operations to read and update the state. These methods rely on the support of the j.u.c.atomic package, which provides semantics for reading and writing volatile in JSR133, and implements compareAndSetState by using local compare-and-swap or load-linked/store-conditional instructions, so that only when the synchronization state has an expected value will it be set to a new value atomically.

The synchronization state is limited to a 32-bit shaping for practical reasons. Although JSR166 also provides atomic manipulation of 64-bit long fields, these operations are simulated using internal locks on many platforms, which may not make the performance of the synchronizer ideal. Of course, there may be a class that specializes in 64-bit state in the future. However, it is not a good decision to introduce such a class into this package now.

(translator's note:

The java.util.concurrent.locks.AbstractQueuedLongSynchronizer class is already included in JDK1.6

Even an AbstractQueuedSynchronizer version of the synchronization state is maintained in the form of long). Currently, a 32-bit state is sufficient for most applications. In the j.u.c package, there is only one synchronizer class that may need more than 32 bits to maintain state, and that is the CyclicBarrier class, so it uses locks (as do most of the higher-level tools in the package).

Concrete implementation classes based on AQS must define tryAcquire and tryRelease methods based on exposed state-related methods to control acquire and release operations. The tryAcquire method must return true when the synchronization state is satisfied, and the tryRelease method must return true when the new synchronization state allows subsequent acquire. These methods all accept a parameter of type int to pass the desired state. For example, when a thread returns from a conditional wait and then reacquires the lock, in order to re-establish the loop count scenario, the lock can be reentered into the lock. Many synchronizers do not need such a parameter, so just ignore it.

3.2 blocking

Prior to JSR166, blocking threads and unblocking threads were based on Java built-in monitors, and no synchronizer could be created based on Java API. The only choices are Thread.suspend and Thread.resume, but they both have unsolvable race problems, so they can't be used: when a non-blocking thread calls resume before a thread that is about to block calls to suspend, the resume operation will have no effect.

The j.u.c package has a LockSuport class that contains a solution to this problem. The method LockSupport.park blocks the current thread unless / until a LockSupport.unpark method is called (it is also possible for the unpark method to be called in advance). Calls to unpark are not counted, so calling the unpark method multiple times before a park call will only undo a park operation. In addition, they work on each thread rather than every synchronizer. A thread calling a park operation on a new synchronizer may return immediately because there may be "remaining" unpark operations before that. However, in the absence of a unpark operation, the next call to park will block. Although this state can be explicitly eliminated, it is not worth it. It is more efficient to call park multiple times when needed.

This simple mechanism is similar to some uses, such as the thread library of Solaris-9, the "consumable event" in WIN32, and the NPTL thread library in Linux. Therefore, the most common platforms running Java have corresponding effective implementations. However, the current Sun Hotspot JVM reference implementation on Solaris and Linux actually uses a condvar of pthread to accommodate the current runtime design. The park method also supports optional relative or absolute timeout settings, and in combination with JVM's Thread.interrupt, you can unpark a thread through interrupts.

3.3 queue

The key of the whole framework is how to manage the queue of blocked threads, which is a strict FIFO queue, so the framework does not support priority-based synchronization.

At present, there is little controversy in the industry that the best choice of synchronous queue is the non-blocking data structure which is not constructed by underlying locks. There are two main choices: one is a variant of Mellor-Crummey and Scott locks (MCS locks), and the other is a variant of Craig,Landin and Hagersten locks (CLH locks) [5] [8] [10]. CLH locks have always been used only for spin locks. However, in this framework, CLH locks are obviously more appropriate than MCS locks. Because CLH locks make it easier to implement "cancellation" and "timeout" functions, we chose CLH locks as the basis for our implementation. However, the final design is quite different from the original CLH lock, so this will be explained below.

The CLH queue is not really that much like a queue because its enqueuing and dequeuing operations are closely related to its purpose (that is, as a lock). It is a linked list queue accessed through two fields, head and tail, which are atomically updatable, both of which point to an empty node at initialization.

A new node, node, joins the queue through an atomic operation:

Do {pred = tail;} while (! tail.compareAndSet (pred, node))

The "release" state of each node is saved in its precursor node. Therefore, the spin operation of the spin lock is as follows:

While (pred.status! = RELEASED); / / spin

To dequeue after spinning, simply point the head field to the node that has just been locked:

Head = node

The advantage of CLH locks is that their queuing and dequeuing operations are fast, lock-free, and barrier-free (even under competition, a thread always wins a chance to insert and continue to execute), and it is also quick to detect whether a thread is waiting (just test whether head is equal to tail). At the same time, the "release" state is decentralized. Almost every node saves this state, and the current node saves the "release" state of the subsequent drive node, so they are scattered and not concentrated together. To avoid some unnecessary memory competition

In the original version of the CLH lock, nodes were not even linked to each other. In a spin lock, the pred variable can be a local variable. However, Scott and Scherer demonstrate that CLH locks can handle "timeouts" and various forms of "cancellation" by explicitly maintaining the precursor node in the node: if the precursor node of a node is cancelled, the node can slide to use the status field of the previous node.

In order to use CLH queues for blocking synchronizers, additional modifications need to be made to provide an efficient way to locate the successor nodes of a node. In the spin lock, a node only needs to change its state, and its successor node will notice the change in the next spin, so the link between nodes is not necessary. However, in the blocking synchronizer, a node needs to explicitly unpark its successor node.

The node of the AQS queue contains a next link to its successor node. However, since there is no atomic unlocked insert instruction similar to compareAndSet for two-way linked list nodes, the setting of this next link is not part of the atomic insert operation, but is simply assigned after the node is inserted:

Pred.next = node

Next linking is just an optimization. If you find that a subsequent node does not exist (or seems to have been cancelled) through the next field of a node, you can always use the pred field to always check if there is a subsequent node.

The second major change to the CLH queue is to use the status field that each node has to control blocking rather than spin. In the synchronizer framework, threads in the queue can return from the acquire operation only if the thread calls the tryAcquire method in a specific subclass to return true; a single "released" bit is not enough. However, some control is still needed to ensure that when an active thread is at the head of the queue, only if it is allowed to call tryAcquire;, the acquire may fail and then (re) block. In this case, there is no need to read the status identity, because permissions can be determined by checking whether the precursor of the current node is head. Unlike spin locks, head is read to ensure that there is not too much memory competition during replication (there is not enough memory contention reading head to warrant replication.). However, the cancel status must exist in the status field.

The status field of the queue node is also used to avoid unnecessary park and unpark calls. Although these methods are as fast as blocking primitives, they still have avoidable overhead when crossing Java and JVM runtime and operating system boundaries. Before calling park, the thread sets a "signal me" bit, and then checks the synchronization and node status again. A released thread clears its own state. In this way, threads do not have to try to block frequently, especially in lock-related classes, which wastes time waiting for the next eligible thread to apply for the lock, thus exacerbating the impact of other contention. Unless the successor node sets the "wake up" bit, this also prevents the thread that is release from judging its successor node. This, in turn, eliminates these situations: unless Wake up and cancel occur at the same time, you have to traverse multiple nodes to process a next field that appears to be null.

The main difference between the variants of the CLH locks used in the synchronization framework and those in other languages may be that the CLH locks used in the synchronization framework rely on the memory of the garbage collection management node, which avoids some complexity and overhead. However, even if you rely on GC, you still need to set it to null when you determine that the link field is no longer needed. This can often be done in conjunction with the out-of-team operation. Otherwise, useless nodes are still accessible and they cannot be recycled.

Other more in-depth tweaks, including delayed initialization of the initial empty node that the CLH queue needs when it first encounters competition, can be found in the source code documentation for the J2SE1.5 version.

These details aside, the general form of the final implementation of a basic acquire operation is as follows (mutex, non-interruption, no timeout):

If (! tryAcquire (arg)) {node = create and enqueue new node; pred = node's effective predecessor; while (pred is not head node | |! tryAcquire (arg)) {if (pred's signal bit is set) pard () else compareAndSet pred's signal bit to true; pred = node's effective predecessor;} head = node;}

Release operation:

If (tryRelease (arg) & & head node's signal bit is set) {compareAndSet head's bit to false; unpark head's successor, if one exist}

The number of main loops of the acquire operation depends on how tryAcquire is implemented in the concrete implementation class. On the other hand, in the absence of a "cancel" operation, the acquire and release of each component is an O (1) operation, ignoring all operating system thread scheduling that occurs in park.

The main way to support the cancel operation is to check for interrupts or timeouts when the park in the acquire loop returns. A thread that is canceled waiting due to a timeout or interrupt sets its node state and then unpark its successor node. In the case of "cancellation", judging its precursor and successor nodes and resetting the state may require a traversal of O (n) (n is the length of the queue). As a result of the cancel operation, the thread is no longer blocked and the node's link and status fields can be quickly rebuilt.

3.4 conditional queue

The AQS framework provides a ConditionObject class for classes that maintain exclusive synchronization and for classes that implement the Lock interface. A lock object can be associated with any number of conditional objects and can provide typical pipe-style await, signal, and signalAll operations, including those with timeouts, as well as some detection and monitoring methods.

By revising some design decisions, the ConditionObject class effectively combines conditions with other synchronization operations. This class only supports Java-style pipe access rules, in which conditional operations are legal only when the frontline holds a lock and the condition to be operated (condition) belongs to the lock (for discussion of some alternative operations, see [4]). In this way, a ConditionObject associated with a ReentrantLock behaves the same as a built-in pipe (via Object.wait, etc.). The only difference between the two is the name of the method, the additional functionality, and the user can declare multiple conditions for each lock.

ConditionObject uses the same internal queue node as the synchronizer. However, these nodes are maintained in a separate conditional queue. The signal operation is achieved by moving the node from the conditional queue to the lock queue, and there is no need to wake up the thread that needs to be woken up before it is reacquired to the lock.

The basic await operations are as follows:

Create and add new node to conditon queue;release lock;block until node is on lock queue;re-acquire lock

The signal operation is as follows:

Transfer the first node from condition queue to lock queue

Because these operations can only be performed when the lock is held, they can use the sequentially linked list queue operation to maintain the conditional queue (with a nextWaiter field in the node). The transfer operation simply removes the first node from the link in the conditional queue and then inserts it into the lock queue through the CLH insert operation.

The main complexity of implementing these operations is what to do when a conditional wait is cancelled due to timeout or Thread.interrupt. "cancel" and "wake up" will have race problems almost at the same time, and the final result follows the relevant specifications of the built-in process. After the JSR133 revision, it is required that if the interrupt occurs before the signal operation, the await method must throw an InterruptedException after the lock has been reacquired. However, if the interrupt occurs after the signal, the await must return without throwing an exception and set the interrupt state of the thread.

To maintain the proper order, a bit in the queue node status variable records whether the node has been (or is in the process of) being transferred. Both Wake-up and cancel-related code attempts to modify this state with compareAndSet. If a modification of a signal operation fails, the next node in the queue is transferred (if any). If a "cancel" operation fails to modify, you must abort the transfer and wait for the lock to be reacquired. The latter case uses a potentially infinite spin wait. The cancelled wait cannot reacquire the lock before the node is successfully plugged into the lock queue, so it is necessary to spin wait for the CLH queue insertion (that is, the compareAndSet operation) to be successfully executed by the "wake up" thread. Spin is rarely required here, and Thread.yield is used in the spin to indicate that some other thread should be scheduled, ideally the one that executes the signal. Although it is possible to implement a help strategy for "cancel" here to help insert nodes, this is too rare to find a good reason to increase the overhead. In all other cases, this basic mechanism does not require spin or yield, so it maintains reasonable performance on a single processor.

4 usage

The AQS class combines the above functions and is provided to the synchronizer as a base class based on the template method pattern [6]. Subclasses only need to define state-checking and update-related methods that control acquire and release operations. However, it is not appropriate to use a subclass of AQS as a synchronizer ADT, because this class must provide rules for methods to control acquire and release internally, which should not be seen by the user. All synchronizer classes in the java.util.concurrent package declare a private inner class that inherits AbstractQueuedSynchronizer and delegate all synchronization methods to this inner class. In this way, the exposed methods of each synchronizer class can use their own names.

Here is the simplest implementation of the Mutex class, which uses synchronization status 0 for unlocking and 1 for locking. This class does not need to synchronize the parameters in the method, so here we use 0 as the argument when calling, and ignore it in the method implementation.

Class Mutex {class Sync extends AbstractQueuedSynchronizer {public boolean tryAcquire (int ignore) {return compareAndSetState (0,1);} public boolean tryRelease (int ignore) {setState (0); return true;}} private final Sync sync = new Sync (); public void lock () {sync.acquire (0);} public void unlock () {sync.release (0);}}

A more complete version of this example, as well as other usage guides, can be found in the J2SE documentation. There can also be some variants. For example, tryAcquire can use a "test-and-test-and-set" strategy that validates the state before changing the state value.

Surprisingly, performance-sensitive things such as mutexes are also intended to be defined by a combination of delegates and virtual methods. However, this is the object-oriented design structure that modern dynamic compilers have been focusing on. Compilers are good at optimizing this overhead, or at least code that calls synchronizers frequently.

The AbstractQueuedSynchronizer class also provides methods to assist with policy control. For example, the underlying acquire method has timeout and interruptible versions. Although our discussion so far has focused on synchronizers for exclusive modes such as locks, the AbstractQueuedSynchronizer class also contains another set of methods (such as acquireShared). The difference between them is that the tryAcquireShared and tryReleaseShared methods can tell the framework (through their return values) that it can accept more requests, and eventually the framework wakes up multiple threads through a cascading signal (cascading signals).

Although serializing synchronizers (persistent storage or transfer) generally doesn't make much sense, these classes are often used to construct other classes, such as thread-safe collections, which are usually serializable. Both the AbstractQueuedSynchronizer and ConditionObject classes provide methods for serializing synchronization state, but do not serialize potentially blocked threads or other internal temporary bookkeeping variables. Even so, when deserializing, most synchronizer classes simply reset the synchronization state to the initial value, which is consistent with the implicit policy of the built-in lock-always deserialize to an unlocked state. This is equivalent to a null operation, but must still be explicitly supported so that the final field can be deserialized.

4.1 Control of fair scheduling

Although synchronizers are based on FIFO queues, they don't have to be fair. It can be noted that in the basic acquire algorithm (Section 3. 3), tryAcquire is executed before joining the queue. So a new thread is able to "steal" the chance that the first thread in the queue header should have passed through the synchronizer.

Intrusive FIFO policies usually provide higher overall throughput than other technologies. When a competing lock is idle and the next thread that is about to acquire the lock is in the process of unblocking, no thread can acquire the lock, and the time interval can be reduced if the intrusion policy is used. At the same time, this strategy also avoids excessive, inefficient competition caused by allowing only one (first) queued thread to be awakened and then try the acquire operation. In scenarios where the synchronizer is only required to be held for a short period of time, the developer who creates the synchronizer can further highlight the effect of the intrusion by defining the tryAcquire to call himself several times before the control is returned.

The intrudable FIFO synchronizer has only probabilistic fairness attributes. The lock queue header an unblocked thread has an unbiased opportunity to win the competition with the intruding thread, if the competition fails, either reblock or try again. However, if the intruding thread arrives faster than the thread at the head of the queue, the first thread in the queue will be so difficult to win the competition that it will almost always re-block, and its successor node will remain blocked all the time. For temporarily held synchronizers, multiple intrusions and release are likely to occur on multiprocessors during the time the first thread in the queue is unblocked. As mentioned below, the end result is to maintain the high speed of progress of one or more threads while still avoiding hunger at least in some probability.

When there is a need for higher fairness, it is also easy to implement. If strict fairness is required, the programmer can define the tryAcquire method as if the current thread is not the head node of the queue (which can be checked by the getFirstQueuedThread method, which is one of the few detection methods provided by the framework), it fails immediately (returns false).

A faster, but not strictly fair, variant can do this, if the queue is listed empty (the moment of judgment), it still allows tryAcquire to execute successfully. In this case, multiple threads may compete to be the first to acquire the lock when they encounter an empty queue at the same time, so that at least one thread usually does not need to be queued. This strategy is used by all synchronizers in the java.util.concurrent package that support fair mode.

Although fairness settings are useful in practice, they are not guaranteed because Java Language Specification does not provide such a scheduling guarantee. For example, even if it is a strictly fair synchronizer, if a group of threads never need to block to wait for each other, then JVM may decide to run them purely sequentially. In practice, on a single processor, such threads may have been running for a period of time before preemptive context switching. If such a thread is holding a mutex, it will soon be switched back just to release the lock it holds, and then continue to block because it knows that another thread needs the lock, which increases the interval between the synchronizer available but no thread can acquire. The impact of synchronizer fairness setting on multiprocessors may be greater because more interlacing occurs in this environment, so one thread has a better chance of finding that the lock is requested by another thread.

In high competition, fair locks work effectively even though performance may be poor when code bodies that hold locks for a short time are protected. For example, when a fairness lock protects a relatively long code body and / or a relatively long inter-lock interval, in this case, a break-in brings only a small performance advantage, but may greatly increase the risk of unlimited waiting. The synchronizer framework leaves these engineering decisions to the user.

4.2 Synchronizer

The following is an overview of how synchronizers are defined in the java.util.concurrent package:

The ReentrantLock class uses the AQS synchronization state to hold the number of times the lock is held (repeated). When a lock is acquired by a thread, ReentrantLock also records the identity of the thread that currently acquired the lock to check for repeated acquisition, and to detect whether there is an illegal state exception when the wrong thread attempts to unlock the lock. ReentrantLock also uses ConditionObject provided by AQS and exposes other monitoring and monitoring-related methods. ReentrantLock implements the optional fair mode by declaring two different AbstractQueuedSynchronizer implementation classes internally (the one that provides the fair mode disables the intrusion policy), using the corresponding ReentrantLock implementation class according to the settings (i.e. the fair parameter in the ReentrantLock constructor) when creating the AbstractQueuedSynchronizer instance.

The ReentrantReadWriteLock class uses 16 bits in the AQS synchronization state to hold the number of write locks, and the remaining 16 bits to hold the number of read locks. WriteLock is built in the same way as ReentrantLock. ReadLock supports allowing multiple read threads at the same time by using the acquireShared method.

The Semaphore class (count semaphore) uses the AQS synchronization state to hold the current count of the semaphore. The acquireShared method defined in it reduces the count or blocks the thread when the count is non-positive; the tryRelease method increases the count and may unblock the thread when the count is positive.

The CountDownLatch class uses the AQS synchronization state to represent the count. When the count is 0, all acquire operations (acquire operations are said from the point of view of aqs, which corresponds to the await method in CountDownLatch) will pass.

The FutureTask class uses the AQS synchronous state to represent the running state (initialization, running, cancelled, and completed) of an asynchronous computing task. Setting or canceling a FutureTask invokes the release operation of AQS when setting or canceling the set method of FutureTask, and the unblocking of the thread waiting for the calculation result is achieved through the acquire operation of AQS.

The SynchronousQueues class (a CSP (Communicating Sequential Processes) form of delivery) uses internal wait nodes that can be used to coordinate producers and consumers. At the same time, it uses AQS synchronization status to control that when a consumer consumes the current item, it allows a producer to continue production, and vice versa.

Of course, users of java.util.concurrent packages can also define their own synchronizers for custom applications. For example, synchronizers that have been considered but not included in this package include semantic classes that provide various styles of WIN32 events, binary semaphores, centrally managed locks, and tree-based barriers.

5 performance

Although the AQS framework supports other forms of synchronization in addition to mutexes, the performance of locks is the easiest to measure and compare. Even so, there are many different methods of measurement. The experiments here are mainly designed to show the overhead and throughput of locks.

In each test, all threads repeatedly update a pseudo-random number, which is calculated by the nextRandom (int seed) method:

Int t = (seed% 127773) * 16807-(seed / 127773) * 2836x return (t > 0)? T: t + 0x7fffffff

In each iteration, the thread updates the shared generator under a mutex with probability S, otherwise it updates its own local generator without locking. In this way, the time taken by the lock to occupy the area is short, which minimizes the external interference when the thread is preempted while holding the lock. The randomness of this function serves two main purposes: to determine whether locks are needed (this generator is sufficient to meet the requirements here), and to make it impossible for the code in the loop to be easily optimized.

Four types of locks are compared: built-in locks, using synchronized blocks; mutexes, using a simple Mutex class like the example in section 4; and reentrant locks, using ReentrantLock; and fair locks, using ReentrantLock's fair mode. All tests are run in server mode of J2SE1.5 JDK build46 (roughly the same as beta2). Before collecting test data, the test program runs 20 non-competitive tests to eliminate the impact of the JVM "warm-up" process. See: Java Theory and practice: dynamic compilation and performance Measurement. Except for tests in fair mode that ran only 1 million iterations, tests in each thread ran 10 million iterations.

The test was run on four X86 machines and four UltraSparc machines. All x86 machines are running RedHat Linux systems based on NPTL 2.4 kernels and libraries. All UltraSparc machines are running Solaris-9. All systems were under light load during the test. According to the characteristics of the test, the system is not required to be completely idle. ). The name "4p" reflects that dual-core hyperthreaded Xeon is more like a 4-way machine than a 2-way machine. The test data is not normalized here. As shown below, there is no simple relationship between the relative cost of synchronization and the number, type, and speed of processors.

Table 1 platform for testing

Name processor number type speed (Mhz) 1P1Pentium39002P2Pentium314002A2Athlon20004P2HTPentium4/Xeon24001U1UltraSparc26504U4UltraSparc24508U8UltraSparc375024U24UltraSparc37505.1 overhead

The overhead without contention is obtained by running only one thread and subtracting the time of each iteration when the probability S is 1 minus the time of each iteration when the probability S is 0 (the probability of accessing shared memory is 0). When the probability is 0, there is no lock operation, and when the probability is 1, there is a lock operation every time. Therefore, subtracting the time of probability 1 minus the time of probability 0 is the cost of the whole lock operation. ). Table 2 shows the cost of each lock operation in a non-competitive scenario in nanoseconds. The Mutex class is closest to the basic time consuming of the framework, and the extra overhead of the reentrant lock is the time it takes to record the current owner thread and error checking, and for fair locks, the time it takes to check whether the queue is empty at the beginning.

Table 2 also shows the time-consuming tryAcquire compared to the "fast path" (fast path) with built-in locks. The difference here mainly reflects the time-consuming of the different atomic instructions and memory barriers used in each lock and machine. On multiprocessors, these instructions are often completely superior to all other instructions. The main difference between the built-in lock and the synchronizer class is obviously due to the fact that the Hotspot lock uses one compareAndSet for both locking and unlocking, while the synchronizer's acquire operation uses one compareAndSet, but the release operation uses a volatile write (that is, a memory barrier on multiple processors and a reorder limit on all processors). The absolute and relative time consuming of each lock varies from machine to machine.

Table 2 single lock overhead when there is no competition (in nanoseconds)

Machine built-in mutex reentrant fair reentrant 1P18931372P587177812A132131304P116951091171U904058674U122821001158U1608310312324U16184108119

From the other extreme, Table 3 shows the cost of each lock under massive lock contention when running 256 concurrent threads with a probability of 1. When fully saturated, the intrudable FIFO lock is an order of magnitude less expensive than the built-in lock (that is, greater throughput) and two orders of magnitude less than the fair lock. This shows the efficiency of breaking into the FIFO strategy in maintaining thread progress, even if there is great competition.

Table 3 also shows that even in the case of low internal overhead, the performance of fair locks is entirely determined by the time of context switching. The times listed are roughly commensurate with the time it takes for threads to block and unblock on each platform.

In addition, a later experiment (using machine 4p only) shows that for the temporarily held locks used here, the setting of the fairness parameter has little effect on the total difference. Here, the difference between thread termination times is recorded as a coarse-grained discrete quantity. On 4p machines, the average standard deviation of fair lock time measurement is 0.7%, and the average reentrant lock is 6.0%. In contrast, in order to simulate a scenario of holding a lock for a long time, each thread was asked to calculate 16K random numbers while holding the lock. At this point, the total running time is almost the same (fair lock: 9.79s, reentrant lock: 9.72s). The difference in fair mode is still small, with an average standard deviation of 0.1%, while reentrant locks rise to an average of 29.5%.

Table 3 single lock cost at saturation (in nanoseconds)

Machine built-in mutex reentrant fair reentrant 1P521466783272P930108132149672A7487984339104P1146188247153281U879153177413944U2590347368300048U12741571743108424U1983160182322915.2 throughput

Most synchronizers are used between non-competitive and highly competitive. This can be checked in two ways with experiments, by modifying the contention probability of fixed threads, and / or by adding more threads to a set of threads with a fixed probability of contention. To illustrate these effects, tests run with reentrant locks with different contention probabilities and different number of threads. A slowdown metric is used in the drawings.

Here, t is the total elapsed time, b is the baseline time of a thread without contention or synchronization, n is the number of threads, p is the number of processors, and S is the proportion of shared access. The calculated result is the ratio of the actual execution time to the ideal execution time (which is usually not available), which is calculated by using Amdahl's 's law. The ideal time simulates an execution process with no synchronization overhead and no thread blocking due to lock contention. Even so, under very low competition, some test results show a small increase compared to the ideal time, probably due to slight differences in optimization and pipelining between baseline and test.

In the figure, the logarithm with the base of 2 is scaled. For example, a value of 1 means that the actual time is twice the ideal time, and 4 means it is 16 times slower. Using logarithms does not need to rely on a random baseline time (in this case, the time to calculate random numbers), so the trend based on different base numbers should be similar. These tests use contention probabilities ranging from 1amp 128 (identified as "0.008") to 1, with a step size of 2 to the number of threads from 1 to 1024, and a step size of half of the power of 2.

On a single processor (1P and 1U), performance decreases as competition increases, but not as the number of threads increases. When multiprocessors encounter competition, their performance degrades even faster. According to multiprocessor-related charts, although there are only a few threads competing at the beginning of the peak, the corresponding performance is usually the worst. This reflects a performance transition area where intruders and awakened threads are ready to acquire locks, which causes them to frequently force each other to block. Most of the time, the transition region is followed by a smooth area, because there are few free locks, so it is similar to the sequential execution mode on a single processor; on multiprocessor machines, the smooth area is entered earlier. For example, note that under full competition (identified as "1.000"), these values indicate a worse relative speed drop on machines with fewer processors.

Based on these results, further tuning can be made for park/unpark to reduce context switching and related overhead, which will bring a small but significant improvement to the framework. In addition, some of the fluctuations seen here can be avoided by using some form of adaptive spin on multiprocessors for short-held but highly competitive locks, which is of great benefit to synchronizer classes. Although adaptive spinning is difficult to work well across different contexts, you can use this framework to build a custom form lock for specific applications that encounter such usage configurations.

The above is all the content of the article "sample Analysis of JUC Synchronizer Framework AQS of java parallel package". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report