What are the concurrency problems of Java 07/08 Update SLTechnology News&Howtos

What are the concurrency problems of Java

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "what are the Java concurrency problems". In the daily operation, I believe many people have doubts about the Java concurrency problems. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what are the Java concurrency problems?" Next, please follow the editor to study!

Preface

Let's take a look at the following simple Java class, which does not use any synchronization.

Final class SetCheck {private int a = 0; private long b = 0; void set () {a = 1; b =-1;} boolean check () {return ((b = = 0) | (b = =-1 & & a = = 1));}}

If you are in a serial execution language, executing the check method in the SetCheck class will never return false, even if the compiler, runtime, and computer hardware do not process the program in the logic you expect, the method will still not return false. The following unexpected behaviors can occur during the execution of the program:

The compiler may reorder instructions, so the assignment of the b variable may precede the a variable. If it is an inline method, the compiler may go even further to reorder the method's instructions with other statements.

The processor may reorder the machine instructions corresponding to the statement before executing them, or even execute them concurrently.

The memory system (consisting of cache control units) may reorder the write instructions of the memory unit corresponding to the variable. Write operations after rearrangement may overwrite other computing / memory operations.

The compiler, processor, and memory system may interleave the machine instructions of the two statements. For example, on a 32-bit machine, the high byte of the b variable is written first, followed by the a variable, followed by the low byte of the b variable.

The compiler, processor, and memory system may cause the memory units representing the two variables to be updated at some point after (if any) successive check calls (if any), and saving the corresponding values in this way (such as in CPU registers) will still get the desired results (check will never return false).

In serial execution languages, as long as program execution follows similar serial semantics, such as the above behaviors will not have any effect. In a simple block of code, the serial execution program does not depend on the internal execution details of the code, so the above behaviors can control the code at will.

This provides basic flexibility for compilers and computer hardware. Based on this, in the past few decades, many technologies (CPU pipeline operation, multi-level cache, read-write balance, register allocation, etc.) came into being, which laid the foundation for the great improvement of computer processing speed. The serial-like nature of these operations makes it unnecessary for developers to know what's going on inside them. For developers, these behaviors will not have any impact on developers if they do not create their own threads.

However, these situations are completely different in concurrent programming. In the above code, when one thread calls the check method, it is entirely possible that another thread is executing the set method. In this case, the check method will expose the optimization process mentioned above.

If any of these operations occur, it is possible for the check method to return false. For example, the check method may get neither 0 nor-1 when reading the variable b of type long. Instead, it is a value that is half written. On the other hand, the out-of-order execution of the statements in the set method may cause the check method to read the value of the variable b as-1, while reading the variable an is still 0.

In other words, not only concurrent execution can cause problems, but also some optimization operations (such as instruction reordering) can cause code execution results to differ from the logic in the source code. With the increasing maturity of compiler and runtime technology and the increasing popularity of multiprocessors, this phenomenon has become more and more common.

For developers who have been working with serial programming backgrounds (in fact, almost all programmers), this can lead to surprising results that may never have occurred in serial programming. This may be the root cause of those subtle concurrency programming errors.

In most cases, there is a simple way to avoid the problems caused by code execution optimization in complex concurrent programs: using synchronization. For example, if all methods in the SetCheck class are declared synchronized, you can ensure that the internal processing details do not affect the expected results of the code.

But in some cases you can't or don't want to use synchronization, or you need to infer that someone else is not using synchronized code. In these cases you can only rely on the minimum guarantee provided by the result semantics described by the Java memory model. The Java memory model allows all of the operations mentioned above, but limits their potential results in execution semantics, and proposes techniques that programmers can use to control certain aspects of these semantics.

The Java memory model is part of the Java language specification and is mainly introduced in Chapter 17 of JLS. Here, we only discuss some basic motivations, attributes, and the program consistency of the model. The missing parts of the first edition of JLS are clarified here.

We assume that the Java memory model can be seen as an idealized model of a standard SMP machine like the one described in 1.2.4.

(1.2.4)

In this model, each thread can be seen as running on a different CPU, but this is rare even on multiprocessors. But in fact, through some of the features of the model, this single mapping of CPU and threads can be implemented in some reasonable ways. For example, because CPU registers cannot be accessed directly by another CPU, this model must take into account situations where one thread cannot know the value of a variable being manipulated by another thread. This situation exists not only in multiprocessor environments, but also in single-core CPU environments, because the unpredictable behavior of compilers and processors can lead to the same situation.

The Java memory model does not specify whether the execution strategies discussed earlier are driven by compilers, CPU, cache controllers, or other mechanisms. It is not even discussed in terms of classes, objects, and methods that developers are familiar with. Instead, the Java memory model defines only the abstract relationship between threads and memory. It is well known that each thread has its own working storage unit (abstraction of caches and registers) to store the values of variables currently used by the thread. The Java memory model only ensures the ordering of code instructions and variable operations, and most rules simply indicate when the value of the variable should be transferred between memory and thread working memory. The main purpose of these rules is to solve the following three interrelated problems:

Atomicity: which instructions must be indivisible. In the Java memory model, these rules need to declare simple read and write operations of memory units that apply only to instance and static variables, as well as array elements, but not the local variables in the method.

Visibility: in which cases, the result of execution by one thread is visible to another thread. The results that need to be concerned about here are the fields written and the values seen by reading this field.

Orderability: under what circumstances, the operation result of one thread is out of order to other threads. The main problem of out-of-order execution is mainly reflected in the mutual execution order of read-write operations and assignment statements.

Atomicity

When synchronization is used correctly, the above attributes have a simple feature: changes made in a synchronization method or code block are atomistic and visible to synchronization methods or code blocks that use the same lock. The execution process between synchronization methods or code blocks is consistent with the execution order specified by the code. Even if the instructions within the code block may be executed out of order, it will not have any impact on other threads that use synchronization.

The situation becomes complicated when synchronization is not used or when inconsistencies are used. The Java memory model provides less protection than most developers expect, and far less than any Java virtual machine currently implemented in the industry. In this way, developers must have an additional obligation to ensure a consistent relationship between objects: if there is a constant relationship between objects that can be seen by multiple threads, all threads that rely on that relationship must maintain that relationship all the time. Not just by the thread that performs the state change.

The java memory model ensures that the memory units that access any type of field are atomic, except for long and long fields. This includes fields that reference the reference types of other objects. In addition, volatile long and volatile double are also atomic. Although the java memory model does not guarantee the atomicity of non-volatile long and non-volatile double, they do have atomicity in some cases. Non-volatile long is atomic in 64-bit JVM,OS,CPU.

When using a non-long or non- field in an expression, atomicity ensures that you will get the initial value of the field or the value that a thread writes to the field. However, it is not possible for two or more threads to write to this field at the same time to produce messy result values (that is, atomicity ensures that all bit bits corresponding to the obtained result values are written by a single thread). However, as you will see below, atomicity does not guarantee that what you get is the latest value written by any thread. Therefore, atomicity assurance usually has little impact on concurrent programming.

Visibility

A modification of a field by one thread ensures visibility to another thread only if:

After one writer thread releases a lock, another reader thread then acquires the same lock. In essence, when a thread releases the lock, it forces the dirty data in the working memory to be refreshed into the main memory, and acquiring a lock forces the thread to load (or reload) the value of the field. Locks provide mutually exclusive execution of a synchronous method or block, and when threads execute to acquire and release locks, the memory effects of all access to fields are defined.

Note the dual meaning of synchronization: locks provide advanced synchronization protocols, while the memory system (sometimes through memory barrier instructions) ensures the consistency of values when threads execute synchronization methods or blocks. This shows that compared with sequential programming, concurrent programming is more similar to distributed programming. The second feature of synchronization can be seen as a mechanism: when a thread runs a synchronized method, it will send and / or receive changes to variables made by other threads in the synchronized method. At this point, using a lock is just a difference in syntax from sending a message.

If a field is declared as volume, after the thread writes to the field, the thread must refresh the field and make it visible to other threads (that is, the field is immediately refreshed) before performing subsequent memory access. Each read access to the volatile field reloads the value of the field.

For the first time, a thread accesses an object's field, and it will read the initial value of the field or the value written by a thread.

In addition, it is a mistake to expose a reference to an unconstructed object to a thread, and it is dangerous to start a new thread inside the constructor, especially if the class may be subclassed. Thread.start has the following memory effect: the thread calling the start method releases the lock, and the new thread that starts execution acquires the lock.

If the runnable superclass calls new Thread (this) .start () before the subclass constructor executes, the object is probably not fully initialized when the run method executes. Similarly, if you create and start a new thread T, the thread uses an object X created after executing the start. You are not sure that the field value of X will be visible to thread T. Unless you synchronize all the methods that use references to X. If possible, you can create an X before starting the T thread.

When the thread terminates, all written variable values are flushed to main memory. For example, if one thread uses Thread.join to terminate another thread, the first thread must see that the second thread is worth modifying the variable.

Note that passing references to objects between different methods of the same thread will never have a memory visibility problem.

The memory model ensures that this will eventually happen, and that a specific update to a particular field by one thread will eventually be visible to other threads, but this "end" may be for a long time. When there is no synchronization between threads, it is difficult to ensure that the values of the fields are consistent across multiple threads (meaning that the writing of the field by the writer thread is immediately visible to the reader thread).

In particular, it is always wrong to wait in a loop for other threads to write to the field if the field is not volatile or is not accessed through synchronization.

In the absence of synchronization, the model also allows inconsistent visibility. For example, get the latest value of one field of an object and the out-of-date values of other fields of that object. Similarly, you may read the latest value of a reference variable, but read the expired value of the field of the object referenced by the reference variable.

In any case, visibility between threads does not always fail (meaning that even if the thread does not use synchronization, it is still possible for the thread to read the latest value of the field), the memory model simply allows this failure to occur. Therefore, even if synchronization is not used between multiple threads, there is no guarantee that memory visibility problems will occur (meaning that threads read expired values), the java memory model only allows memory visibility problems to occur.

Memory visibility issues are rare in many current JVM implementations and java execution platforms, even in JVM and platforms that use multiprocessors. Multiple threads that share the same CPU use a common cache, lack of strong compiler optimization, and hardware with strong cache consistency, all of which allow updated values to be passed immediately between multiple threads.

This makes it impractical to test errors based on memory visibility because such errors are extremely difficult to occur. Or this error occurs only on a platform you haven't used, or only on a platform in the future. These similar explanations are common for the issue of memory visibility between multiple threads. Concurrent programs without synchronization can have a lot of problems, including memory consistency issues.

Order

Ordering rules are shown in the following two scenarios: intra-thread and inter-thread

From a thread's point of view of method execution, instructions are executed in a way called as-if-serial, which has been applied to sequential programming languages.

When this thread "observes" that other threads execute asynchronous code concurrently, it is possible for any code to cross-execute. The only constraint that works is that for synchronization methods, the operation of synchronization blocks and volatile fields remains relatively orderly.

Again, these are only rules for minimum features. Specific to any program or platform, there may be stricter rules of order. So you can't rely on them, because even if your code follows these stricter rules, it may still fail on JVM with different features, and testing is very difficult.

It should be noted that the perspective within the thread is used by other semantic discussions in JLS [1]. For example, the evaluation of arithmetic expressions appears to be performed from left to right in the thread (chapter JLS 15.6), and the effect of this execution does not need to be observed by other threads.

Only when there is only one thread operating variable at a time, the execution within the thread appears serial. The above scenario may be due to the use of synchronization, mutex [2] or pure coincidence. When multiple threads run simultaneously in asynchronous code for reading and writing common fields, an execution mode is formed. In this mode, the code will be executed arbitrarily, atomicity and visibility will fail, and race conditions will be generated. At this point, thread execution is no longer serial.

Although JLS lists some specific legal and illegal reorders, if you encounter problems outside the scope of the list, it reduces the practical guarantee that the run results reflect the cross-execution of almost all reorders. Therefore, there is no need to explore the orderliness of the code.

Volatile keyword details: the memory semantics of volatile in JMM is a feature of locking volatile

When we declare the shared variable as volatile, the read / write to this variable will be very special. A good way to understand the volatile feature is to think of individual reads / writes to volatile variables as synchronizing these individual reads / writes using the same monitor lock. Let's illustrate it with a specific example. Take a look at the following sample code:

Class VolatileFeaturesExample {volatile long vl = 0L; / / use volatile to declare a 64-bit long variable public void set (long l) {vl = l; / / write of a single volatile variable} public void getAndIncrement () {vl++; / / compound (multiple) volatile variable read / write} public long get () {return vl; / / read}} of a single volatile variable

Suppose there are multiple threads calling the three methods of the above program, which is semantically equivalent to the following program:

Class VolatileFeaturesExample {

Long vl = 0L; / / 64-bit long ordinary variable

Public synchronized void set (long l) {/ / A write to a single common variable is synchronized with the same monitor vl = l;} public void getAndIncrement () {/ / ordinary method call long temp = get (); / / call the synchronized read method temp + = 1L; / / ordinary write operation set (temp) / / call the synchronized write method} public synchronized long get () {/ / synchronize return vl;} with the same monitor for a pair of single ordinary variables

As shown in the sample program above, a single read / write to a volatile variable is synchronized with a read / write to a normal variable using the same monitor lock, and they perform the same way.

The happens-before rule of the monitor lock ensures memory visibility between the two threads that release the monitor and get the monitor, which means that a read of a volatile variable will always see (any thread) the last write to the volatile variable.

In short, the volatile variable itself has the following characteristics: the semantics of the monitor lock determines the atomicity of the execution of the critical section code. This means that even if it is a 64-bit long and long variable, as long as it is a volatile variable, reading and writing to that variable will be atomic. In the case of multiple volatile operations or composite operations such as volatile++, these operations are generally not atomic.

Visibility. When reading a volatile variable, you can always see (any thread) the last write to the volatile variable.

Atomicity: reading / writing to any single volatile variable is atomic, but composite operations like volatile++ are not atomic.

Happens before relationship established by volatile write-read

The above is about the characteristics of volatile variables. For programmers, the impact of volatile on thread memory visibility is more important than volatile's own features, and we need to pay more attention to it.

Starting with JSR-133, the write-read of the volatile variable enables communication between threads.

From the perspective of memory semantics, volatile has the same effect as monitor lock: volatile write and monitor release have the same memory semantics; volatile read and monitor acquisition have the same memory semantics.

Look at the following sample code that uses the volatile variable:

Class VolatileExample {

Int a = 0

Volatile boolean flag = false

Public void writer () {a = 1; / 1 flag = true; / / 2} public void reader () {if (flag) {/ / 3 int I = a; / / 4. }}

}

Suppose that after thread An executes the writer () method, thread B executes the reader () method. According to the happens before rules, the happens before relationships established by this process can be divided into two categories:

According to the program order rules, 1 happens before 2; 3 happens before 4.

According to volatile rules, 2 happens before 3.

According to the transitivity rule of happens before, 1 happens before 4.

The graphical representation of the above happens before relationship is as follows:

In the figure above, the two nodes linked by each arrow represent a happens before relationship. Black arrows represent program order rules, orange arrows represent volatile rules, and blue arrows represent happens before guarantees provided by combining these rules.

Here, thread A writes a volatile variable, and thread B reads the same volatile variable. All shared variables visible to thread A before writing the volatile variable will become visible to thread B immediately after thread B reads the same volatile variable.

Volatile write-read memory semantics

The memory semantics written by volatile are as follows:

When you write a volatile variable, JMM flushes the shared variables in the local memory corresponding to the thread to the main memory.

Taking the above example program VolatileExample as an example, suppose thread A first executes the writer () method, and then thread B executes the reader () method. Initially, both flag and an in the local memory of both threads are in the initial state. The following figure is a schematic diagram of the status of the shared variable after thread A performs a volatile write:

As shown in the figure above, after thread A writes the flag variable, the values of the two shared variables updated by thread An in local memory An are flushed to main memory. At this point, the values of the shared variables in the local memory An and the main memory are the same.

The memory semantics of volatile reads are as follows:

When reading a volatile variable, JMM sets the local memory for that thread to be invalid. The thread then reads the shared variable from the main memory.

The following is a schematic diagram of the status of the shared variable after thread B reads the same volatile variable:

As shown in the figure above, after reading the flag variable, local memory B has been set to invalid. At this point, thread B must read the shared variable from main memory. The read operation of thread B will cause the values of the shared variables in local memory B to be the same as those in main memory.

If we combine the volatile write and volatile read steps, after thread B reads a volatile variable, the values of all shared variables visible to thread A before writing to the volatile variable will immediately become visible to thread B.

Here is a summary of the memory semantics of volatile write and volatile read:

Thread A writes a volatile variable, which essentially sends a message to a thread that will read the volatile variable next.

Thread B reads a volatile variable, essentially receiving a message sent by a thread that modified the shared variable before writing the volatile variable.

Thread A writes a volatile variable, and then thread B reads the volatile variable, which is essentially thread A sending a message to thread B through main memory.

Implementation of volatile memory semantics

Next, let's look at how JMM implements the memory semantics of volatile write / read.

We mentioned earlier that overordering is divided into compiler reordering and processor reordering. In order to implement volatile memory semantics, JMM restricts these two types of reordering respectively. The following is a table of volatile reordering rules developed by JMM for the compiler:

Can you reorder the second operation?

The first operation is ordinary read / write volatile read volatile write ordinary read / write

NOvolatile read NONONOvolatile write

NONO

For example, the last cell in the third line means that in program order, when the first operation is a read or write of a normal variable, and if the second operation is volatile, the compiler cannot reorder these two operations.

We can see from the above table:

When the second operation is volatile write, no matter what the first operation is, it cannot be reordered. This rule ensures that operations before volatile writing are not reordered by the compiler after volatile writing.

When the first operation is a volatile read, no matter what the second operation is, it cannot be reordered. This rule ensures that operations after volatile read are not reordered by the compiler before volatile read.

When the first operation is volatile write and the second operation is volatile read, it cannot be reordered.

In order to implement the memory semantics of volatile, when generating bytecode, the compiler inserts a memory barrier in the instruction sequence to prevent certain types of processors from reordering. It is almost impossible for the compiler to find an optimal arrangement to minimize the total number of insertion barriers, so JMM adopts a conservative strategy. The following is the JMM memory barrier insertion strategy based on the conservative policy:

Insert a StoreStore barrier in front of each volatile write operation.

Insert a StoreLoad barrier after each volatile write operation.

Insert a LoadLoad barrier after each volatile read operation.

Insert a LoadStore barrier after each volatile read operation.

The above memory barrier insertion strategy is very conservative, but it can ensure that the correct volatile memory semantics can be obtained in any processor platform and any program.

The following is a schematic diagram of the instruction sequence generated after volatile writes are inserted into the memory barrier under a conservative strategy:

The StoreStore barrier in the image above ensures that all normal writes preceding it are visible to any processor before volatile is written. This is because the StoreStore barrier ensures that all normal writes above are flushed to main memory before volatile writes.

What's interesting here is the StoreLoad barrier behind volatile writes. The purpose of this barrier is to prevent volatile writes from being reordered with possible volatile read / write operations. Because the compiler is often unable to accurately determine whether a StoreLoad barrier needs to be inserted after a volatile write (for example, a method return immediately after a volatile write). To ensure that the memory semantics of volatile are correctly implemented, JMM takes a conservative strategy here: insert an StoreLoad barrier after each volatile write or before each volatile read. In terms of overall execution efficiency, JMM chose to insert an StoreLoad barrier after each volatile write. Because the common usage pattern of volatile write-read memory semantics is that one writer thread writes volatile variables, and multiple reading threads read the same volatile variable. When the number of reading threads greatly exceeds the number of writing threads, choosing to insert the StoreLoad barrier after volatile writing will result in a considerable improvement in execution efficiency. From this we can see a feature of JMM in implementation: first to ensure correctness, and then to pursue the efficiency of execution.

The following is a schematic diagram of the instruction sequence generated after volatile reads are inserted into the memory barrier under a conservative strategy:

The LoadLoad barrier in the image above is used to prevent the processor from reordering the upper volatile reads with the normal reads below. The LoadStore barrier is used to prevent the processor from reordering the upper volatile read with the lower normal write.

The above memory barrier insertion strategies for volatile writes and volatile reads are very conservative. In actual execution, the compiler can omit unnecessary barriers according to specific circumstances, as long as the write-read memory semantics of volatile is not changed. Let's illustrate it with a specific example code:

Class VolatileBarrierExample {

Int a

Volatile int v1 = 1

Volatile int v2 = 2

Void readAndWrite () {int I = v1; / / first volatile read int j = v2; / / second volatile read a = I + j; / / ordinary write v1 = I + 1; / / first volatile write v2 = j * 2; / / second volatile write}}

For the readAndWrite () method, the compiler can do the following optimizations when generating bytecode:

Note that the final StoreLoad barrier cannot be omitted. Because the method return immediately after the second volatile is written. At this point, the compiler may not be able to accurately determine whether there will be volatile reads or writes later, and for security reasons, the compiler will often insert a StoreLoad barrier here.

The above optimization is for any processor platform, because different processors have different "tightness" processor memory models, memory barrier insertion can also be optimized according to the specific processor memory model. Take the x86 processor as an example, all barriers except the last StoreLoad barrier in the image above will be omitted.

Volatile read and write under the previous conservative strategy can be optimized on the x86 processor platform to:

As mentioned earlier, x86 processors only reorder write-read operations. X86 does not reorder read-read, read-write, and write-write operations, so the memory barriers corresponding to these three types of operations are omitted in x86 processors. In x86, JMM only needs to insert a StoreLoad barrier after volatile writes to correctly implement volatile write-read memory semantics. This means that in x86 processors, volatile writes are much more expensive than volatile reads (because performing the StoreLoad barrier is more expensive).

Why should JSR-133 enhance the memory semantics of volatile

In the old Java memory model before JSR-133, although reordering between volatile variables was not allowed, the old Java memory model allowed reordering between volatile variables and normal variables. In the old memory model, the VolatileExample sample program might be reordered into the following timing to execute:

In the old memory model, when there was no data dependency between 1 and 2, 1 and 2 could be reordered (similar to 3 and 4). The result is that when thread B executes 4, it may not be possible to see the changes made by writer thread A to the shared variable during execution 1.

Therefore, in the old memory model, the write-read of volatile has no monitor release-acquisition of the memory semantics. In order to provide a more lightweight mechanism for communication between threads than monitor locks, the JSR-133 expert group decided to enhance the memory semantics of volatile: strictly restrict the reordering of volatile variables and ordinary variables by compilers and processors, and ensure that volatile write-read and monitor release-access have the same memory semantics. From the perspective of compiler reordering rules and processor memory barrier insertion strategy, as long as the reordering between volatile variables and ordinary variables may destroy the memory semantics of volatile, such reordering will be prohibited by compiler reordering rules and processor memory barrier insertion strategy.

Because volatile only guarantees atomicity for reading / writing to a single volatile variable, the mutex execution of the monitor lock ensures atomicity for the entire critical section code. Monitor locks are more powerful than volatile in terms of functionality; volatile has advantages in terms of scalability and execution performance. If the reader wants to use volatile instead of monitor locks in the program, please be careful.

Detailed explanation of CAS operation

This article belongs to the author's original, and the original text is published at InfoQ: http://www.infoq.com/cn/articles/atomic-operation

Introduction

An atom (atom) originally means "the smallest particle that cannot be further divided", while an atomic operation (atomic operation) means "an uninterruptible operation or series of operations". Implementing atomic operations on multiprocessors becomes a bit complicated. In this article, let's talk about how atomic operations are implemented in Inter processors and Java.

Term definition term name English explanation cache line Cache line cache minimum operating unit comparison and exchange Compare and SwapCAS operation requires entering two values, an old value (the value before the expected operation) and a new value. During the operation, compare whether the old value has changed. If there is no change, it will be exchanged into the new value, and if there is a change, it will not be exchanged. CPU pipeline CPU pipelineCPU pipeline works like an assembly line in industrial production. In CPU, an instruction processing pipeline is composed of five or six circuit units with different functions, and then an X86 instruction is divided into five or six steps and then executed by these circuit units respectively, so that an instruction can be completed in a CPU clock cycle, thus improving the operation speed of CPU. Memory order conflict Memory order violation memory order conflict is usually caused by false sharing, which means that multiple CPU modify different parts of the same cache line at the same time, which causes the operation of one CPU to be invalid. When this memory order conflict occurs, CPU must clear the pipeline. 3 how to realize atomic operation by processor

32-bit IA-32 processors implement atomic operations between multiple processors based on cache locking or bus locking.

3.1 the processor automatically guarantees the atomicity of basic memory operations

First of all, the processor automatically ensures the atomicity of the basic memory operation. The processor guarantees that reading or writing a byte from the system memory is atomic, meaning that when one processor reads a byte, other processors cannot access the memory address of that byte. Pentium 6 and the latest processors can automatically guarantee that the 16-stroke, 32 / 64-bit operation of a single processor on the same cache row is atomic, but complex memory processors cannot automatically guarantee its atomicity, such as across bus widths, across multiple cache rows, and cross-page table access. However, the processor provides two mechanisms: bus locking and cache locking to ensure the atomicity of complex memory operations.

3.2 use bus locks to ensure atomicity

The first mechanism is to ensure atomicity through bus locks. If multiple processors read and rewrite a shared variable at the same time, then the shared variable will be operated by multiple processors at the same time, so the read and write operation is not atomic, and after the operation, the value of the shared variable will not be the same as expected. For example: if iprocessors 1, we do two ireads + operations, we expect the result is 3, but it is possible that the result is 2. The figure below is as follows

(example 1)

The reason is that it is possible for multiple processors to read the variable I from their respective caches at the same time, add one to each, and then write it to the system memory. So if you want to ensure that the operation of reading and rewriting shared variables is atomic, you must ensure that when CPU1 reads and rewrites shared variables, CPU2 cannot operate on the cache that caches the memory address of the shared variable.

Processors use bus locks to solve this problem. The so-called bus lock is the use of a LOCK# signal provided by the processor. When one processor outputs this signal on the bus, the requests of other processors will be blocked, so the processor can exclusively use shared memory.

3.3 use cache locks to ensure atomicity

The second mechanism is to ensure atomicity through cache locking. At the same time, we only need to make sure that the operation on a certain memory address is atomic, but bus locking locks the communication between CPU and memory, which makes other processors cannot operate the data of other memory addresses during locking, so bus locking is expensive. In some cases, recent processors use cache locking instead of bus locking to optimize.

Frequently used memory is cached in the processor's L1Magol L2 and L3 caches, so atomic operations can be performed directly in the processor's internal cache without declaring bus locks. "cache locking" can be used in Pentium 6 and recent processors to achieve complex atomicity. The so-called "cache locking" means that if the memory area in the processor cache row is locked during the LOCK operation, when it performs the lock operation to write back the memory, the processor does not declare the LOCK# signal on the bus, but modifies the internal memory address and allows its cache consistency mechanism to ensure the atomicity of the operation, because the cache consistency mechanism prevents simultaneous modification of memory region data cached by more than two processors. The cache row becomes invalid when other processors write back the data of the locked cache row. In example 1, when CPU1 modifies I in the cache row, CPU2 cannot cache the cache row of I at the same time.

However, there are two cases in which the processor does not use cache locking. The first situation is that when the operational data cannot be cached within the processor, or when the operational data spans multiple cache lines (cache line), the processor invokes bus locking. The second situation is that some processors do not support cache locking. For Inter486 and Pentium processors, bus locking is invoked even if the locked memory region is in the processor's cache line.

The above two mechanisms can be implemented by Inter processors that provide a lot of LOCK prefix instructions. For example, bit test and modification instruction BTS,BTR,BTC, swap instruction XADD,CMPXCHG and other operands and logic instructions, such as ADD (plus), OR (or), etc., the area of memory operated by these instructions will be locked, resulting in other processors cannot access it at the same time.

4 how to realize atomic operation with JAVA

Atomic operations can be implemented in java by locking and cycling CAS.

4.1 use cyclic CAS to implement atomic operations

The CAS operation in JVM takes advantage of the CMPXCHG instructions provided by the processor mentioned in the previous section. The basic idea of the spin CAS implementation is to loop the CAS operation until it succeeds. The following code implements a CAS thread-safe counter method safeCount and a non-thread-safe counter count.

Package Test;import java.util.ArrayList;import java.util.List;import java.util.concurrent.atomic.AtomicInteger;public class Counter {private AtomicInteger atomicI = new AtomicInteger (); private int i = 0; public static void main (String [] args) {final Counter cas = new Counter (); List ts = new ArrayList (); long start = System.currentTimeMillis (); for (int j = 0; j < 100) Jake +) {Thread t = new Thread (new Runnable () {@ Override public void run () {for (int I = 0; I < 10000; iTunes +) {cas.count (); cas.safeCount () }); ts.add (t);} for (Thread t: ts) {t.start ();} / wait for all threads to finish executing for (Thread t: ts) {try {t.join () } catch (InterruptedException e) {e.printStackTrace ();}} System.out.println (cas.i); System.out.println (cas.atomicI.get ()); System.out.println (System.currentTimeMillis ()-start) } / * * use CAS to implement thread safety counter * * / private void safeCount () {for (;;) {int I = atomicI.get (); boolean suc = atomicI.compareAndSet (I, + + I); if (suc) {break } / * * non-thread safety counter * * / private void count () {iCompletes;}} result 992362100000075

Starting with Java1.5, JDK's concurrency package provides classes to support atomic operations, such as AtomicBoolean (atomically updated boolean values), AtomicInteger (atomically updated int values), and AtomicLong (atomically updated long values). These atomic wrapper classes also provide useful tool methods, such as increasing and subtracting 1 from the current value atomically.

Some concurrency frameworks in Java concurrent packages also use spin CAS to implement atomic operations, such as the Xfer method of the LinkedTransferQueue class. Although CAS is very efficient in solving atomic operations, there are still three major problems in CAS. ABA problem, long cycle time, high overhead and can only guarantee the atomic operation of a shared variable.

The problem follows the ABA. Because CAS needs to check whether the lower value has changed when operating the value, and update it if it does not change, but if a value is originally A, becomes B, and becomes An again, then you will find that its value has not changed when you use CAS to check, but it has actually changed. The solution to the ABA problem is to use the version number. Append the previous version number to the variable, and add one to the version number each time the variable is updated, then A-B-A will become 1A-2B-3A.

Starting with Java1.5, a class AtomicStampedReference is provided in JDK's atomic package to solve ABA problems. The compareAndSet method of this class first checks whether the current reference is equal to the expected reference, and whether the current flag is equal to the expected flag, and if all are equal, atomically sets the reference and the value of the flag to the given update value.

Public boolean compareAndSet (V expectedReference,// expects to reference V newReference,// updated reference int expectedStamp, / / expected flag int newStamp / / updated flag)

The cycle time is long and the cost is high. If spin CAS is not successful for a long time, it will bring a lot of execution overhead to CPU. If JVM can support the pause instructions provided by the processor, then the efficiency will be improved to a certain extent. Pause instructions have two functions. First, it can delay pipelined execution of instructions (de-pipeline), so that CPU will not consume too much execution resources. The delay time depends on the specific version of the implementation. On some processors, the delay time is zero. Second, it can avoid the CPU pipeline being emptied (CPU pipeline flush) caused by memory sequence conflict (memory order violation) when exiting the loop, thus improving the execution efficiency of CPU.

Only one atomic operation of a shared variable is guaranteed. When performing operations on a shared variable, we can use cyclic CAS to guarantee atomic operations, but when operating on multiple shared variables, cyclic CAS cannot guarantee the atomicity of the operation, so locks can be used, or there is a trick to merge multiple shared variables into one shared variable to operate. For example, if you have two shared variables, iAccord2JazeA, merge ij=2a, and then use CAS to manipulate ij. Starting with Java1.5, JDK provides AtomicReference classes to ensure atomicity between reference objects, and you can CAS multiple variables in one object.

4.2 use locking mechanism to implement atomic operation

The locking mechanism ensures that only the thread that acquires the lock can manipulate the locked area of memory. JVM implements many kinds of locking mechanisms, including biased locks, lightweight locks and mutexes. What is interesting is that in addition to biased locks, JVM uses cyclic CAS to achieve locks. When a thread wants to enter the synchronous block, it uses cyclic CAS to obtain the lock, and when it exits the synchronous block, it uses cyclic CAS to release the lock. For more information, see Synchronized in the article Java SE1.6.

At this point, the study of "what are the concurrency problems of Java" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.