In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
The main content of this article is to explain "what is the reason why stop the world is needed in the process of GC". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what is the reason why you need stop the world in the process of GC?"
Common recovery algorithms
Mark-clear (mark-sweep)
Advantages: low cost, high speed, disadvantages: clean up in place, so you can't avoid the problem of fragmentation.
Tag-copy (mark-copy)
Pros: memory space after GC is a contiguous disadvantage: available memory space is halved
Marking-finishing (mark-compact)
Advantages: no fragmentation problem, memory space available size is not halved disadvantages: low efficiency, high overhead
Reference-count (reference counting)
The first three garbage collection algorithms are indirect, and they all need to traverse the surviving object graph from the known root set in order to determine all the living objects. In reference counting, the viability of objects can be directly determined by the creation or deletion of reference relationships, so that there is no need to find out all the living objects through heap traversal like the tracking collector, and then reverse determine the untraversed garbage objects. Advantages: direct traversal, fast speed disadvantages: unable to solve the circular reference problem why GC should be recycled from generation to generation
For traditional, basic GC implementations, because they are "stop-the-world" throughout the work of GC, it is important to find a way to shorten the length of time GC works at a time. If it takes too long to collect the entire GC heap, why not just collect some of it?
In the case of a small number of objects, traceable garbage collectors (especially replicated garbage collectors) can collect garbage most efficiently. However, the existence of longevity objects will affect the efficiency of recycling, because the collector either marks them repeatedly, tracks them, or repeatedly copies them from one half to another.
According to the actual operation of the program, jvm has two hypotheses about garbage collection (that is, two rules of thumb).
1) weak generational hypothesis: the life cycle of the vast majority of objects is very short, and the vast majority of objects are permanent. 2) strong generational hypothesis: the more objects go through the garbage collection process, the more difficult it is to die.
Nowadays, most jvm are designed according to the theory of generational recycling. The jvm heap is divided into at least two regions, young and old. The new generation stores objects that have just been created, while in the old days, it stores a small number of surviving objects.
In the Cenozoic era, the object is "life and death". Gc generally adopts the replication algorithm, because the outstanding feature of this algorithm is that it only cares about what needs to be replicated, and the reachability analysis only uses tagging and copying few living objects. You don't have to traverse the entire heap because most of it is discarded. But its disadvantage is also obvious, which needs to waste half of the memory space.
Therefore, according to the characteristics of old objects, the label-clean (some algorithms will bring compression) strategy is generally adopted. If the replication algorithm is used in this part, on the one hand, there is no extra space to guarantee it, on the other hand, because of the high survival rate, the cost of replication increases significantly.
An object from birth to death
After an object is generated, it will be allocated on the stack first. If there is no lower allocation on the stack, the Eden area will enter the surivivor area. After a garbage collection, the survivor area will enter another survivor. At the same time, some objects in the Eden area will also enter another survivot, and when they are old enough, they will enter the old area, which is a logical moving process of the whole object.
When will it be distributed on the stack and when will it be allocated in the area of Eden?
1 allocation on stack
Allocation on the stack:
Thread private small objects: small objects, thread private
No escape: it's used in a piece of code, and no one knows it except this code.
Support for scalar substitution: it means to use ordinary attributes and to replace objects with ordinary types is called scalar substitution
Allocation on the stack will be a little faster than allocation on the heap. If there is no allocation on the stack, priority will be given to local allocation, that is, thread local allocation TLAB (Thread local Allocation Buffer): in the Eden area, many threads will allocate objects to it, but when we allocate objects, we will certainly carry out space requisition, and the efficiency of multi-thread synchronization will be reduced, so the TLAB mechanism is designed.
Occupies eden, default is 1%, and takes up 1% of the space in the Eden area. This space is called thread-specific. When allocating objects, it is allocated to this unique space of the thread first.
When multithreading, you can apply for space without competing for eden to improve efficiency.
2 the old age
When does the object enter the old age?
How many times has it been recycled into the old age?
Exceeded the number of times specified by XX:MaxTenuringThreshold (YGC)
Parallel Scavenge entered the old era for 15 times.
CMS has entered the old era for six times.
G1 entered the old era for 15 times.
It is said on the Internet that the number of times can be increased, but this is impossible.
Dynamic age judgment
In order to be able to adapt to the memory conditions of different programs, the virtual machine does not always require that the age of the object must reach MaxTenuringThreshold in order to promote the old age. If the total size of all objects of the same age in the Surivivor space is more than half of the Survivor space, the objects whose age is greater than or equal to that age can directly enter the old age without waiting for the age required in the MaxTenuringThreshold.
Copy back and forth between two Survivor as long as more than 50% of the oldest directly into the old area, that is, it is not necessary to get 15 years old.
If so many objects in S1 are copied into S2 and more than 50 percent of them are copied into S2, the whole object is copied into S2 all of a sudden. After a garbage collection, the total number of objects at this time has exceeded half of S2, and some of the oldest objects directly enter the elderly area. This is called dynamic youth judgment.
The big object went straight into the old age.
The so-called large objects are Java objects that require a large amount of continuous memory space. The most typical large objects are very long strings and arrays. When large objects often occur, garbage collection is triggered in advance to obtain enough continuous memory space when there is still a lot of memory space.
Start first new an object, then allocates it on the stack, if it can be allocated on the stack, the stack pops up directly, and the pop-up ends. If it cannot be allocated on the stack, it determines whether the object is a large object. If it is a large object, it goes directly into the old age. If it ends after FGC, if not, enter the thread local allocation (TLAB), in any case, it will go to the Eden area for GC clearance, if the removal is completed. It ends directly, if it is not cleared, enter S1 to continue GC clearance, if the age is old enough to enter old area, if not old enough to enter S2, then S2 will continue to clear GC, either the age has reached or the dynamic age has reached
MinorGC/YGC: triggered when the younger generation runs out of space
MajorGC/FullGC: triggered when the old era can no longer allocate space, and the new generation can be recycled at the same time.
Problem-cross-band referenc
Objects in the younger generation may refer to objects from the old age, so how to avoid full scanning of the old age when garbage collection is carried out by the younger generation?
Solution: memory set (remembered set) corresponds to implementation card table (CardTable)
GC Roots = Cenozoic GCRoot + RememberedSet
Common garbage collector
New generation collectors: Serial, ParNew, Parallel Scavenge
Old Age Collector: Serial Old, CMS, Parallel Old
New and old collectors: G1, ZGC, Shenandoah
Each garbage collector is not operated independently. The following figure shows that there is a connection between the garbage collector, which can be used in collaboration:
Serial
Single thread serial replication algorithm
Serial Old
The old collector, like Serial, is single-threaded, except that the algorithm uses tag-collation (Mark-Compact).
Parallel Scavenge
The multithreaded version of the Serial collector, the parallel collector. Replication algorithm
Parallel Old
The old collector is the old version of Parallel Scavenge. The algorithm is replaced by Mark-Compact.
ParNew
Similar to Parallel, it is designed to work with cms. Replication algorithm. A new generation of parallel collector.
CMS
Concurrent mark sweep concurrent flag cleanup, with the goal of obtaining the shortest reclaim pause time. Concurrent recyclers in the old days. CMS and tricolor marking algorithm
CMS (Concurrent Mark Sweep) is a landmark garbage collector. Why do you say that? Because before it, GC thread and user thread cannot work at the same time, even if it is Parallel Scavenge, it is only when GC starts multiple threads to collect in parallel, the whole process of GC still has to pause the user thread, that is, Stop The World. The consequence of this is that the Java program will stutter for a while, which reduces the response speed of the application, which can not be accepted by the program running on the server.
Why pause the user thread when GC?
First of all, if the user thread is not paused, it means that garbage will continue to be generated during the period and will never be cleaned up.
Secondly, the operation of the user thread will inevitably lead to changes in the reference relationship of the object, which will lead to two situations: missing and mismarking.
Missing mark
Originally, it is not garbage, but in the process of GC, the user thread modifies its reference relationship, so that the GC Roots is unreachable and becomes garbage. This situation is a little better, nothing more than some floating garbage, next time GC will clean it up.
Wrong mark
It was originally garbage, but in the process of GC, the user thread will repoint the reference to it, and if GC reclaims it, it will cause the program to run wrong.
How does CMS solve these problems? How does it work concurrently between GC thread and user thread?
It can be divided into four main steps
Initial mark
Mark only the objects directly associated with GC Root for a short time, STW
Concurrent tagging
Traversing the entire object graph starting from the objects directly associated with the GC Root, in parallel with the user thread
Relabel
Fixed concurrent markers, tricolor markers. STW
Concurrent cleanup
Using the tag removal algorithm, there is a fragment problem.
Why pause the user thread when GC?
First of all, if the user thread is not paused, it means that garbage will continue to be generated during the period and will never be cleaned up.
Secondly, the operation of the user thread will inevitably lead to changes in the reference relationship of the object, which will lead to two situations: missing and mismarking.
Missing mark
Originally, it is not garbage, but in the process of GC, the user thread modifies its reference relationship, so that the GC Roots is unreachable and becomes garbage. This situation is a little better, nothing more than some floating garbage, next time GC will clean it up.
Wrong mark
It was originally garbage, but in the process of GC, the user thread will repoint the reference to it, and if GC reclaims it, it will cause the program to run wrong. Tricolor marking algorithm
Why can CMS GC threads work with user threads? Using tricolor marking to solve the problem
The process of marking is roughly as follows:
At first, all the objects were white and were not accessed.
Sets the objects directly associated with the GC Roots to gray.
Traverses all references to the gray object, setting the gray object itself to black and the reference to gray.
Repeat step 3 until there are no gray objects.
At the end, the black object survives and the white object is recycled.
The premise that this process is executed correctly is that no other thread changes the reference relationship between objects. However, in the process of concurrent marking, the user thread is still running, so missing and mismarking will occur.
Missing mark
Suppose GC is already traversing object B, and the user thread performs the operation of A.B=null, cutting off the reference from A to B.
After the implementation of A.B=null, B, D, and E can all be recycled, but because B has turned gray, it will still be used as a living object and continue to traverse.
The end result is that the current round of GC will not recycle B, D, E, and save it for the next GC, which is also part of the floating garbage.
In fact, this problem can still be solved through the "write barrier", as long as the write barrier is added when A writes B, the records of B being cut off are recorded, and they can be marked as white when they are re-marked.
Wrong mark
Assuming that the GC thread has traversed to B, the user thread does the following:
B. references to B.During nullplash B to D are cut off. References to A.xxdestroy Dentax raceme A to D are established
When will this situation be established? If an object loses its reference, it can be referenced by other objects.
He is using reachability analysis from B.D = null, but other threads still use D, so he can still quote D.
References from B to D are cut off, and references from A to D are established.
At this point, the GC thread continues to work, because B no longer refers to D, although A refers to D again, but because A has been marked black, GC will no longer traverse A, so D will be marked white and will finally be treated as garbage collection.
You can see that the result of the wrong mark is much more serious than the leaking table. Floating garbage can be cleaned up next time by GC, and recycling the objects that should not be recycled will cause the program to run wrong.
Mislabeling occurs only if the following two situations are met:
As long as any condition is broken, the problem of wrong standard can be solved.
Original snapshot and incremental UPDAT
The original snapshot breaks the first condition: when the reference of the gray object to the white object is broken, the reference relationship is recorded. When the scan is over, scan again with these gray objects as the root. This means that regardless of whether the reference relationship is deleted or not, it will be scanned according to the snapshot of the object graph at the beginning of the scan.
The incremental update breaks the second condition: when a black-to-white reference is established, the new reference relationship is recorded, and then re-scanned with the black objects in these records as the root after the scan is over. It is equivalent to a black object that becomes a gray object once a reference to a white object is established.
Write barrier this write barrier does not refer to the write barrier in concurrent programming. The write barrier here refers to the addition of some processing before and after the attribute assignment, similar to AOP.
The solution adopted by CMS is: write barrier + incremental update to achieve, breaking the second condition.
When a black-to-white reference is established, the reference relationship is recorded through the write barrier, and then re-scanned with the black object in the reference relationship as the root after the scan is over.
Why pause the user thread when GC?
Is there any way to eliminate STW completely? It seems that when it is re-marked, it can not avoid the wrong mark, so it is necessary to STW
|-references from black to white are updated incrementally. |
Wrong mark-- > | |-- > write barrier
| |-references that point to white are all destroyed by the original snapshot |
Weak trichromatic invariant: all white objects referenced by black objects are protected by gray.
Strong tricolor invariant: there is no pointer to execute a white object from a black object.
A solution based on the original snapshot: a weak tricolor invariant that marks the target of An as gray.
Solution based on incremental update: strong trichromatic invariant, marking the target of An as black.
Shortcomings of CMS
Although CMS is a landmark garbage collector that sets a precedent for GC threads and user threads to work at the same time, CMS is never the default garbage collector no matter which version of JDK it is. The reason is that CMS is not perfect and has some shortcomings.
CMS fragmentation problem, when there is no free space, use SeralOld single thread to organize the space, why not Parallel Old?
Garbage collector set
Serial collector: serial + serial old
Parallel collector: Paraller Scanvenge + Paraller Old
Concurrent tag cleanup collector combination: ParNew + CMS + serial old
G1 garbage collector
Introduced after Java 7 update4, Jdk9 defaults to
Logical generation, physical non-generation
G1 divides the Java heap into independent regions of equal size, and the JVM can have up to 2048 Region. The general Region size is equal to the heap size divided by 2048. For example, if the heap size is 4096m, the Region size is 2m
The operation of the G1 collector:
Initial tag (Initial Marking): marks the object to which the GC Roots can be directly associated, and modifies the value of the TAMS pointer so that when the user thread runs concurrently in the next phase, the new object can be correctly allocated in the available Region, which requires a short pause thread, but is done synchronously when borrowing Minor GC, so there is actually no additional pause at this stage
Concurrent marking (Concurrent Marking): start with GC Roots to analyze the reachability of objects in the heap and recursively scan the object graph in the whole heap to find out the objects to be recycled. This stage takes a long time, but can be executed concurrently with user programs.
Final Final Marking: another brief pause is made to the user thread, and the user processes the last small amount of SATB records left over after the end of the concurrent phase
Filter recovery (Live Data Counting and Evacuation): responsible for updating the statistics of Region, sorting the recovery value and cost of each Region, and making a recovery plan according to the expected pause time of the user lock. You can only select any number of Region to constitute the collection, and then assign the surviving objects of the part of the Region that determines the recovery to the empty Region, and then clean up all the space of the entire Region.
According to the expected pause time, the estimated proportion of recycling may not be able to keep up with the allocation of objects. And then trigger Full GC.
A picture allows you to understand JVM's garbage collection algorithm in detail.
At this point, I believe you have a deeper understanding of "what is the reason for the need for stop the world in the process of GC?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.