In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "what are the problems about JVM"? interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what are the problems about JVM"?
Young gc, old gc, full gc and mixed gc can't tell each other apart.
The prerequisite for this problem is that you need to know the GC generation and why. This was mentioned in the previous article, if it is not clear, you can take a look.
Now let's answer this question.
In fact, GC is divided into two categories, namely Partial GC and Full GC.
Partial GC is a partial collection, which is divided into young gc, old gc and mixed gc.
Young gc: refers to the GC that collects only the younger generation.
Old gc: refers to the collection of old GC only.
Mixed gc: this is specific to the G1 collector, which refers to the collection of GC from the entire younger generation and some of the older generations.
Full GC is whole heap recycling, which refers to collecting the entire heap, including the younger generation, the older generation and, if there is a permanent generation, the permanent generation.
In fact, there is the term Major GC, which in "in-depth understanding of the Java virtual machine" refers to the old GC, that is, equivalent to old gc, but there are also a lot of data that it is equivalent to full gc.
And Minor GC, which refers to the younger generation of gc.
What is the trigger condition for young gc?
It is generally assumed that young gc will be triggered when the younger generation of eden is about to be full.
Why say it in general? Because some collectors' recycling implementations allow the following young gc to be executed before full gc.
For example, Parallel Scavenge, but there are parameters that can be adjusted so that it does not young gc.
There may be other implementations that do the same, but normally it's just as if the eden area is almost full.
There are two triggers for eden fast full, one is insufficient memory allocation for objects, and the other is insufficient memory allocation for TLAB.
What are the trigger conditions for full gc?
This trigger condition is a little too much, let's take a look.
When it comes to young gc, according to previous statistics, it is found that the average promotion size of the younger generation is larger than the remaining space of the old age, which will trigger full gc.
If there is a permanent generation, if the permanent generation is full, it will also trigger full gc.
Lack of space in the old age, large objects directly apply for allocation in the old age, if there is not enough space in the old age, it will trigger full gc.
Guarantee failure means promotion failure, the new generation of to cannot hold objects copied from eden and from, or the new generation of objects need to be promoted when their gc age reaches the threshold. If the old age cannot be put down, full gc will be triggered.
Executing commands such as System.gc (), jmap-dump, and so on will trigger full gc.
Do you know TLAB? Let's talk about it.
This starts with the memory request.
Generally speaking, the generated object needs to apply for memory space from the new generation in the heap, and the heap is globally shared, such as the new generation of memory is regular and divided by a pointer.
Memory is compact, and the new object creation pointer can move the object size size to the right. This is called pointer addition (bump [up] the pointer).
It is conceivable that if multiple threads are allocating objects, then the pointer will become a hot resource and need to be mutually exclusive, so the allocation efficiency will be inefficient.
So I created a TLAB (Thread Local Allocation Buffer), which applies for a memory area allocated by a thread.
This area allows only one thread to request an allocation object, allowing all threads to access this area of memory.
TLAB's idea is actually very simple, is to delimit an area to a thread, so that each thread only needs to apply for object memory in its own acre, and does not need to compete for hot pointers.
When this piece of memory is used up, you can apply for it.
This idea is actually very common, such as distributed generator, each time will not take a number, will take a batch of numbers, and then apply for a batch after use.
You can see that each thread has its own memory allocation area, and the shorter arrows represent the allocation pointer within the TLAB.
If this area is used up, you can apply for it.
However, the size of each application is not fixed, and it will be adjusted according to the historical information since the thread started. For example, if the thread has been allocating memory, then the TLAB is larger, and if the thread basically does not apply for memory allocation, the TLAB is smaller.
And TLAB wastes space. Let's take a look at this picture.
You can see that there is only one grid left in the TLAB, and the application object needs two frames. At this time, you need to apply for another TLAB, and the previous one will be wasted.
In HotSpot, a filled object is generated to fill this piece, because the heap requires linear traversal, the traversal process is to know the size of the object through the object header, and then skip this size to find the next object, so there can be no holes.
Of course, traversal can also be achieved through external records such as free linked lists.
And TLAB can only allocate small objects, while large objects still need to be allocated in the shared eden area.
So in general, TLAB is designed to avoid competition in object allocation.
Does PLAB know?
You can see that it is very similar to TLAB, PLAB is Promotion Local Allocation Buffers.
It is used when the younger generation is promoted to the old age.
When multithreading executes YGC in parallel, there may be a lot of objects that need to be promoted to the old age, when the pointer of the old era is "hot", so a PLAB is made.
First apply for a piece of space from the old freelist (free linked list), and then allocate memory in this space through pointer addition (bump the pointer), so that there is less competition for freelist and the allocation of space is fast.
Roughly the idea above is that each thread first applies for a piece of PLAB, and then allocates the promoted object in this piece of memory.
This is similar to TLAB's idea.
The real cause of concurrent mode failure
"in-depth understanding of the Java virtual machine": because the CMS collector cannot handle "floating garbage" (FloatingGarbage), it is possible that a "Con-current Mode Failure" failure may result in another full "Stop The World" Full GC.
This passage means that a Full GC is caused by throwing this error.
In fact, it was Full GC that caused the error. Let's take a look at the source code. The version is openjdk-8.
First of all, search for this mistake.
And find out who called report_concurrent_mode_interruption.
It was found in void CMSCollector::acquire_control_and_collect (...) That is called in this method.
Let's take a look at first_state: CollectorState first_state = _ collectorState
The enumeration is already clear, even before the end of cms gc.
The acquire_control_and_collect method is used by cms to execute foreground gc.
Cms is divided into foreground gc and background gc.
Foreground is actually Full gc.
So it was full gc when cms gc was still in progress that caused the error to be thrown.
The reason is that the allocation rate is too fast so that the heap is not enough to be recycled, so full gc is generated.
It is also possible that the threshold of the heap set by the initiating cms gc is too high.
Why is full GC single-threaded when concurrent mode failure occurs in CMS GC?
The following answers come from R University.
Because there are not enough development resources, I am lazy. It's that simple. There are no technical problems. Big companies have made their own internal optimizations.
So how did you steal this lazy guy in the first place? The troubled CMS GC has experienced many upheavals. It was originally designed and implemented as a low-latency GC for Sun Labs's Exact VM.
However, Exact VM lost the internal struggle with HotSpot VM for the authentic JVM of Sun, and CMS GC was later transplanted to HotSpot VM as the technical heritage of Exact VM.
While this migration is still in progress, Sun is starting to look a little tired; by the time CMS GC is fully ported to HotSpot VM, Sun is in a dying stage.
With the reduction of development resources and the loss of developers, the HotSpot VM development team at that time could not do much and could only pick the important ones to do. At this time, another GC implementation of Sun Labs, Garbage-First GC (G1 GC), has been available.
Compared with CMS,G1 which may be affected by fragmentation after a long run, it is considered to have more potential to incrementally organize / compress the data in the heap to avoid being affected by fragmentation.
As a result, there were not many development resources at that time, and some of them were invested in the project to turn G1 GC into production-and as a result, progress was slow.
After all, only one or two people are doing it. So at that time, there were not enough development resources to polish the details of various supporting facilities of CMS GC, and the parallelization of supporting backup full GC was delayed.
But there are bound to be students who wonder: doesn't HotSpot VM already have parallel GC? And there are several?
Let's take a look:
ParNew: parallel young gen GC, not responsible for collecting old gen.
Parallel GC (ParallelScavenge): parallel young gen GC, similar to ParNew but not compatible; also not responsible for collecting old gen.
ParallelOld GC (PSCompact): parallel full GC, but not compatible with ParNew / CMS.
So... That's what it's all about.
It is true that HotSpot VM already has a parallel GC, but the two are only responsible for collecting young gen during young GC, and only ParNew can be used with CMS
Although parallel full GC has a ParallelOld, it is not compatible with CMS GC and cannot be used as its backup full GC.
Why can't some old and new collectors be combined, such as ParNew and Parallel Old?
This picture was drawn by a member of HostSpot's GC team in 2008, when G1 was not yet available and was under development, so a question mark was drawn on it.
The answer is:
"ParNew" is written in a style... "Parallel Old" is not written in the "ParNew" style
HotSpot VM's own implementation of generational collectors has a framework that only implementations within the framework can work with each other.
And there is a developer he does not want to implement according to this framework, he wrote one himself, and the test results were good and then absorbed by HotSpot VM, which led to incompatibility.
I saw a very vivid answer before: just like the EMU can't wear a green car, the electricity and hooks don't match.
How can the new generation of GC avoid full heap scanning?
In the common generational GC, the memory set is used to record the addresses of objects that may be quoted by the new generation in the old age, so as to avoid full heap scanning.
There is an object precision in the above picture, one is card precision, and the card precision is called card table.
Divide the heap into many blocks, each with 512 bytes (card pages). An element in the byte array is used to represent a block. 1 represents a dirty block, and there are intergenerational references.
The implementation in Hotspot is the card table, which is maintained through the post-write barrier, and the pseudo code is as follows.
Cms needs to record references from the old to the younger generation, but the implementation of the write barrier does not do any conditional filtering.
That is, without judging that the current object is an old object and referencing a new generation object, the corresponding card table will be marked as dirty.
As long as the reference assignment will mark the object's card as dirty, of course, YGC scanning will only sweep the old card table.
This is done to reduce the consumption caused by the write barrier, after all, references are assigned very frequently.
What's the difference between the memory set of cms and the memory set of G1?
The implementation of the memory set of cms is the card table, or card table.
Usually the implemented memory set is points-out, we know that the memory set is used to record the intergenerational references from the non-collection area to the collection area, and its subject is actually the non-collection area, so it is points-out.
In cms, there are only card watches from the old age to the younger generation, which are used for the younger generation gc.
The G1 is based on region, so a points-into structure is added to the points-out card table.
Because a region needs to know which other region has pointers to itself, and then needs to know which card those pointers are in.
In fact, the memory set of G1 is a hash table,key that is the starting address of another region, and then value is a collection in which the index of the card table is stored.
Let's take a look at this picture and it's very clear.
For example, it is expensive to maintain a memory set every time a reference field is assigned, so the implementation of G1 takes advantage of logging write barrier (described below).
It is also an asynchronous idea, which will first record the changes in the queue, and when the queue exceeds a certain threshold, the background thread will take out and update the memory set all the time.
Why doesn't G1 maintain the memory set from the younger generation to the old?
G1 is divided into young GC and mixed gc.
Young gc will select all the region of the younger generation for collection.
Midex gc will select all the younger generation of region and some of the high collection income of the old region to collect.
Therefore, the region of the younger generation is within the scope of collection, so there is no need to record the cross-generational references from the younger generation to the old.
What methods did cms and G1 use to maintain the correctness of concurrency?
The previous article analyzed two necessary and sufficient conditions for concurrent execution to fail to mark:
Inserts a new object into a scanned object, that is, a reference from a black object to a white object.
Removed the reference from the gray object to the white object.
Cms and G1 break these two necessary and sufficient conditions through incremental update and SATB respectively, and maintain the correctness of the concurrency of GC threads and application threads.
Cms uses incremental updates (Incremental update), which breaks the first condition by marking the inserted white objects gray through the write barrier, that is, adding them to the tag stack and re-scanning during the remark phase to prevent missing marks.
G1 uses SATB (snapshot-at-the-beginning), which breaks the second condition by writing down the old reference relationship through the write barrier and then scanning the old reference relationship again.
This is already very clear from the English noun. To put it bluntly, at the beginning of GC, if an object is considered to be alive, it is tantamount to taking a snapshot.
And the newly assigned objects in the gc process are also considered to be alive. Each region maintains the TAMS (top at mark start) pointer, which is where prevTAMS and nextTAMS mark the beginning of the concurrent tag twice, respectively.
The Top pointer is the location of the newly allocated object in the region, so all the newly allocated objects in the area between nextTAMS and Top are considered to be alive.
On the other hand, the cms that uses incremental updates needs to reset all thread stacks and the entire younger generation in the remark phase, because the previous root has been added, so it needs to be re-scanned, which will be time-consuming if the younger generation has a large number of objects.
Note that this phase is STW, which is critical, so CMS also provides a CMSScavengeBeforeRemark parameter to force a YGC before the remark phase.
If G1 passes through SATB, it only needs to scan the old references of the SATB record in the final marking phase, which is faster than cms in this respect, but also because there is more floating garbage than cms.
What is logging write barrier?
The write barrier actually consumes the performance of the application and is the logic performed when referencing the assignment, which is done very frequently, so a logging write barrier is created.
Transfer some of the logic to be executed by the write barrier to the background thread for execution to mitigate the impact on the application.
In the write barrier, you only need to record a log message to a queue, and then other background threads will pull the information from the queue to complete the subsequent operation, which is actually asynchronous thinking.
Like SATB write barrier, each Java thread has a separate, fixed-length SATBMarkQueue, and only old references are pushed into the queue in the write barrier. When full, it will be added to the global SATBMarkQueueSet.
The background thread scans, processes if a certain threshold is exceeded, and starts tracing.
Logging write barrier is also used to maintain the write barrier of the memory set.
Briefly talk about the G1 recovery process.
From an overall point of view, G1 is divided into two stages, namely, concurrent marking and object copying.
Concurrent tagging is based on STAB and can be divided into four phases:
1, the initial tag (initial marking), this stage is STW, scan the root collection, mark the root directly reachable objects. Marking objects in G1 is recorded using an external bitmap, not the object header.
2. Concurrency phase (concurrent marking), which is concurrent with the application thread. The tracing starts from the root reachable object marked in the previous step, and all reachable objects are scanned recursively. STAB also records references to changes at this stage.
3. The final tag (final marking), which is STW and handles references in STAB.
4. Cleanup phase (clenaup), which is STW. The number of surviving objects in each region is counted according to the marked bitmap. If there is any region that does not survive at all, it will be recycled as a whole.
Object copy phase (evacuation), which is STW.
According to the result of the tag, select the appropriate reigon to form the collection set (collection set, that is, CSet), and then copy the CSet survival object to the new region.
The bottleneck of G1 lies in the object copy phase, which requires more bottlenecks to transfer objects.
Briefly talk about the cms recycling process
In fact, you can know several processes from the CollectorState enumeration of the previous question.
1, the initial tag (initial mark), this stage is STW, scan the root collection, mark the root directly reachable objects.
2. Concurrent marking (Concurrent marking), this stage is concurrent with the application thread. Tracing is started from the root of the previous step marked directly to reachable objects, and all reachable objects are scanned recursively.
3. Concurrency pre-cleaning (Concurrent precleaning), this stage is concurrent with the application thread, just to do some work for the re-marking phase, scan the dirty areas of the card table and objects newly promoted to the old age, etc., because the re-marking is STW, so share a little bit.
4. Interruptible pre-cleaning phase (AbortablePreclean), which is basically the same as the previous phase, is to share the work of re-tagging.
5, remark, this stage is STW, because the reference relationship of the concurrent phase will change, so we have to traverse the new generation of objects, Gc Roots, card tables, etc., to correct the tag.
6. Concurrency cleanup (Concurrent sweeping), this phase is concurrent with the application thread and is used to clean up garbage.
7. Concurrent reset (Concurrent reset), this phase is concurrent with the application thread to reset the internal state of the cms.
The bottleneck of cms is the relabeling phase, which takes a long time to rescan.
Cms write barrier is not only to maintain the card table, but also to maintain incremental updates?
As a matter of fact, there is only one copy of the card table, and it is certainly not enough to use incremental updates to support both YGC and CMS concurrency.
Each time YGC scans the reset card table so that the records of incremental updates are cleaned.
So also created a mod-union table, in the concurrent marking, if the YGC needs to reset the record of the card table, the corresponding location of the mod-union table will be updated.
In this way, the cms relabeling phase can combine the card table and mod-union table at that time to deal with incremental updates to prevent missing objects.
What are the two major goals of GC tuning?
They are minimum pause time and throughput, respectively.
Minimum pause time: because GC STW pauses all application threads, which is tantamount to stutter for users, reducing STW time is key for latency-sensitive applications.
Throughput: for some applications that are insensitive to latency, such as some backend computing applications, throughput is the focus. They do not pay attention to the time of each GC pause, but only focus on the low total pause time and high throughput.
For example:
Scheme 1: each GC pauses 100 ms, 5 pauses per second.
Scheme 2: each GC pauses 200 ms, twice per second.
Relatively speaking, the first of the two schemes has a low delay and the second has a high throughput, so it is basically impossible to have both.
Therefore, when tuning, we need to make clear the goal of the application.
How to tune GC
This question is easy to ask in the interview, hold on to the core answer.
Now they are all generational GC, and the idea of tuning is to allow objects to be recycled in the new generation as far as possible, to prevent too many objects from being promoted to the old age, and to reduce the distribution of large objects.
You need to balance the size of the generation, the number of garbage collections, and the pause time.
Need to carry on the complete monitoring to the GC, monitor the occupation size of each age, YGC trigger frequency, Full GC trigger frequency, object allocation rate and so on.
Then it is optimized according to the actual situation.
For example, if there is an inexplicable Full GC, it is possible that some third-party library transferred System.gc.
The frequent Full GC may be due to the fact that the CMS GC trigger memory threshold is too low, resulting in an inability to allocate objects.
There is also the threshold of the object's age promotion, the survivor is too small, and so on, the specific situation still has to be analyzed specifically, anyway, the core is unchanged.
At this point, I believe you have a deeper understanding of "what are the JVM issues"? you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.