Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand JVM ZGC garbage Collector

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about how to understand the JVM ZGC garbage collector. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

If some of the following concepts are not clear, you can first look at in-depth understanding of JVM-garbage collector and in-depth understanding of JVM-Shenandoah garbage collector.

ZGC (Z Garbage Collector) is a garbage collector developed by Oracle with low latency as its primary goal. It is based on dynamic Region memory layout, does not set age generation (temporarily), and uses read barrier, color pointer and memory multi-mapping technology to realize the collector of concurrent mark-finishing algorithm. The new addition of JDK 11, which is still in the experimental stage, is mainly characterized by the recovery of TB-level memory (up to 4T) and the pause time of no more than 10ms.

Dynamic Region

The Region of ZGC can have large, medium, and small capacities as shown in the figure:

Mini Region (Small Region): the capacity is fixed to 2MB and is used to place small objects smaller than 256KB.

Medium Region (Medium Region): the capacity is fixed to 32MB and is used to place objects greater than or equal to 256KB but less than 4MB.

Large Region (Large Region): the capacity is not fixed and can be changed dynamically, but it must be an integral multiple of 2MB to place large objects with 4MB or above. There is only one large object in each large Region, the minimum capacity can be as low as 4MB, and all large Region may be smaller than medium Region. Large Region is not reallocated in the implementation of ZGC, because copying a large object is very expensive.

Dyeing pointer technique

There are several tagging implementations for HotSpot virtual machines:

(1) record the tag directly on the object header (such as Serial collector)

(2) record the tag on a data structure independent of the object (for example, G1 and Shenandoah use a structure equal to the size of heap memory called BitMap to record the tag information.

(3) record the tag information directly on the pointer of the reference object (such as ZGC)

A coloring pointer is a technique that stores a small amount of extra information directly on the pointer. At present, in the 64-bit operating system under Linux, the high 18 bits cannot be used for addressing, but the remaining 46 bits can support 64T of space, and so far we hardly use so much memory. So ZGC takes out the highest 4 bits of the 46 bits to store 4 flag bits, and the remaining 42 bits can support 4T of memory, as shown in the figure:

The high 18 bits of the 64-bit pointer under Linux cannot be used for addressing, and all cannot be used.

Finalizable: indicates whether it can only be accessed through the finalize () method, but not in other ways

Remapped: indicates whether you have entered the redistribution set (that is, it has been moved)

Marked1, Marked0: indicates the tricolor flag status of an object

The last 42 is used to store the object address, supporting a maximum of 4T

Tricolor mark

In the concurrent reachability analysis algorithm, we use tricolor tags (Tri-color Marking) to mark whether the object has been accessed by the collector:

White: indicates that the object has not been accessed by the garbage collector. Obviously, at the beginning of the accessibility analysis, all the objects are white. If at the end of the analysis, they are still white objects, which means they are unreachable.

Black: indicates that the object has been accessed by the garbage collector and all references to the object have been scanned. The black object represents that it has been scanned, it is safe to survive, and if there are other object references pointing to the black object, there is no need to scan it again. It is impossible for a black object to point directly (without passing through a gray object) to a white object.

Gray: indicates that the object has been accessed by the garbage collector, but at least one reference on the object has not been scanned.

The scanning process of reachability analysis is actually a process in which the ripples with gray peaks advance from black to white, but the problem of "object disappearance" will occur in the process of concurrency, as shown in the figure.

According to the theory of object disappearance, object disappearance will occur only if it is satisfied at the same time:

The assignor inserts one or more new references from black objects to white objects.

The assignor removes all direct or indirect references from the gray object to the white object

To solve the problem of object disappearance, you only need to destroy one of them. at present, there are two common solutions:

Incremental update (Incremental Update): the incremental update destroys the first condition. When the black object inserts a new reference relationship pointing to the white object, the newly inserted reference is recorded. After the concurrent scan is over, the black object in the recorded reference relationship is re-scanned again. This can be simplified to mean that once a black object has a newly inserted reference to a white object, it changes back to a gray object.

Original snapshot (Snapshot At TheBeginning,SATB): the original snapshot destroys the second condition. When the gray object wants to delete the reference relationship pointing to the white object, record the reference to be deleted. After the concurrent scan is over, the gray object in the recorded reference relationship is re-scanned again. This can also be simplified to mean that regardless of whether the reference relationship is deleted or not, the search is based on the snapshot of the object graph at the beginning of the scan.

Whether the above is the insertion or deletion of reference relationship records, the record operation of the virtual machine is realized through the write barrier. CMS is marked concurrently based on incremental updates, while G1 and Shenandoah are implemented with original snapshots.

Three advantages of dyeing pointer

(1) once the living object of a Region is removed, the Region can be released and reused immediately without having to wait for all references to the Region in the entire heap to be corrected before cleaning, which makes it theoretically possible to complete the collection as long as there is an idle Region,ZGC. Shenandoah needs to wait until the end of the update phase to release the Region in the recycling set. If all the objects in the Region are alive, you need 1:1 space to complete the collection.

(2) coloring pointers can greatly reduce the number of memory barriers used during garbage collection, and ZGC only uses read barriers.

(3) the dye pointer has strong expansibility, and it can be used as an extensible storage structure to record more data related to object marking and relocation process, so as to further improve performance in the future.

Memory multiple mapping

ZGC uses memory multiple mapping (Multi-Mapping) to map multiple different virtual memory addresses to the same physical memory address, which is a many-to-one mapping, meaning that the address space that ZGC sees in virtual memory is larger than the actual heap memory capacity. If the flag bits in the coloring pointer are regarded as segment splitters of the address, as long as these different address segments are mapped to the same physical memory space, after multiple mapping conversion, you can use the coloring pointer to address normally, as shown in the figure:

The multiple mapping of ZGC is only a concomitant product of its use of dye pointer technology.

Reading barrier

The read barrier (Load Barrier) is used when objects are loaded from the heap. The main function of using the read barrier here is to check the tricolor flag bits on the pointer to determine whether the object has been moved. If it is not accessible directly, you need to "self-heal" if it is moved (object access will slow down, but only once), and when the "self-healing" is completed, the subsequent access will not slow down.

The read-write barrier can be understood as the "AOP" operation of object access.

ZGC operation process

The operation process of ZGC can be divided into the following four major stages:

Concurrent markup (Concurrent Mark): like G1 and Shenandoah, concurrent tagging is the stage of traversing the object graph for reachability analysis, and its initial and final tags also have a brief pause, and the whole marking phase will only update the Marked 0 and Marked 1 flag bits in the stained pointer.

Concurrent reserve reallocation (Concurrent Prepare for Relocate): this phase requires statistics based on specific query conditions to figure out which Region to be cleaned up during this collection process, and to compose these Region into a redistribution set (Relocation Set). ZGC scans all Region each time it is recycled, saving the maintenance cost of the memory set in G1 with a wider range of scanning costs.

Concurrent redistribution (Concurrent Relocate): redistribution is the core stage of ZGC execution, which copies the living objects in the redistribution set to the new Region, and maintains a Forward Table for each Region in the redistribution set, recording the transition relationship from the old object to the new object. The ZGC collector can clearly know whether an object is in the redistribution set only from the reference. If the user thread concurrently accesses the object in the redistribution set at this time, the access will be intercepted by the preset memory barrier, and then immediately forward the access to the newly copied object according to the forwarding table record on Region, while updating the value of the reference to point directly to the new object. ZGC calls this behavior the "Self-Healing" capability of pointers.

ZGC's dye pointer slows down only the first time it accesses old objects because of its "Self-Healing" ability, while Shenandoah's Brooks forwarding pointer slows down every time.

Once the living objects of a Region in the redistribution set have been copied, the Region can immediately release the allocation for the new object, but the forwarding table has to be reserved and cannot be released, because there may still be access to the forwarding table.

Concurrent remapping (Concurrent Remap): what remapping does is correct all references in the heap to old objects in the redistribution set, but object references in ZGC have a "self-healing" feature, so this remapping operation is not urgent. ZGC cleverly merges the work to be done in the concurrent remapping phase into the concurrent marking phase in the next garbage collection cycle, which traverses all objects anyway, thus saving the overhead of traversing the object graph.

Problems in ZGC

The biggest problem with ZGC is floating garbage.

Floating garbage

The pause time of ZGC is below 10ms, but the execution time of ZGC is still much longer than this time. If the whole process of ZGC needs to be executed for 10 minutes, during this period, due to the high rate of object allocation, a large number of new objects will be created. It is very difficult for these objects to enter the current GC, so they can only be recycled at the next GC. These objects that can only be collected by the next GC are floating garbage.

ZGC has no concept of generation and requires a full heap scan every time, resulting in some "life-and-death" objects not being recycled in time.

Solution

At present, the only way is to increase the capacity of the stack to give the program more respite time, but this is also a temporary solution. If you need to solve this problem fundamentally, you still need to introduce generational collection, so that new objects are created in a special area, and then collect more frequently and faster specifically for this area.

Official test data

Pause time

In ZGC's pause time test, it is not in the same order of magnitude compared with other collectors, as shown in the figure:

Throughput

In terms of ZGC's "weakness" throughput, ZGC with low latency as the primary goal has reached 99% of Parallel Scavenge with high throughput, directly surpassing G1, as shown in the figure:

Advantages and disadvantages

Advantages: low pause, high throughput, and small extra memory consumption during ZGC collection

Disadvantages: floating garbage

The above is the editor for you to share how to understand the JVM ZGC garbage collector, if you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report