Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the differences between G1 garbage collector and CMS

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article focuses on "what is the difference between G1 garbage collector and CMS". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn the difference between G1 garbage collector and CMS.

1. Region partition

In the garbage collector before G1, the heap area is mainly divided into Eden area, Old area and Survivor area. For Eden,Survivor, it is called "young generation garbage collection" for the recycling process. And the younger generation and the older generation have continuous memory space, respectively. The size of G1 dividing the heap into several Region,Region can be set by the G1HeapRegionSize parameter, which must be a power of 2, and the range is allowed to be 1Mb to 32Mb. JVM calculates the partition size based on the average of the initial and maximum values of the heap memory, and the average heap size is divided into about 2000 Region. Once the partition size is set, it will not change after startup.

Eden regions (younger generation-Eden area)

Survivor regions (younger generation-Survivor area)

Old regions (old age)

Humongous regions (Giant object area)

Free resgions (unassigned area, also known as available partition)

1) G1 is still recycled by generations, but the memory of different generations is not necessarily continuous, and the number of Region occupied by different generations is not necessarily fixed. (it is not recommended to explicitly set the size of the younger generation through the relevant options. Will overwrite the pause time target). The number of Eden,Survivor in the younger generation will change with each GC.

2) the partition is not fixed which generation it belongs to, so for example, after a ygc, the partition of the original Eden will become a free available partition, and then it may also be used to allocate giant objects, become H zone, and so on.

3) the giant object in G1 refers to an object that occupies more than 50% of the capacity of Region. The Humongous area is specially used to store giant objects. If an H zone cannot hold a giant object, it will be stored through several consecutive H partitions. Because the transfer of giant objects will affect the efficiency of GC, when the concurrent marking phase finds that giant objects are no longer alive, they will be recycled directly. Ygc also recycles giant objects in some cases.

4) partitions can make effective use of memory space, because the collection as a whole uses "mark-sort", and the Region is based on the "copy" algorithm. After GC, the surviving objects are copied to the available partitions (unallocated partitions), so there is no space debris.

5) G1, like CMS, also adjusts (increases) the heap space based on the calculation of heap size in an fullgc, for example. However, compared to executing fullgc,G1 GC, when objects cannot be allocated or giant objects cannot get contiguous partitions to allocate space, it is a priority to try to expand heap space to get more available partitions. In principle, G1 will calculate the time it takes to execute GC, and try its best to reduce the time spent on GC (including ygc,mixgc). If possible, it will continue to expand the heap space to meet the needs of object allocation and transfer.

6) because G1 provides "predictable pause time" and is a heuristic algorithm based on G1, G1 estimates how many partitions the younger generation needs and how many more partitions need to be reclaimed. The opportunity for younggc to trigger is when the number of Eden partitions reaches the upper limit. Younggc reclaims all Eden and survivor areas at one time. The surviving objects will be transferred to another new survivor zone or old zone, and if the transferred target partition is full, the availability zone will be marked as S or O zone.

2. Important data structures of G1

TLAB (Thread Local Allocation Buffer) native thread buffer

G1 GC will enable Tlab optimization by default. Its function is that in the case of concurrency, CAS-based exclusive threads (mutator threads) can first allocate objects in a memory area (Eden that belongs to the Java heap), but because it is the exclusive memory area of Java threads, there is no lock competition, so the allocation speed is faster, and each Tlab is exclusive to a thread. If the object to be allocated is judged to be a giant object, TLAB is not used.

PLAB (Promotion Local Allocation Buffer) Promotion Local allocation buffer

In younggc, objects transfer (copy) all surviving objects in the Eden area to the S zone partition. There will also be S-area object promotion (Promotion) to the old age. The threshold for this decision for promotion can be set through MaxTenuringThreshold. The promotion process, whether promoted to S or O, is carried out in the PLAB of the GC thread. Each GC thread has a PLAB.

Collection Sets (CSets) Collection to be collected

A collection of region to be recycled in the GC. Each generation of Region may be stored in the CSet. Living objects in CSet are moved (copied) in gc. The region in the CSet becomes an available partition after GC.

Remembered Sets (RSets) memorized set

A set of memories exists in each partition, and there is only one RSet per partition. It stores the references of objects in other partitions to objects in this partition, which is a points-in structure. When ygc, as long as the old area object in the RSet is scanned for references to the local young area, there is no need to scan all old areas. When mixed gc, scan the RSet of the Old area, and the references of other old areas to this old section do not have to scan all old areas. The efficiency of GC is improved. Because GC scans all young zone objects each time, RSet is used only when scanning old references young,old references old.

Card Table card table

The Java heap is divided into regions of equal size, and this small area (typically size in 128to512byte) is treated as Card, while Card Table maintains all Card. The structure of Card Table is an array of bytes, and Card Table maps a Card with single-byte information. When an object is stored in a Card, the Card is called dirty card. For some hot spots, Card will be stored in Hot card cache. Like Card Table, Hot card cache is a global structure.

3. Comparison of G1 and CMS 3.1 CMS process

The CMS collector works only for old-fashioned collections and is based on the tag-cleanup algorithm, and its operation is divided into four steps:

The initial tag (CMS initial mark) monopolizes the CPU (STW), marking only the objects with which GCroots can be directly associated

Concurrent tags (CMS concurrent mark) can be executed in parallel with user threads to mark all reachable objects

CMS remark monopolizes CPU (STW) to correct the junk objects generated by the running of user threads in the concurrent marking phase.

Concurrent cleanup (CMS concurrent sweep) can be performed in parallel with user threads to clean up garbage

Among them, the two steps of initial tagging and relabeling still require Stop-the-world. The initial tag only marks the objects to which GC Roots can be directly associated, which is very fast, and the concurrent marking stage is the process of GC Roots Tracing, while the re-marking phase is to correct the marking records of that part of the object that changes due to the continued operation of the user program during the concurrent marking period. The pause time in this stage is generally slightly longer than the initial stage, but much shorter than the concurrent marking time.

3.2 CMS benefits

Concurrent collection, low pause

3.3 CMS shortcomings

1) very sensitive to CPU: although it will not cause the user thread to stop during the concurrency phase, it will slow down the application because it takes up some threads

2) unable to handle floating garbage: in the last step of concurrent cleaning, user thread execution will also generate garbage, but this part of the garbage is after the tag, so you have to wait until the next gc to clean it up. This part of the garbage is called floating garbage.

3) CMS uses the "mark-clean" method to generate a large amount of space debris: when there are too many fragments, it will bring problems to the allocation of large object space, and there will be a lot of space in the old age, but can not find enough continuous space to allocate the current object, so FullGC has to be triggered in advance. In order to solve this problem, CMS provides a switch parameter (- XX:+UseCMSCompactAtFullCollection is enabled by default). Used for defragmentation after the completion of the FullGC, but the memory defragmentation process is not concurrent and can lead to longer pause times

3.4 G1 YoungGC

The younger generation of garbage collection will only recycle Eden and Survivor areas. In YGC, G1 first stops the execution of the application (Stop-The-World), and G1 creates a Collection Set, which refers to the collection of memory segments that need to be recycled. The collection of memory segments in the younger generation recycling process includes all the memory segments in the younger generation Eden area and Survivor area.

1) in the first stage, scan the root. The root refers to the object that the static variable points to, the local variable on the chain of method calls being executed, and so on. The following reference, together with the external reference of the RSet record, serves as an entry for scanning for living objects.

2) in the second stage, update RSet. Process the card in dirty card queue and update RSet. After this phase is completed, RSet can accurately reflect the old reference to the object in the memory segment.

3) the third stage, dealing with RSet. Identify the objects in the Eden pointed to by the old objects, and the objects in the pointed Eden are considered to be living objects.

4) in the fourth stage, copy the object. At this stage, the object tree is traversed, and the surviving objects in the Eden memory segment are copied to the hollow memory segment in the Survivor area. If the age of the surviving object in the Survivor memory segment does not reach the threshold, the age will be increased by 1, and if it reaches the threshold, it will be copied to the hollow memory segment in the Old area. If there is not enough Survivor space, some of the data in Eden space will be promoted directly to the old space.

5) the fifth stage, deal with citation. Handle references such as Soft,Weak,Phantom,Final,JNI Weak. In the end, the data in the Eden space is empty, the GC stops working, and the objects in the target memory are continuously stored without fragmentation, so the replication process can achieve the effect of memory defragmentation and reduce fragmentation.

3.5 G1 concurrent marking

When the proportion of the entire heap size in the jvm stack space reaches the IHOP threshold-XX:InitiatingHeapOccupancyPercent (the default is 45%), G1 starts a mixed garbage collection cycle. Mix GC not only carries out normal new generation garbage collection, but also reclaims some of the old partitions marked by background scanning threads. Global concurrency marking is performed before Mix GC.

1) initial tag (InitingMark): Mark GC Roots, will STW, and generally reuse the pause time of YoungGC. The initial tag sets the NTAMS value for all partitions.

2) Root partition scan (RootRegionScan): according to the GC root elements determined in the initial marking phase, scan the region where these elements are located, get references to the old age, and mark the referenced objects. This phase is executed concurrently with the application thread, that is, there is no STW pause and must be completed before the next younger generation of GC starts.

3) concurrent marking (ConcurrentMark): traverses the entire heap to find all reachable living objects. If all objects in the area object are found to be garbage, the area will be recycled immediately. This phase is executed concurrently with the application thread and is also allowed to be interrupted by the younger generation of GC.

4) final marking (Remark): there is a STW pause at this stage to complete the marking cycle. G1 clears the SATB buffer, tracks live objects that are not accessed, and handles references.

5) cleanup phase (Clean UP): this is the last sub-phase, and G1 will have a STW pause when performing statistics and cleaning RSet. During the statistical process, the fully idle region is marked, as well as the candidate region suitable for mixed mode GC. Part of the cleanup phase is executed concurrently, such as when the free region is reset and added to the free list.

After the cleanup phase, the surviving objects are also transferred (replication algorithm) to other available partitions, so the current partition becomes the new available partition. Replication transfer is mainly to solve the problem of fragmentation within the partition.

3.6 G1MixedGC

1) after the end of the concurrent tag, the memory segments that were 100% garbage in the old days were recycled, and the memory segments that were partly garbage were calculated. By default, these old memory segments are recycled eight times (which can be set through-XX:G1MixedGCCountTarget).

2) the mixed collection (Collection Set) includes 1/8 old memory segments, Eden memory segments, and Survivor memory segments. The algorithm of hybrid recycling is exactly the same as that of the younger generation, except that the collection has more memory segments of the old era. For the specific process, please refer to the younger generation recycling process.

3) since the memory segments in the old days were recycled eight times by default, G1 will give priority to the garbage-rich memory segments. The higher the proportion of garbage in memory segments, the more it will be recycled first. And there is a threshold that determines whether memory segments are reclaimed. -XX:G1MixedGCLiveThresholdPercent, which defaults to 65%, which means that garbage accounts for 65% of memory segments before it can be recycled. If the proportion of garbage is too low, it means that the proportion of living objects is high, and it will take more time to copy.

4) mixed recycling does not have to be carried out eight times. There is a threshold-XX:G1HeapWastePercent, with a default value of 10%, which means that 10% of the entire heap memory is allowed to be wasted, which means that if it is found that the percentage of garbage that can be recycled is less than 10% of heap memory, mixed recycling will no longer be carried out. Because GC takes a lot of time but reclaims very little memory.

3.7 G1 Featur

1) parallelism and concurrency: G1 can make full use of the hardware advantages of multi-CPU and multi-core environment, and use multiple CPU to shorten the Stop-the-world pause time. Some other collectors originally need to pause the GC operations executed by Java threads, and the G1 collector can still keep Java programs running concurrently.

2) generational collection

3) Spatial integration: different from CMS's mark-removal algorithm, G1 is a collector based on tag-finishing algorithm as a whole, and locally (between two Region) based on "replication" algorithm. In any case, both algorithms mean that G1 does not produce memory space fragments during operation, and can provide regular available memory after collection. This feature makes it easier for the program to run for a long time, and when allocating large objects, the next GC will not be triggered in advance because the continuous memory space cannot be found.

4) predictable pause: this is another major advantage of G1 over CMS. Reducing pause time is a common concern of G1 and CMS, but in addition to pursuing a low pause, G1 can also build a predictable pause time model, so that the time spent on garbage collection should not exceed N milliseconds within a specified length of M milliseconds.

At this point, I believe that everyone on the "G1 garbage collector and CMS what is the difference between" have a deeper understanding, might as well to practical operation it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report