What are the contents of CG management and principles? 07/15 Update SLTechnology News&Howtos

What are the contents of CG management and principles?

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly talks about "what are the management and principles of CG". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what are the contents of CG management and principles?"

Basic concepts of GC

GC: GC itself has three kinds of semantics. The following needs to bring different semantics according to specific scenarios:

Garbage Collection: garbage collection technology, noun.

Garbage Collector: garbage collector, noun.

Garbage Collecting: garbage collection action, verb.

Mutator: the role of producing garbage, that is, our application, garbage maker, allocate and free through Allocator.

TLAB: short for Thread Local Allocation Buffer, CAS-based exclusive threads (Mutator Threads) can first allocate objects to a piece of memory in Eden. Because there is no lock competition in the memory area exclusive to Java threads, it is faster to allocate, and each TLAB is exclusive to a thread.

Card Table: translated as a card table in Chinese, it is mainly used to mark the status of card pages. Each card table item corresponds to a card page. When an object reference in the card page has a write operation, the write barrier will mark the state of the card table where the object is located to dirty. The essence of the card table is to solve the problem of intergenerational reference.

JVM memory partition

As you can see from the official website of JCP (Java Community Process), the latest version of Java is Java 16, the future Java 17 and the current Java 11 and Java 8 are LTS versions, and the JVM specification is also changing with iterations. Since this article mainly discusses CMS, we still put the memory structure of Java 8 here.

GC mainly works in the Heap area and the MetaSpace area (the blue section above). In Direct Memory, if DirectByteBuffer is used, then GC is managed indirectly through Cleaner#clean when there is not enough memory allocated. (because although memory allocation and recycling are not directly managed by JVM, references are still stored in JVM's heap and memory objects can be cleared indirectly by references.)

Any automatic memory management system faces the steps of allocating space for new objects and then collecting garbage object space. Let's take a look at these basics.

The operation of assigning the object address in the object Java mainly uses Unsafe to call the allocate and free methods of C #. There are two ways to assign the address:

Free linked list (free list): changes a random IO into a sequential IO through additional storage to record idle addresses, but results in additional space consumption and additional addressing time at the O (1) level.

Collision pointer (bump pointer): using a pointer as a demarcation point, when you need to allocate memory, you only need to move the pointer to the idle end at a distance equal to the size of the object, which is more efficient, but the usage scenario is limited. (memory allocation mechanism that must be organized)

Identify rubbish

Reference counting (Reference Counting): the reference of each object is counted. Every time there is a place to reference it, the counter + 1, the reference expires-1, the reference count is put in the object header, and the object greater than 0 is considered to be a living object. Although the problem of circular reference can be solved by the Recycler algorithm, in a multithreaded environment, reference count changes also have to be synchronized with expensive operations, and the performance is low, which will be adopted by early programming languages.

Reachability analysis, also known as reference chain method (Tracing GC): starting from GC Root to search for objects, the objects that can be searched are reachable objects, which is not enough to judge whether the objects are alive / dead. It needs to be marked many times to determine more accurately, and the objects outside the whole connected graph can be recycled as garbage. At present, this algorithm is adopted by the mainstream virtual machines in Java.

Collection algorithm

There have been some collection algorithms since the advent of automatic memory management, and different collectors are also combined in different scenarios.

Mark-Sweep (tag-clear): the recovery process is mainly divided into two stages, the first stage is the Tracing stage, that is, traversing the object graph from GC Root and marking (Mark) each object encountered, and the second stage is the Sweep stage, in which the collector checks every object in the heap and collects all the unmarked objects. Trichromatic abstraction (Tricolour Abstraction) and bitmap tagging (BitMap) are used to improve the efficiency of the algorithm in different implementations, and it is more efficient when there are more surviving objects.

Mark-Compact (marking-finishing): the main purpose of this algorithm is to solve the fragmentation problem that exists in non-mobile collectors. It is also divided into two stages, the first stage is similar to Mark-Sweep, and the second stage will sort the surviving objects according to the sorting order (Compaction Order). The main implementation includes double pointer (Two-Finger) recovery algorithm, sliding recovery (Lisp2) algorithm and lead finishing (Threaded Compaction) algorithm.

Copying (replication): the space is divided into two halves of From and To of the same size, and only one of them is used at the same time. Each time, the surviving objects of one half are transferred to the other half by replication. There are recursive algorithms (proposed by Robert R. Fenichel and Jerome C. Yochelson) and iterative algorithms (proposed by Cheney), as well as approximate priority search algorithms that solve the problems of recursive stack and cache rows. The replication algorithm can allocate memory quickly by colliding pointers, but it also has the disadvantage of low space utilization, and the cost of replication is higher when the surviving objects are relatively large.

If you look at the time-consuming actions of mark, sweep, compaction, and copying together, the relationship is roughly as follows:

Although both compaction and copying involve moving objects, depending on the algorithm, compaction may have to calculate the target address of the object first, then correct the pointer, and finally move the object. Copying can do these things together, so it can be faster. In addition, it is important to note that the cost of GC can not only look at the time-consuming Collector, but also look at Allocator.

If you can ensure that the memory is not fragmented, the allocation can be done in pointer bumping mode. You only need to move a pointer to complete the allocation, which is very fast. If there are fragments in memory, they have to be managed in ways such as freelist, and the allocation speed is usually slower.

Generational collector

ParNew: a multithreaded collector that uses a replication algorithm and works mainly in the Young area. The number of threads collected can be controlled by the-XX:ParallelGCThreads parameter. The whole process is STW and is often used in combination with CMS.

CMS: in order to obtain the shortest recovery pause time as the goal, using the "mark-clear" algorithm, garbage collection is divided into four big steps, in which the initial tag and relabel will be STW. Most of them are used on the server side of the Internet website or the Bamp S system. JDK9 is marked and JDK14 is deleted. For more information, please see JEP. (performance and response time are preferred)

Partition collector

G1: a server-side garbage collector used in multiprocessor and high-capacity memory environments to achieve high throughput while meeting the requirements of garbage collection pause time as much as possible.

ZGC: a low-latency garbage collector introduced in JDK11, which is suitable for memory management and collection of large memory and low-latency services. According to the SPECjbb 2015 benchmark, the maximum pause time is only 1.68ms under a 128G heap, which is much better than that of G1 and CMS.

Shenandoah: developed by a team at Red Hat, similar to the G1, the garbage collector is based on Region design, but does not require Remember Set or Card Table to record cross-Region references, and the pause time has nothing to do with heap size. The pause time is close to that of ZGC.

Common collector

At present, CMS and G1 collectors are most used, both of which have the concept of generation, and the main memory structure is as follows:

Other collectors

In addition, there are many collectors, such as Metronome, Stopless, Staccato, Chicken, Clover and other real-time collectors, Sapphire, Compressor, Pauseless and other concurrent copy / finishing collectors, Doligez-Leroy-Conthier and other tag finishing collectors.

Two core indicators of GC

Delay (Latency): can also be understood as the maximum pause time, that is, the longest STW in the garbage collection process, the shorter the better, to a certain extent, it is acceptable to increase the frequency, which is the main development direction of GC technology.

Throughput (Throughput): during the life cycle of the application system, because GC threads occupy the currently available CPU clock cycles of Mutator, throughput is the percentage of the total running time spent by Mutator. For example, if the system runs 100min,GC and takes 1 min, the system throughput is 99%. The collector with priority to throughput can accept a longer pause.

Summarize and summarize

At present, the systems of major Internet companies basically pursue low latency to avoid the loss of user experience caused by a long GC pause. The metrics need to be combined with the SLA of the application service, which is mainly judged by the following two points:

In short, the time of a pause does not exceed the throughput of the TP9999,GC served by the application is not less than 99.99%. For example, if the TP9999 of a service An is 80 ms and the average GC pause is 30ms, then the maximum pause time of the service should not exceed 80 ms,GC and the frequency should be controlled at more than 5 min.

If not, then tuning or more resources are needed for parallel redundancy. (you can stop and take a look at the gc.meantime minute level indicator on the monitoring platform. If it exceeds 6 ms, the throughput of the stand-alone GC will not reach 4 9s. )

Note: in addition to these two indicators, there are Footprint (resource size measurement), response speed and other indicators. The Internet, a real-time system, pursues low latency, while many embedded systems pursue Footprint.

There are a few key GC Cause to focus on:

System.gc (): manually triggers the GC operation.

CMS: some actions in the execution process of CMS GC, focusing on the two STW phases of CMS Initial Mark and CMS Final Remark.

Promotion Failure: the Old area does not have enough space for promoted objects in the Young area (even if the total available memory is large enough).

Concurrent Mode Failure: when CMS GC is running, the space reserved in the Old area is insufficient to allocate to new objects, and the collector will degenerate, seriously affecting the performance of GC. The following example is such a scenario.

GCLocker Initiated GC: if a thread needs to GC when executing in the critical section of JNI, GCLocker will prevent the occurrence of GC and prevent other threads from entering the critical section of JNI until a GC is triggered when the last thread exits the critical section.

At this point, I believe you have a deeper understanding of "what are the management and principles of CG?" you might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.