How to realize GC Optimization in java Application 07/19 Update SLTechnology News&Howtos

How to realize GC Optimization in java Application

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to achieve GC optimization in java applications". In daily operation, I believe many people have doubts about how to achieve GC optimization in java applications. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "how to achieve GC optimization in java applications". Next, please follow the editor to study!

When the performance of Java programs falls short of the set goals, and other optimization methods have been exhausted, it is usually necessary to adjust the garbage collector to further improve performance, which is called GC optimization. However, the GC algorithm is complex, there are many parameters that affect the performance of GC, and the parameter adjustment depends on the respective characteristics of the application, these factors greatly increase the difficulty of GC optimization. Even so, GC tuning is not random, and there are still some general ways of thinking. This article will introduce these general GC optimization strategies and related practice cases, including the following contents: > preparation before optimization: briefly review the relevant knowledge of JVM and introduce some general strategies of GC optimization. > Optimization method: introduce the general process of tuning: define the optimization target, →, optimize, →, track the optimization results. > Optimization case: briefly describe the GC problems encountered by the author's team and the optimization solution.

First, the preparation before optimization GC optimization needs to know

In order to better understand what is introduced in this article, you need to know the following. 1. Basic knowledge of GC, including but not limited to: a) how GC works. B) understand the meaning of terms such as the new generation, the old age, promotion, etc. C) can read the GC log.

GC optimization can not solve all performance problems, it is the last resort of tuning.

If you are not familiar with the knowledge points mentioned in the first point, you can read the summary-JVM basic Review first; if you are already familiar with it, you can skip this section and read on.

JVM basic Review of JVM memory structure

A brief introduction to JVM memory structure and common garbage collectors.

The garbage collection of contemporary mainstream virtual machines (Hotspot VM) adopts the algorithm of "generational recycling". "generational recycling" is based on the fact that the life cycle of objects is different, so different recycling methods can be adopted for objects with different life cycles in order to improve the efficiency of recycling.

Hotspot VM divides memory into different physical areas, which is the embodiment of the idea of "generation". As shown in the figure, JVM memory is mainly composed of the new generation, the old age, and the permanent generation.

① new generation (Young Generation): most objects are created in the new generation, and many of them have a short life cycle. After each new generation of garbage collection (also known as Minor GC), only a small number of objects survive, so choose the replication algorithm, only a small amount of replication cost can be completed.

The Cenozoic is divided into three regions: one Eden region and two Survivor regions (generally speaking). Most of the objects are generated in the Eden region. When the Eden area is full, the surviving objects are copied to two Survivor zones (one of them). When this Survivor area is full, objects that survive in this area and do not meet the criteria for promotion will be copied to another Survivor area. Each time the object goes through Minor GC, the age is added by 1, and after reaching the "promotion age threshold", the object is put into the old age, a process also known as "promotion". Obviously, the size of the "promotion age threshold" directly affects the object's stay time in the new generation. In the Serial and ParNew GC collectors, the "promotion age threshold" is set by the parameter MaxTenuringThreshold, and the default value is 15.

② old age (Old Generation): objects that survive N garbage collections in the new generation will be put into the older generation, and the survival rate of objects in this area is high. In the old days, garbage collection (also known as Major GC) usually used a "mark-clean" or "mark-clean" algorithm. The whole heap, including the new generation and the old generation of garbage collection, is called Full GC (in HotSpot VM, with the exception of CMS, other GC that can collect the old age will collect the entire GC heap at the same time, including the new generation).

③ permanent Generation (Perm Generation): metadata is mainly stored in metadata, such as Class and Method, and has little to do with the Java objects to be reclaimed by garbage collection. Compared with the new generation and the elderly generation, the division of this area has less impact on garbage collection.

Common garbage collector

Different garbage collectors are suitable for different scenarios. Commonly used garbage collectors:

Serial (Serial) collector is a single-threaded collector, which is simple, easy to implement and efficient.

The ParNew collector is the multithreaded version of Serial, which can make full use of CPU resources and reduce the recovery time.

Throughput first (Parallel Scavenge) recycler, focusing on throughput control.

The concurrent tag cleanup (CMS,Concurrent Mark Sweep) collector is a kind of collector whose goal is to obtain the shortest recovery pause time, which is based on the "mark-clear" algorithm.

GC log

The log format of each collector is determined by its own implementation, in other words, the log format of each collector can be different. However, in order to facilitate users to read, the virtual machine designer keeps the logs of each collector in common. These commonalities are briefly described in the JavaGC log.

Parameter basic strategy

The size of each partition has a great impact on the performance of GC. How to adjust each partition to the appropriate size and analyze the size of active data is a good starting point.

The size of active data refers to the amount of space occupied by long-term living objects in the heap when the application is running stably, that is, the amount of space occupied by the old age in the heap after Full GC. It can be obtained from the old data size after Full GC in the GC log. The more accurate method is to obtain the GC data many times after the program is stable, and calculate the size of the active data by taking the average value. The proportional relationship between the active data and each partition is as follows (see reference 1):

Total size of space multiple 3-4 times the size of active data Cenozoic 1-1.5 active data size 2-3 times the size of active data permanent generation 1.2-1.5 times Full GC permanent generation space occupation

For example, if the active data size of the old age obtained from the GC log is 300m, then the partition size can be set to:

Total reactor: 1200MB = 300MB × 4 * Cenozoic: 450MB = 300MB × 1.5 * Old Age: 750MB = 1200MB-450MB *

This part of the setting is only the initial value of the heap size, which may be adjusted in subsequent optimizations, depending on the characteristics and requirements of the application.

II. Optimization steps

The general steps of GC optimization can be summarized as follows: determining objectives, optimizing parameters, and acceptance results.

Determine the goal

To make it clear that the system requirements of the application are the basis of performance optimization, the system requirements refer to some aspects of the application runtime requirements, such as:-high availability, availability up to several 9. -low latency, the number of milliseconds within which the request must complete the response. -High throughput, number of transactions completed per second.

It is important to identify system requirements because of possible conflicts between the above performance metrics. For example, in general, the cost of reducing latency is to reduce throughput or consume more memory, or both.

Since the author's team focuses on high availability and low latency, let's analyze how to quantify the impact of GC time and frequency on response time and availability. Through this quantitative indicator, we can calculate the impact of the current GC situation on services, and also evaluate the benefits of GC optimization on response time, which are important for low latency services.

For example: suppose a GC of persistent 25ms occurs in unit time T, the average response time of the API is 50ms, and the request arrives evenly, as shown in the following figure:

Before capacity expansion: the Cenozoic capacity is R, and assuming that the survival time of object An is 750ms Magi minor GC interval 500ms, then this Minor GC time = T1 (scanning Cenozoic R) + T2 (replicating object A to S).

After capacity expansion: the capacity of the new generation is 2R, and the life cycle of object An is 750ms, then the Minor GC interval is increased to 1000ms. At this time, Minor GC object An is no longer alive and does not need to be copied to the Survivor area. Then the GC time is 2 × T1 (scanning Cenozoic R), and there is no T2 replication time.

It can be seen that after the expansion, Minor GC increases T1 (scanning time), but saves T2 (replicated objects) time. More importantly, for virtual machines, the cost of replicating objects is much higher than the scanning cost. Therefore, the time of a single Minor GC depends more on the number of surviving objects after GC than on the size of the Eden area. Therefore, if there are a lot of short-term objects in the heap, then the expansion of the new generation, the single Minor GC time will not increase significantly. The following needs to confirm the lifecycle distribution of objects in the service:

After adjustment:

Optimize

Before solving the problem, review the four main phases of CMS, as well as the work of each phase. The following figure shows the objects that can be marked at each stage of CMS, distinguished by different colors. 1. Init-mark initial tag (STW), which performs reachability analysis and marks the objects to which GC ROOT can be directly associated, so it is very fast. 2. Concurrent-mark concurrent markup, starting from the green objects marked in the previous stage, and all reachable objects are marked in this stage. 3. Remark STW, pause all user threads, rescan objects in the heap, do reachability analysis, and mark living objects. Because the concurrent marking phase is a process that executes concurrently with the user thread, there may be a user thread that modifies the fields of some active objects to point to an unmarked object. For example, the red object in the following figure is not reachable at the beginning of the concurrent tag, but the reference changes during the parallel period to become object reachable. This stage needs to re-mark such objects to prevent them from being cleaned up in the next stage. This process also requires STW. In particular, it is important to note that this stage is based on the object in the new generation as the root to determine whether the object is alive or not. 4. Concurrent cleaning, concurrent garbage cleaning.

If only the object in the old age is scanned, that is, taking the object in the old age as the root, to determine whether the object has a reference. In the figure above, because the reference exists in the Cenozoic generation, object A will not be corrected and marked as reachable in the Remark phase, and will be reclaimed by error when GC. The new generation of objects hold references to objects from the old era, which is called "intergenerational reference". Because of its existence, the Remark phase must scan the entire heap to determine whether the object is alive, including the gray unreachable objects in the figure.

Grey objects are unreachable, but still need to be scanned: the new generation of GC and the old GC are carried out separately, and only when Minor GC will use the root search algorithm to mark whether the new generation of objects are reachable, that is to say, although some objects are already unreachable, they will not be marked as unreachable before the occurrence of Minor GC, and CMS can not identify which objects survive, but can only scan the whole heap (Cenozoic + old age). Thus it can be seen that the number of objects in the heap affects the Remark phase time. The same rule can be obtained by analyzing GC logs. When Remark time is more than 500ms, the utilization rate of the new generation is more than 75%. This reduces the time-consuming problem of the Remark phase and translates into how to reduce the number of new generation objects.

Objects in the new generation are characterized by "dying", so that if a Minor GC is executed before Remark, most of the objects will be recycled. CMS takes this approach by adding an interruptible concurrent pre-cleanup (CMS-concurrent-abortable-preclean) before Remark. The main task of this phase is still to concurrently mark whether the object is alive, but the process can be interrupted. This phase starts when more than 2m is used in the Eden area. Of course, 2m is the default threshold and can be modified by parameters. If you wait for Minor GC when this phase is executed, the above grey objects will be recycled and there will be fewer objects to be scanned in the Reamark phase.

In addition, CMS provides a parameter CMSMaxAbortablePrecleanTime, which defaults to 5s, to avoid infinite waiting for this phase without waiting for Minor GC, which means that if interruptible pre-cleaning is executed for more than 5s, regardless of whether Minor GC occurs or not, this phase will be aborted and Remark will be entered. According to the red flag 2 of the GC log, interruptible concurrent pre-cleaning takes 5.35s, which exceeds the set 5s to be interrupted without waiting for Minor GC, so there are still many objects in the Cenozoic generation at the time of Remark.

In this case, CMS provides the CMSScavengeBeforeRemark parameter to ensure that a Minor GC is forced before Remark.

Optimization result

After increasing the CMSScavengeBeforeRemark parameter, the GC pause with a single execution time > 200ms disappears. From the monitoring, the GCtime is consistent with the business fluctuation, and there is no obvious burr.

The specific strategy of the card meter is to divide the space of the old era into several cards (card) with a size of 512B. The card table itself is a single-byte array, and each element in the array corresponds to a card. when the old reference Cenozoic generation occurs, the virtual machine sets the card table element to the appropriate value. As shown in the figure above, card table 3 is marked as dirty (the card table has another function to identify which blocks have been modified during the concurrent marking phase), and then Minor GC can quickly identify which cards have references from the old age to the new generation by scanning the card table. In this way, the virtual machine avoids full-heap scanning by exchanging space for time.

To sum up, CMS is designed to focus on achieving the shortest latency, so it spares no effort to do a lot of work, including making the application and GC threads concurrent as much as possible, increasing the interruptible concurrency pre-cleaning phase, introducing card tables, etc., although these operations sacrifice a certain amount of throughput but get a shorter recovery pause time.

Case 3 the GC of the Stop-The-World determines the goal

The GC log is shown in the following figure (in the GC log, Full GC is used to describe the pause type of garbage collection, which represents the GC of STW type, but not the old GC). According to the GC log, this Full GC takes 1.23s. This online service also requires low latency and high availability. The goal of this optimization is to reduce the pause time of a single STW recovery and improve availability.

Optimize

First of all, when might STW's Full GC be triggered? 1. Insufficient Perm space; 2. Promotion failed and concurrent mode failure during CMS GC (concurrent mode failure usually occurs because CMS is in progress, but due to lack of space in the old years, objects that are no longer used need to be recycled as soon as possible, then all threads are stopped, CMS is terminated, and Serial Old GC is carried out directly. Statistics show that the average size of Young GC promoted to the old years is larger than the remaining space in the old years; 4. Actively trigger Full GC (execute jmap-histo:live [pid]) to avoid fragmentation issues.

Then, let's analyze one by one:-exclude reason 2: if it is two cases of reason 2, there will be a special identification in the log, which is not available at present. -exclusion reason 3: according to the GC log, the usage in the old era was only 20%, and there were no large objects greater than 2G. -exclusion reason 4: because there were no relevant orders executed at that time. -reason 1: after the Full GC is found in the log, the Perm area becomes larger, which is inferred to be caused by insufficient permanent generation space and capacity expansion.

After finding the reason, there are two ways to solve it: 1. By setting the-XX:PermSize parameter to the same as-XX:MaxPermSize, it forces the virtual machine to fix the capacity of the permanent generation at startup to avoid automatic expansion at runtime. 2. CMS does not recycle Perm zone by default. Parameters CMSPermGenSweepingEnabled and CMSClassUnloadingEnabled allow CMS to reclaim Perm zone when its capacity is insufficient.

Since the service does not generate a large number of dynamic classes, there is little benefit from recovering the Perm area, so we adopt solution 1 to fix the size of the Perm area when starting to avoid dynamic expansion.

Optimization result

After adjusting the parameters, the service will no longer have STW GC caused by the expansion of Perm area.

Summary

For services with high performance requirements, it is recommended to make MaxPermSize and MinPermSize consistent (starting with JDK8, the Perm area disappears completely and metaspace is used instead. While metaspace is directly stored in memory, not in JVM), Xms and Xmx are also set to the same, which can reduce the performance loss caused by automatic memory expansion and contraction. When the virtual machine starts, all the memory set in the parameters will be made private. Even if a part of the memory will not be used by the user code before the expansion, this part of the memory will be identified as virtual memory in the virtual machine and will not be used by other processes.

IV. Summary

Combined with the above GC optimization cases to make a summary: 1. First of all, once again, before performing GC optimization, you need to make sure that there is no room for optimization in the architecture and code of the project. We can't expect an application with flawed system architecture or endless code level optimization to achieve a qualitative leap in performance through GC optimization. two。 Secondly, through the above analysis, we can see that there are many optimizations within the virtual machine to ensure the stable operation of the application, so do not tune for the sake of tuning, improper tuning may be counterproductive. 3. Finally, GC optimization is a systematic and complex task, and there is no omnipotent tuning strategy that can meet all performance indicators. GC optimization must be based on our in-depth understanding of various garbage collectors in order to get twice the result with half the effort.

All the cases in this paper come from the practical experience of the docking service of Beijing Business Security Center (also known as risk control). At the same time, I would like to thank the friends of risk control for their professional and responsible review to make this article more perfect. For the content involved in this article, you are welcome to correct and supplement.

At this point, the study on "how to achieve GC optimization in java applications" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.