What is the garbage collection optimization method for Java applications? 04/22 Update SLTechnology News&Howtos

What is the garbage collection optimization method for Java applications?

2025-04-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the garbage collection optimization method of Java application". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the garbage collection optimization method of Java application".

High-performance applications constitute the backbone of modern networks. LinkedIn has many internal high-throughput services to meet thousands of user requests per second. To optimize the user experience, it is important to respond to these requests with low latency.

For example, one function that users often use is to understand dynamic information-a constantly updated list of professional activities and content. Dynamic information can be found everywhere in LinkedIn, including the company page, the school page and the most important home page. The basic dynamic information data platform indexes the updates of various entities in our economic graph (members, companies, groups, etc.). It must achieve relevant updates with high throughput and low latency.

Figure 1 LinkedIn dynamic information

These Java applications with high throughput and low latency are transformed into products, and developers must ensure consistent performance at every stage of the application development cycle. Determining the settings for optimizing garbage collection (Garbage Collection,GC) is critical to achieving these indicators.

This article takes a series of steps to identify requirements and optimize GC, targeting developers who are interested in using a system approach to optimize GC in order to achieve high throughput and low latency. The method in this paper comes from the process of building the next generation dynamic information data platform by LinkedIn. These methods include, but are not limited to, the CPU and memory overhead of concurrent tag cleanup (Concurrent Mark Sweep,CMS) and the G1 garbage collector, avoiding persistent GC cycles caused by long-lived objects, optimizing GC thread task allocation to improve performance, and OS settings required for predictable GC pause times.

The right time to optimize GC?

GC operation changes with code-level optimization and workload. Therefore, it is important to adjust GC on a near-complete code base that has implemented performance optimization. But it is also necessary to do initial analysis on the end-to-end basic prototype, which uses stub code and simulates workloads that represent the product environment. This captures the true boundaries of latency and throughput of the architecture, which in turn determines whether to scale up or out.

In the prototype phase of the next generation dynamic information data platform, almost all the end-to-end functions are implemented, and the query load served by the current product infrastructure is simulated. From this, we have obtained a variety of workload characteristics used to measure application performance and GC characteristics when running for a long time enough.

Steps to optimize GC

The following are the overall steps to optimize GC to meet the requirements of high throughput and low latency. It also includes the specific details of the prototype implementation in the dynamic information data platform. You can see that there is performance in ParNew/CMS, but we also experimented with the G1 garbage collector.

1. Understand the basics of GC

It is important to understand how GC works because there are a large number of parameters to adjust. Oracle's Hotspot JVM memory management white paper is a great way to start learning about the Hotspot JVM GC algorithm. To learn about the G1 garbage collector, please check out this paper.

two。 Carefully consider GC requirements

In order to reduce the GC overhead of application performance, some features of GC can be optimized. GC characteristics such as throughput and latency should be tested and observed over a long period of time to ensure that the characteristic data comes from multiple GC cycles where the number of objects processed by the application changes.

The Stop-the-world collector pauses the application thread when it collects garbage. The duration and frequency of the pause should not adversely affect the application's compliance with SLA.

The concurrent GC algorithm competes with the application thread for the CPU cycle. This overhead should not affect application throughput.

The uncompressed GC algorithm will cause heap fragmentation and cause full GC Stop-the-world to pause for a long time.

Garbage collection takes up memory. Some GC algorithms result in higher memory footprint. If the application requires a large heap space, make sure that the memory overhead of the GC is not too large.

A clear understanding of the GC log and common JVM parameters is necessary to simply adjust the GC operation. The operation of GC changes as code complexity increases or working characteristics change.

We started the experiment using Linux OS's Hotspot Java7u51,32GB heap memory, 6GB young generation and-XX: CMS initiating Occupy GC with a value of 70 (its space occupancy rate when the old CMS was triggered). Set up a larger heap memory to maintain the object cache of long-lived objects. Once this cache is populated, the proportion of objects promoted to the old age decreases significantly.

With the initial GC configuration, 80ms's next-generation GC quiesced every three seconds, with more than 99.9 percent of applications lagging 100ms. Such a GC is likely to be suitable for many applications where SLA is not too strict about latency. However, our goal is to reduce the latency of 99.9% applications as much as possible, for which GC optimization is essential.

3. Understand GC metrics

Measure it before you optimize it. Learn the details of the GC log (using these options:-XX:+PrintGCDetails-XX:+PrintGCTimeStamps-XX:+PrintGCDateStamps-XX:+PrintTenuringDistribution-XX:+PrintGCApplicationStoppedTime) to get an overall picture of the GC characteristics of the application.

LinkedIn's internal control and reporting systems, inGraphs and Naarad, generate a variety of useful visualization graphs of indicators, such as the percentage of GC pause time, the duration of a pause, and the GC frequency over a long period of time. In addition to Naarad, there are many open source tools such as gclogviewer that can create visual graphics from GC logs.

At this stage, it is necessary to determine whether the GC frequency and pause duration affect the ability of the application to meet deferred requirements.

4. Reduce GC frequency

In the generational GC algorithm, the recovery frequency can be reduced by: (1) reducing the object allocation / promotion rate; (2) increasing the size of generation space.

In Hotspot JVM, the pause time of the new generation GC depends on the number of objects after a garbage collection, not on the size of the new generation itself. The impact of increasing Cenozoic size on application performance needs to be carefully assessed:

If more data survives and is replicated to the survivor area, or if more data is upgraded to the old age with each garbage collection, increasing the size of the Cenozoic may lead to a longer Cenozoic GC standstill.

On the other hand, if the number of surviving objects does not increase significantly after each garbage collection, the pause time may not be extended. In this case, reducing the GC frequency may result in lower overall application latency and / or increased throughput.

For applications that are mostly short-term survival objects, only the parameters mentioned above need to be controlled. For applications that create long-lived objects, it is important to note that promoted objects may not be recycled by the old GC cycle for a long time. If the old GC trigger threshold (percentage of space occupancy in the old age) is relatively low, the application will fall into a continuous GC cycle. This problem can be avoided by setting a high GC trigger threshold.

Because our application maintains a large cache of long-lived objects in the heap, the old GC trigger threshold is set to-XX:CMSInitiatingOccupancyFraction=92-XX:+UseCMSInitiatingOccupancyOnly. We also tried to increase the size of the new generation to reduce the frequency of the new generation of recycling, but did not use it because it increased the application latency.

5. Reduce GC pause time

Reducing the size of the new generation can shorten the pause time of the new generation GC because less data is replicated to the survivor area or promoted. However, as mentioned earlier, we will look at the impact of reduced Cenozoic size and the resulting increase in GC frequency on overall application throughput and latency. The pause time of the new generation GC also depends on the tenuring threshold (lifting threshold) and space size (see step 6).

Use CMS to try to minimize heap fragmentation and the associated old garbage collection full GC pause time. The old GC is triggered at a low threshold by controlling the object to increase the proportion and decrease the value of-XX:CMSInitiatingOccupancyFraction. For detailed adjustments of all options and their related tradeoffs, check out Web Services's Java garbage collection and Java garbage collection essentials.

We observed that most of the new generation in the Eden area was recycled, and almost no objects died in the survivor area, so we reduced the tenuring threshold from 8 to 2 (use option:-XX:MaxTenuringThreshold=2) in order to shorten the time spent on data replication by the new generation garbage collection.

We also note that the standstill time of the new generation of recycling increases with the increase in the space occupancy rate of the old era. This means that the pressure from the old age makes it take more time for the object to ascend. To solve this problem, increase the total heap memory size to 40GB, decrease the value of-XX:CMSInitiatingOccupancyFraction to 80, and start recycling in old times faster. Although the value of-XX:CMSInitiatingOccupancyFraction is reduced, increasing heap memory can avoid the constant old GC. At this stage, we have achieved a 70ms Cenozoic recovery pause and 99.9 percent delayed 80ms.

6. Optimizing the task allocation of GC worker threads

To further shorten the pausing time of the new generation, we decided to explore the option of optimizing the task binding to the GC thread.

The-XX:ParGCCardsPerStrideChunk option controls the task granularity of the GC worker thread and can help achieve * performance without using a patch, which is used to optimize the card table scanning time of the new generation of garbage collection. What is interesting is that the time of the new generation of GC extends with the increase of space in the old age. Set this option to 32678 to reduce the pausing time of Cenozoic recycling to the average 50ms. At this time, 99.9% of the delay 60ms is applied.

There are other options to map tasks to GC threads, and if OS allows, the-XX:+BindGCTaskThreadsToCPUs option binds GC threads to individual CPU cores. -XX:+UseGCTaskAffinity uses the affinity parameter to assign tasks to the GC worker thread. However, our application does not find any benefit from these options. In fact, some surveys show that these options do not work on Linux systems.

7. Understand the CPU and memory overhead of GC

Concurrent GC usually increases the use of CPU. We observed the well-functioning CMS default settings, and the increase in CPU usage caused by the co-operation of GC and G1 garbage collector significantly reduced the application's vomiting volume and latency. G1 may take up more memory overhead of the application than CMS. For low-throughput, non-compute-intensive applications, GC's high CPU utilization may not need to be worried.

Figure 2 percentage of CPU usage of ParNew/CMS and G1: nodes with relatively significant changes in CPU usage use G1

Option-XX:G1RSetUpdatingPauseTimePercent=20

Figure 3 requests per second for ParNew/CMS and G1: G1 is used by nodes with lower throughput

Option-XX:G1RSetUpdatingPauseTimePercent=20

8. Optimize system memory and Istroke O management for GC

Generally speaking, GC pauses occur in (1) low user time, high system time and high clock time, and (2) low user time, low system time and high clock time. This means that the underlying process / OS settings are stored in the problem. Case (1) may indicate that Linux stole pages from JVM, and case (2) may indicate that Linux starts the GC thread when the disk cache is cleared, and the thread gets stuck in the kernel while waiting for IWeiO. You can refer to the PPT for how to set parameters in these cases.

To avoid runtime performance loss, use the JVM option-XX:+AlwaysPreTouch to access and zero pages when launching the application. Set vm.swappiness to zero, unless OS does not exchange pages when absolutely necessary.

Maybe you will use mlock to pin JVM pages in memory so that OS does not swap out pages. However, if the system runs out of memory and swap space, OS reclaims memory through the kill process. Typically, the Linux kernel chooses processes that have a high resident memory footprint but do not run for a long time (workflow of the killing process in the case of OOM). For us, this process is likely to be our application. It is better for a service to be gracefully degraded (moderately degraded), and a sudden failure of the service indicates poor maneuverability-so instead of using mlock, we use vm.swappiness to avoid possible exchange penalties.

GC Optimization of LinkedIn dynamic Information data platform

For this platform prototype system, we use two algorithms of Hotspot JVM to optimize garbage collection:

The new generation of garbage collection uses ParNew, and the old generation of garbage collection uses CMS.

G1 is used in the new and old generations. G1 is used to solve the problem of stable and predictable pause time of less than half a second when the heap size is 6GB or larger. In the course of our experiment with G1, although we adjusted various parameters, we did not get the predictable value of GC performance or pause time like ParNew/CMS. We queried a bug [3] related to memory leaks using G1, but could not determine the root cause.

With ParNew/CMS, apply a Cenozoic pause of 40-60ms every three seconds and one CMS cycle per hour. The JVM options are as follows:

/ / JVM sizing options-server-Xms40g-Xmx40g-XX:MaxDirectMemorySize=4096m-XX:PermSize=256m-XX:MaxPermSize=256m / / Young generation options-XX:NewSize=6g-XX:MaxNewSize=6g-XX:+UseParNewGC-XX:MaxTenuringThreshold=2-XX:SurvivorRatio=8-XX:+UnlockDiagnosticVMOptions-XX:ParGCCardsPerStrideChunk=32768 / / Old generation options-XX:+UseConcMarkSweepGC-XX:CMSParallelRemarkEnabled-XX:+ParallelRefProcEnabled-XX:+CMSClassUnloadingEnabled-XX:CMSInitiatingOccupancyFraction=80-XX:+UseCMSInitiatingOccupancyOnly / / Other options-XX:+AlwaysPreTouch-XX:+PrintGCDetails-XX:+PrintGCTimeStamps-XX:+PrintGCDateStamps-XX:+PrintTenuringDistribution-XX:+PrintGCApplicationStoppedTime-XX:-OmitStackTraceInFastThrow

With these options, the application latency of 99.9 percent is reduced to 60ms for thousands of read requests.

Thank you for your reading, the above is the content of "what is the garbage collection optimization method of Java application". After the study of this article, I believe you have a deeper understanding of what the garbage collection optimization method of Java application is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.