What is the basic principle of jvm garbage collection GC tuning? 07/12 Update SLTechnology News&Howtos

What is the basic principle of jvm garbage collection GC tuning?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces what is the basic principle of jvm garbage collection GC tuning related knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe you will get something after reading this article on the basic principles of jvm garbage collection GC tuning, let's take a look at it.

Description:

Capacity: performance, capability, system capacity; translated as "system capacity"; meaning hardware configuration.

GC tuning (Tuning Garbage Collection) is the same principle as other performance tuning. Beginners may be confused by more than 200 GC parameters, and then randomly adjust a few to try the results, or modify a few lines of code to test. In fact, you can make sure that your tuning is going in the right direction by following these steps:

List performance tuning metrics (State your performance goals)

Perform the test (Run tests)

Check results (Measure the results)

Compare with the goal (Compare the results with the goals)

If the target is not met, modify the configuration parameters and continue testing (go back to running tests)

The first step we need to do is to set clear GC performance metrics. There are three dimensions that are common to all performance monitoring and management:

Latency (delay)

Throughput (Throughput)

Capacity (system capacity)

Let's first explain the basic concepts and then demonstrate how to use these metrics. If you are familiar with concepts such as latency, throughput, and system capacity, you can skip this section.

Core concept (Core Concepts)

Let's first take a look at the assembly line of a factory. The workers assemble the ready-made components in the assembly line and assemble them into bicycles. Through field observation, we found that it takes 4 hours to assemble a bicycle from the component to the other end of the production line.

Continuing to observe, we also found that one bike has been assembled every minute since then, 24 hours a day, all the time. After simplifying the model and ignoring the maintenance window, it is concluded that the assembly line can assemble 60 bikes per hour.

Description: time window / window period, please compare with the window where tickets are sold at the station, it is a period of time when something is specified / restricted.

Through these two measurement methods, we can know the relevant performance information of the production line: delay and throughput:

Delay of production line: 4 hours

Throughput of the production line: 60 vehicles per hour

Note that the time unit in which latency is measured is based on specific needs-from nanoseconds (nanosecond) to thousands of years (millennia). The throughput of the system is the operation completed in each unit of time. Operations is generally something related to a particular system. In this case, the selected unit of time is hours, and the operation is the assembly of the bike.

Now that we have mastered the concepts of latency and throughput, let's actually tune the factory. Demand for bicycles has been stable for a period of time, with a four-hour delay in assembling bicycles on the production line, and throughput has been stable for several months: 60 vehicles per hour. Suppose a sales team suddenly skyrocketed and the demand for bicycles doubled. The number of bicycles that customers need every day is no longer 60 24 = 1440, but 21440 = 2880 / day. The boss is not satisfied with the production capacity of the factory and wants to make some adjustments to increase it.

It seems that it is easy for the general manager to make the right judgment, and the delays in the system cannot be dealt with-he is concerned about the total amount of bicycles produced every day. After reaching this conclusion, if the factory has sufficient funds, then immediate measures should be taken to improve throughput to increase production capacity.

We will soon see that this factory has two identical production lines. Each production line can assemble a finished bike in a minute. It is conceivable that the number of bicycles produced every day will double. Up to 2880 vehicles per day. It is important to note that there is no need to reduce the assembly time of the bike-it still takes 4 hours from start to finish.

Coincidentally, this performance optimization increases throughput and capacity at the same time. Generally speaking, we will first measure the current system performance, and then set new goals, only optimize some aspect of the system to meet the performance indicators.

A very important decision has been made here-to increase throughput rather than reduce latency. While increasing the throughput, we also need to increase the system capacity. Compared with the original situation, two assembly lines are now needed to produce the required bicycles. In this case, increasing the throughput of the system is not free and requires horizontal scaling to meet the increased throughput requirements.

When dealing with performance issues, you should consider that there is another seemingly unrelated solution. If the delay of the production line is reduced from 1 minute to 30 seconds, the throughput can also be doubled.

Either reduce latency, or the customer is very rich. There is a similar saying in software engineering that behind every performance problem, there are always two different solutions. More machines can be used, or efforts can be made to improve underperforming code.

Latency (delay)

Latency metrics for GC are determined by general latency requirements. Latency metrics are usually described as follows:

All transactions must be answered within 10 seconds

90% of order payment operations must be completed within 3 seconds.

Recommended products must be displayed in front of users within 100 ms

When dealing with such performance metrics, you need to ensure that the GC pause does not take up too much time during the transaction, otherwise it will not meet the target. The meaning of "not too much" needs to be determined on a case-by-case basis, taking into account other factors, such as external data source interaction time (round-trips), lock competition (lock contention), and other security points.

Assume that the performance requirement is that 90% of the transactions should be completed within 1000ms, and the maximum length of each transaction should not exceed 10 seconds. As a rule of thumb, it is assumed that the proportion of GC pause time cannot exceed 10%. In other words, 90% of GC pauses must end within 100ms, and there cannot be GC pauses that exceed 1000ms. For simplicity, we ignore the possibility of multiple GC pauses in the same transaction.

With formal requirements, the next step is to check the pause time. There are many tools available, which will be described in more detail in the following 6. GC tuning (tools), where we check the time of GC pauses by looking at the GC log. The relevant information is scattered in different log clips, see the following data:

*-06-04T13:34:16.974-0200: 2.578: [Full GC (Ergonomics) [PSYoungGen: 93677K-> 70109K (254976K)] [ParOldGen: 499597K-> 511230K (761856K)] 593275K-> 581339K (1016832K), [Metaspace: 2936K-> 2936K (1056768K)], 0.0713174 secs] [Times: user=0.21 sys=0.02, real=0.07 secs]

This indicates a GC pause, triggered at the moment of * *-06-04T13:34:16. Corresponds to 2578 ms after JVM starts.

This event pauses the application thread for 0.0713174 seconds. Although the total time spent is 210 ms, because it is a multicore CPU machine, the most important number is the total time that the application thread was paused, in this case the parallel GC is used, so the pause time is about 70ms. This time the pause time of GC is less than the threshold of 100ms to meet the demand.

Continue the analysis, extract the pause-related data from all GC logs, and summarize it to see if the requirements are met.

Throughput (Throughput)

Throughput and latency metrics are very different. Of course, both are based on general throughput requirements. General throughput requirements (Generic requirements for throughput) look something like this:

The solution must process 1 million orders per day

The solution must support 1000 logged-in users while performing an action within 5-10 seconds: a, B, or C

All customers are counted every week for no more than 6 hours, and the time window is between 12:00 every Sunday evening and 6 o'clock the next day.

As you can see, the throughput requirement is not for a single operation, but for how many operations the system must complete in a given time. Similar to latency requirements, GC tuning requires determining the total time consumed by the GC behavior. The amount of time that each system can accept varies, and in general, the total time taken by GC cannot exceed 10%.

Now suppose the requirement is to process 1000 transactions per minute. At the same time, the total GC pause time per minute cannot exceed 6 seconds (that is, 10%).

With formal requirements, the next step is to obtain relevant information. Still extracting data from the GC log, you can see information like this:

At this point, we are interested in user time (user) and system time (sys), rather than actual time (real). Here, the time we care about is 0.23s (user + sys = 0.21s + 0.02s), during which time the GC pause takes up cpu resources. Importantly, the system runs on a multicore machine and translates to an actual pause time (stop-the-world) of 0.0713174 seconds, which will be used in the following calculations.

Once the useful information has been extracted, all that is left to do is to count the total GC pause time per minute. See if the demand is met: the total pause time per minute must not exceed 6000 milliseconds (6 seconds).

Capacity (system capacity)

System capacity (Capacity) requirements are additional constraints on the hardware environment when throughput and latency targets are met. Most of these requirements come from computing resources or budget reasons. For example:

The system must be able to deploy to Android devices with less than 512 MB memory

The system must be deployed on an Amazon EC2 instance and the configuration must not exceed c3.xlarge (4-core 8GB).

Monthly Amazon EC2 bill must not exceed $12000

Therefore, the system capacity must be considered on the basis of meeting the latency and throughput requirements. It can be said that if there are unlimited computing resources to squander, then any latency and throughput metrics are not a problem, but the reality is that budget and other constraints limit the resources available.

Related examples

After introducing the three dimensions of performance tuning, let's do some practical work to achieve the GC performance target.

Take a look at the following code:

/ / imports skipped for brevitypublic class Producer implements Runnable {private static ScheduledExecutorService executorService = Executors.newScheduledThreadPool (2); private Deque deque; private int objectSize; private int queueSize; public Producer (int objectSize, int ttl) {this.deque = new ArrayDeque (); this.objectSize = objectSize; this.queueSize = ttl * 1000;} @ Override public void run () {for (int I = 0; I)

< 100; i++) { deque.add(new byte[objectSize]); if (deque.size() >

QueueSize) {deque.poll ();} public static void main (String [] args) throws InterruptedException {executorService.scheduleAtFixedRate (new Producer (1024 * 1024 / 1000, 5), 0100, TimeUnit.MILLISECONDS); executorService.scheduleAtFixedRate (new Producer (50 * 1024 * 1024 / 1000), 0100, TimeUnit.MILLISECONDS); TimeUnit.MINUTES.sleep (10) ExecutorService.shutdownNow ();}}

This program code submits two jobs (job) every 100ms. Each job simulates a specific life cycle: an object is created, then released at a predetermined time, and then regardless, GC automatically reclaims the occupied memory.

When running the sample program, turn on GC logging with the following JVM parameters:

-XX:+PrintGCDetails-XX:+PrintGCDateStamps-XX:+PrintGCTimeStamps

You should also add the JVM parameter-Xloggc to specify where the GC logs are stored, like this:

-Xloggc:C:\\ Producer_gc.log

You can see the behavior of GC in the log file, like this:

*-06-04T13:34:16.119-0200: 1.723: [GC (Allocation Failure) [PSYoungGen: 114016K-> 73191K (234496K)] 421540K-> 421269K (745984K), 0.0858176 secs] [Times: user=0.04 sys=0.06, real=0.09 secs] * *-06-04T13:34:16.738-0200: 2.342: [GC (Allocation Failure) [PSYoungGen: 234462K-> 93677K (254976K)] 582540K-> 593275K (7664K) 0.2357086 secs] [Times: user=0.11 sys=0.14, real=0.24 secs] * *-06-04T13:34:16.974-0200: 2.578: [Full GC (Ergonomics) [PSYoungGen: 93677K-> 70109K (254976K)] [ParOldGen: 499597K-> 511230K (761856K)] 593275K-> 581339K (1016832K), [Metaspace: 2936K-> 2936K (1056768K)], 0.0713174 secs] [Times: user=0.21 sys=0.02 Real=0.07 secs]

Based on the information in the log, you can improve performance through three optimization goals:

Ensure that in the worst case, the GC pause time does not exceed the predetermined threshold

Ensure that the total thread pause time does not exceed the predetermined threshold

Reduce hardware configuration and costs while ensuring that latency and throughput targets are met.

To this end, run the code for 10 minutes with three different configurations, and get three different results, summarized as follows:

Heap memory size (Heap) GC algorithm (GC Algorithm) effective time ratio (Useful work) maximum pause time (Longest pause)-Xmx12g-XX:+UseConcMarkSweepGC89.8%560 ms-Xmx12g-XX:+UseParallelGC91.5%1104 ms-Xmx8g-XX:+UseConcMarkSweepGC66.3%1610 ms

Using different GC algorithms and different memory configurations, run the same code to measure the relationship between GC pause time and latency and throughput. The details and results of the experiment are described in detail in later chapters.

Note that in order to keep it as simple as possible, only a few input parameters have been changed in the example, and this experiment is not tested under different CPU numbers or different heap layouts.

Tuning for Latency (tuning delay indicator)

Suppose there is a requirement that each job must be processed within 1000ms. We know that the actual job processing only needs 100 ms, and after simplification, the delay requirements for GC pauses can be calculated by subtracting the two. Now the demand becomes: GC pause cannot exceed 900ms. The answer to this question is easy, as long as you parse the GC log file and find out which pause time is the largest in the GC pause.

Let's take a look at the three configurations used in the test:

As you can see, one of the configurations meets the requirements. The running parameters are:

Java-Xmx12g-XX:+UseConcMarkSweepGC Producer

In the corresponding GC log, the maximum pause time is 560 ms, which meets the requirement of the latency target of 900 ms. If the throughput and system capacity requirements are still met, it can be said that the goal of GC tuning has been successfully achieved, and the tuning is over.

Tuning for Throughput (Throughput tuning)

Assume that the throughput metric is: 13 million operations are completed per hour. The same configuration as above, one of which meets the requirements:

The command line parameters for this configuration are:

Java-Xmx12g-XX:+UseParallelGC Producer

As you can see, GC takes up 8.5% of the CPU time, and the remaining 91.5% is the effective computing time. For simplicity, ignore the other security points in the example. Now you need to consider:

It takes time 100ms for each CPU core to process a job.

As a result, each core can perform 60000 operations in a minute (60000 operations per job)

A core can perform 3.6 million operations in an hour

With four CPU cores, you can perform: 4 x 3.6m = 14.4 million operations per hour

In theory, through a simple calculation, it can be concluded that the number of operations that can be performed per hour is: times to meet the demand.

It is worth mentioning that if the latency target is to be met, there will be a problem. In the worst case, the GC pause time is 1104 ms. The maximum delay time is twice as long as the previous configuration.

Tuning for Capacity (tuning system capacity)

Suppose you need to deploy the software to a server (commodity-class hardware) and configure it as a 4-core 10G. In this case, the system capacity requirement becomes: the maximum heap memory space cannot exceed 8GB. With this requirement, we need to adjust to the third set of configurations for testing:

The program can be executed with the following parameters:

Java-Xmx8g-XX:+UseConcMarkSweepGC Producer

The test result is a significant increase in latency and a significant reduction in throughput:

Now, GC takes up more CPU resources, and this configuration has only 66.3% of the valid CPU time. As a result, this configuration reduces throughput from 13176000 operations per hour at best to less than 9547200 operations per hour.

The worst-case delay becomes 1610 ms instead of 560ms.

Through the introduction of these three dimensions, you should understand that instead of simply "performance" optimization, you need to consider, measure, and tune latency and throughput from three different dimensions, as well as system capacity constraints.

This is the end of the article on "what are the fundamentals of jvm garbage collection GC tuning?" Thank you for reading! I believe you all have a certain understanding of "what is the basic principle of jvm garbage collection GC tuning". If you want to learn more, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.