How to explore and practice the new generation of garbage collector ZGC 07/08 Update SLTechnology News&Howtos

How to explore and practice the new generation of garbage collector ZGC

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the exploration and practice of the new generation of garbage collector ZGC, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

ZGC (The Z Garbage Collector) is a low-latency garbage collector introduced in JDK 11. Its design goals include:

Pause time does not exceed 10ms

The pause time does not increase with the size of the heap or the size of the active object

Support for 8MB~4TB-level heaps (future support for 16TB).

From the design goal, we know that ZGC is suitable for memory management and recycling of large memory and low latency services. The following mainly introduces the application and excellent performance of ZGC in low latency scenarios.

The pain of GC

The system availability of many low-latency and high-availability Java services is often plagued by GC pauses. GC pause refers to STW (Stop The World) during garbage collection. When STW occurs, all application threads stop activity and wait for the GC pause to end. Take Meituan's risk control service as an example, some upstream businesses require results to be returned in the risk control service 65ms, and the availability should reach 99.99%. However, because of the GC standstill, we failed to achieve the above usability goals. At that time, the CMS garbage collector was used, a single Young GC 40ms, 10 times a minute, and the average response time of the interface was 30ms. According to the calculation, the response time of requests with (40ms + 30ms) * 10 / 60000ms = 1.12% will increase from 0 to 40ms, and the response time of requests with 30ms * 10 / 60000ms = 0.5% will increase 40ms. It can be seen that the GC pause has a great impact on the response time. In order to reduce the impact of GC pause on system availability, we tuned it from the point of view of reducing single GC time and GC frequency, and tested the G1 garbage collector, but these three measures failed to reduce the impact of GC on service availability.

CMS and G1 pause time bottleneck

Before introducing ZGC, let's review the GC process of CMS and G1 and the bottleneck of pause time. CMS's new generation of Young GC, G1 and ZGC are all based on the tag-copy algorithm, but the different implementation of the algorithm leads to huge differences in performance.

The tag-copy algorithm is applied to the new generation of CMS (ParNew is the default new generation garbage collector for CMS) and G1 garbage collector. The tag-copy algorithm can be divided into three phases:

The marking phase, that is, starting with the GC Roots collection, marking active objects

Transfer phase, that is, copy the active object to the new memory address

In the relocation phase, because the transfer causes the address of the object to change, in the relocation phase, all pointers to the old address of the object are adjusted to the new address of the object.

Taking G1 as an example, the main bottleneck of G1 pause time is analyzed through the process of mark-copy algorithm in G1 (which is used by both Young GC and Mixed GC of G1). The G1 garbage collection cycle is shown in the following figure:

The mixed recovery process of G1 can be divided into three stages: marking stage, cleaning stage and replication stage.

Marking stage pause analysis

Initial marking phase: the initial marking phase refers to the process of marking all direct child nodes from GC Roots, which is STW. Due to the small number of GC Roots, this phase usually takes a very short time.

Concurrent marking phase: the concurrent marking phase refers to the reachability analysis of the objects in the heap starting from GC Roots to find out the living objects. This phase is concurrent, that is, application threads and GC threads can be active at the same time. Concurrent tags take a lot of time, but because it's not STW, we don't care much about how long this phase takes.

Relabeling phase: relabeling objects that have changed during the concurrent tagging phase. This phase is STW.

Pause analysis in clean-up phase

The cleanup phase counts the partitions with living objects and the partitions without living objects. During this phase, garbage objects are not cleaned and replication of living objects is not performed. This phase is STW.

Pause analysis of replication phase

The transfer phase in the replication algorithm requires the allocation of new memory and member variables of the replication object. The transfer phase is STW, in which memory allocation usually takes a very short time, but the replication of object member variables may take a long time, because the replication time is proportional to the number of surviving objects and object complexity. The more complex the object, the longer it takes to copy.

Of the four STW procedures, the initial tag takes less time because it marks only GC Roots. Re-tagging because the number of objects is small, and it takes less time. The cleanup phase is less time-consuming because of the small number of memory partitions. The transfer phase takes a long time to process all living objects. Therefore, the bottleneck of G1 pause time is the transfer phase STW in marker-replication. Why can't the transfer phase be executed concurrently as the marking phase? The main reason is that G1 fails to solve the problem of accurately locating the address of the object in the process of transfer.

The Young GC of G1 and the Young GC of CMS, which mark-copy the whole process STW, are not described in detail here.

Fully concurrent ZGC based on ZGC principle

Like ParNew and G1 in CMS, ZGC also uses the mark-copy algorithm, but ZGC has made significant improvements to this algorithm: ZGC is almost all concurrent in the marking, transfer, and relocation phases, which is the key reason why the pause time of ZGC is less than the 10ms goal.

The ZGC garbage collection cycle is shown in the following figure:

ZGC has only three STW phases: initial tagging, re-tagging, and initial transfer. Among them, both the initial marking and the initial transfer only need to scan all the GC Roots, and the processing time is proportional to the number of GC Roots, which is generally very short; the STW time in the re-marking phase is very short, with the most 1ms, and if it exceeds 1ms, it will enter the concurrent marking phase again. That is, almost all pauses in ZGC depend only on the GC Roots collection size, and the pause time does not increase with the size of the heap or the size of the active object. Compared with ZGC, the transfer phase of G1 is completely STW, and the pause time increases with the increase of the size of the living object.

Key technologies of ZGC

Through the techniques of colored pointer and read barrier, ZGC solves the problem of accurate access to objects in the process of transfer, and realizes concurrent transfer. The general principle is described as follows: "concurrency" in concurrency transfer means that the GC thread is constantly accessing the object while the GC thread is transferring the object. Assuming that the object is transferred, but the object address is not updated in time, the application thread may access the old address, resulting in an error. In ZGC, the application thread will trigger the "read barrier". If it is found that the object has been moved, the "read barrier" will update the read pointer to the new address of the object, so that the application thread always accesses the new address of the object. So how does JVM tell that an object has been moved? Is to use the address of the object reference, that is, the shading pointer. The following describes the technical details of shading pointers and read barriers.

Shading pointer

Shaded pointers are a technique for storing information in pointers.

ZGC supports only 64-bit systems and divides the 64-bit virtual address space into multiple subspaces, as shown in the following figure:

Among them, [0~4TB) corresponds to the Java heap, [4TB ~ 8TB) is called M0 address space, [8TB ~ 12TB) is called M1 address space, [12TB ~ 16TB) is reserved unused, and [16TB ~ 20TB) is called Remapped space.

When an application creates an object, it first requests a virtual address in the heap space, but the virtual address does not map to the real physical address. ZGC also requests a virtual address for the object in the M0, M1 and Remapped address spaces, and the three virtual addresses correspond to the same physical address, but only one of the three spaces is valid at the same time. ZGC sets up three virtual address spaces because it uses the idea of "space for time" to reduce GC pause time. The space in "space for time" is a virtual space, not a real physical space. The switching process of these three spaces will be described in detail in the following chapters.

Corresponding to the above address space partition, ZGC actually uses only bits 0-41 of the 64-bit address space, while bits 42-45 store metadata, and bits 47-63 are fixed at 0.

ZGC stores object survival information in 42-45 bits, which is completely different from traditional garbage collection and putting object survival information in the object header.

Reading barrier

Read barrier is a technique used by JVM to insert a small piece of code into the application code. This code is executed when the application thread reads the object reference from the heap. It is important to note that only "read object references from the heap" will trigger this code.

Example of read barrier:

Object o = obj.FieldA / / to read a reference from the heap, you need to add a barrier Object p = o / do not need to add a barrier, because you are not reading a reference from the heap o.dosomething () / / you do not need to add a barrier, because you are not reading a reference from the heap I = obj.FieldB / / you do not need to add a barrier, because it is not an object reference

The code function of the read barrier in ZGC: in the process of object marking and transfer, it is used to determine whether the reference address of the object satisfies the condition, and takes the corresponding action.

ZGC concurrent processing demonstration

Next, the switching process of address view in a garbage collection cycle of ZGC is described in detail:

Initialization: after ZGC initialization, the address view of the entire memory space is set to Remapped. The program runs normally, allocates objects in memory, starts garbage collection after meeting certain conditions, and enters the marking stage at this time.

Concurrent marking phase: the view is M0 when entering the marking phase for the first time. If the object has been accessed by the GC markup thread or the application thread, adjust the address view of the object from Remapped to M0. So, at the end of the marking phase, the address of the object is either M0 view or Remapped. If the address of the object is M0 view, the object is active; if the address of the object is Remapped view, the object is inactive.

Concurrent transfer phase: after the marking ends, the transfer phase is entered, and the address view is set to Remapped again. If the object has been accessed by the GC transfer thread or the application thread, adjust the address view of the object from M0 to Remapped.

In fact, there are two address views M0 and M1 during the marking phase, and the above process shows that only one address view is used. The two are designed to distinguish between the previous tag and the current tag. That is, after entering the concurrent marking phase for the second time, the address view is adjusted to M1 instead of M0.

Colored pointer and read barrier technology are used not only in the concurrent transfer phase, but also in the concurrent marking phase: if the object is set to marked, the traditional garbage collector needs to make a memory access and put the object survival information in the object header; in ZGC, it only needs to set the 42 ~ 45 bits of the pointer address, and because it is a register access, it is faster than accessing memory.

ZGC tuning practice

ZGC is not a "silver bullet" and needs to be tuned according to the specific characteristics of the service. The actual combat experience can be searched on the network is less, tuning theory needs to explore on its own, we also spent a lot of time at this stage, and finally achieved the ideal performance. One of the purposes of this article is to list some common problems when using ZGC to help you use ZGC to improve service availability.

Tuning basics

Understand important configuration parameters of ZGC

Take the ZGC parameter configuration of our service in the production environment as an example to illustrate the role of each parameter:

Sample configuration of important parameters:

-Xms10G-Xmx10G-XX:ReservedCodeCacheSize=256m-XX:InitialCodeCacheSize=256m-XX:+UnlockExperimentalVMOptions-XX:+UseZGC-XX:ConcGCThreads=2-XX:ParallelGCThreads=6-XX:ZCollectionInterval=120-XX:ZAllocationSpikeTolerance=5-XX:+UnlockDiagnosticVMOptions-XX:-ZProactive-Xlog:safepoint,classhisto*=trace,age*,gc*=info:file=/opt/logs/logs/gc-%t.log:time,tid,tags:filecount=5,filesize=50m

-Xms-Xmx: the maximum and minimum memory of the heap is set to 10g here, and the heap memory of the program will remain unchanged at 10g. -XX:ReservedCodeCacheSize-XX:InitialCodeCacheSize: set the size of CodeCache. All the code compiled by JIT is placed in CodeCache. A general service of 64m or 128m is sufficient. Our service has a certain particularity, so the setting is larger, which will be described in detail later. -XX:+UnlockExperimentalVMOptions-XX:+UseZGC: enable the configuration of ZGC. -XX:ConcGCThreads: thread that collects garbage concurrently. The default is 12.5% of the total number of cores. The default for CPU is 1. After scaling up, the GC becomes faster, but it will occupy the CPU resources when the program is running, and the throughput will be affected. -the number of threads used in the XX:ParallelGCThreads:STW phase, which defaults to 60% of the total number of cores. -the minimum time interval between the occurrence of XX:ZCollectionInterval:ZGC (in seconds). -XX:ZAllocationSpikeTolerance:ZGC triggers the correction coefficient of the adaptive algorithm. The default is 2. The higher the value, the earlier the ZGC is triggered. -XX:+UnlockDiagnosticVMOptions-XX:-ZProactive: whether to enable active recycling is enabled by default. The configuration here means to disable it. -Xlog: sets the content, format, location, and size of each log in the GC log.

Understand the timing of ZGC trigger

Compared with the GC trigger mechanism of CMS and G1, the GC trigger mechanism of ZGC is very different. The core feature of ZGC is concurrency, and new objects are always generated in the process of GC. How to ensure that the newly generated objects will not fill the heap before the completion of the GC is the first goal of ZGC parameter tuning. Because in ZGC, when there is no time for garbage collection to fill the heap, it can cause running threads to pause, which can last as long as seconds.

ZGC has a variety of GC trigger mechanisms, which are summarized as follows:

Blocking memory allocation request triggers: when it is too late to collect the garbage and the garbage will fill the heap, it will cause some threads to block. We should avoid this kind of trigger. The key word in the log is "Allocation Stall".

Adaptive algorithm based on allocation rate: the most important GC trigger method. The principle of the algorithm can be described simply as "ZGC calculates the next GC when the memory occupancy reaches a threshold according to the recent object allocation rate and GC time." The detailed theory of adaptive algorithm can be found in Peng Chenghan's book ZGC Design and implementation of a New Generation of garbage Collector. The threshold is controlled by the ZAllocationSpikeTolerance parameter, which defaults to 2. The higher the value, the earlier the GC will be triggered. We solved some problems by adjusting this parameter. The key word in the log is "Allocation Rate".

Based on fixed time interval: through ZCollectionInterval control, it is suitable to deal with sudden increase in traffic scenarios. When the traffic changes smoothly, the adaptive algorithm may trigger GC only when the heap utilization reaches more than 95%. When the traffic increases suddenly, the adaptive algorithm may trigger too late, resulting in some threads blocking. We adjust this parameter to solve the problem of traffic surge scenarios, such as scheduled activities, seconds kill and other scenarios. The key word in the log is "Timer".

Active trigger rule: similar to the fixed interval rule, but the time interval is not fixed, it is the time that ZGC calculated by itself. Because our service has added a trigger mechanism based on fixed time interval, we disable this function through the parameter-ZProactive to avoid frequent GC and affect service availability. The key word in the log is "Proactive".

Warm-up rule: when the service is started, it generally does not need to be concerned. The key word in the log is "Warmup".

External trigger: System.gc () trigger is explicitly called in the code. The keyword in the log is "System.gc ()".

Metadata allocation trigger: when the metadata area is insufficient, it generally does not require attention. The key word in the log is "Metadata GC Threshold".

Understand the ZGC log

A complete GC process, the points that need to be noted have been marked in the diagram.

Note: this log filters the information entering the safe point. Normally, an GC process is interspersed with operations to enter a safe point.

Each line in the GC log indicates the information in the GC process. The key information is as follows:

Start: start the GC and indicate the cause of the GC trigger. The trigger in the figure above is caused by an adaptive algorithm.

Phase-Pause Mark Start: initial tag, will STW.

Phase-Pause Mark End: if marked again, it will be STW.

Phase-Pause Relocate Start: the initial transfer will be STW.

Heap information: records the heap size changes before and after Mark and Relocate in the process of GC. High and Low record the maximum and minimum values. We are generally concerned about the value of Used in High. If it reaches 100%, there must be insufficient memory allocation in the GC process. You need to adjust the trigger time of GC to GC earlier or faster.

GC information statistics: you can print garbage collection information regularly and observe all the statistics from startup to the present within 10 seconds, within 10 minutes and within 10 hours. Using these statistics, we can detect and locate some outliers.

There are many contents in the log, the key points have been marked with a red line, the meaning is easier to understand, and you can consult the information on the Internet for more detailed explanation.

Understand the cause of ZGC pause

In the course of actual combat, we have found a total of six scenarios that bring the program to a standstill, as follows:

When GC, the initial tag: Pause Mark Start in the log.

When GC, mark: Pause Mark End in the log.

When GC, initial transfer: Pause Relocate Start in the log.

Memory allocation blocking: when there is insufficient memory, the thread blocks waiting for the GC to complete. The keyword is "Allocation Stall".

Safe point: after all threads enter the safe point, they can conduct GC,ZGC to enter the safe point regularly to determine whether GC is needed. The thread that enters the safe point first needs to wait and then enter the safe point until all threads hang.

Dump threads, memory: such as jstack, jmap commands.

Tuning case

The service we maintain is called Zeus, which is Meituan's rule platform and is often used for rule management in risk control scenarios. The rule runs based on the open source expression execution engine Aviator. Inside Aviator, each expression is transformed into a class of Java, and the expression logic is implemented by calling the interface of that class.

There are more than ten thousand rules in the Zeus service, and each machine has millions of requests per day. These objective conditions cause the classes and methods generated by Aviator to generate a lot of ClassLoader and CodeCache, which have become performance bottlenecks of GC when using ZGC. Next, two types of tuning cases are introduced.

Memory allocation blocking, system pause can reach seconds

Case 1: sudden increase in traffic and performance burr in flash sale activity

Log information: comparing the GC log and the business log at the time when the performance burr occurs, it is found that the JVM has been paused for a long time, and there are a large number of "Allocation Stall" logs in the GC log.

Analysis: such cases often occur in scenarios where "adaptive algorithm" is the main GC trigger mechanism. ZGC is a concurrent garbage collector, GC thread and application thread are active at the same time, and new objects are generated during the GC process. Before the GC is completed, the newly generated objects will fill up the heap, so the application thread may block due to the failure to apply for memory. When the flash sale activity starts, a large number of requests enter the system, but the GC trigger interval calculated by the adaptive algorithm is long, resulting in the GC trigger is not timely, resulting in memory allocation blocking, resulting in pause.

Solution:

(1) enable the GC trigger mechanism based on fixed time interval:-XX:ZCollectionInterval. For example, adjust to 5 seconds, or even less.

(2) increase the correction coefficient-XX:ZAllocationSpikeTolerance to trigger GC earlier. ZGC uses the normal distribution model to predict the memory allocation rate, and the default value of the model correction coefficient ZAllocationSpikeTolerance is 2. The higher the value, the earlier the trigger for all clusters in GC,Zeus is 5.

Case 2: during the pressure test, the performance burr occurs when the flow gradually increases to a certain extent.

Log information: an average of one GC per second, with almost no interval between the two GC.

Analysis: GC triggers in time, but the memory marking and recycling speed is too slow, causing memory allocation blocking, resulting in pause.

Solution: increase-XX:ConcGCThreads to speed up concurrent marking and recycling. The default value of ConcGCThreads is the number of kernels of 1 and 8, and the default value is 1. This parameter affects the system throughput. If the GC interval is longer than the GC cycle, it is not recommended to adjust this parameter.

The quantity of GC Roots is large and the pause time of single GC is long.

Case 3: single GC pause time 30ms, there is a big gap between 10ms and expected pause time.

Log information: observe the statistics of ZGC log information, "Pause Roots ClassLoaderDataGraph" takes a long time.

Analysis: dump memory file, found that there are tens of thousands of ClassLoader instances in the system. We know that ClassLoader is part of GC Roots, and the pause time of ZGC is proportional to GC Roots. The larger the number of GC Roots, the longer the pause time. On further analysis, the class name of ClassLoader indicates that these ClassLoader are generated by Aviator components. Analyzing the Aviator source code, it is found that when Aviator generates a new class for each expression, a ClassLoader is created, which leads to a huge number of ClassLoader problems. In later versions of Aviator, this issue has been fixed by creating only one ClassLoader to generate classes for all expressions.

Solution: upgrade the version of the Aviator component to avoid generating excess ClassLoader.

Case 4: after the service is started, the longer the running time, the longer the time of a single GC, and then resume after restart.

Log information: according to the statistics of ZGC log information, the time consuming of "Pause Roots CodeCache" will gradually increase with the running time of the service.

Analysis: the CodeCache space is used to store the JIT compilation results of Java hot code, and CodeCache is also part of GC Roots. By adding the-XX:+PrintCodeCacheOnCompilation parameter and printing the optimized methods in CodeCache, you can find a lot of Aviator expression code. To the root cause, each expression is a method in a class. As the running time increases and the number of execution increases, these methods will be optimized and compiled into the CodeCache by JIT, resulting in a larger and larger CodeCache.

Solution: JIT has some parameter configurations that can adjust the conditions for JIT compilation, but they are not suitable for our problems. We finally solved the problem through business optimization, removing Aviator expressions that did not need to be executed, thus avoiding a large number of Aviator methods from entering the CodeCache.

It is worth mentioning that we do not deploy all clusters until all these problems have been resolved. Even though there are various burrs at the beginning, it is found that the ZGC with various problems has less impact on service availability than the previous CMS. So it took about 2 weeks from the beginning to the full deployment of ZGC. In the following 3 months, we followed up these problems while doing business requirements, and finally solved the above problems one by one, thus enabling ZGC to achieve a better performance in each cluster.

Upgrade ZGC effect delay reduced

TP (Top Percentile) is a measure of system latency: TP999 represents the minimum time that 99.9% of requests can be responded to, and TP99 represents the minimum time that 99% of requests can be responded to.

In different clusters of Zeus services, ZGC is in low latency (TP999

< 200ms）场景中收益较大： TP999：下降12~142ms，下降幅度18%~74%。 TP99：下降5~28ms，下降幅度10%~47%。超低延迟（TP999 < 20ms）和高延迟（TP999 >

200ms) services are not profitable because the response time bottleneck of these services is not GC, but externally dependent performance.

Decreased huff and puff

ZGC may not be appropriate for scenarios where throughput is a priority. For example, an offline cluster in Zeus originally used CMS, but after upgrading ZGC, the system throughput decreased significantly. There are two reasons: first, ZGC is a single-generation garbage collector, while CMS is a generational garbage collector. Single-generation garbage collectors process more objects each time, which consumes more CPU resources; second, ZGC uses read barrier, which consumes additional computing resources.

Summary

As the next generation garbage collector, ZGC has excellent performance. The ZGC garbage collection process is almost all concurrent, and the actual STW pause time is very short, less than 10ms. This is due to the use of coloring pointer and read barrier technology.

In upgrading JDK 11+ZGC, Zeus successfully achieved the upgrade goal by classifying risks and problems, and then breaking them one by one, and GC pauses almost no longer affect the availability of the system.

Finally, it is recommended that you upgrade the ZGC,Zeus system because of business characteristics, encountered more problems, while risk control other teams in the upgrade is very smooth. Welcome to join the "ZGC user Communication" group.

reference

ZGC official website

Peng Chenghan. Design and implementation of a new generation of garbage collector ZGC. Machinery Industry Press, 2019.

Talking about GC Optimization of Java Application from actual cases

Some key Technologies of Java Hotspot G1 GC

Appendix how to use the new technology

In the production environment to upgrade JDK 11, using ZGC, people may not be most concerned about the effect, but the new version of the use of fewer people, online practice is also less, reliable, whether stable. The second is whether the upgrade cost will be great. If it doesn't succeed, it will be a waste of time. Therefore, before using the new technology, the first thing to do is to assess the benefits, costs and risks.

Evaluate the income

For JDK, a program of worldwide concern, the new technologies introduced by large version upgrades have generally been proven in theory. What we need to do is to determine whether the bottleneck of the current system is a problem that can be solved by the new version of JDK, and do not take action without diagnosing the problem clearly. After evaluating the benefits, the costs and risks are evaluated. If the benefits are too large or too small, the weights of the other two effects will be much smaller.

Take the case mentioned at the beginning of this article as an example, assuming that the number of GC remains the same (10 times per minute), and the time of a single GC decreases 10ms from 40ms. It is calculated that there are 100 GC 60000 = 0.17% of the time in one minute, and all requests only pause 10ms during the GC period, the number of requests affected and the delay increased by GC have been reduced.

Evaluate the cost

This mainly refers to the manpower cost required for upgrading. This item is relatively mature, and the change point is judged according to the user manual of the new technology. It's not much different from other projects, so I won't go into details.

In our practice, it takes two weeks to complete online deployment and achieve a state of safe and stable operation. The subsequent iteration lasted for 3 months, and the ZGC was optimized and adapted more appropriately according to the business scenario.

Assess the risk

The risks of upgrading JDK can be divided into three categories:

Compatibility risk: the Java program JAR package depends a lot on whether the program can run after upgrading the JDK version. For example, when our service is upgraded from JDK 7 to JDK 11, we need to solve the problem of incompatibility of many JAR packages.

Functional risk: after running, whether there will be some component logic changes that affect the logic of existing functions.

Performance risk: if there is no problem with the function, whether the performance is stable and can run stably online.

After classification, the response to each type of risk is transformed into a common test problem, which is no longer an unknown risk. Risk refers to uncertain things, if uncertain things can be transformed into certain things, it means that the risk has been eliminated.

Upgrade JDK 11

JDK 11 was selected because ZGC is supported for the first time in JDK 11, and JDK 11 is a long-term support (Long Term Support,LTS) version that will be maintained for at least three years, while normal versions (such as JDK 12, JDK 13, and JDK 14) have only a six-month maintenance cycle and are not recommended.

Local test environment installation

Download JDK 11 from two sources, OpenJDK and OracleJDK, and the main differences between the two versions of JDK are free and paid for for a long time, and free for the short term. Note that ZGC in version 11 of JDK does not support Mac OS systems, and using JDK 11 on Mac OS systems can only use other garbage collectors, such as G1.

Production environment installation

Upgrading JDK 11 is not only to upgrade the JDK version of your own project, but also requires project support such as compilation, release, deployment, running, monitoring, performance memory analysis tools, and so on. Meituan's internal practice:

Compilation and packaging: Meituan release system supports the selection of JDK 11 for compilation and packaging. Online operation & full deployment: JDK11 is required to be installed on the online machine. There are 3 ways:

1. New applications install JDK 11 virtual machines by default: try JDK 11:00 this way; in full deployment, if there are too many newly applied machines, there may not be enough machine resources. two。 Install JDK 11 to the stock virtual machine through handwritten script: not recommended, business students are too involved in operation and maintenance. 3. Use the image deployment feature provided by the container to install JDK 11 when packaging the image: recommended method, no new application resources are required.

Monitoring indicators: mainly the time and frequency of GC. We support the collection of ZGC data through Meituan's CAT monitoring system (CAT has been open source). Performance memory analysis: when you encounter performance problems online, you also need to use Profiling tools. Meituan's performance diagnosis and optimization platform Scalpel has supported JDK 11 performance memory analysis. If your company does not have relevant tools, it is recommended to use JProfier.

Resolve component compatibility

Our project contains more than 200,000 lines of code, needs to be upgraded from JDK 7 to JDK 11, and has many dependent components. Although it looks like the upgrade will be more complex, it actually took only two days to resolve the compatibility issue. The specific process is as follows:

1. To compile, you need to modify the build configuration in the pom file, which can be modified according to the error report. There are two main types:

a. Some classes are deleted: for example, "sun.misc.BASE64Encoder", just find the replacement class java.util.Base64.

b. Component dependency version is not compatible with JDK 11 problem: find the corresponding dependent component, search for the latest version, generally support JDK 11.

two。 After the compilation is successful, start and run, at this time it is still possible that the component depends on the version, just deal with it in the way it was compiled.

Upgrade modified dependencies:

Javax.annotation javax.annotation-api 1.3.2 javax.validation validation-api 2.0.1.Final org.projectlombok lombok 1.18.4 org.hibernate.validator hibernate-validator-parent 6.0.16.Final com.sankuai.inf patriot-sdk 1.2.1 org.apache.commons commons-lang3 3.9 commons-lang commons-lang 2.6 io.netty netty-all 4.1.39.Final junit junit 4.12

JDK 11 has been around for two years, and common dependent components have compatible versions. However, if it is a company-level component provided within the company, it may not be compatible with JDK 11 and the relevant components need to be upgraded. If it is difficult for the other party to upgrade, you can consider splitting the features, deploy the features that rely on these components separately, and continue to use the lower version of JDK. As the excellent performance of JDK11 is known, it is believed that more teams will use JDK11 to solve GC problems, and the more users, the greater the motivation to upgrade each component.

Verify functional correctness

The functional correctness is ensured through complete single test, integration and regression testing.

After reading the above, do you have any further understanding of how to explore and practice the new generation of garbage collector ZGC? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.