What garbage collectors does JVM have? 07/13 Update SLTechnology News&Howtos

What garbage collectors does JVM have?

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article is about what garbage collectors are available in JVM. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

If the collection algorithm is the methodology of memory collection, then the garbage collector is the concrete implementation of memory collection. There is no stipulation on how to implement the garbage collector in the Java virtual machine specification, so the garbage collectors provided by different manufacturers and versions of virtual machines may be very different, and generally provide parameters for users to combine the collectors used in different ages according to their own application characteristics and requirements. The collector discussed next is based on the HotSpot virtual machine after JDK1.7 Update 14 (the commercial G1 collector was officially available in this version, which was still in the experimental state), and all the collectors included by the virtual machine are shown in the following figure:

The figure above shows seven collectors for different generations, and if there is a connection between the two collectors, they can be used together. The area where the virtual machine is located indicates whether it belongs to a new-generation collector or an old-age collector. Hotspot implements so many collectors precisely because there are no perfect collectors at present, but just choose the one that is most suitable for specific applications.

Related concepts parallelism and concurrency

Parallel: refers to multiple garbage collection threads working in parallel, while the user thread is still waiting.

Concurrent: means that the user thread and the garbage collection thread execute at the same time (but not necessarily in parallel and may be executed alternately) and the user program continues to run. The garbage collector runs on another CPU.

Throughput (Throughput)

Throughput is the ratio of the time spent by CPU running user code to the total time spent by CPU, that is,

Throughput = time to run user code / (time to run user code + garbage collection time).

Assuming that the virtual machine runs for a total of 100 minutes, of which garbage collection takes 1 minute, the throughput is 99%.

Minor GC and Full GC

New generation GC (Minor GC): refers to the garbage collection actions that occur in the new generation. Because most Java objects have the characteristics of dying out forever, Minor GC is very frequent, and the recovery speed is generally fast. See the previous article for specific principles.

Old GC (Major GC / Full GC): refers to the GC that occurred in the old years, with the emergence of Major GC, which is often accompanied by Minor GC at least once (but not absolutely, there is a policy selection process of Major GC directly in the collection strategy of the Parallel Scavenge collector). Major GC is generally more than 10 times slower than Minor GC.

New generation collector Serial collector

Serial (serial) collector is the most basic and oldest collector. It is a new generation collector using replication algorithm. It was once (before JDK 1.3.1) the only choice for the new generation collection of virtual machines. It is a single-threaded collector that uses only one CPU or one collection thread to complete garbage collection, and more importantly, when it does garbage collection, it must pause all other worker threads until the end of the Serial collector collection ("Stop The World"). This work is initiated and completed automatically by the virtual machine in the background, stopping all the user's working threads when the user is invisible, which is difficult for many applications to accept.

The following figure shows the operation of the Serial collector (the Serial Old collector was used in the old days):

In order to eliminate or reduce the pauses caused by memory recycling in worker threads, the HotSpot virtual machine development team has developed a variety of other excellent collectors in the Java development process since JDK 1.3, which will be described later. But the birth of these collectors does not mean that the Serial collector is "old and useless". In fact, it is still the default new generation collector for the HotSpot virtual machine running in Client mode. It also has advantages over other collectors: simple and efficient (compared with the single thread of other collectors). For an environment that limits a single CPU, the Serial collector can naturally achieve higher single-thread collection efficiency because it has no thread interaction overhead to concentrate on garbage collection.

In the user's desktop application scenario, the memory allocated to the virtual machine management is generally not very large, and the new generation that collects tens of megabytes or even one or two hundred megabytes of memory (only the memory used by the new generation, the desktop application will not grow any bigger). The pause time can be controlled within tens of milliseconds, up to 100 milliseconds, as long as it does not occur frequently. Therefore, the Serial collector is a good choice for virtual machines running in Client mode.

ParNew collector

The ParNew collector is the multithreaded version of the Serial collector, and it is also a new generation of collectors. Except for using multithreading for garbage collection, the rest of the behavior, including all the control parameters available to the Serial collector, collection algorithm (replication algorithm), Stop The World, object allocation rules, collection strategy, and so on, are exactly the same as the Serial collector, both sharing a considerable amount of code.

The working process of the ParNew collector is shown below (the Serial Old collector was used in the old days):

Apart from using multithreaded collection, the ParNew collector does not have much innovation compared with the Serial collector, but it is the preferred new generation collector in many virtual machines running in Server mode. One of the important reasons that has nothing to do with performance is that, except for the Serial collector, it is currently the only one that works with the CMS collector (Concurrent Mark Sweep). The CMS collector is an epoch-making collector introduced by JDK 1.5, and the details will be covered later.

The ParNew collector will never work better than the Serial collector in a single CPU environment, and even because of the overhead of thread interaction, the collector cannot be 100% surpassed in both CPU environments implemented by hyper-threading technology. In the multi-CPU environment, with the increase of the number of CPU, it is very beneficial to the effective use of system resources in GC. The number of collection threads turned on by default is the same as the number of CPU, and the-XX:ParallerGCThreads parameter setting can be used in the case of a very large number of CPU.

Parallel Scavenge collector

The Parallel Scavenge collector is also a parallel multithreaded new generation collector that also uses replication algorithms. The characteristic of Parallel Scavenge collector is that its focus is different from other collectors. Collectors such as CMS focus on reducing the pause time of user threads during garbage collection as much as possible, while the goal of Parallel Scavenge collector is to achieve a controllable throughput (Throughput).

The shorter the pause time, the more suitable for programs that need to interact with users, and good response speed can improve the user experience. On the other hand, high throughput can make efficient use of CPU time to complete the operation task of the program as soon as possible, which is mainly suitable for tasks that operate in the background without too much interaction.

In addition to providing a parameter that can precisely control the throughput, the Parallel Scavenge collector also provides a parameter-XX:+UseAdaptiveSizePolicy, which is a switch parameter. When the parameter is turned on, there is no need to manually specify detailed parameters such as the size of the new generation (- Xmn), the ratio of Eden to Survivor (- XX:SurvivorRatio), the age of the object of promotion (- XX:PretenureSizeThreshold), and so on. The virtual machine collects performance monitoring information according to the operation of the current system, and dynamically adjusts these parameters to provide the most appropriate pause time or maximum throughput, which is called GC adaptive tuning strategy (GC Ergonomics). Adaptive adjustment strategy is also an important difference between Parallel Scavenge collector and ParNew collector.

It is also worth noting that the Parallel Scavenge collector does not work with the CMS collector, so before the introduction of Parallel Old in JDK 1.6, if the new generation chose the Parallel Scavenge collector, only the Serial Old collector could be used with it in the old days.

Old era collector Serial Old collector

Serial Old, an older version of the Serial collector, is also a single-threaded collector that uses the tag-collation (Mark-Compact) algorithm.

The main significance of this collector is also to be used by virtual machines in Client mode. If you are in Server mode, it also has two main uses:

Used with the Parallel Scavenge collector in JDK1.5 and previous versions (before the birth of Parallel Old).

Used as a backup scenario for the CMS collector when Concurrent Mode Failure occurs in concurrent collections.

Its workflow is the same as that of the Serial collector, and here is the workflow diagram for using Serial/Serial Old with it again:

Parallel Old collector

The Parallel Old collector is an old-fashioned version of the Parallel Scavenge collector, using multithreading and mark-up algorithms. As mentioned earlier, this collector was only available in JDK 1.6. before that, if the new generation chooses the Parallel Scavenge collector, the old era has no choice but Serial Old, so after the birth of Parallel Old, the "throughput first" collector finally has a more veritable application portfolio, and Parallel Scavenge plus Parallel Old collectors can be given priority in situations that focus on throughput and CPU resource sensitivity. The workflow of the Parallel Old collector is the same as that of Parallel Scavenge. Here is a flow chart for the use of the Parallel Scavenge/Parallel Old collector:

CMS collector

CMS (Concurrent Mark Sweep) collector is a kind of collector whose goal is to obtain the shortest recovery pause time. It is very suitable for those Java applications focused on the server side of the Internet website or BBAMA S system. These applications attach great importance to the response speed of the service. From the name ("Mark Sweep"), you can see that it is based on the "tag-clear" algorithm.

The entire process of CMS collector work is divided into the following four steps:

Initial tag (CMS initial mark): just mark the objects to which GC Roots can be directly associated, which is very fast and requires "Stop The World".

Concurrent tagging (CMS concurrent mark): the process of GC Roots Tracing that takes the longest time in the process.

CMS remark: in order to correct the tag record of the part of the object in which the tag changes due to the continued operation of the user program during the concurrent marking period, the pause time in this phase is generally slightly longer than that in the initial marking phase, but much shorter than the concurrent marking time. "Stop The World" is also required at this stage.

Concurrent cleanup (CMS concurrent sweep)

Because the collector threads of the longest concurrent marking and concurrent cleanup processes throughout the process can work with the user thread, in general, the memory collection process of the CMS collector is executed concurrently with the user thread. From the following figure, you can clearly see the concurrency and pause time in the operation steps of the CMS collector:

Advantages

CMS is an excellent collector, its main advantages have been reflected in the name: concurrent collection, low pause, so the CMS collector is also known as the concurrent low pause collector (Concurrent Low Pause Collector).

Shortcoming

Very sensitive to CPU resources in fact, programs designed for concurrency are sensitive to CPU resources. In the concurrency phase, although it will not cause the user thread to stop, it will cause the application to slow down and the total throughput to decrease because it takes up some threads (or CPU resources). By default, the number of recycling threads started by CMS is (CPU + 3) / 4, that is, when the CPU is more than 4, the garbage collection thread has no less than 25% of the CPU resources when the collection is issued concurrently, and decreases with the increase in the number of CPU. However, when there are less than 4 CPU (for example, 2), CMS may have a great impact on user programs. If the CPU load is already large and half of the computing power is allocated to execute the collector thread, it may suddenly reduce the execution speed of user programs by 50%, which is also unacceptable.

Failure to handle floating garbage (Floating Garbage) may have a "Concurrent Mode Failure" failure, resulting in another Full GC generation. Since the user thread is still running during the CMS concurrency cleanup phase, new garbage will naturally be generated as the program runs. This part of the garbage appears after the marking process, and CMS can no longer dispose of it in the current collection, so it has to be cleaned up at the next GC. This part of the garbage is called "floating garbage". It is also because the user thread still needs to run in the garbage collection phase, so it also needs to reserve enough memory space for the user thread to use, so the CMS collector can not wait until the old age is almost completely filled up like other collectors, but needs to reserve a part of the space to provide the program operation for concurrent collection.

The space debris CMS caused by the mark-clear algorithm is a collector based on the mark-clear algorithm, which means that a large number of space debris will be generated at the end of the collection. When there is too much space debris, it will bring a lot to the allocation of large objects. Trouble, there is often old space surplus, but can not find enough contiguous space to allocate the current object.

G1 collector

The G1 (Garbage-First) collector is one of the most cutting-edge achievements in the development of today's collector technology. it is a garbage collector for server-side applications, and the mission entrusted to it by the HotSpot development team is to replace the CMS collector released in JDK 1.5 in the future. Compared with other GC collectors, G1 has the following characteristics:

Parallel and concurrent G1 can make full use of the hardware advantages of multi-CPU and multi-core environment, and use multiple CPU to shorten the "Stop The World" pause time. Some other collectors originally need to pause the GC actions executed by Java threads. G1 collectors can still allow Java programs to continue to execute concurrently.

Generational collection, like other collectors, the concept of generational collection is still preserved in G1. Although G1 can manage the entire GC heap independently without the cooperation of other collectors, it can handle newly created objects and old objects that have survived many times of GC for better collection results in different ways.

On the whole, the spatial integration G1 is a collector based on the "tag-collation" algorithm, and locally (between the two Region) is based on the "replication" algorithm. This means that there is no memory space fragmentation during G1 operation, and regular available memory can be provided after collection. This feature helps the program to run for a long time, and when allocating large objects, the next GC will not be triggered in advance because the continuous memory space cannot be found.

Predictable pause this is a major advantage of G1 over CMS. Reducing pause time is a common concern of G1 and CMS, but in addition to reducing pause, G1 can also build a predictable pause time model, which allows users to specify that the time spent on GC should not exceed N milliseconds in a time period of M milliseconds, which is almost a feature of real-time Java (RTSJ) garbage collectors.

Across the entire heap memory

Other collectors before G1 cover the entire Cenozoic or old generation, while G1 is no longer the case. When using the G1, the memory layout of the Java heap is very different from that of other collectors. It divides the entire Java heap into several equal-sized independent areas (Region). Although it still retains the concept of the new generation and the old age, the new generation and the old age are no longer physically isolated, but are part of the collection of Region (not need to be contiguous).

Establish a predictable time model

The G1 collector can build a predictable pause time model because it can systematically avoid region-wide garbage collection in the entire Java heap. G1 tracks the value of garbage accumulation in each Region (the amount of space obtained by recycling and the empirical value of the time it takes to recycle), and maintains a priority list in the background, preferentially recycling the most valuable Region each time according to the allowed collection time (that's where the Garbage-First name comes from). This method of using Region to divide memory space and priority area recovery ensures that the G1 collector can achieve the highest collection efficiency in a limited time.

Avoid full heap scanning-Remembered Set

G1 divides the Java heap into multiple Region, which is "broken into parts". However, Region cannot be isolated. An object is assigned to a Region and can have a reference relationship with any object in the entire Java heap. When doing reachability analysis to determine whether the object is alive, the entire Java heap needs to be scanned to ensure accuracy, which obviously does great harm to the efficiency of GC.

In order to avoid full heap scanning, the virtual machine maintains a corresponding Remembered Set for each Region in G1. When the virtual machine discoverer writes to data of type Reference, a Write Barrier temporarily interrupts the write operation to check whether the object referenced by Reference is in a different Region (in the case of generations, it is checked whether the object in the old age refers to the object in the new generation), and if so, the relevant reference information is recorded in the Remembered Set of the Region to which the referenced object belongs through CardTable. When memory collection is performed, adding Remembered Set to the enumeration range of the GC root node ensures that the whole heap is not scanned and there is no omission.

Without counting the operation of maintaining the Remembered Set, the operation of the G1 collector can be roughly divided into the following steps:

The Initial Marking simply marks the objects to which GC Roots can be directly associated, and modifies the value of TAMS (Nest Top Mark Start) so that when the user program runs concurrently in the next phase, the object can be created in the correct Region. This stage requires a pause in the thread, but it takes a short time.

Concurrent marking (Concurrent Marking) starts from GC Root to analyze the reachability of objects in the heap and find the surviving objects. This stage takes a long time, but can be executed concurrently with user programs.

The final tag (Final Marking) in order to fix the part of the tag record that changes due to the continuous operation of the user program during the concurrent marking period, the virtual machine records the object changes during this period in the thread's Remembered Set Logs, and the final marking phase needs to merge the Remembered Set Logs data into the Remembered Set. This stage requires thread pause, but can be executed in parallel.

Filter recovery (Live Data Counting and Evacuation) first sorts the recovery value and cost in each Region and makes a recovery plan according to the time of the GC pause expected by the user. In fact, this stage can also be executed concurrently with the user program, but because only part of the Region is recycled, the time can be controlled by the user, and halting the user thread will greatly improve the collection efficiency.

From the following figure, you can clearly see the concurrency and pause stages in the operation steps of the G1 collector (at Safepoint):

Summary

Thank you for reading! This is the end of this article on "what garbage collectors do JVM have". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.