Detailed explanation of JVM garbage collector 04/16 Update SLTechnology News&Howtos

Detailed explanation of JVM garbage collector

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "detailed explanation of JVM garbage collector". In daily operation, I believe many people have doubts about the detailed explanation of JVM garbage collector. The editor has consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "detailed explanation of JVM garbage collector"! Next, please follow the editor to study!

1 Overview

The first thing to consider is:

What kind of rubbish needs recycling?

When will it be recycled?

How to recycle?

When we need to troubleshoot various memory overflow problems, and when garbage collection is called a bottleneck for the system to achieve higher concurrency, we need to implement the necessary monitoring and adjustment of these "automated" technologies.

2 the object is dead?

There are almost all the object instances in the heap, and the first step before garbage collection is to determine which objects are dead (that is, objects that can no longer be used in any way).

2.1 citation counting method

Add a reference counter to the object, which is incremented by 1 whenever there is a reference; when the reference expires, the counter is subtracted by 1; an object with a counter of 0 can no longer be used.

This method is simple and efficient, but at present, the mainstream virtual machine does not choose this algorithm to manage memory. The main reason is that it is difficult to solve the problem of circular reference between objects.

2.2 Reachability Analysis algorithm

The basic idea of this algorithm is to search down from these nodes through a series of objects called "GC Roots". The path taken by the node is called reference chain. When an object is not connected to GC Roots with any reference chain, it is proved that the object is unavailable.

2.3 talk about citation again

After JDK1.2, Java expanded the concept of citation, dividing it into four categories: strong citation, soft citation, weak citation and virtual citation (citation intensity gradually weakens)

1. Strong reference

Most of the references we used in the past are actually strong references, which are the most commonly used references. If an object has a strong reference, it is similar to an essential household item, and the garbage collector will never recycle it. When the memory space is insufficient, the Java virtual machine would rather throw an OutOfMemoryError error to cause the program to terminate abnormally, rather than solve the memory shortage problem by randomly collecting objects with strong references.

2. Soft reference (SoftReference)

If an object has only soft references, it is similar to something that can be used for daily use. If there is enough memory space, the garbage collector will not reclaim it, and if there is insufficient memory space, it will reclaim the memory of these objects. As long as the garbage collector does not reclaim it, the object can be used by the program. Soft references can be used to implement memory-sensitive caching.

A soft reference can be used in conjunction with a reference queue (ReferenceQueue), and if the object referenced by the soft reference is garbage collected, the JAVA virtual machine will add the soft reference to the reference queue associated with it.

3. Weak reference (WeakReference)

If an object has only weak references, it is similar to something that can be used for daily use. The difference between weak references and soft references is that objects with only weak references have a shorter life cycle. When the garbage collector thread scans the memory area under its jurisdiction, once an object with only weak references is found, its memory will be reclaimed regardless of whether the current memory space is sufficient or not. However, because the garbage collector is a low-priority thread, objects with only weak references will not necessarily be found quickly.

A weak reference can be used in conjunction with a reference queue (ReferenceQueue), and if the object referenced by the weak reference is garbage collected, the Java virtual machine adds the weak reference to the reference queue associated with it.

4. Virtual reference (PhantomReference)

As the name implies, "virtual reference" is nonexistent, and unlike other references, virtual reference does not determine the life cycle of the object. If an object only holds a virtual reference, it can be garbage collected at any time as if it had no reference at all.

Virtual references are mainly used to track the activity of objects being garbage collected.

One difference between virtual references and soft and weak references is that virtual references must be used in conjunction with reference queues (ReferenceQueue). When the garbage collector is ready to recycle an object, if it finds that it still has a virtual reference, it will add the virtual reference to the reference queue associated with it before reclaiming the object's memory. The program can know whether the referenced object is going to be garbage collected by determining whether a virtual reference has been added to the reference queue. If a program finds that a virtual reference has been added to the reference queue, it can take the necessary action before the memory of the referenced object is reclaimed.

In particular, weak references and virtual references are rarely used in programming, and soft references are often used. This is because soft references can accelerate the collection of junk memory by JVM, maintain the security of the system, and prevent memory overflow (OutOfMemory) and other problems.

2.4 Live or die

Even in the accessibility analysis, the unreachable objects are not "must die". At this time, they are temporarily in the "reprieve stage". In order to really declare the death of an object, they have to go through the marking process at least twice; in the accessibility analysis, the unreachable objects are marked for the first time and screened once, the screening condition is whether it is necessary to implement the finalize method. When the object does not override the finalize method, or the finalize method has been called obsolete by the virtual machine, the virtual machine treats both situations as unnecessary. Objects that are determined to need to be executed will be placed in a queue for a second tag, and will be actually recycled unless the object is associated with any object on the reference chain.

2.5 recovery method area

Garbage collection in the method zone (or permanent generation in the Hotspot virtual) consists of two main parts: obsolete constants and useless classes.

It is relatively simple to determine whether a constant is an "obsolete constant", while the conditions for determining whether a class is a "useless class" are much harsher. A class needs to meet the following three conditions to be considered a "useless class":

All instances of the class have been recycled, that is, no instances of the class exist in the Java heap.

The ClassLoader that loaded the class has been recycled.

The corresponding java.lang.Class object of this class is not referenced anywhere, and its methods cannot be accessed anywhere through reflection.

3 garbage collection algorithm 3.1 Mark-clear algorithm

The algorithm is divided into "marking" and "clearing" phases: first, all the objects that need to be recycled are marked, and all the marked objects are uniformly reclaimed after the marking is completed. It is the most basic collection algorithm and will bring two obvious problems: 1: efficiency problem and 2: space problem (a large number of discontinuous fragments will be produced after mark removal)

3.2 replication algorithm

In order to solve the problem of efficiency, the "replication" collection algorithm appeared. It can divide the memory into two blocks of the same size, one of which is used at a time. When this piece of memory is used up, copy the surviving objects to another piece, and then clean up the used space at once. This makes it possible to reclaim half of the memory interval for each memory collection.

3.3 marking-finishing algorithm

According to the characteristics of the old era, a marking algorithm, the marking process is still the same as the "mark-clear" algorithm, but the next step is not to directly reclaim recyclable objects, but to move all surviving objects to a stage. and then directly clean up the memory outside the end boundary.

3.4 Generation collection algorithm

At present, the junk mobile phones of virtual machines use generation-by-generation collection algorithm, which has no new idea, but divides the memory into several blocks according to the different survival periods of objects. Generally, the java heap is divided into the new generation and the old age, so that we can choose the appropriate garbage collection algorithm according to the characteristics of each age.

For example, in the new generation, a large number of objects will die in each collection, so we can choose the replication algorithm and only pay a small amount of object replication cost to complete each garbage collection. In the old days, the probability of object survival is relatively high, so we can choose the "tag-clean" or "mark-clean" algorithm for garbage collection.

Extended interview question: why should HotSpot be divided into the new generation and the old era?

Answer according to the above introduction to the generation collection algorithm.

4 garbage collector

If the collection algorithm is the methodology of memory collection, then the garbage collector is the concrete implementation of memory collection.

Although we compared the various collectors, we did not pick out the best collector. Because we know that there is no best garbage collector in the current location, and there is no universal garbage collector, what we can do is to choose our own garbage collector according to the specific application scenario. Imagine this: if there were a perfect collector that works all over the world and in any scenario, our HotSpot virtual machine would not implement so many different garbage collectors.

4.1 Serial Collector

Serial (serial) collector the collector is the most basic and oldest garbage collector. You can tell by the name that this collector is a single-threaded collector. The meaning of its "single thread" not only means that it will use only one garbage collection thread to complete the garbage collection work, but more importantly, it must pause all other worker threads ("Stop The World" to know about it) until its collection is complete.

The designers of virtual machines are certainly aware of the poor user experience of Stop The World, so the pause time is decreasing in subsequent garbage collector designs (there is still a pause, and the process of finding the best garbage collector continues).

But is there any advantage over other garbage collectors in the Serial collector? Of course, it is simple and efficient (compared to the single thread of other collectors). Because the Serial collector has no overhead of thread interaction, it can naturally achieve high single-thread collection efficiency. The Serial collector is a good choice for virtual machines running in Client mode.

4.2 ParNew Collector

The ParNew collector is actually a multithreaded version of the Serial collector, and the behavior (control parameters, collection algorithm, collection strategy, and so on) is exactly the same as the Serial collector except for using multithreading for garbage collection.

It is the first choice for many virtual machines running in Server mode, and apart from the Serial collector, it is the only one that works with the CMS collector (the real concurrent collector, described later).

The concepts of parallelism and concurrency are supplemented:

Parallel: refers to multiple garbage collection threads working in parallel, while the user thread is still waiting.

Concurrent: means that the user thread and the garbage collection thread execute at the same time (but not necessarily in parallel and may execute alternately), the user program continues to run, and the garbage collector runs on another CPU.

4.3 Parallel Scavenge Collector

Parallel Scavenge collector is a new generation collector. It is not only a collector using replication algorithm, but also a parallel multithreaded collector. So what's so special about it?

The focus of the Parallel Scavenge collector is throughput (efficient use of CPU). Garbage collectors such as CMS focus more on the pause time of user threads (improving the user experience). Throughput is the ratio of the time spent running user code in CPU to the total time spent on CPU. The Parallel Scavenge collector provides many parameters for users to find the most appropriate pause time or maximum throughput. If you don't know much about the operation of the collector, it is also a good choice to leave the memory management optimization to the virtual machine if manual optimization exists.

4.4.Serial Old collector

An older version of the Serial collector, which is also a single-threaded collector. It has two main uses: one is to be used with the Parallel Scavenge collector in JDK1.5 and previous versions, and the other is to serve as a backup scheme for the CMS collector.

4.5 Parallel Old Collector

An older version of the Parallel Scavenge collector. Use multithreading and mark-up algorithms. When you focus on throughput and CPU resources, you can give priority to both Parallel Scavenge and Parallel Old collectors.

4.6 CMS Collector

CMS (Concurrent Mark Sweep) collector is a kind of collector whose goal is to obtain the shortest recovery pause time. It is very suitable for use in applications that focus on user experience.

As you can see from the word Mark Sweep in the name, the CMS collector is implemented by a "mark-clear" algorithm, and its operation is a little more complex than the previous garbage collectors. The whole process is divided into four steps:

Initial markup: pauses all other threads and records objects directly connected to the root, very fast

Concurrent markup: open both GC and user threads and use a closure structure to record reachable objects. But at the end of this phase, the closure structure is not guaranteed to contain all the current reachable objects. Because the user thread may constantly update the reference domain, the GC thread cannot guarantee the real-time reachability analysis. So this algorithm keeps track of where reference updates occur.

Relabeling: the purpose of the relabeling phase is to correct the tagging records of the objects whose markup changes are caused by the continued running of the user program during the concurrent tagging period. the pause time of this phase is generally slightly longer than that of the initial tagging phase and much shorter than that of the concurrent tagging phase.

Concurrent cleanup: starts the user thread while the GC thread begins to clean the marked area.

From its name, we can see that it is an excellent garbage collector, the main advantages: concurrent collection, low pause. But it has three obvious disadvantages:

Sensitive to CPU resources

Unable to handle floating garbage

The recycling algorithm it uses, the Mark-clear algorithm, causes a large number of space debris to be generated at the end of the collection.

4.7 G1 Collector

The previous generation of garbage collectors (serial serial, parallel parallel, and CMS) divided heap memory into three fixed-size parts: the younger generation (young generation), the older generation (old generation), and the persistent generation (permanent generation).

G 1 (Garbage-First) is a server-oriented garbage collector for machines with multiple processors and large memory. It not only meets the GC pause time requirements with very high probability, but also has high throughput performance characteristics.

It is regarded as an important evolutionary feature of HotSpot virtual machine in JDK1.7. It has the following characteristics:

Parallelism and concurrency: G1 can make full use of the hardware advantages of CPU and multi-core environment, and use multiple CPU (CPU or CPU cores) to shorten stop-The-World pause time. While some other collectors are supposed to pause GC actions executed by Java threads, the G1 collector can still allow java programs to continue to execute concurrently.

Generational collection: although G1 can independently manage the entire GC heap without the cooperation of other collectors, it still retains the concept of generational.

Spatial integration: unlike CMS's tag-cleanup algorithm, G1 is a collector based on the tag finishing algorithm as a whole; locally, it is based on the replication algorithm.

Predictable pause: this is another big advantage of G1 over CMS. Reducing pause time is a common concern of G1 and CMS, but in addition to pursuing a low pause, G1 can also build a predictable pause time model that allows users to specify a time period of M milliseconds.

The G1 collector maintains a priority list in the background, giving priority to the Region with the greatest recycling value each time based on the allowed collection time (which is the origin of its name Garbage-First). This method of using Region to divide memory space and priority area collection ensures that the GF collector can collect as efficiently as possible (breaking up memory into parts) in a limited time.

The operation of the G1 collector is roughly divided into the following steps:

Initial mark

Concurrent tagging

Final mark

Screening and recovery

The operation of the above steps has many similarities with CMS. The initial marking phase only marks the objects to which GC Roots can be directly associated, and modifies the value of TAMS, so that when the user program runs concurrently in the next stage, new objects can be created in the correct available Region. This stage needs to pause the thread, but it takes a short time. The concurrent marking phase starts from GC Root to analyze the reachability of objects in the heap and find out the surviving objects. This stage takes a long time. But it can be executed concurrently with the user program. The final marking phase is to fix the part of the marking record that changes the markup due to the continued operation of the user program during the concurrent marking period. The virtual machine records the changes of the object during this period in the thread Remenbered Set Logs, the final marking phase needs to merge the Remembered Set Logs data into the Remembered Set Logs, and the final marking phase needs to merge the Remembered Set Logs data into the Remembered Set. This stage needs to pause the thread. But it can be executed in parallel. Finally, in the screening and recovery stage, the recovery value and cost of each Region are sorted, and the recovery plan is made according to the GC pause time expected by users.

5 memory allocation and recovery strategy 5.1 objects are allocated first in the Eden area

In most cases, objects are allocated in the Eden area of the Cenozoic era. When there is not enough space in the Eden area to allocate, the virtual machine will initiate a Minor GC.

What's the difference between Minor Gc and Full GC?

New generation GC (Minor GC): refers to the occurrence of a new generation of garbage collection, Minor GC is very frequent, the recovery speed is generally relatively fast.

Old GC (Major GC/Full GC): refers to the GC that occurs in the old years, when Major GC is often accompanied by Minor GC at least once (not absolutely). The speed of Major GC is generally more than 10 times slower than that of Minor GC.

5.2 the big object goes directly into the old age.

Large objects are objects that require a lot of contiguous memory space (for example, strings, arrays).

5.3 long-term surviving objects will enter the old age

Since the virtual machine uses the idea of generation-by-generation collection to manage memory, memory collection must be able to identify which objects should be placed in the new generation and which objects should be placed in the old age. To do this, the virtual machine gives each object an object age (Age) counter.

5.4 Age determination of dynamic objects

In order to better adapt to the memory situation of different programs, the virtual machine does not always require that the age of the object must reach a certain value in order to enter the old age. If the sum of all the objects of the same age in the Survivor space is more than half of the Survivor space, the objects whose age is greater than or equal to that age can directly enter the old age without reaching the required age.

At this point, the study on "detailed explanation of JVM garbage collector" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.