Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the differences between garbage location, garbage collection algorithm and garbage disposal optimized by JVM?

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly talks about "what are the differences between JVM-optimized garbage location, garbage collection algorithm and garbage processor". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "what are the differences between JVM-optimized garbage location, garbage collection algorithms, and garbage handlers?"

What is rubbish?

Garbage, mainly refers to the objects on the heap, so how to make sure that these objects can be recycled?

The general idea is that if an object can never be accessed, then it is garbage and can be recycled. How can you be sure that the object will never be used?

Citation counting method

Add a reference counter to the object that increments the counter value whenever there is a reference; when the reference expires, the counter value is subtracted by one; an object with a counter of zero at any time can no longer be used. However, in the field of Java, at least the mainstream Java virtual machines do not choose the reference counting algorithm to manage memory. The main reason is that this seemingly simple algorithm has many exceptions to consider and must cooperate with a lot of extra processing to ensure that it works correctly. For example, simple reference counting is very difficult to solve the problem of circular references between objects.

As shown in the figure, the reference of each object is 1, which constitutes a circular reference, but it cannot be accessed by other objects, these two objects no longer have any references, and the reference counting algorithm cannot recycle them.

Code validation:

Package com.courage; public class ReferenceCountingGC {public Object instance = null; private static final int _ 1MB = 1024 * 1024; / * the only meaning of this member attribute is to occupy memory so that you can see clearly in the GC log whether * / private byte [] bigSize = new byte [5* _ 1MB]; public static void testGC () {/ / 5 M ReferenceCountingGC objA = new ReferenceCountingGC () / / 5 M ReferenceCountingGC objB = new ReferenceCountingGC (); objA.instance = objB; objB.instance = objA; objA = null; objB = null; / / suppose that GC,objA and objB can be recycled in this line? System.gc ();} public static void main (String [] args) {testGC ();}}

Execution result:

[0.004s] [warning] [gc]-XX:+PrintGCDetails is deprecated. Will use-Xlog:gc* instead. [0.012s] [info] [gc,heap] Heap region size: 1M [0.015s] [info] [gc] Using G1 [0.015s] [info] [gc,heap,coops] Heap address: 0x0000000701000000, size: 4080 MB, Compressed Oops mode: Zero based, Oop shift amount: 3. [0.119s] [gc,metaspace] GC (0) Metaspace: 805K-> 805K (1056768K) [0.119s] [info] [gc] GC (0) Pause Full (System.gc ()) 14m-> 0m (8m) 2.886ms [0.119s] [info] [gc,cpu] GC (0) User=0.03s Sys=0.00s Real=0.00s [0.120s] [info] [gc,heap,exit] Heap.

For the sake of space, I omitted part of the print. It can be seen that the memory footprint after System.gc () is 14m-> 0m, which releases the 10m of the object. That is, JVM does not use reference counting to mark garbage.

Root reachability algorithm

The basic idea of this algorithm is to use a series of root objects called "GC Roots" as the starting node set, from these nodes, search downward according to the reference relationship, the path of the search process is called "reference chain" (Reference Chain). If there is no reference chain connection between an object and GC Roots, or in terms of graph theory, it is from GC Roots to the time when the object is unreachable. It proves that this object can no longer be used.

In the Java technology system, the fixed objects that can be used as GC Roots include the following:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Objects referenced in the virtual machine stack (local variables in stack frames), such as those used in the method stack where each thread is called

Parameters, local variables, temporary variables, and other objects referenced by class static properties in the method area, such as reference type static variables of the Java class.

Objects referenced by constants in the method area, such as references in the string constant pool (String Table).

Objects referenced by JNI (commonly known as Native methods) in the local method stack.

References within the Java virtual machine, such as Class objects corresponding to basic data types, and some resident exception objects (such as

NullPointExcepiton, OutOfMemoryError, etc., and system class loaders.

All objects held by synchronization locks (synchronized keyword).

JMXBean that reflects the internal situation of the Java virtual machine, callbacks registered in JVMTI, local code caching, and so on.

Garbage collection algorithm

This paper introduces three common garbage collection algorithms (mark-sweep,mark-compact,mark-copy), which are the algorithm basis of various garbage collectors in java virtual machine.

The idea of garbage collection algorithm

At present, most of the garbage collectors of commercial virtual machines are designed according to the theory of "generation collection" (Generational Collection). The name of generation collection is theory, which is essentially a set of rules of thumb that accord with the actual situation of most programs, and it is based on two generational hypotheses:

1) weak generational hypothesis (Weak Generational Hypothesis): the vast majority of objects are permanent.

2) the strong generation hypothesis (Strong Generational Hypothesis): the more times the object goes through the garbage collection process, the more difficult it is for the object to die.

These two generational hypotheses work together to establish a consistent design principle for many commonly used garbage collectors: the collector should divide the Java heap into different areas, and then allocate the recycled objects to different areas according to their age (that is, the number of times the object survives the garbage collection process) for storage. Obviously, if most objects in an area are dying out, it is difficult to survive the garbage collection process. So put them together and focus on how to keep a small amount of survival instead of marking a large number of objects that will be recycled each time, and a large amount of space can be recycled at a lower cost.

If the rest are hard-to-die objects, put them together, and the virtual machine can use a lower frequency to recycle this area, which takes into account both the time cost of garbage collection and the efficient use of memory space.

Mark-clear algorithm Mark-Sweep

The algorithm is divided into two stages: "marking" and "clearing": first, all the objects that need to be recycled are marked, and after the marking is completed, all the marked objects are recycled uniformly, or conversely, the living objects are tagged and all the unmarked objects are recycled uniformly.

It has two main disadvantages:

The first is that the execution efficiency is unstable. If there are a large number of objects in the Java heap, and most of them need to be recycled, a large number of marking and clearing actions must be carried out, resulting in a decrease in the execution efficiency of both marking and clearing processes as the number of objects increases.

The second is the fragmentation of memory space, which will produce a large number of discontinuous memory fragments after marking and clearing. Too much space debris may result in not finding enough continuous memory later when large objects need to be allocated during the running of the program, which will have to trigger another garbage collection action in advance.

Tag-copy Mark-Copy

The tag-copy algorithm is often referred to as the replication algorithm, which divides the available memory into two equal chunks of capacity, using only one of them at a time. When this piece of memory is used up, copy the surviving objects to another piece, and then clean up the used memory space at once.

If most of the objects in memory are alive, this algorithm will incur a lot of overhead of inter-memory replication, but for most objects that can be recycled, the algorithm needs to copy a small number of living objects. and each time the memory is reclaimed for the whole half of the area, so when allocating memory, there is no need to consider the complexity of space debris, as long as you move the top pointer and allocate it sequentially. This is easy to implement and efficient to run, but its drawbacks are also obvious. the cost of this replication recovery algorithm is to reduce the available memory to half of the original memory, which wastes too much space.

Tag-compress Mark-Compact

The tag-copy algorithm will carry out more replication operations when the object survival rate is high, and the efficiency will be reduced. More crucially, if you do not want to waste 50% of the space, you need additional space for allocation guarantees to deal with the extreme situation in which all objects in the used memory are 100% alive, so this algorithm generally could not be used directly in the old days.

The marking process in the tag-compression algorithm is still the same as the mark-clear algorithm, but the next step is not to clean up recyclable objects directly, but to move all surviving objects to one end of the memory space. and then clean up the memory beyond the boundary:

The essential difference between the marking-clearing algorithm and the marking-finishing algorithm is that the former is a non-mobile recycling algorithm, while the latter is mobile. Whether or not to move recycled objects is a risk decision with both advantages and disadvantages: if moving surviving objects, especially in the old days, where there were a large number of living areas for each collection, moving surviving objects and updating all places referencing them would be an extremely heavy operation, and such object movement would have to pause the user's application (STW problem) all the way.

Garbage disposal

Based on the above three garbage collection algorithms, seven kinds of garbage collectors are derived:

Serial collector

This collector is a single-threaded collector, but its "single-threaded" meaning does not only mean that it will use only one processor or one collection thread to complete garbage collection. More importantly, it emphasizes that when it carries out garbage collection, it must pause all other worker threads until its collection is complete.

So far, it is still the default new generation collector for HotSpot virtual machines running in client mode. It has advantages over other collectors, that is, it is simple and efficient (compared with the single thread of other collectors). For environments with limited memory resources, it has the lowest additional memory consumption (Memory Footprint) of all collectors. For an environment with a single core processor or a small number of processor cores, the Serial collector has no overhead of thread interaction, so concentrating on garbage collection can naturally achieve the highest single-thread collection efficiency. The Serial collector is a good choice for virtual machines running in client mode.

ParNew collector

The ParNew collector is essentially a multithreaded parallel version of the Serial collector, except for garbage collection using multiple threads at the same time

The rest of the behavior includes all the control parameters available to the Serial collector (for example:-XX:SurvivorRatio,-XX:

PretenureSizeThreshold,-XX:HandlePromotionFailure, etc.), collection algorithm, Stop The World, object allocation specification

Then, the recycling strategy is completely consistent with the Serial collector, and a considerable amount of code is shared between the two collectors in implementation.

Apart from supporting multithreaded parallel collection, the ParNew collector does not have much innovation compared with the Serial collector, but it

However, it is the preferred new generation collection for many HotSpot virtual machines running in server mode, especially in legacy systems before JDK 7.

One of the important reasons that has nothing to do with function or performance is:

Apart from the Serial collector, it is currently the only one that can work with CMS

The collector cooperates, on the other hand, the emergence of CMS consolidates the position of ParNew.

The ParNew collector will never work better than the Serial collector in a single-core processor environment, even due to the existence of threads

The overhead of interaction, the collector is not 100% guaranteed to exceed the Serial collector in a pseudo-dual-core processor environment implemented by hyperthreading (Hyper-Threading) technology. Of course, with the increase in the number of processor cores that can be used, ParNew for garbage collection

The efficient use of system resources is still very beneficial.

Parallel Scavenge collector

Parallel Scavenge collector is also a new generation collector, it is also a collector based on tag-copy algorithm, and it is also a multithreaded collector that can collect in parallel. Many of the features of Parallel Scavenge are very similar to ParNew on the surface, so what's so special about it?

The characteristic of Parallel Scavenge collector is that its focus is different from other collectors. Collectors such as CMS focus on shortening the pause time of user threads during garbage collection as much as possible, while the goal of Parallel Scavenge collector is to achieve a controllable throughput (Throughput). The so-called throughput is the ratio of the time spent by the processor running user code to the total time consumed by the processor, namely:

If the virtual machine completes a task, the user code plus garbage collection takes a total of 100 minutes, and garbage collection takes 1 minute, then the throughput is 99%. The shorter the pause time, the more suitable for programs that need to interact with users or ensure the quality of service response, and good response speed can improve the user experience, while high throughput can make the most efficient use of processor resources. complete the operation task of the program as soon as possible, which is mainly suitable for analysis tasks that operate in the background without too much interaction.

Because of its close relationship with throughput, Parallel Scavenge collectors are often referred to as "throughput first collectors".

Serial Old collector

Serial Old, an older version of the Serial collector, is also a single-threaded collector that uses the tag-collation algorithm. The main significance of this collector is also for HotSpot virtual machines in client mode. If you are in server mode, it may also have two uses: one is to be used in conjunction with the Parallel Scavenge collector in JDK 5 and previous versions, and the other is to serve as a backup scenario in the event of a failure of the CMS collector, when Concurrent Mode Failure occurs in concurrent collections.

Parallel Old collector

Parallel Old is an old version of the Parallel Scavenge collector, which supports multithreaded concurrent collection and is implemented based on the tag-collation algorithm. This collector was not available until JDK 6. Until then, the new generation of Parallel Scavenge collectors had been in a rather awkward state, because if the new generation chose the Parallel Scavenge collector, the old generation had no choice but the Serial Old (PSMarkSweep) collector, and other well-performing old collectors, such as CMS, could not work with it.

Due to the "drag" of Serial Old collectors on server-side application performance in the old days, using Parallel Scavenge collectors may not achieve the effect of maximizing throughput as a whole.

Similarly, because the parallel processing power of server multiprocessors can not be fully utilized in the old collection of single thread, the total throughput of this combination may not even be better than the combination of ParNew and CMS in the old running environment with large memory space and advanced hardware specifications. Until the advent of the Parallel Old collector, the "throughput first" collector finally has a more veritable combination, and the combination of Parallel Scavenge plus Parallel Old collector can be given priority in situations where throughput or processor resources are scarce.

CMS collector

CMS (Concurrent Mark Sweep) collector is a kind of collector whose goal is to obtain the shortest recovery pause time. At present, a large number of Java applications are concentrated on the Internet website or the server of the browser-based BPX S system. These applications usually pay more attention to the response speed of the service, hoping that the system pause time as short as possible, in order to bring a good interactive experience to users. The CMS collector meets the needs of such applications very well.

From the name (including "Mark Sweep"), you can see that the CMS collector is implemented based on the tag-cleanup algorithm, and its operation process is more complex than the previous collectors. The whole process is divided into four steps, including:

1) initial tag (CMS initial mark)

2) concurrent marking (CMS concurrent mark)

3) relabeling (CMS remark)

4) concurrent cleanup (CMS concurrent sweep)

The two steps of initial tagging and relabeling still require "Stop The World". The initial tag only marks the objects to which GCRoots can be directly associated, which is very fast; the concurrent marking phase is the process of traversing the entire object graph from the directly associated objects of GCRoots, which takes a long time but does not need to pause the user thread and can be run concurrently with the garbage collection thread. The purpose of the relabeling phase is to correct the marking records of the objects whose markup changes are caused by the continued operation of the user program during the concurrent tagging period. the pause time in this stage is usually slightly longer than that in the initial tagging phase. but it's also much shorter than the concurrent marking phase. Finally, there is the concurrent cleanup phase, which cleans up and deletes the dead objects judged by the marking phase. Since there is no need to move the living objects, this stage can also be concurrent with the user thread. Because the garbage collector thread can work with the user thread during the longest concurrent marking and concurrent cleanup phase of the process, overall, the memory collection process of the CMS collector is executed concurrently with the user thread.

Advantages: concurrent collection, low pause

Disadvantages: 1. Very sensitive to processor resources

two。 Unable to handle floating garbage (Floating Garbage)

3. Space debris

Garbage First collector

Garbage First (G1 for short) collector is a milestone in the development history of garbage collector technology. It creates the design idea of collector for local collection and memory layout based on Region. G1 is a garbage collector mainly for server-side applications.

Before the advent of the G1 collector, all other collectors, including CMS, targeted either the entire Minor GC, the entire Major GC, or the entire Java Full GC. G1 jumped out of this cage, it can face any part of heap memory to form Collection Set (generally referred to as CSet) for recycling, the measure is no longer which generation it belongs to, but which memory stores the largest amount of garbage and the largest recovery benefits, this is the Mixed GC mode of the G1 collector. The Region-based heap memory layout created by G1 is the key to achieving this goal. Although G1 is still designed according to the theory of generational collection, the layout of its heap memory is very different from that of other collectors: G1 no longer insists on the division of fixed size and a fixed number of generational regions, but divides the continuous Java heap into multiple independent regions of equal size (Region). Each Region can act as the new generation Eden space, Survivor space, or old space as needed. The collector can use different strategies to deal with Region that play different roles, so that both newly created objects and old objects that have survived for a period of time can achieve good collection results.

There is also a special class of Humongous areas in Region that are dedicated to storing large objects. G1 believes that as long as the size of an object is more than half of the capacity of a Region, it can be determined to be a large object. The size of each Region can be set by the parameter-XX:G1HeapRegionSize, the value range is 1MB~32MB, and should be 2 to the N power. For super-large objects that exceed the entire Region capacity, they will be stored in N consecutive Humongous Region, and most of the behavior of G1 treats Humongous Region as part of the old days, as shown in figure 3-12.

Although G1 still retains the concept of the new generation and the old age, the new generation and the old age are no longer fixed, they are a dynamic collection of a series of regions (which do not need to be continuous). The G1 collector can build a predictable pause time model because it takes Region as the minimum unit for a single collection, that is, each time the memory space collected is an integral multiple of the Region size, so that full-area garbage collection in the entire Java heap can be systematically avoided. A more specific idea is to let the G1 collector track the "value" of garbage accumulation in each Region, that is, the amount of space obtained by recycling and the empirical value of the time required for recycling, and then maintain a priority list in the background. Each time according to the allowed collection pause time set by the user (specified by the parameter-XX:MaxGCPauseMillis, the default is 200ms), priority is given to those Region with the largest recovery value. This is the origin of the name "Garbage First". This method of using Region to divide memory space and priority area recovery ensures that the G1 collector can achieve the highest collection efficiency in a limited time.

Summary of garbage disposal

At present, it is the combination of garbage collectors in the new generation and the old era:

At this point, I believe that everyone has a deeper understanding of the "JVM-optimized garbage location, garbage collection algorithm, garbage processor differences", might as well to the actual operation of it! Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report