Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Detailed introduction to the Java virtual machine (part ⑤)-garbage collection

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

Garbage collection is mainly for heaps and method areas. The three areas of program counter, virtual machine stack and local method stack are private to the thread, only exist in the life cycle of the thread, and disappear after the thread ends, so there is no need to garbage collect these three areas.

First, judge whether an object can be recycled

1. Citation counting algorithm

Add a reference counter to the object, adding 1 when the object adds a reference, and minus 1 when the reference expires. Objects with a reference count of 0 can be recycled.

In the case of circular references to two objects, the reference counter is never 0, making it impossible to recycle them. It is because of the existence of circular references that the Java virtual machine does not use the reference counting algorithm.

Public class Test {public Object instance = null; public static void main (String [] args) {Test a = new Test (); Test b = new Test (); a.instance = b; b.instance = a; a = null; b = null; doSomething ();}}

In the above code, the object instances referenced by an and b hold references to each other, so when we remove the references to an object and b object, because the two objects still have references to each other, the two Test objects cannot be recycled.

Advantages: high execution efficiency, less impact on program execution. Cons: unable to detect circular references, causing memory leaks.

two。 Reachability analysis algorithm

Whether the object can be recycled is determined by determining whether the reference chain of the object is reachable.

Search with GC Roots as the starting point, the reachable objects are alive, and the unreachable objects can be recycled.

The Java virtual machine uses this algorithm to determine whether an object can be recycled. GC Roots generally contains the following:

Objects referenced in the local variable table in the virtual machine stack (the local method variable table in the stack frame) the reference object of the active thread of the object referenced by the constant reference in the object method area referenced by the class static property in the object method area referenced in the JNI (Native method) in the local method stack

3. Recovery of method area

Because the permanent generation object is mainly stored in the method area, and the recovery rate of the permanent generation object is much lower than that of the new generation, the performance-to-price ratio of recovery in the method area is not high.

It is mainly the recovery of the constant pool and the unloading of the class.

In order to avoid memory overflow, virtual machines are required to have class unloading capabilities in scenarios where reflection and dynamic proxies are heavily used.

There are many unloading conditions for a class, and the following three conditions need to be met, and if the conditions are met, they may not be unloaded:

All instances of the class have been recycled, and no instances of the class exist in the heap. The ClassLoader that loaded the class has been recycled. The corresponding Class object of this class is not referenced anywhere, so the method of this class cannot be accessed anywhere through reflection.

4. Finalize ()

A destructor similar to C++ that is used to close external resources. But try-finally and other methods can do better, and this method is very expensive, uncertain, can not guarantee the order of each object call, so it is best not to use it.

When an object can be recycled, if you need to execute the object's finalize () method, it is possible to have the object re-referenced in that method to save yourself. Self-rescue can only be done once. If the reclaimed object has previously called the finalize () method to save itself, the method will not be called again when it is reclaimed later.

Does the finalize () method of Object do the same thing as the destructor of C++?

Unlike C++ 's destructor, the destructor call is determined, while the finalize () method is uncertain; when the garbage collector declares an object dead, it goes through at least two marking procedures. If the object is not directly connected to the GC Root after the reachability analysis, it will be marked for the first time and determine whether the finalize () method is executed; if the object overrides the finalize () method and is not referenced, it will be placed in the F-Queue queue, and a low-priority finalize () thread created by the virtual machine will later execute the trigger finalize () method. Due to the low priority of the thread, the execution process may be terminated at any time; give the object the last chance to be reborn 2. Reference type

Whether the number of references of an object is judged by the reference counting algorithm, or whether the object is reachable or not through the reachability analysis algorithm, whether the object can be recycled or not is related to the reference.

Java provides four reference types with different strengths.

1. Strong citation

Objects associated with strong references are not recycled.

Use new a new object to create a strong reference.

Object obj = new Object ()

Throwing an OOM Error Terminator also does not recycle an object with a strong reference, which can only be recycled by weakening the reference by setting the object to null.

two。 Soft reference

Indicates that the object is in a useful but not necessary state.

Objects associated with soft references are recycled only if there is not enough memory. Can be used to implement memory-sensitive caching.

Use the SoftReference class to create a soft reference.

Object obj = new Object (); SoftReference sf = new SoftReference (obj); obj = null; / / make objects associated by soft references only

3. Weak reference

Represents non-essential objects, which are weaker than soft references. Applies to objects that are occasionally used and do not affect garbage collection.

An object associated with a weak reference must be reclaimed, that is, it can only survive until the next garbage collection occurs.

Use the WeakReference class to create weak references.

Object obj = new Object (); WeakReference wf = new WeakReference (obj); obj = null

4. Virtual reference

Also known as ghost reference or phantom reference, whether an object has a virtual reference does not affect its survival time, and it is impossible to get an object through virtual reference.

It does not determine the life cycle of the object and can be reclaimed by the garbage collector at any time. Must be used in conjunction with the reference queue ReferenceQueue.

The only purpose of setting a virtual reference for an object is to receive a system notification that acts as a sentinel when the object is reclaimed. Specifically, it determines whether the referenced object is reclaimed by GC by judging whether the reference queue ReferenceQueue adds a virtual reference.

Use PhantomReference to create virtual references.

Object obj = new Object (); ReferenceQueue queue = new ReferenceQueue (); PhantomReference pf = new PhantomReference (obj, queue); obj = null

Reference queue (ReferenceQueue): when the GC (garbage collection thread) is ready to recycle an object, if it is found that it still has only a soft reference (or weak reference, or virtual reference) pointing to it, it will add the soft reference (or weak reference, or virtual reference) to the reference queue (ReferenceQueue) associated with it before recycling the object. If a soft reference (or weak reference, or virtual reference) object itself is in the reference queue, the object that the reference object points to has been recycled. There is no actual storage structure, and the storage logic depends on the relationship between the internal nodes.

III. Garbage collection algorithm

1. Mark-clear

During the marking phase, scanning from the root collection checks whether each object is active, and if so, the program marks the object header.

In the cleanup phase, the object is collected and the flag bits are cancelled. in addition, it is judged whether the recovered chunk is continuous with the previous idle chunk, and if so, the two chunks are merged. To reclaim an object is to divide the object as a block and connect it to an one-way linked list called "free linked list". After that, you only need to traverse the free linked list to find the partition.

When allocating, the program searches the free linked list for block block with space greater than or equal to the new object size size. If the block it finds is equal to size, it will return this chunk directly; if the chunk found is greater than size, it will split the block into two parts of size and (block-size), return the chunk of size size, and return the block of block-size to the free linked list.

Deficiency:

The marking and cleanup processes are inefficient; they produce a large number of discontiguous memory fragments, making it impossible to allocate memory to large objects.

two。 Marking-finishing

Let all the surviving objects move to one end, and then clean up the memory outside the end boundary directly.

Advantages:

There is no memory fragmentation.

Deficiency:

A large number of objects need to be moved, so the processing efficiency is relatively low.

3. Copy

Divide the memory into two equal blocks, use only one of them at a time, copy the surviving objects to the other when the memory is used up, and then clean up the used memory space again.

The main deficiency is that only half of the memory is used.

Today's commercial virtual machines use this collection algorithm to recycle the new generation, but it is not divided into two equal pieces, but a larger Eden space and two smaller Survivor spaces, using Eden and one of the Survivor each time. When recycling, copy all the surviving objects in Eden and Survivor to another Survivor, and finally clean up the Eden and the used Survivor.

The Eden and Survivor size ratio of the HotSpot virtual machine defaults to 8:1, which ensures a memory utilization of 90%. If more than 10% of the objects survive each collection, then one piece of Survivor will not be enough. At this time, you need to rely on the old age for space allocation guarantee, that is, to borrow the old space to store objects that cannot be placed.

4. Generational collection

Stop-the-World

JVM stops the execution of the application because it is about to execute GC; it happens in any GC algorithm; most GC optimizations improve program performance by reducing the time it takes for Stop-the-world to occur.

Safepoint

The point where the object reference relationship will not change during the analysis.

Where Safepoint is generated: method call, loop jump, exception jump, etc.

Nowadays, the commercial virtual machine adopts the generation collection algorithm, which divides the memory into several blocks according to the object survival cycle, and different blocks adopt appropriate collection algorithm.

Generally speaking, the heap is divided into the new generation and the old age.

New generation use: replication algorithm used in old times: Mark-clear or mark-organize algorithm 4. Garbage collector

These are the seven garbage collectors in the HotSpot virtual machine, and the connection means that the garbage collector can be used together.

Single-threaded and multithreaded: single-threaded means that the garbage collector uses only one thread, while multithreading uses multiple threads; serial and parallel: serial refers to the alternate execution of the garbage collector and the user program, which means that the user program needs to be paused when performing the garbage collection; parallelism refers to the simultaneous execution of the garbage collector and the user program. With the exception of CMS and G1, all garbage collectors are executed in a serial manner.

1. Serial collector (- XX:+UseSerialGC)

Serial is translated as serial, that is, it is executed in a serial manner.

It is a single-threaded collector that uses only one thread for garbage collection.

Its advantage is simple and efficient, in a single CPU environment, because there is no thread interaction overhead, so it has the highest single-thread collection efficiency.

It is the default new generation collector in Client scenarios, where memory is generally not very large. The pause time for collecting one or two hundred trillion of garbage can be controlled within more than 100 milliseconds, which is acceptable as long as it is not too frequent.

2. ParNew collector (- XX:+UseParNewGC)

It is a multithreaded version of the Serial collector.

It is the default new generation collector in Server scenarios, in addition to performance reasons, mainly because it can only be used with the CMS collector except for the Serial collector.

3. Parallel Scavenge Collector (- XX:+UseParallelGC)

It is a multithreaded collector like ParNew.

The goal of other collectors is to minimize the pause time of the user thread during garbage collection, and its goal is to achieve a controllable throughput, so it is called the "throughput first" collector. The throughput here refers to the ratio of the time CPU spent running user programs to the total time.

The shorter the pause time, the more suitable for programs that need to interact with users, and good response speed can improve the user experience. On the other hand, high throughput can make efficient use of CPU time to complete the operation task of the program as soon as possible, which is suitable for tasks that operate in the background without too much interaction.

Shortening the pause time is achieved at the expense of throughput and Cenozoic space: Cenozoic space becomes smaller and garbage collection becomes frequent, resulting in a decline in throughput.

The GC adaptive adjustment strategy (GC Ergonomics) can be turned on through a switch parameter, so there is no need to manually specify detailed parameters such as the size of the new generation (- Xmn), the ratio of Eden to Survivor areas, and the age of the object promoted to the old age. The virtual machine collects performance monitoring information according to the operation of the current system, and dynamically adjusts these parameters to provide the most appropriate pause time or maximum throughput.

4. SerialOld collector (- XX:+UseSerialOldGC)

It is an old version of the Serial collector and is also used by virtual machines in Client scenarios. If used in Server scenarios, it has two main uses:

Used with the Parallel Scavenge collector in JDK 1.5 and earlier (before the birth of Parallel Old). Used as a backup scenario for the CMS collector when Concurrent Mode Failure occurs in concurrent collections.

5. ParallelOld Collector (- XX:+UseParallelOldGC)

It's an old version of the Parallel Scavenge collector.

In situations that focus on throughput and CPU resource sensitivity, Parallel Scavenge plus Parallel Old collectors can be given priority.

6. CMS Collector (- XX:+UseConcMarkSweepGC)

CMS (Concurrent Mark Sweep), Mark Sweep refers to the mark-clear algorithm.

It is divided into six processes:

Initial markup: just marking the objects to which GC Roots can be directly associated is fast and requires a pause. Concurrent tagging: the process of GC Roots Tracing, which takes the longest time in the entire recycling process and does not require a pause. Concurrent pre-cleaning: find objects that have been promoted from the younger generation to the old in the concurrent marking phase: in order to correct the marking records of the objects whose tags have changed due to the continued operation of the user program during the concurrent marking period, need to pause. Concurrent cleanup: clean up garbage objects without pause. Concurrent reset: reset the data structure of the CMS collector and wait for the next garbage collection.

The collector thread can work with the user thread without pause during the longest concurrency marking and concurrent cleanup process.

It has the following disadvantages:

Low throughput: low pause time is at the expense of throughput, resulting in low CPU utilization. Unable to handle floating garbage, Concurrent Mode Failure may occur. Floating garbage refers to the garbage generated during the concurrent cleanup phase as the user thread continues to run, and this part of the garbage can only be collected until the next GC. Due to the existence of floating garbage, it is necessary to set aside some memory, which means that CMS collection cannot be recycled as other collectors do when the old age is almost over. If there is not enough memory reserved to hold floating garbage, Concurrent Mode Failure will appear, and the virtual machine will temporarily enable Serial Old to replace CMS. The space debris caused by the mark-removal algorithm often has a surplus of old space, but can not find enough contiguous space to allocate the current object, so the Full GC has to be triggered in advance.

7. G1 Collector (- XX:+UseG1GC)

G1 (Garbage-First), which is a garbage collector for server applications, has good performance in multi-CPU and large memory scenarios. The mission given by the HotSpot development team is to replace the CMS collector in the future.

The heap is divided into the new generation and the old age, and the range of other collectors is the whole new generation or the old age, while G1 can directly recycle the new generation and the old age together.

G1 divides the heap into several independent regions of equal size (Region), and the Cenozoic era is no longer physically separated from the old age.

By introducing the concept of Region, the original whole memory space is divided into multiple small spaces, so that each small space can be garbage collected separately. This partition method brings great flexibility and makes a predictable pause time model possible. By recording each Region garbage collection time and the space gained by the collection (these two values are obtained from past recycling experience), and maintain a priority list, each time based on the allowed collection time, priority recycling of the most valuable Region.

Each Region has a Remembered Set that records the Region in which the reference object of the Region object resides. By using Remembered Set, full heap scans can be avoided when doing reachability analysis.

Without counting the operation of maintaining the Remembered Set, the operation of the G1 collector can be roughly divided into the following steps:

Initial tag concurrent tag final tag: in order to fix the part of the tag record that changes the tag due to the continued operation of the user program during the concurrent marking period, the virtual machine records the object changes during this period in the thread's Remembered Set Logs, and the final marking phase needs to merge the Remembered Set Logs data into the Remembered Set. Threads need to be paused at this stage, but can be executed in parallel. Filter recycling: first, sort the recovery value and cost in each Region, and make a recovery plan according to the GC pause time expected by the user. In fact, this stage can also be executed concurrently with the user program, but because only part of the Region is recycled, the time can be controlled by the user, and halting the user thread will greatly improve the collection efficiency.

It has the following characteristics:

Parallel and concurrent generational collection space integration: overall, it is a collector based on the "tag-collation" algorithm, and locally (between the two Region) is based on the "replication" algorithm, which means that no memory space fragmentation will occur during the run. Predictable pause: allows the user to specify that the amount of time spent on GC should not exceed N milliseconds within a time period of M milliseconds. Fifth, reduce the pause time

1. Use the CMS collector

There are four steps for garbage collection by the CMS collector:

Initial marking concurrent marking relabeling and clearing

Initial marking and relabeling require "stop the world", but GC threads can work with user threads during the longest time-consuming concurrent marking and concurrent cleanup process. Overall, CMS and user threads are parallel.

two。 Increment algorithm

Basic idea: if all the garbage is disposed of at once, which will cause a long pause in the system, then let the GC thread and the user thread execute alternately. Each time the GC thread collects only a small area of memory space, then switches to the user thread and repeats several times until the GC is complete.

Problem: there is thread switching and context switching, resulting in a decline in system throughput.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report