Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the GC algorithms and types in the Java virtual machine

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces what are the algorithms and types of GC in the Java virtual machine, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand.

I. the concept of GC:

GC:Garbage Collection garbage collection

In 1960, Lisp used GC.

In Java, the objects of GC are Java heap and method zone (i.e., * * zone).

Let's explain the above three sentences one by one:

(1) GC:Garbage Collection garbage collection. The so-called garbage here refers to some useless objects produced during the operation of the system, which occupy a certain amount of memory space and may lead to OOM if they are not released for a long time.

It is up to the programmer to apply for, manage, and free up memory space, so there is no concept of GC. In Java, there is a special thread for garbage collection to monitor, scan and automatically release some useless memory, which is a basic idea of garbage collection to prevent artificial memory leaks introduced by programmers.

(2) in fact, GC is older than Java. Lisp, which was born in MIT in 1960, is the language that really uses dynamic memory allocation and garbage collection technology. When Lisp was still in its infancy, people were thinking about three things that GC needed to accomplish:

What memory needs to be reclaimed?

When will it be recycled?

How to recycle?

(3) the program counter, the virtual machine stack and the local method stack in the memory area are born and destroyed with the thread; the stack frames in the stack methodically perform the operations of getting out and entering the stack with the entry and exit of the method, and the amount of memory allocated in each stack frame is basically known when the class structure is determined. There is no need to think too much about recycling in these areas, because when the method ends or the thread ends, the memory is naturally reclaimed.

While Java heap and method area are different, multiple implementation classes in an interface may need different memory, and multiple branches of a method may need different memory. Only when the program is running can we know which objects will be created. The allocation and recycling of this part of memory are dynamic. GC also focuses on this part of memory. In later articles, when it comes to "memory" allocation and recycling, it only refers to part of the memory.

Second, citation counting algorithm: (old garbage collection algorithm. Unable to handle circular references, not adopted by Java)

1. The concept of reference counting algorithm:

Add a reference counter to the object, which increases the counter value by 1 whenever there is a reference; when the reference expires, the counter value is subtracted by 1; an object with a counter of 0 at any time can no longer be used.

2. Examples of users:

The implementation of the citation counting algorithm is simple, and the judgment efficiency is high. In most cases, it is a good algorithm. It's used in a lot of places. For example:

Microsoft's COM Technology: Computer Object Model

FlashPlayer using ActionScript3

Python

However, the mainstream java virtual machine does not choose the reference counting algorithm to manage memory, the main reason is that it is difficult to solve the problem of circular references between objects.

3. The problem of citation counting algorithm:

Reference and de-reference adjoint addition and subtraction, affecting performance

Fatal flaw: objects that are referenced in a loop cannot be recycled

In the three figures above, for the rightmost one, none of the counters referenced by the loop are 0, but they are unreachable to the root object, but cannot be released.

An example of code referenced by a circular reference:

Public class Object {Object field = null; public static void main (String [] args) {Thread thread = new Thread (new Runnable () {public void run () {Object objectA = new Object (); Object objectB = new Object (); / / location 1 objectA.field = objectB; objectB.field = objectA / / location 2 / / to do something objectA = null; objectB = null;// location 3}}); thread.start (); while (true);}}

The above code seems to be a little deliberate, but in fact, it often occurs in the actual programming process, such as two one-to-one database objects, each keeping references to each other. * A * * loop is just to keep the JVM from quitting, it doesn't make any practical sense.

Code interpretation:

Three numbers 1, 2, and 3 are marked in the code, and when the statement in position 1 is executed, the reference count of both objects is 1. When the statement in position 2 is executed, the reference count of both objects becomes 2. When the statement at position 3 is executed, that is, after both are classified as null values, the reference count for both is still 1. According to the recycling rules of the reference counting algorithm, it will not be recycled when the reference count is not returned to zero.

For the GC we use today, when the thread thread finishes running, the objectA and objectB are all objects to be recycled. If our GC uses the reference counting algorithm mentioned above, these two objects will never be recycled, even if we show that the object is null after use.

Third, root search algorithm:

1. The concept of root search algorithm:

Because of the shortcomings of the reference counting algorithm, JVM generally uses a new algorithm called root search algorithm. Its treatment is to set up several kinds of root objects, and when any one of the root objects is unreachable to a certain object, it is considered that the object can be recycled.

As shown in the figure above, ObjectD and ObjectE are related to each other, but because GC roots is unreachable to these two objects, D and E will eventually be treated as GC objects. If the above figure uses the reference counting method, none of the five objects will be recycled.

2. Accessibility analysis:

We just mentioned that there are several root objects that can be recycled when any one of the root objects is unreachable to a certain object. When we introduce the tag-cleaning algorithm / tag finishing algorithm later, we will always emphasize that all reachable objects are marked once, starting from the root node. What is reachable? The explanation here is as follows:

Accessibility analysis:

The search starts from the object at the root (GC Roots) as the starting point, and the search path is called "reference chain". When an object is not connected to GC Roots by any reference chain (in the concept of graph theory, it is unreachable from GC Roots to this object), it is proved that the object is unavailable.

3. Root (GC Roots):

Speaking of GC roots (GC root), in the Java language, there are several kinds of objects that can be treated as GC roots:

1. Objects referenced in the stack (the local variable table in the stack frame).

2. Static members in the method area.

3. Objects referenced by constants in the method area (global variables)

4. Objects referenced by JNI (commonly known as Native methods) in the local method stack.

Note: both * * and the fourth refer to the local variable scale of the method, the meaning of the second is relatively clear, and the third mainly refers to the constant value declared as final.

On the basis of root search algorithm, in the implementation of modern virtual machine, there are mainly three kinds of garbage collection algorithms, which are mark-clear algorithm, replication algorithm and mark-finishing algorithm. All three algorithms extend the root search algorithm, but they are very easy to understand.

Fourth, mark-clear algorithm:

1. The concept of tag removal algorithm:

Tag-removal algorithm is the ideological basis of modern garbage collection algorithm. The marking-clearing algorithm divides garbage collection into two phases: the marking phase and the clearing phase. One possible implementation is that in the marking phase, all reachable objects starting from the root node are marked first through the root node. Therefore, objects that are not marked are junk objects that are not referenced; then, during the cleanup phase, all objects that are not marked are cleared.

2. Detailed description of the tag-removal algorithm:

What it does is that when the effective memory space (available memory) in the heap is exhausted, the entire program (also known as stop the world) is stopped, and then two tasks are done, the * item is marked, and the second item is cleared.

Tagging: the process of tagging is to traverse all the GC Roots and then mark all GC Roots reachable objects as living objects.

Cleanup: the cleanup process traverses all objects in the heap and removes all unmarked objects.

That is, when the available memory is exhausted while the program is running, the GC thread will be triggered and the program will be paused, then the objects that are still alive will be marked again, and finally all the unmarked objects in the heap will be cleared, and then the program will resume running.

Take a look at the following picture:

The above figure represents the status of all objects while the program is running, and their flag bits are all 0 (that is, unmarked, the following default 0 is unmarked, 1 is marked). Assuming that the effective memory space is exhausted, JVM will stop the application from running and start the GC thread, and then start tagging. According to the root search algorithm, the status of the object is as follows:

As you can see in the above figure, according to the root search algorithm, all objects reachable from the root object are marked as living objects, and the * phase marking has been completed. Next, the second phase of cleanup is about to be performed, and after the cleanup, the remaining objects and their status are shown in the following figure:

As you can see in the image above, objects that are not marked will be recycled and cleared, while objects that are marked will be left behind, and the tag bits will be returned to 0. Needless to say, wake up the stopped program thread and let the program continue to run.

Question: why do you have to stop the program?

A:

In fact, this is not difficult to understand, assuming that our program is running with GC threads, imagine such a scenario.

Suppose we have just marked the rightmost object in the diagram and marked it as A for a while. As a result, a new object B is new in the program, and the An object can reach the B object. However, because the An object has been marked at the end, the tag bit of the B object is still 0 at this time, because it missed the marking phase. So when it comes to the cleanup phase, the new object B will be painstakingly cleared. In this way, it is not hard to imagine that GC threads will cause the program to fail to work properly.

Of course, the above results are unacceptable. We just new an object, but after a GC, it suddenly becomes null. How can we play with this?

3. Shortcomings of the tag-removal algorithm:

(1) first of all, its disadvantage is that it is relatively inefficient (recursive and full-heap object traversal), resulting in a long time for stop the world, especially for interactive applications. Just imagine, if you play a website, the website will hang for five minutes an hour. Will you still play it?

(2) the second major disadvantage is that the free memory cleared in this way is discontinuous, which is not difficult to understand. Our dead objects appear in every corner of memory at once, and now after they are cleared, the layout of memory will naturally be in a mess. To cope with this, JVM has to maintain a free list of memory, which is another kind of overhead. And when allocating array objects, finding contiguous memory space can be difficult to find.

5. Replication algorithm: (new generation of GC)

The concept of replication algorithm:

The original memory space is divided into two blocks, and only one of them is used at a time. During garbage collection, the living objects in the memory being used are copied to the unused memory blocks, and then all objects in the memory blocks in use are cleared. Swap the roles of two memory to complete garbage collection.

Compared with the mark-removal algorithm, the replication algorithm is a relatively efficient recovery method.

Not suitable for situations with a large number of living objects, such as the old age (replication algorithm is suitable for the new generation of GC)

The problem with the replication algorithm is the waste of space.

The replication algorithm only reclaims the memory of the whole half area every time, and does not have to consider the complex situations such as memory fragments when allocating memory. As long as the pointer at the top of the stack is moved and the memory is allocated sequentially, the implementation is simple and efficient. It's just that the cost of this algorithm is to reduce the memory to half of the original, which is too deadly.

Therefore, it is not difficult to see from the above description that if the replication algorithm is to be used, at least the survival rate of the object must be very low, and most importantly, we must overcome the waste of 50% memory.

Today's commercial virtual machines use this collection algorithm to recover the new generation, and 98% of the objects in the new generation are "life and death", so there is no need to divide the memory space according to the proportion of 1:1. Instead, the memory is divided into a larger Eden space and two smaller Survivor spaces, using Eden and a piece of Survivor each time. When recycling, copy the surviving objects in Eden and Survivor to another Survivor space at one time, and * clean up the Eden and the Survivor space you just used. The default size ratio of Eden to Survivor for HotSpot virtual machines is 8:1, which means that the memory space available in each Cenozoic generation is 90% (80% to 10%) of the entire Cenozoic capacity, and only 10% of the space is wasted.

Of course, 98% of the objects can be recycled only in general scenarios, and we have no way to guarantee that no more than 10% of the objects survive each time. When there is not enough Survivor space, we need to rely on the old age for allocation guarantee, so large objects go directly into the old era. The whole process is shown in the following figure:

In the picture above, the position of the green arrow represents a large object, which goes directly into the old age.

According to the above replication algorithm, now let's take a look at the following number of gc logs, which should be able to understand:

In the GC log above, the free space for the new generation is 13824K (1536K for 12288K+from space in the eden zone). According to the address of memory, the total space of Cenozoic is 15m, and the space of 15m is = 13824K + 1536K of to space.

6. Mark-collation algorithm: (GC in the old days)

Introduce:

If more replication operations are carried out when the object survival rate is high, the efficiency will become lower. More crucially, if you do not want to waste 50% of the space, you need additional space for allocation guarantees to deal with the extreme situation where all objects in the used memory are 100% alive, so it was generally impossible to select this algorithm directly in the old days.

Concept:

The tag-compression algorithm is suitable for situations where there are many living objects, such as the old age. It makes some optimizations on the basis of the mark-clear algorithm. Like the tag-clear algorithm, the tag-compression algorithm first needs to mark all reachable objects from the root node, but then it does not simply clean up the untagged objects. instead, it compresses all living objects to one end of memory, and then cleans up all the space outside the boundary.

Tag: its * phases are exactly the same as the tag / cleanup algorithm, traversing the GC Roots, and then marking the surviving objects.

Tidy up: move all surviving objects and arrange them in order of memory address, and then recycle all memory after the end memory address. Therefore, the second stage is called the finishing stage.

As you can see in the image above, the marked living objects will be sorted out and arranged by memory address, while the unmarked memory will be cleaned up. In this way, when we need to allocate memory to a new object, the JVM only needs to hold a starting address of memory, which is obviously a lot less overhead than maintaining a free list.

The marking / sorting algorithm can not only make up for the scattered memory area in the marking / clearing algorithm, but also eliminate the high cost of halving the memory in the replication algorithm.

However, the only disadvantage of the marking / finishing algorithm is that it is not efficient.

You should not only mark all living objects, but also organize the reference addresses of all living objects. In terms of efficiency, the marking / finishing algorithm is lower than the replication algorithm.

Summary of tag-clearing algorithm, replication algorithm, tag finishing algorithm:

All three algorithms are based on the root search algorithm to determine whether an object should be reclaimed, and the theoretical basis that the root search algorithm can work properly is the relevant content of the variable scope in the grammar. Therefore, in order to prevent memory leaks, the most fundamental way is to master the scope of variables, rather than using CAccord memory + memory management.

When GC threads are turned on, or when the GC process starts, they all pause the application (stop the world).

The difference between them is as follows: (> means that the former is better than the latter, = that the effect of the two is the same)

(1) efficiency: replication algorithm > tagging / sorting algorithm > tagging / clearing algorithm (the efficiency here is only a simple comparison of time complexity, which is not necessarily the case).

(2) memory uniformity: copy algorithm = marking / sorting algorithm > marking / clearing algorithm.

(3) memory utilization: marking / sorting algorithm = marking / clearing algorithm > replication algorithm.

Note 1: we can see that the marking / clearing algorithm is relatively backward, but the latter two algorithms are established on this basis.

Note 2: you can't have both time and space.

7. Generation-by-generation collection algorithm: (new generation GC+, old age GC)

At present, the GC of commercial virtual machines adopts the "generational collection algorithm", which is not a new idea, but only divides the memory into several blocks according to the different survival periods of objects. Generally speaking, the Java heap is divided into the new generation and the old age: the short-lived object is classified as the new generation, and the long-lived object is classified as the old age.

A small number of objects survive and are suitable for replication algorithms: in the new generation, a large number of objects are found to die and only a small number of objects survive in each GC, so choose the replication algorithm and only pay a small amount of replication cost to complete the GC.

A large number of objects survive and are suitable for tag-clean / mark-clean: in the old days, because the object had a high survival rate and no extra space to allocate guarantee, it was necessary to use the "tag-clean" / "mark-clean" algorithm for GC.

Note: a small part of the objects in the old era are because they are guaranteed by the old age when the new generation is recycled, and most of the objects enter the old age because GC has not been recycled many times.

8. Accessibility:

All algorithms need to be able to identify a junk object, so they need to give a definition of accessibility.

Palpable:

This object can be reached from the root node.

It is actually scanning from the root node, and as long as the object is in the reference chain, it is reachable.

Resurrectable:

Once all references are released, they are in a resurrectable state

Because the object may be resurrected in finalize ()

Untouchable:

After finalize (), it may enter an unreachable state

An untouchable object cannot be resurrected.

To be recycled.

The code example of resurrecting an object with the finalize method:

Package test03; / * Created by smyhvae on 2015-8-19. * / public class CanReliveObj {public static CanReliveObj obj; / / when GC is executed, the finalize method is executed, and @ Override protected void finalize () throws Throwable {super.finalize (); System.out.println ("CanReliveObj finalize called"); obj = this is executed only once / when GC is executed, the finalize method is executed, and then the purpose of this line of code is to revive the object of null and then become reachable} @ Override public String toString () {return "I am CanReliveObj";} public static void main (String [] args) throws InterruptedException {obj = new CanReliveObj (); obj = null / / resurrectable System.out.println ("* times gc"); System.gc (); Thread.sleep (1000); if (obj = = null) {System.out.println ("obj is null");} else {System.out.println ("obj available");} obj = null / / inactivating System.out.println ("second gc"); System.gc (); Thread.sleep (1000); if (obj = = null) {System.out.println ("obj is null");} else {System.out.println ("obj available");}

We need to pay attention to the comments on line 14. At first, we set obj to null on line 25, and then execute GC once. We thought that obj would be recycled, but it didn't, because the finalize method of line 11 was called during GC, and then obj was resurrected on line 14. Then, on line 34, set obj to null, and then execute GC once, at which point the obj is recycled because the finalize method is only executed once.

Summary of the use of finalize method:

Lesson: avoid using finalize (), because carelessness can lead to errors.

Low priority, when to be called, uncertain

When GC uncertainty occurs, we naturally don't know when the finalize method will be executed.

If you want to use finalize to release resources, we can use try-catch-finally instead

9. Stop-The-World:

1. Stop-The-World concept:

A global pause in Java.

Global pause, all Java code stops, native code can be executed, but cannot interact with JVM

In most cases, it is caused by GC.

In a few cases, it is caused by other situations, such as Dump threads, deadlock checking, heap Dump.

2. Why is there a global pause during GC?

Analogy: at the party, suddenly GC came to clean the room, the party was very messy, and new rubbish was generated, the room can never be cleaned, only when everyone stops moving can the room be cleaned up.

Moreover, if there is no global pause, it will put a great burden on GC threads, the difficulty of the GC algorithm will also increase, and it is difficult for GC to judge what is garbage.

3. The harm of Stop-The-World:

The service stopped for a long time and there was no response

When you encounter a HA system, it may cause the master / slave switch, which will seriously harm the production environment.

Remarks: HA:High Available, high availability cluster.

For example, the host and slave are now working. If the host is causing a long pause in GC, the slave will detect that the host is not working, so the slave starts to work; but the host does not work only temporarily, and when the GC ends, the host starts to work again, so the host and slave will work at the same time. It is actually very dangerous for the host and standby to work at the same time, which may lead to inconsistent applications, failure to provide normal services, and so on, thus affecting the production environment.

Code example:

(1) the code for printing the log: (one for every 100ms)

Public static class PrintThread extends Thread {

Public static final long starttime=System.currentTimeMillis ()

@ Override

Public void run () {

Try {

While (true) {

Long t=System.currentTimeMillis ()-starttime

System.out.println ("time:" + t)

Thread.sleep (100)

}

} catch (Exception e) {

}

}

}

In the code above, is the code responsible for printing the log, printing one every 100ms, and calculating the printing time.

(2) the code of the worker thread: (the worker thread is specially used to consume memory)

Public static class MyThread extends Thread {HashMap map=new HashMap (); @ Override public void run () {try {while (true) {if (map.size () * 512 map.size 1024 > = 450) {/ / if the memory consumption consumed by map is greater than 450, clear the memory System.out.println ("= prepare for cleanup =:" + memory ()) Map.clear ();} for (int itemositani

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report