What is JVM climbing over the wall of memory management? 07/04 Update SLTechnology News&Howtos

What is JVM climbing over the wall of memory management?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the relevant knowledge of "what is JVM crossing the wall of memory management". The editor shows you the operation process through an actual case. The method of operation is simple and fast, and it is practical. I hope this article "what is JVM crossing the wall of memory management" can help you solve the problem.

JVM Runtime data region

In the process of executing the Java program, the Java virtual machine divides the memory it manages into several different data regions, some areas always exist with the start of the virtual machine process, and some areas are established and destroyed depending on the start and end of the user thread.

Thread private memory:

Because JVM multithreading is implemented by switching threads in turn and allocating processor execution time, at any given moment, a processor (a core for a multi-core processor) will only execute instructions in one thread.

Therefore, in order to restore to the correct execution position after thread switching, each thread needs to have an independent program counter, which does not affect each other and stores independently. We call this kind of memory area "thread private" memory.

Program counter

A program counter is a small piece of memory space that can be seen as a line number indicator of the bytecode executed by the current thread. It is the indicator of program control flow, branch, loop, jump, exception handling, thread recovery and other basic functions need to rely on this counter to complete.

The bytecode interpreter works by changing the value of this counter to select the next bytecode instruction to be executed.

If the thread is executing a Java method, this counter records the address of the executing virtual machine bytecode instruction.

If you are executing a local (Native) method, the counter value should be empty.

Java virtual machine stack

The Java virtual machine stack describes the thread memory model for Java method execution, which is also a thread private memory area with the same life cycle as threads.

Stack frame

When each method is executed, the Java virtual machine synchronously creates a stack frame to store local variables, Operand stack, dynamic connection, method exit, and so on. The process that each method is called until the completion of execution corresponds to the process of a stack frame in the virtual machine stack from entering the stack to going out of the stack.

1. Local variable scale

The local variable table stores what is known at compile time: basic data types, object references, and returnAddress types (pointing to the address of a bytecode instruction)

The storage space in the local variable table is represented by a local variable slot. The memory space required by the local variable table is allocated during compilation. When entering a method, how much local variable space needs to be allocated in the stack frame is completely determined. The size of the local variable table will not be changed during the operation of the method (the "size" here refers to the number of variable slots, and the size of a variable slot is implemented by a specific virtual machine).

two。 Abnormal situation

1.StackOverflowError exception: the stack depth of the thread request is greater than the depth allowed by the virtual machine

2.OutOfMemoryError exception: the capacity of the Java virtual machine stack can be expanded dynamically, and enough memory cannot be applied for when the stack is expanded. There is no OutOfMemoryError exception on the HotSpot virtual machine because the virtual machine stack cannot be extended. As long as the thread applies for stack space successfully, there will be no OOM, but if the application fails, there will still be an OOM exception.

Local method stack

The role played by the virtual machine stack is very similar. The local method stack serves the local (Native) methods used by the virtual machine.

The HotSpot virtual machine directly combines the local method stack with the virtual machine stack.

Java reactor

The Java heap is the largest block of memory managed by a virtual machine and an area of memory shared by all threads. The Java heap is an area of memory managed by the garbage collector. So it is often called GC heap.

The Java heap is created when the virtual machine starts. The sole purpose of this memory area is to hold object instances, where "almost" all object instances in the Java world allocate memory.

From the perspective of memory recovery, since most of the modern garbage collectors are designed based on the theory of generational collection, terms such as "new generation", "old age", "permanent generation", "Eden space", "From Survivor space" and "To Survivor space" often appear in the Java heap.

In the past (demarcated by the emergence of the G1 collector), as the absolute mainstream of the HotSpot virtual machine in the industry, its internal garbage collectors were all designed based on "classic generation", requiring the collocation of new and old collectors to work. In this context, the above statement will not produce much ambiguity. But today, the garbage collector technology is not the same as it was ten years ago, and there are new garbage collectors in HotSpot that do not use generational design. According to the above mentioned, there are many areas that need to be discussed.

Allocate buffer TLAB (Thread Local Allocation Buffer)

From the point of view of allocating memory, the Java heap shared by all threads can be divided into private allocation buffers (Thread Local Allocation Buffer,TLAB) of multiple threads.

No matter how the partition does not change the commonness of the content stored in the Java heap, no matter which area it is, it can only store instances of objects, and the purpose of subdividing the Java heap is to better reclaim memory or allocate memory faster.

Size setting of Java heap

Java heaps can be implemented as either fixed size or extensible, but current mainstream Java virtual machines are implemented in terms of extensibility (through the parameters-Xmx and-Xms). If there is no memory in the Java heap to complete instance allocation, and the heap can no longer be expanded, the Java virtual machine will throw an OutOfMemoryError exception.

Method area

The difference between methods is called "non-heap". It is used to store data such as type information, constants, static variables, code cache compiled by real-time compiler, which have been loaded by the virtual machine. Is an area of memory shared by each thread.

Many people prefer to call the method zone "PermanentGeneration" or confuse the two. In essence, the two are not equivalent. Because it was only the HotSpot virtual machine design team at that time who chose to extend the generational design of the collector to the method area, or to use permanent generations to implement the method area, this enabled HotSpot's garbage collector to manage this part of memory like a Java heap, eliminating the need to write memory management code specifically for the method area.

It is true that garbage collection behavior is relatively rare in this area compared to the Java heap, but it is not permanent once the data enters the method area. The goal of memory recovery in this area is mainly for constant pool recovery and type unloading. Generally speaking, the recovery effect of this area is difficult to be satisfactory, especially type unloading, and the conditions are very harsh. But the recovery of this part of the area is sometimes necessary.

Running constant pool

The run-time pool is part of the method area. In addition to the version, field, method, interface and other description information of the class, another piece of information in the Class file is the constant pool table, which is used to store all kinds of literals and symbol references generated during compilation, which are stored in the runtime pool of the method area after the class is loaded.

Running constant pool relative to the Class file constant pool another important feature is dynamic, Java language does not require that constants must be generated only at the compilation time, that is, not preset into the Class file constant pool content can enter the method area run constant pool, during the run can also put new constants into the pool, this feature is often used by developers is the String class intern () method.

Deep analysis of String#intern

In Java, String objects declared directly in double quotes are stored directly in the constant pool. Instead of a String object declared in double quotes, you can use the intern method provided by String.

The intern method: if the string constant pool already contains a string equal to this String object, a reference to the String object representing the string in the pool is returned; otherwise, the string contained by the String object is added to the constant pool and a reference to the String object is returned.

Summary

Sort out the JVM runtime data areas described above:

JVM garbage collection mechanism

The program counter, the virtual machine stack, and the local method stack are all thread private areas, which are born and destroyed with the thread. In these areas, you don't need to think too much about how to recycle, and when the method ends or the thread ends, the memory naturally follows the recycling.

For example, the stack frames in the stack methodically perform unstack and stack operations with the entry and exit of the method. The amount of memory allocated in each stack frame is basically known when the class structure is determined (although some optimizations are made by the just-in-time compiler at run time, but in the discussion based on the conceptual model, generally it can be considered to be known at compile time), so the memory allocation and recycling of these areas are deterministic.

But there is significant uncertainty in both the Java heap and the method zone:

1. Multiple implementation classes of an interface may require different memory, and different conditional branches executed by a method may require different memory. Only while running can we know which objects the program will create and how many objects will be created. The allocation and recycling of this part of memory is dynamic. The garbage collector is concerned with how this part of memory should be managed.

two。 There are two main parts of garbage collection in the method area: obsolete constants and types that are no longer used. Recycling obsolete constants is very similar to recycling objects in the Java heap.

For example, there are no string objects that refer to a constant in the constant pool, and there is no other reference to this literal quantity in the virtual machine. If memory collection occurs at this point, and the garbage collector determines that it is necessary, the constant will be cleaned out of the constant pool by the system. Symbolic references to other classes (interfaces), methods, and fields in the constant pool are similar.

The "performance-to-price ratio" of method zone garbage collection is usually relatively low: in the Java heap, especially in the new generation, a garbage collection of conventional applications can usually reclaim 70% to 99% of the memory space. In contrast, method zone collection is limited to stringent conditions, and the results of regional garbage collection are often much lower than this.

Judge the survival of the object

The garbage collector is the dead object, so what you need to do before recycling is to make sure that the object is still alive. There are two mainstream algorithms to judge the survival of objects: reference counting algorithm and reachability analysis algorithm.

Citation counting algorithm

Add a reference counter to the object that increments the counter value whenever there is a reference to it; when the reference expires, the counter value decreases by one. An object whose counter is zero at any time can no longer be used.

The disadvantage of this algorithm is that when two objects refer to each other, it can not be recycled; because they refer to each other, their reference count is not zero, and the reference counting algorithm can not recover them.

Although the reference counting algorithm (Reference Counting) takes up some extra memory space for counting, its principle is simple and the decision efficiency is very high. In most cases, it is a good algorithm. There are also some famous application cases, such as Microsoft COM (Component Object Model) technology, FlashPlayer using ActionScript 3, Python language and Squirrel, which is widely used in game scripting field. However, in the field of Java, at least no reference counting algorithm is selected to manage memory in the mainstream Java virtual machines.

Reachability analysis algorithm

At present, the memory management subsystems of the mainstream commercial programming languages (Java, CentralLisp) use reachability analysis (Reachability Analysis) algorithm to determine whether the object is alive or not.

The basic idea of the algorithm is to use a series of root objects called "GC Roots" as the starting node set, from these nodes, search downward according to the reference relationship, the path of the search process is called "reference chain" (Reference Chain). If there is no reference chain connection between an object and GC Roots, or in terms of graph theory, it is from GC Roots to the time when the object is unreachable. It proves that this object can no longer be used.

There are many kinds of GC Root objects, and the common ones are:

Objects referenced in the virtual machine stack (the local variable table in the stack frame), such as parameters, local variables, temporary variables, etc., used in the method stack that each thread is called.

An object referenced by a class static property in the method area, such as a reference type static variable of the Java class.

Objects referenced by constants in the method area, such as references in the string constant pool (String Table).

Objects referenced by JNI (commonly known as Native methods) in the local method stack.

All objects held by the synchronization lock (synchronized)

Several ways of citation

No matter through the reference counting algorithm to judge the number of references of the object, or through the reachability analysis algorithm to determine whether the object reference chain is reachable or not, it is inseparable from the "reference" to determine whether the object is alive or not.

Sort from strong to weak according to the intensity caused:

Strong references: strong references are the most common reference assignments that are common in program code, that is, reference relationships like "Object obj=new Object ()". In any case, the garbage collector will never recycle the referenced object as long as the strong reference relationship exists.

Soft references: describe objects that are useful but not necessary. Objects that are only associated with soft references will be listed in the scope of recycling for a second collection before a memory overflow exception will occur in the system. If there is not enough memory in this collection, a memory overflow exception will be thrown.

Weak reference: used to describe non-essential objects, but it is weaker than soft references, and objects associated with weak references can only survive until the next garbage collection occurs. When the garbage collector starts to work, objects associated with only weak references are recycled, regardless of whether the current memory is sufficient or not.

False reference: also known as "ghost reference" or "phantom reference", it is the weakest kind of reference relationship. Whether an object has a false reference or not will not affect its survival time at all, and it is impossible to obtain an object instance through virtual reference. The only purpose of setting a virtual reference association for an object is to receive a system notification when the object is reclaimed by the collector.

Garbage collection algorithm mark removal algorithm

The algorithm is divided into two stages: "marking" and "clearing": first, all the objects that need to be recycled are marked, and after the marking is completed, all the marked objects are recycled uniformly, or conversely, the living objects are tagged and all the unmarked objects are recycled uniformly.

Disadvantages:

Execution efficiency is unstable. If there are a large number of objects in the Java heap, and most of them need to be recycled, a large number of marking and clearing actions must be carried out, resulting in a decrease in the execution efficiency of both marking and clearing processes as the number of objects increases.

In the problem of fragmentation of memory space, a large amount of discontinuous memory fragments will be produced after marking and clearing. Too much space debris may result in not finding enough continuous memory when large objects need to be allocated during the running of the program and have to trigger another garbage collection action in advance.

Tag replication algorithm

It divides available memory into two equal chunks according to capacity, using only one of them at a time. When this piece of memory is used up, copy the surviving objects to another piece, and then clean up the used memory space at once.

Advantages:

Solve the shortcomings of the mark removal method. The entire half of the area is recycled each time, regardless of the waste of memory fragments.

Disadvantages:

The drawback is that the available memory is reduced to half of what it used to be, which is a bit too wasteful.

If most of the objects in memory are alive, this algorithm will incur a lot of overhead of inter-memory replication.

At present, most commercial Java virtual machines give priority to using this collection algorithm to recover the new generation.

98% of the objects in the new generation will not survive the first round of collection. Therefore, it is not necessary to divide the memory space of the new generation according to the proportion of 1 ∶ 1. The new generation collectors such as Serial and ParNew of HotSpot virtual machine have adopted this strategy to set up the memory layout of the new generation. The specific method of Appel recycling is to divide the new generation into a larger Eden space and two smaller Survivor spaces, using only Eden and one piece of Survivor for each memory allocation. When garbage collection occurs, the objects that are still alive in Eden and Survivor are copied to another Survivor space at once, and then the Eden and the used Survivor space are cleaned up directly. The default size ratio of Eden to Survivor for HotSpot virtual machines is 8 ∶ 1, which means that the available memory space in each new generation is 90% of the entire new generation capacity (80% of Eden plus 10% of a Survivor), and only one Survivor space, that is, 10% of the new generation will be "wasted".

Marking finishing method

The algorithm makes all the living objects move to one end of the memory space, and then cleans up the memory outside the boundary directly.

Pros: there will be no waste of tag-destructing memory.

Disadvantages: replication collection algorithm in the case of high object survival rate, there will be replication operations, more mobile operations, the efficiency will become low.

The choice of tag cleanup and tag finishing is a tradeoff:

Tagging, by moving living objects, especially in the old days, when there were a large number of living areas for each collection, moving living objects and updating all places referencing these objects would be an extremely heavy operation, and this kind of object movement would have to * * pause the user application (Stop The World) * *.

If moving and sorting living objects is not considered at all like the mark-and-clear algorithm, the problem of space fragmentation caused by living objects scattered in the heap can only be solved by more complex memory allocators and memory accessors. For example, through the "partition idle allocation linked list" to solve the memory allocation problem (the computer hard disk storage of large files does not require physically continuous disk space, the ability to store and access on the fragmented hard disk is achieved through the hard disk partition table). Memory access is the most frequent operation of user programs, and if additional burden is added to this link, it will directly affect the throughput of the application.

Based on the above two points, whether moving objects has drawbacks, memory recovery will be more complex when moving, and memory allocation will be more complex if not moved. From the standstill time of garbage collection, the pause time of not moving objects will be shorter, or even without pause, but in terms of the throughput of the whole program, moving objects will be more cost-effective.

Even if not moving objects will improve the efficiency of the collector, because memory allocation and access are much higher than garbage collection, the time-consuming part of this part increases, and the total throughput still decreases.

The Parallel Scavenge collector focused on throughput in the HotSpot virtual machine is based on the tag-collation algorithm, while the delay-focused CMS collector is based on the marking-clearing algorithm.

In order to balance the disadvantages of the two, there is a way to neutralize them. Let the virtual machine use the mark-clear algorithm most of the time, temporarily tolerate the existence of memory fragmentation, until the degree of fragmentation of the memory space has been large enough to affect the object allocation, and then use the mark-collation algorithm to collect once, to get regular memory space. For example, the CMS collector based on mark-removal algorithm uses this method when faced with too much space debris.

Generation collection algorithm

At present, most of the garbage collectors of commercial virtual machines are designed according to the theory of "generational collection".

Consistent design principle for several commonly used garbage collectors: the collector should divide the Java heap into different areas, and then allocate the recycled objects to different areas according to their age (that is, the number of times the object survives the garbage collection process).

The advantages of this are:

If most objects in an area struggle to survive the garbage collection process, put them together and focus on how to keep a small number of objects alive instead of marking a large number of objects that will be recycled each time. You can recycle a lot of space at a lower cost.

If the rest are hard-to-die objects, put them together, and the virtual machine can use a lower frequency to recycle this area, which takes into account both the time cost of garbage collection and the efficient use of memory space.

After the Java heap is divided into different areas, the garbage collector can only reclaim one or some parts of the area at a time. This leads to the division of recycling types such as Minor GC,Major GC,Full GC. It is also possible to arrange garbage collection algorithms that match the survival characteristics of the objects stored in different regions.

The distinction of collecting concepts:

New generation collection (Minor GC/Young GC): refers to garbage collection whose goal is only the new generation

Major GC/Old GC: refers to the goal of garbage collection in old times. Please note that the term "Major GC" is a bit confusing now. There are often different references in different materials. Readers need to distinguish whether it refers to the collection of the old era or the whole collection by context.

Whole heap collection (Full GC): garbage collection for the entire Java heap and method zone.

The Java reactor is divided into Cenozoic era and old age. In the new generation, a large number of objects are found to die during each garbage collection, and a small number of objects that survive after each collection will be gradually promoted to the old age.

Ps: these partitions are only part of the common feature or design style of the garbage collector, not the inherent memory layout of a specific JVM implementation, let alone a further detailed partition of the Java heap in the Java virtual machine specification. As the absolute mainstream of the HotSpot virtual machine in the industry, its internal garbage collectors are all based on "classic generation" design, which requires the matching of new and old collectors to work. But today, there are also new garbage collectors in HotSpot that do not use generational design.

Memory recovery strategy

The recycling strategy described below is based on the recycling process designed by Classical Generation:

1. Distribution and recovery of the new generation

1. In most cases, objects are allocated in the Cenozoic Eden zone. When there is not enough space in the Eden zone to allocate, the virtual machine will initiate a Minor GC. The new generation is divided into a larger Eden space and two smaller Survivor spaces. Only Eden and one Survivor are used for each memory allocation. When garbage collection occurs, the objects that are still alive in Eden and Survivor are copied to another Survivor space at once, and then the Eden and the used Survivor space are cleaned up directly. The default size ratio of Eden to Survivor for HotSpot virtual machines is 8 ∶ 1

two。 The big object went straight into the old age.

two。 The big object goes straight into the old age. Large objects are Java objects that require a lot of contiguous memory space. The most typical large objects are very long strings, or arrays with a large number of elements.

Why would you do that? The goal is to avoid copying back and forth between the Eden area and the two Survivor zones, resulting in a large number of memory copy operations.

Large objects are bad news for the memory allocation of virtual machines, and the worse news than encountering a large object is to encounter a group of short-lived large objects that "die forever". When we write programs, we should be careful to avoid large objects. The reason to avoid large objects in Java virtual machines is that when allocating space, it is easy to trigger garbage collection in advance when there is still a lot of memory space to obtain enough contiguous space to place them, and when replicating objects, large objects mean high memory replication overhead.

3. Long-term surviving objects will enter the old age.

If the object is still alive after the first Minor GC and can be accommodated by the Survivor, the object is moved to the Survivor space and its age is set to 1 year old. Each time the object endures Minor GC in the Survivor area, the age increases by 1 year, and when its age increases to a certain age threshold (the default is 15), it will be promoted to the old age. The age threshold for the promotion of an object can be set by parameter-XX: MaxTenuringThreshold.

This is the end of the introduction to "what JVM crosses the wall of memory management". Thank you for reading. If you want to know more about the industry, you can follow the industry information channel. The editor will update different knowledge points for you every day.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.