How to analyze the JVM principle 07/11 Update SLTechnology News&Howtos

How to analyze the JVM principle

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article shows you how to analyze the principle of JVM, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

1: what is JVM

JVM is the abbreviation of Java Virtual Machine (Java Virtual Machine). JVM is a specification for computing devices. It is a fictional computer, which is realized by simulating various computer functions on a real computer. The Java virtual machine includes a set of bytecode instructions, a set of registers, a stack, a garbage collection heap, and a storage method domain. JVM shields the information related to the specific operating system platform, so that Java programs only need to generate the object code (bytecode) running on the Java virtual machine, and can run on a variety of platforms without modification. When JVM executes the bytecode, it actually interprets the bytecode as machine instruction execution on a specific platform.

What is the relationship between 2:JRE/JDK/JVM

JRE (JavaRuntimeEnvironment,Java runtime environment), that is, the Java platform. All Java programs can only be run under JRE. Ordinary users only need to run the developed java program and install JRE.

JDK (Java Development Kit) is a development kit used by program developers to compile and debug java programs. JDK's tool is also a Java program, which also requires JRE to run. In order to maintain the independence and integrity of JDK, JRE is also part of the installation of JDK. Therefore, under the installation directory of JDK, there is a directory called jre, which is used to store JRE files.

JVM (JavaVirtualMachine,Java Virtual Machine) is part of JRE. It is a fictional computer, which is realized by simulating various computer functions on the actual computer. JVM has its own perfect hardware architecture, such as processors, stacks, registers, etc., as well as corresponding instruction systems. The most important feature of the Java language is that it runs across platforms. JVM is used to support operating system independent, cross-platform implementation.

3:JVM principle

JVM is the core and foundation of java, a virtual processor between java compiler and os platform. It is a kind of abstract computer implemented by software method based on the lower operating system and hardware platform, on which the bytecode program of java can be executed.

The java compiler only needs to target JVM to generate code or bytecode files that JVM can understand. The Java source file is compiled into a bytecode program, and each instruction is translated into different platform machine code through JVM, which is run on a specific platform.

The architecture of 4:JVM

Class loader (ClassLoader) (used to load .class files)

Execution engine (execute bytecode, or execute local methods)

Runtime data area (method area, heap, java stack, PC register, local method stack)

5:JVM Runtime data area

First block: PC register

The PC register is used to store the JVM instructions that each thread will execute next. If the method is native, no information is stored in the PC register.

Second block: JVM stack

The JVM stack is thread private, and each thread creates a JVM stack at the same time. The JVM stack stores variables of local basic types in the current thread (eight basic types defined in java: boolean, char, byte, short, int, long, float, double), partial return results, and Stack Frame. Non-basic objects only store an address on the JVM stack that points to the stack.

Block 3: heap (Heap)

It is the area used by JVM to store object instances and array values. It can be considered that the memory of all objects created by new in Java is allocated here, and the memory of objects in Heap needs to be reclaimed by GC.

(1) the heap is shared by all threads in JVM, so the allocation of object memory on it needs to be locked, which leads to the high overhead of new objects.

(2) in order to improve the efficiency of object memory allocation, Sun Hotspot JVM allocates a separate space TLAB (Thread Local Allocation Buffer) to the created thread, whose size is calculated by JVM according to the running condition, and there is no need to lock when allocating objects on TLAB, so JVM will try its best to allocate memory on TLAB when allocating memory to thread objects. In this case, the performance of allocating object memory in JVM is basically as efficient as C However, if the object is too large, the heap space allocation is still used directly.

(3) TLAB only works on the new generation of Eden Space, so when writing Java programs, it is usually more efficient to allocate multiple small objects than large objects.

(4) all newly created Object will be stored in the new generation Yong Generation. If the Young Generation data survives one or more GC, it will be transferred to OldGeneration. New Object is always created in Eden Space.

Block 4: method area (Method Area)

(1) in Sun JDK, this region corresponds to PermanetGeneration, also known as persistent generation.

(2) the method area stores the information of the loaded class (name, modifier, etc.), static variables in the class, constants defined as final type in the class, Field information in the class, method information in the class. When developers obtain information in the program through getName, isInterface and other methods in the Class object, the data are all derived from the method area, and the method area is also globally shared, under certain conditions, it will also be GC. When the method area needs to use more memory than it allows, an error message for OutOfMemory is thrown.

Block 5: running constant Pool (Runtime Constant Pool)

The fixed constant information in the class, the method and the reference information of Field are stored, and the space is allocated from the method area.

Block 6: native method stack (Native Method Stacks)

JVM uses the local method stack to support the execution of native methods, and this area is used to store the state of each native method call.

6: the algorithm for determining the "dead" object

Because the program counter, the Java virtual machine stack and the local method stack are all exclusive to the thread, the memory occupied is also generated with the thread and reclaimed with the end of the thread. Unlike the Java heap and the method zone, thread sharing is the focus of GC.

There are almost all objects in the heap. Before GC, you need to consider which objects are still alive and which can not be recycled, and which objects are dead and can be recycled.

There are two algorithms to determine whether an object is alive or not:

1.) reference counting algorithm: add a reference counter to the object. Whenever the object is applied in a place, the counter increases by 1; when the reference expires, the counter minus 1; when the counter is 0, the object is dead and recyclable. But it is difficult to solve the situation of circular references between two objects.

2.) reachability analysis algorithm: through a series of objects called "GC Roots" as the starting point, the search starts from these nodes, and the search path is called reference chain. When an object is not connected to GC Roots with any reference chain (that is, the object is unreachable to GC Roots), it is proved that the object is dead and recyclable. The objects that can be used as GC Roots in Java include objects referenced in the virtual machine stack, objects referenced by Native methods in the local method stack, objects referenced by static properties in the method area, and objects referenced by constants in the method area.

In the mainstream implementation of mainstream commercial programming languages (such as our Java), reachability analysis algorithms are used to determine whether the object is alive or not.

7:JVM garbage collection

The basic principle of GC (Garbage Collection): recycle objects that are no longer used in memory, and the method used for recovery in GC is called a collector. Because GC needs to consume some resources and time, after analyzing the life cycle characteristics of objects, Java collects objects in the way of the new and old generations, in order to shorten the pauses caused by GC to applications as much as possible.

(1) the collection of objects of the new generation is called minor GC.

(2) the collection of objects of the Old Age is called Full GC.

(3) the GC that is enforced by actively calling System.gc () in the program is Full GC.

Different object reference types are recycled by GC in different ways. References to JVM objects are divided into four types:

(1) strong references: by default, objects use strong references (the instance of this object has no other object references, so GC will be recycled)

(2) soft reference: soft reference is an application provided in Java that is suitable for caching scenarios (it will be GC only if there is not enough memory)

(3) weak reference: it must be reclaimed by GC during GC

(4) Virtual reference: because virtual reference is only used to know whether an object is GC or not.

8: garbage collection algorithm

1. Mark-clear algorithm

The most basic algorithm can be divided into two stages: first, the objects that need to be recycled at the mark, and all the marked objects will be uniformly reclaimed after the marking is completed.

It has two shortcomings: one is the efficiency problem, the marking and removal process is inefficient; the other is the space problem, after the mark removal will produce a large number of discontinuous memory fragments (similar to our computer disk debris). Too much space debris makes it impossible to find enough contiguous memory when large objects need to be allocated and has to trigger another garbage collection action in advance.

2. Replication algorithm

In order to solve the problem of efficiency, there is a "replication" algorithm, which divides the available memory into two equal blocks according to capacity, and only needs to use one of them at a time. When one piece of memory is used up, copy the surviving objects to another, and then clean up the memory space that has just been used up at once. This solves the problem of memory fragmentation, but at the cost of reducing the content to half of the original.

3. Marking-finishing algorithm

The replication algorithm will copy frequently when the object survival rate is high, and the efficiency will be reduced. Therefore, there is a mark-collation algorithm, and the marking process is the same as the mark-removal algorithm, but in the subsequent steps, instead of cleaning up the objects directly, all the surviving objects are moved to one side, and then the memory outside the end boundary is cleared directly.

4. Generation collection algorithm

At present, the GC of commercial virtual machines adopts generation-by-generation collection algorithm, which does not have any new ideas, but divides the heap into two generations according to the survival cycle of objects: the new generation and the old age, and the method area is called permanent generation (in the new version, permanent generation has been abandoned and the concept of metaspace has been introduced, which uses JVM memory while metaspace directly uses physical memory).

In this way, different collection algorithms can be used according to the characteristics of each era.

Objects in the new generation "die every day". Every GC, a large number of objects will die, a small number of objects will survive, and replication algorithms will be used. The Cenozoic is subdivided into Eden region and Survivor region (Survivor from, Survivor to), and the default size ratio is 8:1:1.

Objects in the old era used mark-clear or mark-collation algorithms because of their high survival rate and no extra space for allocation guarantee.

The newly generated objects first enter the Eden area, and then use Survivor from when the Eden area is full. When the Survivor from is full, then Minor GC (the new generation GC), enter the copy of the surviving objects in Eden and Survivor from into the Survivor to, and then empty the Eden and Survivor from. At this time, the original Survivor from becomes the new Survivor to, and the original Survivor to becomes the new Survivor from. When copying, if the Survivor to cannot accommodate all the surviving objects, then according to the allocation guarantee of the old age (similar to the loan guarantee of the bank), the object copy into the old age, and if the old age cannot accommodate it, then Full GC (the old age GC).

Large objects go directly to the old age: there is a parameter configuration in JVM-XX:PretenureSizeThreshold, which allows objects larger than this setting to enter the old age directly, in order to avoid a large amount of memory replication between the Eden and Survivor areas.

Long-lived objects enter the old age: JVM defines an object age counter for each object. If the object is born in Eden and survives after the first Minor GC, and can be accommodated by Survivor, it will be moved to Survivor and the age will be set to 1. If he doesn't survive a Minor GC, his age will be increased by 1, and when he reaches a certain age (the default is 15, which can be set through XX:MaxTenuringThreshold), he will move into the old age. However, JVM does not always require that the age must reach the maximum age in order to promote the old age. If the total size of all objects of the same age (such as age x) in the Survivor space is more than half of the Survivor, all objects older than or equal to x will directly enter the old age without waiting for the maximum age requirement.

9: garbage collector

The garbage collection algorithm is the methodology and the garbage collector is the concrete implementation. The JVM specification does not have any stipulation on how to implement the garbage collector, so different manufacturers and different versions of virtual machines provide different garbage collectors. Here we only look at the HotSpot virtual machine.

After JDK7/8, all collectors and combinations (connections) of the HotSpot virtual machine are as follows:

1.Serial collector

Serial collector is the most basic and oldest collector, and it was once the only choice for the new generation of mobile phones. It is single-threaded and uses only one CPU or one collection thread to complete the garbage collection, and when it collects, it must suspend all other worker threads until it ends, that is, "Stop the World". Stopping all user threads is unacceptable for many applications. For example, if you are doing something and are forced to stop by others, can you still count the "alpaca" that rushes past in your heart?

Nevertheless, it is still the default new generation collector for virtual machines running in client mode: simple and efficient (compared to individual threads of other collectors, because there is no thread switching overhead, etc.).

Work schematic diagram:

2.ParNew collector

The ParNew collector is a multithreaded version of the Serial collector, and the behavior (collection algorithm, stop the world, object allocation rules, collection strategy, and so on) is the same as that of the Serial collector, except for the use of multithreading.

It is the preferred new generation collector for many JVM running in Server mode, and one of the reasons why it is very important is that apart from Serial, only he can work with the old CMS collector.

Work schematic diagram:

3.Parallel Scavenge collector

A new generation of collectors, parallel multi-thread collectors. Its goal is to achieve a controllable throughput (that is, the ratio of CPU running user code time to CPU total consumption time, that is, throughput = line user code time / [line user code time + garbage collection time]). This can efficiently use CPU time to complete the operation task of the program as soon as possible, which is suitable for tasks that do not need too much interaction in the background.

4.Serial Old collector

The old version of the Serial collector, single-threaded, "tag finishing" algorithm, is mainly for virtual machines in Client mode.

You can also use Server mode:

Versions prior to JDK 1.5 are used with Zhongyu Parallel Scavenge collectors

It can be used as the back scheme of CMS, and Concurrent Mode Failure is used when CMS occurs.

Work schematic diagram:

5.Parallel Old collector

The old version of Parallel Scavenge, multithreading, "tagging" algorithm, JDK 1.6 just appeared. Before this, Parallel Scavenge can only be used with Serial Old. Due to the poor performance of Serial Old, the advantages of Parallel Scavenge can not be brought into full play.

With the advent of the Parallel Old collector, the Throughput first collector finally has a veritable combination. Parallel Scavenge/Parallel Old combinations can be used in situations where throughput and CPU are sensitive. The working diagram of the combination is as follows:

6.CMS collector

CMS (Concurrent Mark Sweep) collector is a kind of collector whose goal is to obtain the shortest recovery pause time. If the pause time is short, the user experience is good.

Based on the "tag removal" algorithm, concurrent collection, low pause, complex operation process, which is divided into four steps:

1) initial tagging: marking only objects to which GC Roots can be directly associated is fast, but requires "Stop The World"

2) concurrent marking: the process of tracking the reference chain, which can be executed concurrently with the user thread.

3) Retag: fix the tag record of the object whose tag changes due to the continuous running of the user thread in the concurrent marking phase, which is longer than the initial marking time but much shorter than the concurrent marking time, and requires "Stop The World"

4) concurrent cleanup: cleanup is marked as recyclable and can be executed concurrently with user threads

Since the most time-consuming concurrency markup and concurrent cleanup of the entire process can work with user threads, in general, the memory recovery process and user threads of the CMS collector are executed concurrently.

Work schematic diagram:

The CSM collector has three disadvantages:

1) very sensitive to CPU resources

Although concurrent collection does not pause user threads, it still slows down the application and reduces overall throughput because it takes up part of the CPU resources.

The default number of collection threads for CMS is = (number of CPU + 3) / 4; when the number of CPU is more than 4, the collection thread takes up more than 25% of CPU resources, which may have a greater impact on user programs; when there are less than 4, the impact is even greater and may be unacceptable.

2) unable to handle floating garbage (during concurrent cleanup, the new garbage generated by the user thread is called floating garbage), and a "Concurrent Mode Failure" failure may occur.

Concurrent cleanup requires a certain amount of memory space, which cannot be collected after almost filling up like other collectors in the old years. If the CMS reserved memory space cannot meet the needs of the program, there will be a "Concurrent Mode Failure" failure. At this time, JVM enables the backup plan: temporarily enable the Serail Old collector, resulting in another Full GC generation.

3) generate a large number of memory fragments: based on the "mark-clear" algorithm, CMS does not compress after cleaning, resulting in a large number of discontiguous memory fragments, which will result in not finding enough contiguous memory when allocating large memory objects, which requires another Full GC action to be triggered in advance.

7.G1 collector

G1 (Garbage-First) is a commercial collector just launched by JDK7-u4. G1 is a garbage collector for server applications. Its mission is to replace the CMS collector in the future.

G1 collector features:

Parallelism and concurrency: it can make full use of the hardware advantages of multi-CPU and multi-core environment, shorten the pause time, and execute concurrently with user threads.

Generational collection: G1 can independently manage the entire heap without the cooperation of other GC collectors, dealing with new objects and objects that have been alive for some time in different ways.

Spatial integration: on the whole, the tag demarcation algorithm is used, and locally, the replication algorithm (between two Region) is used. There will be no memory fragmentation, and the GC will not be triggered in advance because large objects cannot find enough contiguous space, which is better than the CMS collector.

Predictable pause: in addition to pursuing a low pause, you can also build a predictable pause time model that allows the user to specify that the time spent on garbage collection is no more than N milliseconds within a time period of M milliseconds, which is better than the CMS collector.

Why can there be a predictable pause?

Because it can be planned to avoid region-wide garbage collection in the entire Java heap.

The G1 collector divides the memory into equal independent areas (Region), retaining the concepts of the new generation and the old age, but is no longer physically isolated.

G1 tracks each Region to get its collection value and maintains a priority list in the background

Each time according to the allowed collection time, priority is given to the recovery of the most valuable Region (origin of the name Garbage-First)

This ensures that the collection efficiency can be as high as possible in a limited time.

What if the object is referenced by another Region object?

When judging the survival of an object, do you need to scan the entire Java heap to ensure accuracy? In other generational collectors, there is also such a problem (and the G1 is more prominent): when the new generation of recycling has to scan the old age?

Whether G1 or other generational collectors, JVM uses Remembered Set to avoid global scanning:

Each Region has a corresponding Remembered Set

Each time a Reference type data write operation occurs, a Write Barrier temporarily interrupts the operation

Then check whether the reference to be written points to an object that is in a different Region from the Reference type data (other collectors: check whether the old object references the new generation object)

If not, the relevant reference information is recorded in the Remembered Set corresponding to the Region where the reference points to the object through CardTable

When doing garbage collection, adding Remembered Set to the enumeration range of the GC root node ensures that there is no global scan and no omissions.

Without counting the operation of maintaining Remembered Set, the recycling process can be divided into four steps (similar to CMS):

1) initial tag: only mark the objects to which GC Roots can be directly associated, and modify the value of TAMS (Next Top at Mark Start), so that the next phase of concurrent user programs can create new objects in the correct available Region, which requires "Stop The World".

2) concurrent marking: starting from GC Roots, reachability analysis is performed to find out the surviving objects, which takes a long time and can be executed concurrently with user threads.

3) final marking: fixed the tag record of the part of the object whose tag changed due to the user thread continuing to run in the concurrent marking phase. When marking concurrently, the virtual machine records the object changes in the thread Remember Set Logs, and the final marking phase integrates Remember Set Logs into Remember Set, which is longer than the initial marking time but much shorter than the concurrent marking time, and requires "Stop The World".

4) filter recycling: first, sort the recycling value and cost of each Region, then customize the recycling plan according to the GC pause time expected by users, and finally recover some valuable Region garbage objects according to the plan. The replication algorithm is used for recycling, copying living objects from one or more Region to another empty Region on the heap, and compressing and freeing memory in the process; it can be done concurrently to reduce pause time and increase throughput.

The above is how to analyze the principle of JVM. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.