How to analyze the architecture of JVM 07/01 Update SLTechnology News&Howtos

How to analyze the architecture of JVM

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, the editor will show you how to analyze the JVM architecture. The knowledge points in the article are introduced in great detail. Friends who feel helpful can browse the content of the article with the editor, hoping to help more friends who want to solve this problem to find the answer to the problem. Follow the editor to learn more about "how to analyze the JVM architecture".

Virtual machine

What is a virtual machine? Virtual machine is a kind of software that simulates and executes a certain instruction set architecture (ISA). It is an abstraction of operating system and hardware. The software model is shown in the following figure:

This abstraction of the computer system is similar to the interface programming generics (or relying on the inversion principle) in object-oriented programming (OOP). The common part of the underlying implementation is extracted through a layer of abstraction, and the underlying layer implements the abstraction and completes the part of its own personality. That is, a level of abstraction is used to isolate the underlying implementations. The virtual machine specification defines the functions that the virtual machine needs to complete (that is, the interface), and the underlying operating system and hardware use the functions provided by themselves to achieve the functions that the virtual machine needs to complete (implementation). By running on a virtual machine, Java has good cross-platform features.

Java virtual machine

The Java virtual machine (JVM) is defined by the Java virtual machine specification and runs on the bytecode instruction set. This bytecode instruction set contains one byte of opcodes (opcode) and zero to multiple operands (oprand). The virtual machine specification clearly defines what each bytecode instruction performs and how many operands are required. The class file running on the Java virtual machine, which contains the bytecode instruction stream and class definition information, so the Java virtual machine specification also defines the format of the class file (accurate to each byte). Therefore, the two elements of realizing Java virtual machine are bytecode instruction set and class file format. The implementer of Java virtual machine can realize JVM as long as every bytecode instruction in class file is read in the correct way and the function of bytecode instruction is realized according to the requirement.

At present, the commonly used commercial JVM are: Sun HotSpot,BEA JRocket and IBM J9. Among them, because BEA and Sun have been acquired by Oracle, Oracle has two of the most popular JVM in the world, and there are rumors that Oracle will merge the two virtual machines in Java8, learning from each other's needs, learning from each other's strengths to offset their weaknesses, and creating a more sophisticated JVM. HotSpot will execute the code with interpretation + instant compilation. When interpreting and executing bytecode, HotSpot will detect the hotspot code, and then compile this part of the code into local code, which will then run the local code directly instead of interpretation, which will effectively improve the performance of the virtual machine. JRocket is mainly located in the server application, so it does not pay attention to the startup speed of the virtual machine, it will compile all the code into local code for execution in real time, and the garbage collector of JRocket has high collection efficiency. J9 positioning is similar to HotSpot, focusing on desktop and server applications, mainly for IBM's various Java products.

Java language and Java virtual machine

We know that the Java source code, the .java file, is compiled into a .class file through javac. Class files can be run on JVM, and the underlying JVM executes bytecode instructions in .class files through a bytecode interpreter or just-in-time compiler (JIT Compiler). JVM runs on the operating system, and the operating system invokes the underlying hardware services to execute all kinds of software through the instruction set.

You can see that Java runs on top of JVM. But there is no necessary connection between Java and JVM. Java language can not only run on JVM, as long as the corresponding compiler Java language can be run on any platform (such as junk +), it can also be compiled into native code to run directly on the operating system, for example, GCJ (GNU Compiler for Java) on Linux can compile the Java language into native code for direct execution. Similarly, Java is not the only language that can be executed on JVM. As long as a proper compiler is implemented and other languages are compiled into bytecode on JVM, it can be run on JVM. For example, JRuby,Jython and other JVM languages such as Groovy are converted to .class through the corresponding compiler or interpreter and then run on JVM. Because JVM does not care that .class files are converted from Java, JRuby, Jython, etc., as long as the file is properly structured and can be verified by the class file. Therefore, because .class files shield the differences of upper-level languages such as Java and JRuby, Java, Groovy and so on can call each other.

JVM lifecycle

When a Java program is started, a virtual machine instance is born, and when the program closes and exits, the virtual machine instance dies. The JVM instance runs a Java program through the main () method. The main () method must be public, static, return void, and take an array of strings as parameters. The main () method in the initial class of the Java program will be used as the starting point for changing the program's initial thread, and any other thread will be started by this initial thread.

There are two types of threads within JVM: daemon and non-daemon. Daemon threads are usually used by virtual machines themselves, such as garbage collection threads. When all non-daemon threads of the program terminate, the JVM instance exits automatically.

Java virtual machine architecture

JVM consists of a classloader subsystem, a runtime data area, an execution engine, and a local method interface.

Class loader subsystem

The class loader subsystem is mainly used to locate the binary information defined by the class, and then parse and load the information into the virtual machine, which is transformed into the data structure of the type information inside the virtual machine. The classloader subsystem is also responsible for security and is the basis for dynamic linking and dynamic loading of JVM. The data structure of binary information = > type information needs to go through a lot of steps. First of all, the class loader is the first line of defense of the JVM security sandbox, preventing untrusted classes from damaging the virtual machine. Each loaded class file needs to be checked four times before it can be loaded. After the verification is passed, the namespace of the class loader and the features of the runtime package can prevent the untrusted class from pretending to be a trusted class to destroy the virtual machine. After the class loader constructs a data structure with information about the class in the method area, it creates a Class object on the heap as an interface to access the data structure. At the same time, class loading also requires initializing the static data of the class, that is, calling the method of the class. The above is the process of loading, linking, and initializing a class.

Runtime data area

The runtime datazone is the organization of the memory space of the JVM runtime and is logically divided into multiple zones. The lifecycle of these zones is related to whether they are shared by threads. They are:

Heap

Used to store objects or array instances, that is, objects that come out of new at run time. The life cycle of the heap is the same as that of JVM and access is shared between threads. Because of multithreaded concurrent access, there are two ways to consider thread safety. The first is to lock for mutually exclusive access. The second is the thread local allocation buffer (Thread Local Allocate Buffer, TLAB), which pre-allocates an area to each thread when it is created, which is private to the thread, invisible to other threads, and will not be shared. The JVM specification specifies that the heap throws an OutOfMemoryException when insufficient memory is requested.

Method area

Store type information and runtime constant pool (Runtime Constant Pool). Each class loaded by the class loader forms a data structure of the type information corresponding to the child in the method area, including the class name, direct superclass, implemented interface list, field list, method list, and so on. The runtime constant pool is an embodiment of the constant pool list (Constant Pool List) in the class file at runtime, which stores constants of various basic data types and String types, as well as symbolic references to other classes, methods, and fields. The life cycle of the method zone is the same as that of JVM and is shared by multiple threads, so the security of concurrent access should be considered. The JVM specification specifies that the method zone throws an OutOfMemoryException if the required memory is not met.

PC (Program Counter)

Thread private, with the same life cycle as thread, is a simulation of PC in CPU. If the thread is executing the Java method, the address of the next bytecode instruction stored in the thread's PC. When the Java method is called and returned, the PC needs to be updated to save the address of the bytecode instruction being executed by the current method (Current Method). PC is the only store in the JVM specification that does not specify that an exception will be thrown.

JVM stack

The thread is private, has the same life cycle as the thread, and is a simulation of the method call stack in traditional languages such as C). Stack frames (Frame) are stored in the JVM stack for method calls and returns, local variables, and intermediate results of calculations. The JVM specification specifies that the stack can throw two kinds of exceptions: (1) StackOverflowException, which is thrown when the depth of the stack is greater than a specified value. (2) OutOfMemoryException, thrown if insufficient memory is requested when allocating memory for a new stack frame or for a thread.

Stack frames are stored in the JVM stack, and each stack frame corresponds to a method call. Each time, the JVM thread can only execute one method (Current Method). The stack frame of this method is the element at the top of the JVM stack (called the current stack frame, Current Frame). When a method is called, a stack frame is initialized and pressed into the JVM stack; when the method call returns or throws an exception, the JVM stack pops up the corresponding stack frame of the method. Local variable table (Local Variable Table), Operand stack (Oprand Stack) and other stack frame information are stored in each stack frame. The stack frame size is determined at compile time, and the compiler records the local variable table and Operand stack size in the method_info property sheet in the class file. A local variable table is similar to an array that stores local variables and method parameters. Because JVM uses a stack-based instruction set architecture rather than a register-based architecture, all calculations on JVM are performed on the Operand stack (for example, arithmetic operations, method calls, memory access, etc.).

Local method stack

Used to support local method calls, throwing the same exception as the JVM stack.

Executive engine

The execution engine is used to execute JVM bytecode instructions, which are mainly implemented in two ways:

(1) translate the input bytecode instruction into another virtual machine instruction when loaded or executed.

(2) the input bytecode instructions are translated into the host host local CPU instruction set when loaded or executed. These two ways correspond to the interpretation and execution of bytecode and real-time compilation. For example, the implementation of the execution engine in HotSpot VM is an interpretation-compilation hierarchy:

(1) interpretation and execution: interpreting and executing bytecode, collecting information about "HotSpot code" in terms of methods, and performing C0 compilation of "hotspot code".

(2) C0 compilation: compile the collected "hot code" into local code and make some simple optimizations. Continue to collect runtime information and compile some frequently executed native code into C1.

(3) C1 compilation: some radical optimizations will be made to the local code in the C0 phase. If some optimizations cause native code execution to fail, the JVM will degenerate to the bytecode interpretation stage.

Automatic memory management

Automatic memory management is used to manage the allocation and release of runtime data areas. Compared with C and C++, Java does not require programmers to actively manage memory (after the new sends out the object, there is no need to display the delete), so JVM needs to undertake the task of memory management. The main focus of memory management is that when the application for memory (new object, class loading and initialization, initialization stack when starting a thread, etc.) is not satisfied, JVM can automatically reclaim the memory occupied by those objects that are no longer alive, that is, garbage collection is often heard. In the process of recycling, it is also necessary to deal with the fragments of memory space in order to improve space utilization. There are two main key points in the recycling process, the marking of living objects and the algorithm of reclaiming memory.

There are mainly two kinds of tag survival objects: reference calculation and root search.

(1) reference counting is a very common method, which is used in some scripting languages such as python, lua and so on. Each object holds a counter that marks the number of times the object has been referenced. During garbage collection, objects with a reference count of 0 are "dead" objects that need to be collected. One disadvantage of reference counting is that it has no way to handle circular references (A-> B, B-> A).

(2) Root search, HotSpot virtual machine uses this algorithm to mark living objects. Take the collection of all references in the method area and JVM stack as the root of the search, traversing from this collection to the end. The objects that are traversed are living objects; those objects that are not traversed need to be garbage collected. This can effectively avoid circular references.

The main algorithms for reclaiming memory are:

(1) replication algorithm, which divides the memory into two parts, using only one of them at each time. When recycling, all the surviving objects are copied to another part in turn (which avoids memory fragmentation), and then only this part is used. The replication algorithm needs to replicate back and forth between two memory regions, which has a certain replication overhead and space overhead (only one area is used at a time), but it can solve the problem of memory fragmentation very well. it is suitable for the case of frequent object creation and short life cycle.

(2) Mark sweep, mark the living object first, and release the memory occupied by the "dead" object directly during recycling, resulting in a large number of memory fragments.

(3) Mark sorting, the marking stage is the same as the mark cleaning algorithm, after the memory of the "dead" object is released in the recovery phase, it is also necessary to move the object to make all the objects arrange in memory in turn, so as to avoid the generation of memory fragmentation. Tag collation, contrary to the replication algorithm, is suitable for the case of infrequent object creation and long life cycle.

(4) according to the generation collection, the memory is divided into several parts according to the different life cycle of the object, and each part uses a different collection algorithm. At present, most commercial virtual machines use this algorithm. For example, in HotSpot, memory is divided into New, Old, and Perm. The replication algorithm is adopted in the new generation, and the tag finishing algorithm is used in the old and permanent generation. The strategy of memory allocation and recycling is that objects are first allocated in the new generation, and if the new generation of memory does not meet the requirements, it will trigger a garbage collection (Young GC, or Minor GC) of the new generation of memory. Young GC causes some of the new generation of objects to be moved to the old age, partly because the new generation of memory is not enough to hold all the objects, and partly because of their age (each object holds the number of times the object has been garbage collected, indicating its age. Stored in the age attribute of the object header) large enough to be promoted to an old age. When the object of the new generation enters the old age, and the memory of the old era does not meet the requirements, it will trigger a garbage collection (Full GC, or Major GC) of the whole new generation and the old age.

There are several background threads for automatic memory management in JVM. For CPU, these background threads are the same as user threads and need to take up system resources. The operation "Stop the World" must be performed when GC threads perform garbage collection, that is, pausing all user threads. As a result, for systems with high real-time requirements, JVM garbage collection may be a deficiency. However, CMS (Concurrent Mark and Sweep) garbage collector is provided in JDK1.5,Sun, which reduces GC time through concurrent execution of GC threads and user threads, and improves the real-time performance of JVM. In various applications of JVM, gc tuning is a key part, the main goal is to reduce the number of GC and reduce the time of each GC. This section will be discussed in detail in subsequent JVM memory management.

Process Analysis of JVM execution Program

Executing "java Main" on the command line will open an instance of JVM. We can observe the running status of JVM through JVM tools such as jps,jstat. Let's take running com.ntes.money.Main as an example to describe the flow of JVM executing a program.

When the command "java-Xmx=12m-Xms=12m-Dname=value com.ntes.money.Main" is executed on the command line, the JVM execution process is as follows:

1) load JVM, mainly loading dynamic link libraries, libjvm.so under windows and libjvm.so under jvm.dll,Linux

2) set the JVM startup parameters, such as-Xmx=12m-Xms=12m in the command to set the heap size.

3) initialize JVM.

4) call the classloader subsystem to load com.ntes.money.Main. What is given here is the custom class, which is loaded by the system default class loader (Classpath class loader) according to the parent delegation chain of the class loader. First of all, the file path com/ntes/money/Main.class is converted according to the full path type, and then the binary information in Main.class is read, parsed and loaded, and the corresponding data structure of Main class is formed in the method area. There are two reasons why ClassNotFoundException may be thrown here. First, the file path com/ntes/money/Main.class does not exist; second, the com/ntes/money/Main.class file path exists, but the information stored in the Main.class file is not the information of the Main class, such as Main1,Main2 and other classes. In this case, a NoClassDefFoundError is thrown, which results in a ClassNotFoundException.

5) in the data structure corresponding to the com.ntes.money.Main class in the method area, look for the main method according to the method descriptor and access flag. The descriptor here includes the method name, parameters, and return value of the method, namely public static void main (String []). If the corresponding main method is not found, a NoSuchMethodError: main exception is thrown.

6) execute the main method through the local method (JNI).

Thank you for your reading. The above is the whole content of "how to analyze JVM Architecture". Friends who learn it, hurry up and get started. I believe that the editor will certainly bring you better quality articles. Thank you for your support to the website!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.