How to deeply analyze JVM 07/08 Update SLTechnology News&Howtos

How to deeply analyze JVM

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

How to deeply analyze JVM, I believe that many inexperienced people are at a loss about it. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

First of all, two concepts are clarified here: the JVM instance and the JVM execution engine instance. The JVM instance corresponds to an independent Java program, while the JVM execution engine instance corresponds to the thread that belongs to the user's running program; that is, the JVM instance is at the process level and the execution engine is at the thread level.

What is JVM? -Life cycle of JVM

The birth of JVM instance: when you start a Java program, a JVM instance is generated, and any class with a publicstaticvoidmain (String [] args) function can be used as the starting point for running a JVM instance, so how does JVM know that it is the main running classA instead of the main running classB? This requires explicitly telling JVM the class name, which is why we usually run Java commands, such as JavaclassAhelloworld, where Java is the Java virtual machine that tells os to run SunJava2SDK, while classA points out the class name needed to run JVM.

The run of the JVM instance: main () is the starting point of the program's initial thread, and any other thread is started by that thread. There are two kinds of threads inside JVM: daemon thread and non-daemon thread. Main () is a non-daemon thread, which is usually used by JVM itself. Java programs can also indicate that the thread they create is a daemon thread. The demise of the JVM instance: JVM exits when all non-daemon threads in the program terminate; if the security manager allows, the program can also exit using the Runtime class or System.exit ().

What is JVM? -Architecture of JVM

Roughly speaking, the internal architecture of JVM is divided into three parts: class loader (ClassLoader) subsystem, runtime data area, and execution engine. The following will first introduce the class loader, then the execution engine, and * * the runtime data area

1. The class loader, as its name implies, is used to load .class files. The two kinds of loaders of JVM include: boot class loader and user-defined class loader. Boot class loader is a part of JVM implementation, and user-defined class loader is a part of Java program, which must be a subclass of ClassLoader class. (the situation described below is for SunJDK1.2)

Dynamic class loader: find the class to load only in the installation path of the system class (JavaAPI's class file)

User-defined class loader:

System class loader: created at JVM startup to find classes to load in the CLASSPATH directory other user-defined class loaders: it is necessary to talk about several methods of the ClassLoader class first, which are critical to understanding how the custom class loader loads .class files.

ProtectedfinalClassdefineClass (Stringname,bytedata [], intoffset,intlength) protectedfinalClassdefineClass (Stringname,bytedata [], intoffset,intlength,ProtectionDomainprotectionDomain); protectedfinalClassfindSystemClass (Stringname) protectedfinalvoidresolveClass (Classc)

DefineClass is used to import binary class files (new types) into the method area, that is, classes that are user-defined (that is, responsible for loading classes)

Through the fully qualified name of the type, findSystemClass first loads through the system class loader or the startup class loader, and returns the Class object.

ResolveClass: let class loaders perform connection actions (including validation, memory allocation initialization, and resolving symbolic references in types to direct references). This involves the issue of Java namespaces. JVM ensures that all classes referenced by classes loaded by a class loader are loaded by this class loader, and classes loaded by the same class loader can access each other, but classes loaded by different class loaders cannot see each other. As a result, effective shielding is realized.

2. Execution engine: it is either executing bytecode or executing local methods

To talk about the execution engine, you have to have an instruction set. Each instruction contains a single-byte opcode, followed by 0 or more operands.

(1) how the instruction set design takes the stack as the design center, rather than the register as the center, how the instruction set design meets the requirements of the Java architecture:

Platform independence: taking the stack as the center makes it easier to implement Java on machines with few register. Compiler generally uses stack to transmit the intermediate results of compilation to the connection optimizer. If the instruction set is based on stack, it is conducive to the combination of runtime optimization work and the execution engine that performs just-in-time compilation or adaptive optimization. Generally speaking, it is to unify the data structure used for compilation and operation, which is more conducive to optimization.

Network Mobility: the compactness of class files.

Security: most of the opcodes in the instruction set indicate the type of operation. (it is helpful to improve the execution speed by using the data flow analysis period for one-time validation at load time, rather than when each instruction is executed.)

(II) implementation technology

The main execution techniques are: interpretation, just-in-time compilation, adaptive optimization, chip-level direct execution, in which interpretation belongs to * generation JVM, instant compilation JIT belongs to the second generation JVM, adaptive optimization (currently Sun's HotspotJVM uses this technology) absorbs the experience of * * generation JVM and the second generation JVM, and adopts the combination of the two.

Adaptive optimization: start interpreting all code and monitoring code execution, then start a background thread for frequently called methods, compile it to native code, and carefully optimize it. If the method is no longer used frequently, the compiled code is uncompiled and still interpreted and executed.

3. Runtime data area: mainly includes: method area, heap, Java stack, PC register, local method stack.

(1) the method area and heap are shared by all threads

Heap: stores objects created by all programs at run time

Method area: when the class loader of JVM loads the .class file and parses it, the parsed type information is put into the method area.

(2) the Java stack and PC register are exclusive to the thread, during the creation time of the new thread

(3) Local method stack: stores the state of local method calls

The above generally introduces the main contents of the runtime data area, and the following is described in detail. To introduce the data area, you have to explain the data types in JVM.

Data type in JVM: the basic data unit in JVM is word, while the length of word is determined by the specific implementer of JVM.

Data types include basic types and reference types

(1) basic types include: numeric types (including all Java basic data types except boolean), boolean (int in JVM, 0 for false, and other int values for true) and returnAddress (the internal type of JVM, which is used to implement the finally clause).

(2) reference types include array type, class type and interface type.

The presentation of the data in JVM was described earlier, so let's enter it into the data area of JVM

First, let's take a look at the method area:

As mentioned above, the method area is mainly used to store type information extracted by JVM from class files, so how is the type information stored? As we all know, Java uses large end order (big?endian: that is, low-byte data is stored in high-order memory. For example, for 1234, 12 is high-order data and 34 is low-order data, then the storage format in Java should be 12 with a low address of memory, 34 with a high address with memory, and the storage format in x86 is the opposite). This is actually the storage format of data in class files. But when data is poured into the method area, JVM can store it in any way.

Type information: including the fully qualified name of class, the direct parent class of class, the class type or interface type, the class modifier (public, etc.), the list of all direct parent interfaces, the Class object provides a window to access this information (available through Class.forName (") or instance.getClass ()), the following is the Class method, I believe you will suddenly realize that (so J)

GetName (), getSuperClass (), isInterface (), getInterfaces (), getClassLoader ()

The static variable is saved as part of the type information

References to the ClassLoader class: load other classes referenced in that class when dynamically connecting

Reference to the Class class: inevitably, as described above

This type of constant pool: includes direct constants (String,integer and floatpoint constants) and symbolic references to other types, fields, and methods (note: the constant pool here is not a common place to store constants, these symbolic references may be variables that we come into contact with in programming), because of these symbolic references, constant pools become an important part of the dynamic connection of Java programs.

Field information: fields declared in a general type

Method information: information about each method in the type

Compile-time constant: a class variable that is declared with final or initialized with a value known at compile time

Class copies all constants to its constant pool or to its byte stream.

Method table: an array that includes direct references to all instance methods that its instance may call (including those inherited from the parent class)

In addition, if a class is not abstract and local, save the bytecode of the method, the Operand stack, and the stack frame and exception table of the method.

For example:

ClassLava {privateintspeed=5; voidflow () {} classVolcano {publicstaticvoidmain (String [] args) {Lavalava=newLava (); lava.flow ();}}

Run the command JavaVolcano

(1) JVM finds the Volcano.class to pour in and extracts the corresponding type information to the method area. By executing the bytecode in the method area, JVM executes the main () method, which keeps a pointer to the constant pool of the Vocano class all the time.

(2) the * instruction in Main () tells JVM to allocate memory for the classes listed in the constant pool * item (here it is explained again that the constant pool does not only store constant information). Then JVM finds the * item of the constant pool and finds that it is a symbolic reference to the Lava class, then checks the method area to see if the Lava class is loaded, and the result is that it is not loaded yet, look for "Lava.class" and write the type information to the method area. The pointer of the Lava class information in the method area is used to replace the symbolic reference in the original constant pool of Volcano, that is, the symbolic reference is replaced by the direct reference.

(3) JVM sees the new keyword, prepares to allocate memory for Lava, finds the location of Lava in the method area according to the * item of the constant pool of Volcano, analyzes how many pairs of space is needed, allocates space on the heap, initially sets the speed variable to 0, and presses the reference of the lava object to the stack

(4) call the flow () method of lava

All right, now that we have an overview of the method area, let's take a look at the heap.

Heap implementation of Java objects:

The Java object is mainly composed of instance variables (including those declared by the class to which it belongs and its parent class), pointers to the class data in the method area, pointers to the method table, object locks (optional), waiting sets (optional), GC-related data (optional) (depending mainly on the GC algorithm, such as marking and clearing algorithms, you need to mark whether the object is referenced and whether the finalize () method has been called).

So why do Java objects have pointers to class data? Let's consider it from several aspects.

First of all: how do I check whether conversion is allowed when changing an object reference to another type in a program? Class data is required

Second: when dynamic binding, you don't need a reference type, but a run-time type.

The confusion here is: why is the actual type saved in the class data instead of the reference type? This question should be left behind, which I think should be understood in the follow-up reading notes.

Pointer to the method table: this is similar to C++ 's VTBL, which helps to improve the efficiency of method calls.

Object lock: used to realize mutually exclusive access to shared data by multiple threads

Wait set: used to allow multiple threads to coordinate their merits and demerits in order to achieve a common goal. (notice the wait (), notify (), notifyAll () methods in the Object class).

Heap implementation of Java arrays: arrays also have an instance of Class associated with their class, and arrays with the same dimension and type are instances of the same class. The representation of the array class name, such as [[LJava/lang/Object for Object [] [], [I for int [], [[B for byte [])

Now that the heap has been roughly introduced, let's introduce the program counter and the Java stack

Program counters: unique to each thread, created when the thread starts

If thread executes the Java method, PC saves the address of the next instruction execution.

If thread executes the native method, the value of Pc is undefined

Java stack: the Java stack saves the running state of the thread in frames, while the Java stack has only two operations, frame pressing and unstacking.

Each frame represents a method, and the Java method has two ways to return, return and throw an exception, both of which cause the corresponding frame of the method to go off the stack and free memory.

The composition of the frame: local variable area (including method parameters and local variables, for instance methods, first save the this type, in which method parameters are strictly placed according to the declaration order, local variables can be placed arbitrarily), Operand stack, frame data area (used to help support constant pool parsing, normal method return and exception handling).

Local method stack: depends on the implementation of local methods, such as some JVM implementation of the local method excuse to use C connection model, then the local method stack is C stack, it can be said that when a thread calls local methods, it enters a field that is not restricted by JVM, that is, JVM can use local methods to dynamically extend itself.

I believe you all understand what JVM is.

After reading the above, have you mastered how to deeply parse JVM? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.