How to parse the JVM class loading mechanism 07/11 Update SLTechnology News&Howtos

How to parse the JVM class loading mechanism

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

JVM class loading mechanism how to analyze, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can gain something.

Overview

The virtual machine loads the data describing the class from the Class file into memory, and verifies, transforms, parses and initializes the data, and finally forms the Java type that can be directly used by the virtual machine, which is the class loading mechanism of the virtual machine.

Unlike those languages that need to link at compile time, in the Java language, the process of loading, connecting, and initializing types is completed while the program is running, for example, import java.util.* contains many classes, but when the program is running, the virtual machine will only load the classes that our program needs. Although this strategy slightly increases the performance overhead when the class is loaded, it provides a high degree of flexibility for Java applications.

The timing of class loading

A class is destined to have a life cycle from the time it is created (the class here may also be an interface, the same below). (the life cycle here refers to the process that the class goes through while it is running, regardless of whether it is stored on the storage medium or not. The life cycle of a class goes through seven stages: loading (Loading), Verification (Verification), Preparation (Preparation), Resolution (Resolution), initialization (Initialization), use (Using) and unloading (Unloading), from being loaded into memory by a virtual machine to unloading. These seven stages can be described in the following figure:

From the figure above, we can clearly see that each stage is in order, and the order of loading, verification, preparation and initialization is fixed, that is to say, the loading process of the class must start step by step in this order; the parsing phase is not necessarily, and the work of the parsing phase may not start until after initialization. The reason for this design is to support the dynamic binding of Java language. It is also important to note that although the above five stages may start in sequence, it does not mean that they start after the completion of one stage after another. the progress of one stage is entirely possible to activate the progress of another stage. cross-mixing.

So when do you need to start the first phase of the class loading process and load it into memory? This has to involve two concepts: active reference and passive reference. According to the specification of the Java virtual machine, only 5 cases are active references:

The four instructions new (instantiating an object with the new keyword), getstatic (reading the static field of a class), putstatic or invokestatic (setting the static field of a class) are not initialized if tired. Then its initialization needs to be triggered first.

When using reflection to make reflection calls, if the class is not initialized, it needs to be initialized first.

When initializing a class, if the parent class is not initialized, the initialization of the parent class needs to be triggered first

When the program starts to trigger the main method, the virtual machine first triggers the initialization of this class.

When using jdk1.7 's dynamic language support, if the final parsing result of a java.lang.invoke.MethodHandler instance is the method handle of REF_getStatic, REF_pusStatic, REF_invokeStatic (the handle contains the instance data and type data of the object, the handle is a way to access the object. The handle is stored in the heap, and the class corresponding to the handle is not initialized, so the initialization of this class needs to be triggered first.

In addition to the five situations, it is passive citation. Classic examples of passive citations are:

Referencing the static field of the parent class through the subclass does not lead to the initialization of the subclass, because for the static field, only the class that directly defines the static field will be triggered initialization, and the subclass is not the class that defines the static field. naturally, it cannot be instantiated.

Referencing a class through an array definition does not trigger the initialization of the class, for example, Clazz [] arr = new Clazz [10];.

A constant does not trigger the initialization of the class that defines the constant because the constant is stored in the constant pool of the class that calls the constant at compile time, and there is essentially no reference to the class that defines the constant, so it does not trigger the initialization of the class that defines the constant.

For scenarios where these five active references trigger class initialization, the java virtual machine specification defines that "there are only" five scenarios that trigger class loading.

Class loading process loading

The virtual machine needs to do the following three things during the load phase:

Get the binary byte stream of this class through the fully qualified name of a class

Convert the static storage structure represented by this byte stream into the runtime data structure of the method area

Generate a java.lang.Class object representing this class in memory as an access entry for all kinds of data of this class in the method area

These three things are not very detailed in the Java virtual machine, such as how the fully qualified name of the class is loaded and where it is loaded. In general, the fully qualified name of a class can be loaded from the zip, jar package, obtained from the network, or generated at run time (the most obvious technical manifestation of this is the reflection mechanism).

The loading of classes can be divided into array types and non-array types, and non-array types can be loaded through the system's boot class loader or through a custom class loader. This is more flexible. For array types, the array class itself is not loaded through the class loader, but directly through the Java virtual machine, so the array type class does not need the class loader? The answer is no. Because the type after the array removes all dimensions will eventually have to be loaded by the class loader, the relationship between the class of the array type and the class loader is still very close.

Typically, a class of an array type needs to be loaded with the following principles:

If the component type of the array (that is, the type of the array class after removing one dimension, for example, for a two-dimensional array, followed by an one-dimensional array) is a reference type, then recursively use the above procedure to load the component type

If the component type of the array class is not a reference type, such as a basic data type, the Java virtual machine marks the array class as associated with the bootstrap class loader

The visibility of array classes is the same as that of component types. If the component type is not a reference type, the visibility of the array class is public, which means that the visibility of the component type is also public.

As mentioned earlier, the load phase and the connection phase are intersected, so it is possible that the connection phase has already begun before the load phase is completed. But even so, the starting order between the recording phase and the connection phase remains in a fixed order.

Verification

The purpose of the verification phase is to ensure that the information contained in the Class byte stream meets the requirements of the current virtual machine and does not compromise the security of the virtual machine.

We know that the Java language has relative security (the security here is reflected in two aspects: one is the features of the Java language, such as Java removing pointers, which can avoid direct manipulation of memory; the other is the sandbox operation mechanism provided by Java, Java ensures that the mechanisms run inside the sandbox, while operations outside the sandbox cannot be run. It is important to note, however, that the Class files processed by the Java virtual machine are not necessarily compiled from Java code, but may come from other languages, and you can even write Class files directly through a hexadecimal editor (provided, of course, that the Class files are written in accordance with the specification). From this point of view, it is impossible to guarantee the security of Class files from other sources. So if the Java virtual machine trusts the Class file it loads, it is likely to cause harm to the virtual machine itself.

The verification phase of the virtual machine mainly completes the following four verifications: file format verification, metadata verification, bytecode verification, symbol reference verification. (combined with the previous article, view the Class class file structure)

File format verification

The file format here refers to the Class file specification, and the verification of this step mainly ensures that the loaded byte stream (on a computer, it cannot be the entire Class file, only 0 and 1, that is, the byte stream) conforms to the Class file specification (according to the previous description of the Class class file, the meaning of each byte of the Class file is determined. For example, whether the first four bytes are a magic number, etc.) and to ensure that the byte stream can be processed by the virtual machine.

In the Hotspot specification, the validation of the file format is much more than that, but only through the verification of the file format can it be stored in the method area. So it is natural to know that the verification work in the later stages is carried out in the method area.

Metadata validation

Metadata can be understood as data that describes data. More generally, metadata is data that describes dependencies between classes, such as the use of annotations in the Java language (using @ interface to create an annotation). The main purpose of metadata verification is to semantically verify the metadata information of the class to ensure that there is no metadata information that does not conform to the Java language specification (Java syntax).

The specific verification information includes the following aspects:

Whether this class has a parent class (all classes except java.lang.Object should have a parent class)

Whether the parent class of this class inherits classes that are not allowed to be inherited (such as classes modified by final)

If the class is not an abstract class, does it implement the methods required in its parent class or interface

Whether the fields and methods in the class contradict the parent class (for example, whether the final field of the parent class is overridden)

Bytecode verification

This stage mainly carries on the check analysis to the method body of the class. Passing the bytecode verification does not mean that there is no problem, but if it does not pass the verification, there must be a problem. The whole bytecode verification process is much more complicated than this. Due to the high complexity of bytecode verification, an optimization has been added to the virtual machine after the jdk1.6 version. The Class class file structure has an attribute mentioned in this article: the StackMapTable attribute. It is easy to understand that this property is used to check whether the type matches.

Symbol reference verification

This verification is the final stage of verification, the symbol reference is the logical symbol of the Class file, and the direct reference points to an address in the method area. In the parsing phase, the symbol reference is converted to a direct reference, and only the matching check before conversion is carried out. Symbol reference verification is mainly to check the matching of information other than the class itself. For example, whether the symbol reference can find the corresponding point class through the fully qualified name described by the string.

Symbolic reference (Symbolic Reference) uses a set of symbols to describe the referenced target. Symbolic reference can be any form of literal quantity, as long as it can be used to locate the target without ambiguity (symbolic literal quantity, no memory is involved). Symbolic references are independent of the memory layout implemented by the virtual machine, and the target of the reference is not necessarily loaded in memory. The memory layouts of various virtual machine implementations can vary, but the symbolic references they can accept must all be consistent, because the literal form of symbolic references is clearly defined in the Class file format of the Java virtual machine specification.

A direct reference (Direct Reference) can be a pointer to a target, a relative offset, or a handle that indirectly locates to the target (which can be understood as a memory address). The direct reference is related to the memory layout implemented by the virtual machine. The direct reference translated by the same symbolic reference on different virtual machine instances is generally different. If there is a direct reference, the target of the reference must already exist in memory.

The purpose of symbolic reference validation is to ensure that the parsing action can be performed properly, and if it fails symbolic reference validation, a subclass of the java.lang.IncomingChangeError exception will be thrown.

Prepare for

After the verification phase is completed, the preparation phase is entered. The preparation phase is to formally allocate memory space for variables and set the initial values of class variables.

It should be noted that at this time, memory is allocated only to class variables (that is, variables modified by static), instance variables are not included, instance variables are initialized when the object is instantiated, and the memory area allocated is the Java heap. The initial value here is the default value in programming, that is, zero.

For example, public static int value = 123,The initial value of value after the preparation phase is 0 instead of 123because no Java method has been executed at this time, and the putStatic instruction that assigns value to 123is compiled and stored in the class constructor clinit () method, and the action of assigning value to 123will not be executed until the initialization phase.

Special case: if the ConstantValue attribute exists in the field property sheet of the class field, the variable will be initialized to the value specified by the ConstantValue attribute in the preparation phase. For example, when public static final int value = 123compiles, javac will generate the ConstantValue attribute for value, and in the preparation phase, the virtual machine will assign the variable to 123 according to the setting of ConstantValue.

Analysis

The parsing phase is the process of replacing symbolic references in the constant pool with direct references (the difference between symbolic references and direct references was mentioned earlier). The symbol reference needs to be resolved before parsing, and different virtual machine implementations can determine whether to resolve the symbol reference of the constant pool when the class is loaded by the loader (that is, before initialization). Or wait until a symbol reference is used (that is, after initialization).

Now that we understand the timing of the parsing phase, there is another problem: if a symbol reference makes multiple parsing requests, the virtual machine can cache the results of the first parsing in addition to the invokedynamic instruction in the virtual machine (recording the reference in the runtime constant pool and identifying the constant as a parsing state), thus avoiding multiple parsing of a symbol reference.

Parsing actions are mainly aimed at seven types of symbolic references: class or interface, field, class method, method type, method handle and call point qualifier. This paper mainly explains the analysis process of the first four kinds.

Class or interface resolution

To resolve a symbolic reference to a class or interface to a direct reference, you need the following three steps:

If the symbolic reference is not an array type, the virtual machine passes the fully qualified name represented by the symbol to the class that invokes the symbolic reference. This process may trigger the loading of other related classes because it involves the validation process.

If the symbolic reference is an array type, and the element type of the array is an object. We know that the symbolic reference exists in the constant pool of the method area, and the descriptor of the symbolic reference will be in the form of "[java/lang/Integer" (for more information on the concept of the descriptor, see [in-depth understanding of JVM]: Class class file structure), it will be loaded according to the above rules, and the virtual machine will generate a direct reference representing this array object.

If there is no exception in the above steps, the symbolic reference has already generated a direct reference in the virtual machine, but before the parsing is completed, the symbolic reference needs to be verified, mainly to confirm whether the class currently calling the symbolic reference has access rights, and a java.lang.IllegalAccess exception will be thrown if there is no access permission.

Field parsing

The parsing of the field needs to first parse the class to which it belongs, because the field belongs to the class, and the field can continue to be parsed only when the correct direct reference to its class is obtained. The parsing of fields mainly includes the following steps:

If the field symbol reference (hereinafter referred to as the symbol) contains a field in which the simple name and field descriptor match the target, the direct reference to the field is returned and the parsing ends.

Otherwise, if the interface is implemented in the class of the symbol, each interface and its parent interface will be searched recursively according to the inheritance relationship, and if the interface contains a field in which the simple name and field descriptor match the target, then the direct reference to this field will be returned directly for a long time, and the parsing will end.

Otherwise, if the class of the symbol is not the Object class, the parent class will be searched recursively according to the inheritance relationship. If the parent class contains a field that matches both the simple name and the field descriptor, the direct reference to the field will be returned directly, and the parsing will end.

Otherwise, parsing fails and a java.lang.NoSuchFieldError exception is thrown. If a direct reference to this field is returned, permission verification will be performed. If it is found that there is no access to the field, a java.lang.IllegalAccessError exception will be thrown.

Class method analysis

Parsing a class method still requires parsing the class of such method first, and the following steps are required after the correct parsing:

Symbolic references to class methods and interface methods are separate, so if the index of class_index (symbolic reference to methods in a class) is found to be an interface in the class method table, a java.lang.IncompatibleClassChangeError exception will be thrown

If the index of the class_index is indeed a class, look for a method in the class that has a simple name and descriptor that match the target field, and if so, return a direct reference to this method, and the search ends.

Otherwise, recursively look for a field in the parent class of the class that has a simple name and descriptor that match the target field. If so, directly return a direct reference to this field, and the search ends.

Otherwise, the method is looked up recursively in the interface of this class and its parent interface, and if found, the method is an abstract class, and the search ends and returns a java.lang.AbstractMethodError exception (because the abstract class is not implemented)

Otherwise, the lookup fails and a java.lang.NoSuchMethodError exception is thrown. If a direct reference is returned, the symbolic reference needs to be validated for permissions. If there is no access permission, a java.lang.IllegalAccessError exception is thrown.

Analysis of interface method

For the same kind of method parsing, you also need to parse the symbolic reference of the method's class or interface first. If the parsing is successful, perform the following parsing work:

If the index of class_index is found to be a class rather than an interface in the interface method table, an exception of java.lang.IncompatibleClassChangeError will also be thrown

Otherwise, look for a method that has a simple name and descriptor that matches the target field in the interface to which the interface method belongs, and return a direct reference to this method if any.

Otherwise, look in the interface and its parent interface until the Object class, and if found, directly return a direct reference to the method. Otherwise, the lookup fails.

All methods of the interface are public, so there is no access problem

Initialization

At the initialization stage, the virtual machine begins to actually execute the Java program code. The initialization of the class variable was mentioned earlier, but only the initial value was assigned, and the user-defined value has not yet been assigned to the variable. It is only in the initialization phase that this customization process really begins, so it can also be said that the initialization phase is the process of executing the class constructor method clinit (). So this is how the clinit () method is generated?

Clinit () is generated by combining the assignment actions of all class variables and static statement blocks that are automatically collected by the compiler. The order in which the compiler collects is determined by the order in which statements appear in the source file. The static statement block can only access the variable defined in front of the static statement block, and the variable defined after it can be assigned but cannot be accessed in the previous static statement block. Sample code:

Public class Test {static {I = 0; / assign variables can be compiled normally through System.out.println (I); / / this compiler will prompt "illegal forward reference"} static int i = 1;}

The clinit () method is different from the class constructor method because the former does not require an explicit call to the parent class constructor, because the virtual machine ensures that the parent class's clinit () method is executed before the subclass's clinit () method is executed.

Because the clinit () method of the parent class executes first, it means that the static method of the parent class will execute before the clinit () method of the child class. As shown in the following example, the output is 2 instead of 1.

Public class Parent {public static int A = 1; static {A = 2;} public class Sub extends Parent {public static int B = A;} public class Test {public static void main (String [] args) {System.out.println (Sub.B);}}

The clinit () method is not required for a class or interface, and if there is no static statement block or variable assignment in a class, the compiler may not generate the clinit () method for the class.

Static statement blocks cannot be used in the interface, but there are still initialization operations for variable assignments, so the interface also generates the clinit () method. However, unlike classes, the clinit () method of an interface does not need to execute the clini > () method of the parent interface first. The parent interface is initialized only if the variables defined in the parent interface are used. In addition, the implementation class of the interface does not execute the interface's clinit () method when it is initialized.

The virtual machine ensures that the clinit () method of a class is properly locked and synchronized in a multithreaded environment. If there are multiple threads to initialize a class at the same time, only one thread executes the class's clinit () method, and the other threads need to block waiting until the active thread finishes executing the clinit () method. If there is a time-consuming operation in the clinit () method of a class, it may cause multiple processes to block.

Note: parsing and initializing the feature of recursively searching the parent class from the bottom up in the inheritance relationship can be used to explain the initialization order of the parent and subclasses in the inheritance relationship (another reason is the influence of the memory layout allocation policy of the Java HotSpot virtual machine).

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.