Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the java virtual machine execution subsystem

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article focuses on "how to understand the java virtual machine execution subsystem", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn how to understand the java virtual machine execution subsystem.

Class file structure

The program storage format-bytecode (ByteCode), which is used uniformly by virtual machines of different platforms and all platforms, is the cornerstone of platform independence.

Figure-language independence provided by the Java virtual machine

Virtual machine class loading mechanism

The virtual machine loads the data describing the class from the Class file into memory, and verifies, transforms, parses and initializes the data, and finally forms the Java type that can be directly used by the virtual machine, which is the class loading mechanism of the virtual machine.

The timing of class loading

The process of loading, connecting, and initializing types in the Java language is completed during the program run. although this strategy slightly increases the performance overhead when the class is loaded, it provides a high degree of flexibility for Java applications. Language features that are inherently extensible in Java rely on runtime dynamic loading and dynamic connections. For example, if you write an interface-oriented program, you can wait until run time to specify its specific implementation class; users can use Java predefined and custom class loaders so that a local application can load a binary stream from the network or elsewhere at run time as part of the program code. This way of assembling applications is now widely used in Java programs. From the most basic JSP to the relatively complex OSGI technology, all use the Java language to run class loading features.

The whole life cycle of a class from being loaded into virtual machine memory to unloading out of memory includes seven stages: Loading, Verification, Preparation, Resolution, Initialization, Using and Unloading. The three parts of verification, preparation and parsing are collectively referred to as Linking. The sequence of these seven phases is shown in the figure:

Figure-the life cycle of a class

The order of the five stages of loading, validation, preparation, initialization, and unloading is determined, and the loading process of the class must start step by step, while the parsing phase is not necessarily: it can start after the initialization phase in some cases to support the runtime binding (also known as dynamic or late binding) of the Java language.

When do you need to start the first phase of the class loading process: loading? There is no mandatory constraint in the Java virtual machine specification, which can be left to the specific implementation of the virtual machine. However, for the initialization phase, the virtual machine specification strictly stipulates that there are and only five cases in which the class must be "initialized" immediately (and loading, verification, and preparation naturally need to start before):

1) when you encounter four bytecode instructions such as new, getstatic, putstatic or invokestatic, if the class has not been initialized, you need to trigger its initialization first. The most common Java code scenarios that generate these four instructions are when an object is instantiated with the new keyword, when a static field of a class is read or set (except for a static field that has been decorated by final and the result has been put into a constant pool at compile time), and when a class's static method is called.

2) when you make a reflection call to a class using the method of the java.lang.reflect package, if the class has not been initialized, you need to trigger its initialization first.

3) when initializing a class, if you find that its parent class has not been initialized, you need to trigger the initialization of its parent class first.

4) when the virtual machine starts, the user needs to specify a main class to execute (the class that contains the main () method), and the virtual machine initializes the main class first.

5) when using the dynamic language support of JDK 1.7, if a java.lang.invoke.MethodHandle instance parses the method handle of REF_getStatic, REF_putStatic, REF_invokeStatic, and the corresponding class of the method handle has not been initialized, its initialization needs to be triggered first.

Class loading process loading

Loading is a stage of the Class Loading process. During the load phase, the virtual machine needs to do the following three things:

1) get the binary byte stream that defines this class through the fully qualified name of a class.

2) convert the static storage structure represented by this byte stream into the run-time data structure of the method area.

3) generate a java.lang.Class object representing this class in memory as an access entry for all kinds of data of this class in the method area.

Through the fully qualified name of a type, several common forms of binary data flow representing that type are generated:

1) read from zip package to become the basis of future JAR, EAR and WAR formats

2) get from the network. The most typical application of this scenario is Applet.

3) Runtime computing generation, the most commonly used in this scenario is dynamic proxy technology.

4) generated by other files, such as our JSP

Compared to other stages of the class loading process, the loading phase of a non-array class (the array class is relatively special and is created directly by a virtual machine) (to be exact, it is the action of getting the binary byte stream of the class in the loading phase) is the most controllable by the developer, because the loading phase can be accomplished either by the system-provided boot class loader or by the user-defined class loader. Developers can control how the byte stream is obtained by defining their own class loader (that is, overriding the loadClass () method of a class loader).

After the loading phase is completed, the binary byte stream outside the virtual machine is stored in the method area according to the format required by the virtual machine, and the data storage format in the method area is defined by the virtual machine implementation. the virtual machine specification does not specify the specific data structure of this area. Then instantiate an object of the java.lang.Class class in memory (it is not explicitly specified in the Java heap, for the HotSpot virtual machine, the Class object is special, although it is an object, but stored in the method area), this object will serve as an external interface for the program to access these types of data in the method area.

Verification

Verification is the first step in the linking phase, and the main purpose of this step is to ensure that the information contained in the byte stream of the class file meets the requirements of the current virtual machine and does not compromise the security of the virtual machine itself.

The verification phase mainly includes four verification processes: file format verification, metadata verification, bytecode verification and symbol reference verification.

1. File format verification

Verify the class file format specification, such as whether the class file has the beginning of magic 0xCAFEBABE, and whether the major and minor version numbers are within the scope of the current virtual machine processing.

two。 Metadata validation

In this stage, the information described by bytecode is semantically analyzed to ensure that the information described by bytecode conforms to the requirements of java language specification. Verification points may include whether the class has a parent class (all classes except java.lang.Object should have a parent class), whether this class inherits a class that is not allowed to inherit (modified by final), and if the parent class of this class is an abstract class, whether it implements all the methods required in the parent class or interface.

3. Bytecode verification

Carry on the data flow and control flow analysis, this stage carries on the verification analysis to the method body of the class, the task of this stage is to ensure that the method of the verified class will not do any behavior that endangers the security of the virtual machine. For example, it is safe to ensure that the type conversion in the access body is valid, for example, you can assign a subclass object to the parent class data type, but you cannot assign a parent class object to the subclass data type and ensure that the jump command will not jump to a bytecode command outside the method body.

4. Symbol reference verification

Perform a matching check for information outside the class itself (various symbol references in the constant pool).

Prepare for

The preparation phase is the stage that formally allocates memory for class variables and sets the initial values of class variables, all of which will be allocated in the method area. There are two confusing knowledge points in this stage. First, memory allocation at this time includes only class variables (static-decorated variables), not instance variables, which will be allocated in the java heap along with the object when the object is instantiated. The second is that the initial value mentioned here "usually" is the zero value of the data type, assuming that a class variable is defined as:

Public static int value = 12

Then the variable value has an initial value of 0 instead of 12 after the preparation phase, because no java method has been executed yet, and the putstatic instruction that assigns value to 123 is compiled and stored in the class constructor () method, so the action of assigning value to 12 will not be executed until the initialization phase.

In the "usual case" mentioned above, the initial value is zero, then relative to some special cases, if the ConstantValue attribute exists in the field property table of the class field, then the variable value will be initialized to the value specified by the ConstantValue attribute in the preparation phase. The above class variable value is defined as:

Public static final int value = 123

At compile time, javac will generate the ConstantValue property for value, and in the preparation phase, the virtual machine will set value to 123 according to the setting of ConstantValue.

Analysis

The parsing phase is the process of replacing symbolic references in the virtual machine constant pool with direct references.

Symbol reference: a symbol reference is a set of symbols to describe the referenced target object. the symbol can be any form of literal quantity, as long as it can be used to locate the target without ambiguity. Symbolic references are independent of the memory layout implemented by the virtual machine, and the referenced target object is not necessarily loaded into memory.

Direct reference: a direct reference can be a pointer directly to the target object, a relative offset, or a handle that can be located indirectly to the target. Direct reference is related to the implementation of virtual machine memory layout. The direct references translated by the same symbolic reference on different virtual machine instances are generally not the same. If there is a direct reference, the target of the reference must already exist in memory.

Initialization

The initialization phase of the class is the last step of the class loading process. In the preparation phase, the class variable has been assigned the initial value required by the system, while in the initialization phase, it is according to the programmer's subjective plan made by the program to initialize class variables and other resources, or it can be expressed from another point of view: the initialization phase is the process of executing the class constructor () method. The initialization process is triggered in the following four cases:

1. When you encounter four bytecode instructions: new, getstatic, putstatic, or invokestatic, if the class has not been initialized, it needs to be initialized first. The most common java code scenarios that generate these four instructions are when an object is instantiated with the new keyword, when a static field of a class is read or set (except for a static field that has been decorated by final, when the compiler has put the result into the constant pool), and when the static method of the class is called.

two。 When you make a reflection call to a class using the method of the java.lang.reflect package

3. When initializing a class, if you find that its parent class has not been initialized, you need to initialize its parent class first

When 4.jvm starts, the user specifies an executing main class (the class that contains the main method), and the virtual machine initializes this class first

In the preparation phase above, public static int value = 12; after the preparation phase is complete, the value of value is 0, while the class constructor () method is called in the initialization stage, and the value of value is 12 after this phase is completed.

Class loader

The virtual machine design team implements the action of "getting the binary byte stream that describes the class through the fully qualified name of a class" in the class loading phase outside the Java virtual machine, so that the application can decide for itself how to get the desired class. The code module that implements this action is called the classloader.

Class and class loader

For any class, it is necessary for the class loader to load it and the class to establish its uniqueness in JVM. That is, two classes are equal if they come from the same Class file and are loaded by the same class loader. Compared to a class that uses a different class loader to load, it will be different when determining the type check (instanceof) to which the object belongs.

Parental delegation model

From the perspective of the virtual machine, there are only two different kinds of classloaders: one is the boot classloader (Bootstrap ClassLoader), which is implemented in C++ language and is a part of the virtual machine itself. The other is all other class loaders, which are implemented by the Java language, independent of JVM, and all inherit from the abstract class java.lang.ClassLoader.

From the perspective of Java developers, most Java programs typically use the classloaders provided by the following three systems:

1) Boot class loader (Bootstrap ClassLoader): responsible for loading the class libraries stored in the% JAVA_HOME%\ lib directory or in the path specified by the-Xbootclasspath parameter and identified by the java virtual machine (only identified by file names, such as rt.jar, class libraries whose names do not match, will not be loaded even if placed in the specified path) into the memory of the virtual machine. The startup class loader cannot be directly referenced by java programs.

2) extension class loader (Extension ClassLoader): implemented by sun.misc.Launcher$ExtClassLoader, it is responsible for loading all class libraries in the% JAVA_HOME%\ lib\ ext directory or in the path specified by the java.ext.dirs system variable. Developers can use the extension class loader directly.

3) Application class loader (Application ClassLoader): implemented by sun.misc.Launcher$AppClassLoader, it is responsible for loading the class library specified on the user classpath classpath. It is the return value of the getSystemClassLoader () method in the class loader ClassLoader, so it is generally known as the system class loader. Developers can use the application class loader directly. If there is no custom class loader in the program, the class loader is the default class loader in the program.

Our applications are loaded by these three types of loaders.

There is also a custom class loader.

4) Custom classloader (must inherit ClassLoader).

Graph-class loader parent delegation model

If a class loader receives a request for class loading, it will not attempt to load the class itself in the first place, but delegate the request to the parent class loader to complete it, as is the case with the parent class loader at each level. so all requests should eventually be sent to the top-level startup class loader, and only when the parent class loader reports that it cannot complete the load request will the child loader try to load itself. The parental delegation model is very important to ensure the stable operation of the JAVA program. For example, you can try to write a Java class with the same name as an existing class in the rt.jar class library, and you will find that it compiles normally, but can never be loaded and run.

How to destroy the parent delegation model virtual machine bytecode execution engine

The execution engine is one of the core components of the Java virtual machine. Virtual machine is a concept relative to physical machine, these two kinds of machines have code execution ability, the difference is that the execution engine of physical machine is directly built on processor, hardware, instruction set and operating system level, while the execution engine of virtual machine is implemented by itself, so it can make its own instruction set and execution engine architecture, and can execute instruction set formats that are not directly supported by hardware.

The conceptual model of virtual machine bytecode execution engine is developed in the Java virtual machine specification, which becomes the unified appearance (Facade) of various virtual machine execution engines. In different virtual machine implementations, the execution engine will have the choice of interpreting (executing through the interpreter) and compiling (generating native code execution through the just-in-time compiler) when executing Java code, or both, and may even include several different levels of compiler execution engines.

Method call parsing

Method invocation is not the same as method execution. The only task of the method invocation phase is to determine the version of the called method (that is, which method to call), and does not involve the specific running process within the method for the time being. We know that the Class file compilation process does not include the traditional compilation of the connection step, all method calls in the Class file calls are stored in the symbolic reference, rather than the method in the actual runtime memory layout entry address (equivalent to the previous direct reference), that is to say, the symbolic reference parsed into the direct reference process. This feature makes Java have powerful dynamic expansion ability, but it also makes the process of calling Java method more complicated. It is necessary to determine the direct reference of the target method in the class loading device or even during the run.

During the parsing phase of class loading, some of the symbolic references are converted directly to direct references, provided that the method has a determinable version before the program actually runs, and the calling version of the method is immutable at run time. In other words, the call target must be determined when the program code is written and the compiler compiles. Calls to such methods are called Resolution.

The methods that meet the requirement of "compile-time knowable, run-time immutable" in the Java language mainly include: static methods and private methods. The former is directly related to the type, while the latter is inaccessible externally. The characteristics of these two methods determine that it is impossible for them to rewrite other versions by inheritance or other means, so they are suitable for parsing in the class loading phase.

In contrast, five method call bytecode instructions are provided in the Java virtual machine, which are as follows:

Invokestatic: calling static methods

Invokespecial: calling methods, private methods, and parent methods

Invokevirtual: call all virtual methods

Invokeinterface: calls the interface method, which determines an object that implements this interface at run time

Invokedynamic: the method referenced by the call point qualifier is dynamically parsed at run time before the method is executed.

As long as the methods that can be called by invokestatic and invokespecial can determine the unique calling version in the parsing phase, there are four types of static methods, private methods, instance constructors and parent methods that meet this condition. When they load, they will resolve the symbolic reference to the direct reference of the method. These methods are called non-virtual methods, because the methods modified by final cannot be overridden and belong to non-virtual methods. In contrast, other methods are called virtual methods.

The parsing call must be a static process, and it is fully determined during compilation that all the symbolic references involved will be converted to determinable direct references during the parsing phase of class loading, and will not be delayed until run time. This is completely different from the assignment we will talk about later.

Dispatch

As an object-oriented programming language, Java has three characteristics of face objects: inheritance, encapsulation and polymorphism. Below we will explain some of the most basic manifestations of polymorphism, such as how "rewriting" and "overloading" are implemented in the Java virtual machine.

Static dispatch

A dispatch action that relies on a static type to locate a method to perform a version, such as overloading, is called a static dispatch. The virtual machine (compiler to be exact) is determined by the static type of the parameter rather than the actual type when overloaded, and the static type is known to the compiler, so at compile time, the Javac compiler decides which overloaded version to use based on the static type of the parameter.

Dynamic dispatch

The runtime relies on the actual type to locate the dispatch action (rewriting Override) performed by the method is a dynamic dispatch.

Single dispatch and multiple dispatch

The receiver of the method and the parameters of the method are collectively referred to as the number of the method. According to the number of dispatches, dispatches can be divided into single dispatches and multiple dispatches. Single dispatch is to select the target method according to one quantity, while multi-dispatch is to select the target method according to more than one quantity.

In the process of static dispatch, the selection of the target method is based on two points: the static type of the object and the type and number of method parameters. Because the choice is based on two quantities, the static dispatch of the Java language belongs to the type of multi-dispatch.

In the process of dynamic dispatch, because the compiler has determined the signature of the target method, you only need to find the recipient of the method. Because the selection is based on a single quantity, the dynamic dispatch of the Java language is a single dispatch type.

Implementation of virtual mobile dispatch

Because dynamic allocation is a very frequent action, and the method version selection process of dynamic allocation requires the runtime to search the appropriate target method in the method metadata of the class, so in the actual implementation of the virtual machine, based on the consideration of performance, most implementations don't really search so frequently. The most commonly used method is to establish a virtual method table (Virtual Method Table, also known as vtable) for the class in the method area, and the corresponding interface method table-Inteface Method Table, referred to as itable, is also used in invokeinterface execution, using virtual method table indexes instead of metadata lookups to improve performance.

The virtual method table stores the actual entry address of each method. If a method is not overridden in a subclass, the address entry in the virtual method table of the subclass is the same as that of the same method in the parent class, pointing to the implementation entry of the parent class. If this method is overridden in the subclass, the address in the subclass method table will be replaced with an entry that points to the subclass implementation version. The method table is generally initialized in the connection phase of the class load. after the initial values of the variables of the class are prepared, the virtual machine initializes the method table of the class.

At this point, I believe you have a deeper understanding of "how to understand the java virtual machine execution subsystem". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report