Explore the Java virtual machine stack 03/28 Update SLTechnology News&Howtos

Explore the Java virtual machine stack

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Preface

The memory model of Java virtual machine is divided into two parts: one is thread-shared, including Java heap and method area, and the other is thread-private, including virtual machine stack and local method stack, and a small part of program counter memory. Today I'm going to do some shallow research on the Java virtual machine stack.

Students who are familiar with Java should know that JVM is stack-based. But what exactly does this "stack" mean? Is it a virtual machine stack? To answer this question, we need to start with the structure of the virtual machine stack.

Virtual machine stack

What is a virtual machine stack

The stack element of the virtual machine stack is the stack frame, and when a method is called, the stack frame representing that method enters the stack; when this method returns, the stack frame goes off the stack. Therefore, the stacking order of stack frames in the virtual machine stack is the order of method calls. What is a stack frame? The stack frame can be understood as the running space of a method. It is mainly composed of two parts, one is the local variable table, the local variables defined in the method and the parameters of the method are stored in this table, and the other part is the Operand stack, which is used to store operands. We know that the Java program is compiled into bytecode instructions in a form similar to assembly, but different from assembly: the operands of assembly instructions are stored in data segments and registers, and the required operands can be found through memory or register addressing The operands of Java bytecode instructions are stored in the Operand stack. When an instruction with n operands is executed, n operands are taken from the top of the stack, and then the result of the instruction calculation, if any, is put into the stack. Therefore, when we say that the JVM execution engine is based on stack, the "stack" refers to the Operand stack. For a simple example, compare the execution of assembly instructions with Java bytecode instructions, such as calculating 1 + 2, as shown in assembly instructions:

Mov ax, 1; put 1 in register ax

Add ax, 2; add the contents of ax and 2 to ax

The bytecode instruction for JVM goes like this:

Iconst_1 / / push the integer 1 into the Operand stack

Iconst_2 / / push the integer 2 into the Operand stack

The two numbers at the top of the iadd / / stack are added out of the stack, and the result goes into the stack.

Because the Operand stack is memory space, bytecode instructions do not have to worry about the differences between registers and machine instructions on different machines, thus making it platform-independent.

Note that the variables in the local variable table cannot be used directly and must be loaded into the Operand stack as operands if you want to use them. For example, there is a method void foo (), where the code is: int a = 1 + 2; int b = a + 3 position, compiled into bytecode instructions like this:

Iconst_1 / / push the integer 1 into the Operand stack

Iconst_2 / / push the integer 2 into the Operand stack

The two numbers at the top of the iadd / / stack are added to the stack and the result is added to the stack; in fact, the first three steps will be optimized by the compiler to: iconst_3

Istore_1 / / put the contents of the top of the stack into the slot with index 1 in the local variable table, that is, the space corresponding to a

Iload_1 / / load the variable value (3) stored in the slot with the local variable table index 1 into the Operand stack

Iconst_3

The two numbers at the top of the iadd / / stack are added after leaving the stack, and the result is added to the stack.

Istore_2 / / put the contents of the top of the stack into the slot with index 2 in the local variable table, that is, the space corresponding to b

The return / / method returns the instruction and goes back to the call point

It is important to note that the maximum capacity of the local variable table and Operand stack is determined at compile time and does not change at run time. And the space of the local variable table can be reused, for example, when the position of the instruction exceeds the scope of a variable in the local variable table, if a new local variable b is to be defined, b will cover the space of an in the local variable table.

Steal other people's drawings to give people an intuitive understanding of the virtual machine stack (small font Stack refers to the virtual machine stack, Frame is the stack frame, Local variables is the local variable table, and Operand Stack is the Operand stack):

Problems caused by virtual machine stack

After reading the above code, you may have some doubts: what is slot? What do those instructions mean? Why does the index value of a corresponding slot not start from zero, it is clearly the first defined variable?

Let's solve these problems one by one.

What is slot?

First of all, what is slot? Slot is the space unit in the local variable table. According to the virtual machine specification, one slot is used to store the data within 32 bits, such as int,short,float, etc., and two consecutive slot are used to store 64-bit data, such as long,double, etc. The variable JVM of the reference type does not specify its length, it can be 32-bit or 64-bit, so it can occupy either one slot or two slot.

JVM bytecode instruction

Second question, what do those instructions mean?

Instruction format

First of all, we have to understand the format of Java instructions. Java instructions are in bytes, that is, one byte represents an instruction. For example, iconst_1 is an instruction that takes up one byte, so naturally there are no more than 256Java instructions. In fact, more than 200 Java instructions are currently defined. Although the instruction is a byte, it can also take its own Operand. There is an instruction putstatic in JVM that assigns values to specific static fields. But which field is assigned a value? It cannot be explained by this instruction alone, so it can only be specified by operands. The two bytes immediately after putstatic are its operands, which are index values that point to the symbolic reference to the static field in the runtime constant pool. Because the symbolic reference contains basic information about the field, such as the class to which it belongs, the simple name, and the descriptor, the putstatic instruction knows which field of which class is assigned a value.

There are two operands of an instruction: one is embedded in the instruction, usually several bytes after the instruction byte, and the other is stored in the Operand stack. In order to distinguish, we call the former embedded operands and the latter in-stack operands. The difference between the two is that the embedded Operand is determined at compile time and will not change at run time, and it is stored in the Code attribute of the class file method table like the instruction, while the Operand is determined at run time, that is, the program is dynamically generated during execution. For example, the putstatic instruction has an embedded Operand, which is an index value (mentioned earlier), which consists of two bytes, immediately after the corresponding byte of putstatic, and an in-stack Operand, at the top of the Operand stack, which is the value to be assigned to the static field, and the corresponding number of bytes is determined by the type of static field. If the static field is of type short, int, boolean, char or byte, then the Operand must be of type int, which consists of four bytes at the top of the stack; if it is of type float, double or long, then the Operand is the corresponding type, that is, four, eight or eight bytes at the top of the stack If the static field is a reference type, then the type of this Operand must also be a reference type, which consists of the 8 bytes at the top of the stack.

Let me give you another example. Iconst_ represents a family of instructions, which means to put the integer I into the Operand stack, the range of I is (M1, 0, 1, 2, 3, 4, 5), where M1 represents-1. Note that I here is not the Operand of the instruction (that is, the non-embedded Operand, nor the on-stack Operand), such as iconst_1, iconst_2, and iconst3 are all bytecode instructions made up of one byte. We can think of I as the "implicit Operand" of the instruction, that is, the instruction itself contains the Operand. If the integer I exceeds the range of [- 1,5], it cannot be represented by iconst, because a bytecode instruction with only one byte cannot contain all integers. At this point, you need the bipush instruction, which has an embedded Operand consisting of one byte to represent the integer to be placed at the top of the stack, which is changed to a 32-bit integer by extending the symbol bit at the top of the stack. But a byte can not represent all integers, if the integer value is more than a byte can represent the range, only through the ldc instruction, this instruction with a byte of embedded operands, it represents an index to the runtime constant pool of Constant_Integer_info type constants, through the index to refer to the runtime constant pool of integers, no matter how large integers are not afraid.

Read the instruction document

It is impossible to explain all the instructions here, so let me share with you how to read the bytecode instructions on oracle's official website. The address of the document is: https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-6.html

Let's take the astore directive: the documentation about it is described as follows:

Astore instruction

Description and translation:

The bold type on the first line is the name of the instruction

Operation is a function of instructions: storing references in local variables

Format is the format of the instruction: its first byte is the instruction, the name is astore, the second byte is the embedded Operand of the instruction, the name index;Forms refers to the decimal (hexadecimal) code of the instruction, and the decimal (hexadecimal) code of astore is 58 (0x3a)

Operation Stack is the state of the Operand stack before and after the instruction execution: the first line represents the state of the Operand stack before the instruction execution, the second line represents the state of the Operand stack after the instruction execution, and the arrow is the top direction of the stack. At the top of astore's pre-execution stack is the object reference objectRef, which is the in-stack Operand of astore. After execution, objectRef is popped up and stored in the local variable table.

Description is a description of this instruction: index is an unsigned byte, and the index must point to a location in the local variable table of the current stack frame. The reference value at the top of the Operand stack must be returnAddress (method return address) or reference (object reference). The reference will be popped up and its value will be stored in the slot with index index in the local variable table

Notes is a note: when implementing the finally clause in Java, the Operand type used by the astore instruction is a returnAddress, and the aload instruction corresponding to astore (stacking the reference values of the local variable table) cannot load values of type returnAddress into the Operand stack, but can only be of type reference. The asymmetrical design of aload and astore is intentional. The astore instruction can be used in conjunction with the wide instruction to obtain variables in the local variable table with an unsigned double-byte index.

The first variable of the local variable scale

At the Java language level, the essential difference between static methods and instance methods is whether they are shared by objects. From the perspective of JVM, methods (whether static methods or instance methods) are actually shared by objects, and instance variables are private to objects. For JVM, the essential difference between a static method and an instance method is whether it needs to be associated with a specific object: a static method can be called through a class name, it does not need to be associated with a specific object, while an instance method must be called through an object, which needs to be associated with a specific object. So how do instance methods and concrete objects relate? Quite simply, the compiler passes the method receiver as an implicit parameter to the instance method at compile time, which has a familiar name in the method, called "this". The reason why the instance method can access the instance variables and other instance methods of the class is that it has the implicit parameter "this". For example, a method b in class A needs to access the instance variable x, because the instance variable is private to the object, if b is a static method, it does not know which object's instance variable x should be accessed because it does not have a reference to the specific object; if b is an instance method, you can determine that the instance variable to be accessed is this.x through the implicit parameter this. So why can't static methods call instance methods of this class? The essential reason is that there is no this reference. Because the premise of calling an instance method is to pass in an implicit parameter, and the instance method already has this reference, you can pass it to another instance method as an implicit parameter; the static method has no this reference and cannot provide the instance method with an implicit parameter pointing to the recipient of the method, so you cannot call the instance method.

If you understand what is said above, the third problem will be easily solved. Because the method we define is void foo (), which is an instance method, an implicit parameter this,this pointing to a specific object is stored in the first position of the local variable table, that is, in the slot with index 0, and because its scope is from the beginning of the method to the end of the method, its position in the local variable table will not be overridden by other variables. As a result, the variables we define in the method can only be placed in the position behind the local variable table. It is important to note that if the method has parameters (not implicit parameters), the parameters are stored in the local variable table immediately after the this, and because the scope of the parameters is also the entire method body, the local variables defined in the method can only be placed after the parameters. Generally speaking, the storage order of variables in the local variable table is: this (if it is an example method) = > parameters (if any) = > defined local variables (if any).

Thank you for reading.

That's all about virtual machine stack. Java virtual machine is a complete knowledge system, so it is not enough to understand virtual machine stack. There is no other knowledge about virtual machine in detail, such as memory model, runtime constant pool, class loading model and so on. This article plays a role in stimulating everyone's interest in learning JVM, and also serves as a personal learning record and knowledge summary. After that, I may write some summary articles on other aspects of JVM to share with you. Due to the limited personal level and understanding, if there is anything wrong, please do not hesitate to comment, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.