What is the structure of Java bytecode 07/04 Update SLTechnology News&Howtos

What is the structure of Java bytecode

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is the structure of Java bytecode". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the structure of Java bytecode".

1. Bytecode 1.1 what is a bytecode? The reason why Java can be "compiled once, run everywhere" is that JVM is customized for a variety of operating systems and platforms, and because no matter what platform it is, it can be compiled to generate bytecode (.class file) in a fixed format for JVM to use. Therefore, we can also see the importance of bytecode to the Java ecology. Bytecode is called bytecode because the bytecode file consists of hexadecimal values, while JVM reads in bytes as a set of two hexadecimal values. In Java, it is common to use the javac command to compile the source code into a bytecode file, and an example of a .java file compiling to running is shown in figure 1.

Figure 1 Java running diagram for developers, understanding bytecode can provide a more accurate and intuitive understanding of deeper things in the Java language, such as bytecode, and you can intuitively see how the Volatile keyword works on bytecode. In addition, bytecode enhancement technology is widely used in Spring AOP, various ORM frameworks and hot deployment, so it is very helpful for us to understand its principle deeply. In addition, due to the existence of the JVM specification, as long as it can eventually generate bytecode that conforms to the specification, it can run on JVM, so this gives a variety of languages running on JVM (such as Scala, Groovy, Kotlin) an opportunity to extend features that Java does not have or implement various syntax sugars. Learning these languages after understanding bytecode can "go up against the current". Looking at its design idea from the bytecode perspective, it is also easy to learn. This paper focuses on bytecode enhancement technology, starting from the bytecode layer by layer, from the JVM bytecode operation set to the framework of operating bytecode in Java, and then to all kinds of framework principles and applications that we are familiar with. Bytecode structure .java file compiled through javac will result in a .class file, such as writing a simple ByteCodeDemo class, as shown in the left side of figure 2 below:

Figure 2 sample code (left) and the corresponding bytecode (right) are compiled to generate a ByteCodeDemo.class file, which is opened with a pile of hexadecimal numbers, divided by bytes and displayed as shown on the right side of figure 2. As mentioned above, JVM has specification requirements for bytecode, so what is the structure of seemingly messy hexadecimal? The JVM specification requires that each bytecode file should be composed of ten parts in a fixed order, and the overall structure is shown in figure 3. Next, we will introduce these ten parts one by one:

Fig. 3 bytecode structure specified by JVM

(1) Magic number (Magic Number)

The first four bytes of all .class files are magic numbers, and the fixed value of magic numbers is: 0xCAFEBABE. The magic number is placed at the beginning of the file, and JVM can determine whether the file may be a .class file based on the beginning of the file, and if so, it will continue with the later operation. Interestingly, the fixed value of magic number is set by James Gosling, the father of Java, for CafeBabe (coffee baby), while the icon of Java is a cup of coffee. (2) the version number is 4 bytes after the magic number, the first two bytes represent the minor version number (Minor Version), and the last two bytes represent the major version number (Major Version). In figure 2 above, the middle version number is "00 00 00 34", the minor version number is converted to 0, the major version number is converted to 52, and the query serial number 52 in the Oracle official website corresponds to the major version number is 1.8, so the Java version number that compiled the file is 1.8.0. (3) constant pool (Constant Pool) the byte immediately after the major version number is the constant pool entry. Two types of constants are stored in the constant pool: literals and symbolic references. Literals are constant values declared as Final in the code, with symbolic references such as globally qualified names of classes and interfaces, field names and descriptors, method names and descriptors. The constant pool as a whole is divided into two parts: the constant pool counter and the constant pool data area, as shown in figure 4 below.

Fig. 4 structure of constant pool

Constant pool counter (constant_pool_count): because the number of constants is not fixed, you need to place two bytes to represent the constant pool capacity count value. The first 10 bytes of the bytecode of the sample code in figure 2 are shown in figure 5 below, converting 24 in hexadecimal to a decimal value of 36, excluding the subscript "0", that is, there are 35 constants in this type of file.

Figure 5 the first ten bytes and their meaning

Constant pool data area: the data area is composed of (constant_pool_count-1) cp_info structures, and one cp_info structure corresponds to a constant. There are 14 types of cp_info in bytecode (shown in figure 6 below), each of which has a fixed structure.

Figure 6 various types of cp_info

Take CONSTANT_utf8_info as an example, its structure is shown on the left side of figure 7. The first byte is "tag", whose value is taken from the Tag of the corresponding item in figure 6 above. Because its type is utf8_info, the value is "01". The next two bytes identify the length of the string Length, and then the Length bytes are the specific value of the string. Extract a cp_info structure from the bytecode in figure 2, as shown on the right side of figure 7 below. When translated, it means that the constant type is a utf8 string with a length of one byte and a data of "a".

Figure 7 structure of CONSTANT_utf8_info (left) and example (right)

Other types of cp_info structures will not be discussed in this article, the overall structure is more or less the same, first through the Tag to identify the type, and then the next n bytes to describe the length and / or data. For the prophet, you can later view the complete constant pool decompiled by JVM through the javap-verbose ByteCodeDemo command, as shown in figure 8 below. You can see that the decompilation results clearly present the type and value of each cp_info structure.

Figure 8 decompilation result of constant pool

(4) access sign

The two bytes after the end of the constant pool that describes whether the Class is a class or an interface and whether it is modified by modifiers such as Public, Abstract, Final, and so on. The JVM specification specifies the access flag (Access_Flag) in figure 9 below. It should be noted that JVM does not enumerate all access flags, but is described by bit or operation. For example, if the modifier of a class is Public Final, the value of the corresponding access modifier is ACC_PUBLIC | ACC_FINAL, that is, 0x0001 | 0x0010=0x0011.

Figure 9 access flag

(5) current class name

The two bytes after the access flag describe the fully qualified name of the current class. The value of these two bytes is the index value in the constant pool, and the fully qualified name of this class can be found in the constant pool according to the index value. (6) the parent class name, two bytes after the current class name, describes the fully qualified name of the parent class. As above, the index value in the constant pool is also saved. (7) Interface information A two-byte interface counter followed by the parent class name, which describes the number of interfaces implemented by this class or parent class. The next n bytes are the index values of the string constants of all interface names. (8) the field table is used to describe variables declared in classes and interfaces, including class-level variables and instance variables, but not local variables declared within the method. The field table is also divided into two parts, the first part is two bytes, describing the number of fields; the second part is the details of each field fields_info. The structure of the field table is shown in the following figure:

The field table structure of figure 10 takes the field table of the bytecode in figure 2 as an example, as shown in figure 11 below. The access flag of the field is shown in figure 9 # 0002, which corresponds to Private. The constant pool in figure 8 gets the field name "a" and the descriptor "I" (for int) by indexing the subscript. To sum up, you can uniquely determine the variable private int a declared in a class.

Figure 11 field example

(9) method table

After the end of the field table is the method table, the method table is also composed of two parts, the first part is two bytes to describe the number of methods; the second part is the details of each method. The details of the method are complex, including the method's access flag, method name, method descriptor, and method properties, as shown in the following figure:

Figure 12 method table structure

The permission modifier for the method can still be obtained from the value query in figure 9, and both the method name and the method descriptor are index values in the constant pool, which can be found in the constant pool by index values. On the other hand, the part of "the properties of a method" is more complex, which is decompiled into human-readable information directly with the help of javap-verbose, as shown in figure 13. You can see that the attribute consists of the following three parts:

"Code area": the JVM instruction opcode corresponding to the source code, and the key operation in bytecode enhancement is the "Code area" part.

"LineNumberTable": a table of line numbers that corresponds the opcodes of the Code area to the line numbers in the source code. Debug will play a role (the source code goes one line, how many JVM instruction opcodes need to go).

"LocalVariableTable": the local variable table, which contains This and local variables, can be called inside each method because JVM implicitly passes in This as the first parameter of each method. Of course, this is for non-Static methods.

Figure 13 the decompiled method table (10) appends the last part of the property sheet bytecode, which stores the basic information about the properties defined by the class or interface in the file. The 1.3bytecode operation set is shown in figure 13 above, and the red number 0code 17 in the Code area is the opcode that the method source code in .java compiles for JVM to actually execute. In order to help people understand, what you see after decompilation is the mnemonic corresponding to the hexadecimal opcode, the corresponding relationship between the hexadecimal opcode and the mnemonic, and the usefulness of each opcode can be found in the Oracle official documentation, and can be checked when needed. For example, the first mnemonic in the figure above is iconst_2, and the bytecode corresponding to figure 2 is 0x05, which is used to push the int value 2 into the Operand stack. By analogy, the understanding of the mnemonic of 0x17 is the implementation of the complete add () method. The instruction set of Operand stack and bytecode JVM is based on stack rather than register. Stack-based instruction set can be cross-platform (because register instruction set is often linked to hardware), but the disadvantage is that stack-based implementation requires more instructions to complete the same operation (because the stack is only a FILO structure, it needs to be pushed out frequently). In addition, because the stack is implemented in memory and the register is in the cache area of CPU, the stack-based speed is much slower, which is also a sacrifice for cross-platform.

The opcode or set of operations we mentioned above actually controls the Operand stack of this JVM. In order to more intuitively understand how the opcode controls the Operand stack, and to understand the function of the constant pool and variable table, the operation of the add () method on the Operand stack is made as GIF. As shown in figure 14 below, only the referenced part of the constant pool is intercepted, starting with the instruction iconst_2 and ending with the ireturn, which corresponds to the instruction in the code area 0x17 in figure 13:

Figure 14 Control Operand stack diagram 1.5 View bytecode tool if you use the javap command every time you look at the decompiled bytecode, it is very tedious. Here is a recommended Idea plug-in: jclasslib. The effect is shown in figure 15. When the code is compiled and selected "Show Bytecode With jclasslib" in the menu bar "View", you can directly see the class information, constant pool, method area and other information of the current bytecode file.

Figure 15 jclasslib view bytecode

two。 Bytecode enhancement

In the above, we focus on the structure of bytecode, which lays a foundation for us to understand the implementation of bytecode enhancement technology. Bytecode enhancement technology is a kind of technology that modifies the existing bytecode or dynamically generates a new bytecode file. Next, we'll start with an in-depth analysis of the implementation of the most direct bytecode manipulation.

Fig. 16 bytecode enhancement technique

ASM for the need to manipulate bytecode manually, you can use ASM, which can generate .class bytecode files directly, or dynamically modify class behavior before the class is loaded into JVM (as shown in figure 17 below). The application scenarios of ASM include AOP (Cglib is based on ASM), hot deployment, modifying classes in other jar packages, and so on. Of course, when it comes to such low-level steps, it is also difficult to implement. Next, this article will introduce two kinds of API of ASM, and use ASM to implement a rough AOP. But before that, in order for people to understand the ASM process more quickly, readers are strongly advised to understand the visitor pattern first. To put it simply, the visitor pattern is mainly used to modify or manipulate some data with a stable data structure, and through the first chapter, we know that the structure of the bytecode file is fixed by JVM, so it is very suitable to use the visitor pattern to modify the bytecode file.

Figure 17 ASM modified bytecode 2.1.1 ASM API2.1.1.1 core APIASM Core API can be compared to parsing SAX in XML files. Without reading the entire structure of this class, you can use streaming methods to deal with bytecode files. The advantage is that it saves a lot of memory, but it is difficult to program. However, for performance reasons, Core API is generally used for programming. There are several key classes in Core API:

ClassReader: used to read compiled .class files.

ClassWriter: used to rebuild compiled classes, such as modifying class names, properties, and methods, or to generate bytecode files for new classes.

Various Visitor classes: as mentioned above, CoreAPI is processed according to bytecode from top to bottom, and there are different Visitor for different areas of the bytecode file, such as MethodVisitor for accessing methods, FieldVisitor for accessing class variables, AnnotationVisitor for accessing annotations, and so on. In order to implement AOP, the focus is on MethodVisitor.

2.1.1.2 Tree APIASM Tree API can be compared to parsing the DOM in the XML file to read the structure of the entire class into memory, but the disadvantage is that it consumes a lot of memory, but the programming is relatively simple. Unlike CoreAPI,TreeAPI, TreeApi uses various Node classes to map regions of bytecode, and this programming method can be well understood by analogy with DOM nodes. 2.1.2 implement AOP directly using ASM to enhance classes using ASM's CoreAPI. Here, we do not focus on the professional terms of AOP, such as slicing and notification, but only add logic before and after the method call, which is easy to understand and easy to understand. First, define the Base class that needs to be enhanced: it contains only a process () method, and the method outputs a line of "process". After the enhancement, we expect to output "start" before the method execution, followed by "end". Public class Base {

Public void process () {

System.out.println ("process")

}

In order to use ASM to implement AOP, you need to define two classes: one is the MyClassVisitor class, which is used to Visit and modify the bytecode; the other is the Generator class, in which ClassReader and ClassWriter are defined, in which the logic is that classReader reads the bytecode and gives it to the MyClassVisitor class to process it. When the processing is complete, ClassWriter writes the bytecode and replaces the old bytecode. The Generator class is relatively simple, so let's take a look at its implementation, as shown below, and then focus on explaining the MyClassVisitor class. Import org.objectweb.asm.ClassReader

Import org.objectweb.asm.ClassVisitor

Import org.objectweb.asm.ClassWriter

Public class Generator {

Public static void main (String [] args) throws Exception {

/ / read

ClassReader classReader = new ClassReader ("meituan/bytecode/asm/Base")

ClassWriter classWriter = new ClassWriter (ClassWriter.COMPUTE_MAXS)

/ / processing

ClassVisitor classVisitor = new MyClassVisitor (classWriter)

ClassReader.accept (classVisitor, ClassReader.SKIP_DEBUG)

Byte [] data = classWriter.toByteArray ()

/ / output

File f = new File ("operation-server/target/classes/meituan/bytecode/asm/Base.class")

FileOutputStream fout = new FileOutputStream (f)

Fout.write (data)

Fout.close ()

System.out.println ("now generator cc success!")

}

MyClassVisitor inherits from ClassVisitor and is used for bytecode observation. It also contains an inner class MyMethodVisitor, which inherits from the observation used by MethodVisitor to observe the methods in the class. The overall code is as follows:

Import org.objectweb.asm.ClassVisitor

Import org.objectweb.asm.MethodVisitor

Import org.objectweb.asm.Opcodes

Public class MyClassVisitor extends ClassVisitor implements Opcodes {

Public MyClassVisitor (ClassVisitor cv) {

Super (ASM5, cv)

}

@ Override

Public void visit (int version, int access, String name, String signature

String superName, String [] interfaces) {

Cv.visit (version, access, name, signature, superName, interfaces)

}

@ Override

Public MethodVisitor visitMethod (int access, String name, String desc, String signature, String [] exceptions) {

MethodVisitor mv = cv.visitMethod (access, name, desc, signature

Exceptions)

/ / there are two methods in the Base class: the no-parameter construction and the process method, which are not enhanced here

If (! name.equals ("") & & mv! = null) {

Mv = new MyMethodVisitor (mv)

}

Return mv

}

Class MyMethodVisitor extends MethodVisitor implements Opcodes {

Public MyMethodVisitor (MethodVisitor mv) {

Super (Opcodes.ASM5, mv)

}

@ Override

Public void visitCode () {

Super.visitCode ()

Mv.visitFieldInsn (GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;")

Mv.visitLdcInsn ("start")

Mv.visitMethodInsn (INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;) V", false)

}

@ Override

Public void visitInsn (int opcode) {

If ((opcode > = Opcodes.IRETURN & & opcode)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.