In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article shows you what the file structure of Java Class is like, the content is concise and easy to understand, it can definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.
Today, I reorganize my previous notes in Evernote and send them to students who are interested in the structure of java class files for reference.
Friends who are learning Java should know that from the beginning, Java says "write once, run everywhere" under the banner of platform independence. In fact, when it comes to irrelevance, there is another irrelevance in the Java platform, that is, language independence. In order to achieve language independence, then the file structure or bytecode of class in the Java system is very important. In fact, Java has two sets of specifications from the beginning. One is the Java language specification, the other is the Java virtual machine specification. The Java language specification only defines the constraints and rules related to the Java language, while the virtual machine specification is really designed from a cross-platform point of view. Today we will take a practical example to see what the bytecode corresponding to a Class file in Java should look like. Here we will first explain what Class is made up of in general, and then use an actual Java class to analyze the file structure of class.
Before we continue, we need to be clear about the following points:
1) Class files are made up of byte streams based on 8 bytes. These byte streams are arranged strictly in the prescribed order, and there is no gap between bytes. Data exceeding 8 bytes will be stored in the order of Big-Endian, that is, high-order bytes are stored on low addresses, while low-order bytes are stored on high addresses. In fact, this is the key to cross-platform class files. Because the PowerPC architecture uses the storage order of Big-Endian, while the x86 series processors use the storage order of Little-Endian, in order to maintain a unified storage order of Class documents under each processor architecture, the virtual machine specification must be unified.
2) the Class file structure uses a structure similar to C language to store data, there are mainly two types of data items, unsigned numbers and tables, unsigned numbers are used to express numbers, index references and strings, for example, u1Power2, 4, and 8 bytes respectively represent 1 byte, 2 bytes, 4 bytes, 8 bytes of unsigned numbers, while the table is a composite structure composed of multiple unsigned numbers and other tables. As you may see here, it is not very clear whether the unsigned number and the table are above, but it does not matter. I will explain it with an example again when I wait for the example below.
With the above two points clear, let's take a look at what data is contained in the strictly ordered byte streams in the Class file:
(the image above is from The Java Virtual Machine Specification Java SE 7 Edition)
When looking at the figure above, there is one thing we need to pay attention to. For example, cp_info,cp_info represents a constant pool. In the figure above, we use constant_ Pool [constant _ pool_count-1] to indicate that a constant pool has constant_pool_count-1 constants. Here, it is expressed in the form of an array, but we should not mistakenly think that the constant length of all constant pools is the same. In fact, this place is just to facilitate the description of the use of arrays, but here is not like the programming language, an int array, each int length is the same. Once this is clear, let's go back and see what each item in the picture above represents.
1) U4 magic represents the magic number, and the magic number takes up 4 bytes. What on earth does the magic number do? It actually means that the type of the file is a Class file, not a JPG image, or an AVI movie. The magic number corresponding to the Class file is 0xCAFEBABE.
2) U2 minor_version represents the minor version number of the Class file, and this version number is an unsigned number representation of U2 type.
3) U2 major_version represents the major version number of the Class file, and the major version number is an unsigned number representation of U2 type. Major_version and minor_version are mainly used to indicate whether the current virtual machine accepts the current version of Class files. Different versions of Java compilers compile Class files with different versions. Higher versions of virtual machines support Class file structures compiled by lower versions of compilers. For example, the virtual machine corresponding to Java SE 6.0supports the Class file structure compiled by the compiler of Java SE 5.0, and vice versa.
4) U2 constant_pool_count represents the number of constant pools. Here we need to focus on what a constant pool is. Please don't be confused with the runtime constant pool in the Jvm memory model. Constant pools in Class files mainly store literals and symbolic references, which mainly include strings, the value of final constants, or the initial value of an attribute, while symbolic references mainly store the fully qualified names of classes and interfaces, field names and descriptors. The name of the method and the descriptor, the name here may be easy for everyone to understand, as for the concept of descriptor, let's talk about the field table and method table below. In addition, we all know that there are heaps, stacks, method areas and program counters in the memory model of Jvm, and there is an area in the method area called runtime constant pool. what is stored in the runtime constant pool is actually a variety of literals and symbolic references generated by the compiler, but the runtime pool is dynamic, and it can add other constants to it at run time. The most representative one is String's intern method.
5) cp_info stands for constant pool, in which there are various literals and symbolic references mentioned above. The data items in the constant pool have a total of 14 constants in The Java Virtual Machine Specification Java SE 7 Edition, each constant is a table, and each constant uses a common partial tag to indicate which type of constant it is.
Let's briefly describe the details and wait for us to refine them in the following examples.
CONSTANT_Utf8_info tag flag bit is 1, UTF-8 encoded string
CONSTANT_Integer_info tag flag bit is 3, shaping literal quantity
CONSTANT_Float_info tag flag bit is 4, floating point literals
CONSTANT_Long_info tag flag bit is 5, long shaping literal quantity
CONSTANT_Double_info tag flag bit is 6, double precision literals
The CONSTANT_Class_info tag flag bit is 7, a symbolic reference to a class or interface
CONSTANT_String_info tag flag bit is 8, literal amount of string type
The CONSTANT_Fieldref_info tag flag bit is 9, and the symbolic reference of the field
The CONSTANT_Methodref_info tag flag bit is 10, and the symbolic reference to the method in the class
The CONSTANT_InterfaceMethodref_info tag flag bit is 11, and the symbolic reference of the method in the interface
The CONSTANT_NameAndType_info tag flag bit is 12, the names of fields and methods, and symbolic references to types
6) U2 access_flags represents the access information of a class or interface, as shown in the following figure:
7) U2 this_class represents the constant pool index of the class, pointing to the constant of CONSTANT_Class_info in the constant pool
8) U2 super_class represents the index of the superclass and points to the constant of CONSTANT_Class_info in the constant pool
9) U2 interface_counts indicates the number of interfaces
10) U2 interface [interface _ counts] represents the interface table, in which each item points to the CONSTANT_Class_info constant in the constant pool.
11) U2 fields_count represents the number of instance variables and class variables of a class
12) field_info fields [fields _ count] represents the information of the field table, where the structure of the field table is shown in the following figure:
In the figure above, access_flags represents the access representation of the field, such as the field is public,private,protect, name_index represents the field name, points to the constant of type CONSTANT_UTF8_info in the constant pool, descriptor_index represents the descriptor of the field, it also points to the constant of type CONSTANT_UTF8_info in the constant pool, attributes_count indicates the number of property sheets in the field table, and the property sheet is a field used and described. Method and the extensible structure of the properties of the class, the number of property sheets supported by different versions of Java virtual machines is different.
13) U2 methods_count represents the number of method tables
14) method_info represents the method table. The specific structure of the method table is shown in the following figure:
Among them, access_flags represents the access representation of the method, name_index represents the index of the name, descriptor_index represents the descriptor of the method, attributes_count and attribute_info are similar to the property table in the field table, except that the properties in the field table and the property table in the method table are different, such as the Code attribute in the method table, the code that represents the method, and there is no Code attribute in the field table. How many attributes are there in the specific Class? wait until the property sheet in the Class file structure to talk about.
15) attribute_count represents the number of property sheets, and when it comes to property sheets, we need to be clear about the following:
Property sheets exist at the end of the Class file structure, in field tables, method tables, and Code properties, that is, property sheets can also exist in property sheets.
The length of the property sheet is not fixed, and the length of the property sheet is different for different attributes.
After talking about the composition of each item in the Class file structure, let's use a practical example to explain the following.
Package com.ejushang.TestClass; public class TestClass implements Super {private static final int staticVar = 0; private int instanceVar=0; public int instanceMethod (int param) {return param+1;}} interface Super {}
The binary structure of the TestClass.class corresponding to TestClass.java compiled through jdk1.6.0_37 's javac is shown in the following figure:
Let's parse the byte stream in the above figure below according to the file structure of Class mentioned earlier.
1) Magic number
From the file structure of Class, we know that the first 4 bytes are magic numbers. In the above picture, the content of the address 00000000h-00000003h is the magic number. From the above picture, we can see that the magic number of Class files is 0xCAFEBABE.
2) Major and minor version number
The next 4 bytes are primary and secondary version numbers, as can be seen from the above picture that the slave 00000004h-00000005h corresponds to 0 × 0000, so the minor_version of Class is 0 × 0000, and the content of slave 00000006h-00000007h is 0 × 0032, so the major_version version of Class file is 0 × 0032, which happens to be the primary and secondary version of Class compiled by jdk1.6.0 without target parameters.
3) number of constant pools
The next two bytes represent the number of constant pools from 00000008h-00000009h. From the figure above, we can see that the value is 0 × 0018 and the decimal number is 24. But for the number of constant pools, you need to be clear. The number of constant pools is constant_pool_count-1. Why is it minus one because index 0 means that the data items in class do not refer to any constants in constant pools.
4) constant pool
We mentioned that there are different types of constants in the constant pool. Let's take a look at the first constant of TestClass.class. We know that each constant has a tag logo of type U1 to represent the type of constant. The content at 0000000ah in the figure above is 0x0A, and the conversion to a two-level system is 10. With the above description of constant types, we can see that the constant with tag of 10 is Constant_Methodref_info, and the conclusion of Constant_Methodref_info is shown below:
Where class_index points to the constant of type CONSTANT_Class_info in the constant pool. From the binary structure of TestClass, you can see that the value of class_index is 0 × 0004 (address is 0000000bh-0000000ch), that is, it points to the fourth constant.
Name_and_type_index points to a constant of type CONSTANT_NameAndType_info in the constant pool. As you can see from the figure above, the value of name_and_type_index is 0 × 0013, which points to the 19th constant in the constant pool.
You can then find all the constants in the constant pool in the same way. However, JDK provides a convenient tool that allows us to view the constants contained in the constant pool. The constants in all constant pools can be obtained by javap-verbose TestClass. The screenshot is as follows:
From the figure above, we can clearly see that there are 24 constants in the constant pool in TestClass. Don't forget the zero constant, because the zero constant is used to indicate that the data items in the Class do not refer to any constants in the constant pool. From the above analysis, we know that the first constant representation of TestClass, where the fourth constant pointed to by class_index is the 19th constant pointed to by java/lang/Object,name_and_type_index is: () V, from which we can see that the constant of the first representation method represents the instance constructor method generated by the java compiler. Other constants of the constant pool can be analyzed in the same way. OK, after analyzing the constant pool, we then analyze the access_flags.
5) U2 access_flags represents access information about a class or interface, such as whether Class represents a class or interface, whether it is public,static,final, and so on. The meaning of the specific access tag has been mentioned before, so let's take a look at the TestClass access tag. The access tag of Class is from 0000010dh-0000010e, and the period value is 0 × 0021. According to the flag bits of the various access tags mentioned above, we can know that: 0 × 0021, 0 × 0001 | 0 × 0001, that is, ACC_PUBLIC and ACC_SUPER are true, where ACC_PUBLIC is easy for everyone to understand. ACC_SUPER is the flag that will be carried by classes compiled after jdk1.2.
6) U2 this_class represents the index value of the class and is used to represent the fully qualified name of the class. The index value of the class is shown in the following figure:
From the figure above, we can see that the class index value is 0 × 0003, corresponding to the third constant of the constant pool. Through the result of javap, we know that the third constant is a constant of type CONSTANT_Class_info. Through it, we can know that the fully qualified name of the class is: com/ejushang/TestClass / TestClass.
7) U2 super_class represents the index value of the parent class of the current class. The index value points to a constant of type CONSTANT_Class_info in the constant pool. The index value of the parent class is shown below, and its value is 0 × 0004. If you look at the fourth constant of the constant pool, you can see that the fully qualified name of the parent class of TestClass is: java/lang/Object.
8) interfaces_count and interfaces [interfaces _ count] represent the number of interfaces and each interface. The number of interfaces and interfaces of TestClass are shown below, where 0 × 0001 indicates that the number of interfaces is 1, and 0 × 0005 represents the index value of the interface in the constant pool. Find the fifth constant of the constant pool. Its type is CONSTANT_Class_info, and its value is: com/ejushang/TestClass/Super.
9) fields_count and field_info. Fields_count represents the number of field_info tables in the class, while field_info represents the instance variables and class variables of the class. It should be noted that field_info does not contain fields inherited from the parent class. The structure of field_info is shown in the following figure:
Access_flags indicates the access tag of the field, such as public,private,protected,static,final, etc. The value of access_flags is shown below:
Where name_index and descriptor_index are the index values of the constant pool, indicating the name of the field and the descriptor of the field respectively. The name of the field is easy to understand, but how to understand the descriptor of the field? In fact, in the JVM specification, the descriptor for the field is shown in the following figure:
You need to pay attention to the last line of the figure above, which represents the descriptor for an one-dimensional array, which will be [[Ljava/lang/String] for String [] [] and [[I] for int []. The next attributes_count and attribute_info represent the number of property sheets and property sheets, respectively. Let's take the above TestClass as an example and take a look at the field table of TestClass.
First, let's take a look at the number of fields. The number of fields in TestClass is shown below:
From the figure above, you can see that TestClass has two fields. If you look at the source code of TestClass, you can see that there are only two fields. Next, let's take a look at the first field. We know that the first field should be private int staticVar. Its binary representation in the Class file is as follows:
0x001A indicates the access mark. By looking at the access_flags table, we can see that it is ACC_PRIVATE,ACC_STATIC,ACC_FINAL, and then 0 × 0006 and 0 × 0007 represent the sixth and seventh constants in the constant pool, respectively. By looking at the constant pool, we can see that their values are: staticVar and I, where staticVar is the field name, and I is the field descriptor. Through the interpretation of the descriptor above, I describes a variable of int type. Next, 0 × 0001 indicates the number of attribute sheets in the staticVar field table. From the figure above, there is one property sheet corresponding to the staticVar field, and 0 × 0008 represents the eighth constant in the constant pool. Viewing the constant pool shows that this attribute is the ConstantValue attribute, and the format of the ConstantValue attribute is as follows:
Where attribute_name_index represents the constant pool index of the attribute name. In this case, it is ConstantValue, while the attribute_length fixed length of ConstantValue is 2, and constantValue_index represents a reference in the constant pool. In this case, it is 0 × 0009. If you look at the ninth constant, it represents a constant of type CONSTANT_Integer_info with a value of 0.
After talking about private static final int staticVar=0 above, let's go on to talk about the private int instanceVar=0 of TestClass. In this case, the binary representation of instanceVar is shown in the following figure:
0 × 0002 indicates that the access is marked as ACC_PRIVATE,0x000A to indicate the name of the field. It points to the 10th constant in the constant pool. If you look at the constant pool, you can know that the field name is instanceVar, and 0 × 0000 represents the descriptor of the field. It points to the seventh constant in the constant pool. Looking at the constant pool, you can see that the seventh constant is I, the type is instanceVar, and the final 0 × 0007 indicates that the number of property tables is 0.
10) methods_count and method_info, where methods_count represents the number of methods, and method_info represents the method table, where the structure of the method table is shown in the following figure:
You can see from the above figure that the structure of method_info and field_info is very similar. All the flag bits and values of access_flag in the method table are shown in the following figure:
Where name_index and descriptor_index represent the name and descriptor of the method, which are the indexes pointing to the constant pool, respectively. Here we need to explain the descriptor of the method. The structure of the descriptor of the method is: (parameter list) return value. For example, the descriptor of public int instanceMethod (int param) is: (I) I, which represents a method with a parameter of type int and the return value is also of type int. Then there is the number of attributes and the property table, although both the method table and the field table have the number of attributes and the property table. But the attributes they contain are different. Next let's take a look at the binary representation of the method table in TestClass. First, let's take a look at the number of method tables. The screenshot is as follows:
From the figure above, we can see that the number of method tables is 0 × 0002, indicating that there are two methods. Next, let's analyze the first method. Let's first take a look at the access_flag,name_index,descriptor_index of TestClass's first method. The screenshot is as follows:
You can know from the above picture that access_flags is 0 × 0001, and from the description of access_flags flag bits above, you can know that the value of the access_flags of the method is ACC_PUBLIC,name_index, and look at the 11th constant in the constant pool. We know that the name of the method is, 0x000C represents the 12th constant in the constant pool, and its value is () V, indicating that the method has no parameters and return values. In fact, this is an instance constructor method automatically generated by the compiler. The method table of the next 0 × 0001 representation method has 1 attribute, and the attribute screenshot is as follows:
From the figure above, you can see that the constant in the constant pool corresponding to 0x000D is Code, which represents the Code property of the method, so you should understand that the code of the method is stored in the Code attribute in the property table in the method table of the Class file. Next, let's analyze the Code property. The structure of the Code attribute is shown in the following figure:
Where attribute_name_index points to the constant in the constant pool whose value is Code, and the length of attribute_length represents the length of the Code property table (it should be noted here that the length does not include the 6 bytes of attribute_name_index and attribute_length).
Max_stack represents the maximum stack depth, according to which the virtual machine allocates the depth of operands in the stack frame at run time, while max_locals represents the storage space of the local variable table.
The unit of max_locals is slot,slot, which is the smallest unit in which the virtual machine allocates memory for local variables. At run time, for data types of no more than 32-bit types, such as byte,char,int, etc., one slot is occupied, while 64-bit data types such as double and Long need to allocate two slot. In addition, the value of max_locals is not the sum of the amount of memory required by all local variables, because slot can be reused. When a local variable exceeds its scope, the slot occupied by the local variable is reused.
Code_length represents the number of bytecode instructions, while code represents bytecode instructions. From the figure above, we can see that the type of code is U1, the value of a U1 type is 0 × 00-0xFF, and the corresponding decimal value is 0255. at present, the virtual machine specification has defined more than 200 instructions.
Exception_table_length and exception_table represent the exception information corresponding to the method, respectively.
Attributes_count and attribute_info represent the number of attributes and the property sheet in the Code attribute, respectively. From this we can see that the property sheet in the file structure of Class is very flexible, it can exist in the Class file, method table, field table and Code properties.
Next, let's continue to analyze the above example. From the screenshot of the Code attribute of the init method above, we can see that the length of the property sheet is 0 × 000026, the value of max_stack is 0 × 0002, the value of max_locals is 0 × 0001, the length of code_length is 0x0000000A, then 00000149h-00000152h is the bytecode, then the length of exception_table_length is 0 × 0000, and the value of attribute_count is 0 × 0001, and the value of 00000157h-00000158h is 0x000E, which represents the name of the attribute in the constant pool. Looking at the constant pool, we can see that the value of the 14th constant is the corresponding relationship between the line number of the java source code and the line number of the bytecode. It is not a necessary attribute at run time. If the compiler parameter of-g:none is used to cancel the generation of this information, the greatest impact is that when an exception occurs, the line number of the error cannot be displayed in the stack, and the breakpoint can not be set according to the source code when debugging. Next, let's take a look at the structure of LineNumberTable as shown in the following figure:
As mentioned above, attribute_name_index represents the index of the constant pool, attribute_length represents the attribute length, and the start_pc and line_number subtables represent the bytecode line number and the source code line number. The byte stream of the LineNumberTable attribute in this example is shown in the following figure:
After analyzing the first method of TestClass above, we can analyze the second method of TestClass in the same way. The screenshot is as follows:
Among them, access_flags is 0 × 0001 and name_index is 0x000F. The torsional index is 0 × 0010. By viewing the constant pool, you can know that this method is the public int instanceMethod (int param) method. Using a method similar to that above, we can see that the Code property of instanceMethod is shown in the following figure:
Finally, let's analyze the properties of the Class file. From 00000191h-00000199h to the property sheet in the Class file, 0 × 0011 represents the name of the attribute. If you look at the constant pool, you can see that the attribute name is SourceFile. Let's take a look at the structure of SourceFile as shown in the following figure:
Where attribute_length is the length of the attribute, and sourcefile_index points to the constant in the constant pool whose value is the name of the source code file. In this example, the screenshot of the SourceFile attribute is as follows:
Where attribute_length is 0 × 00000002 means the length is 2 bytes, and the value of soucefile_index is 0 × 0012. Looking at the 18th constant of the constant pool, you can see that the name of the source code file is TestClass.java.
The above content is what the file structure of Java Class is like. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.