Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

You call this class?

2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

This article comes from the official account of Wechat: low concurrency programming (ID:dibingfa). Author: flash.

I am a .java file called FlashObject.java, just call me scum.

Public class FlashObject {private String name; private int age; public String getName () {return name;} public int add (int a, int b) {return a + b;}} I am about to be loaded and run by the boss of the JVM virtual machine.

Lao Xu: scum, I'm about to pick you up. Please lose weight first and don't take up too much space.

Scum: OK, no problem. Give me ten seconds.

Public class FlashObject {private String name;private int age;public int add (int aline int b) {return axib;}

Scum: Lao Xu, I'll lose weight. Take a look.

Lao Xu:. Are you sick?

Scum: what's the matter? I've got rid of useless spaces and carriage returns. I've lost a lot of weight.

Lao Xu: all right, look at your IQ, I'll explain it to you. You are still a text file, so to slim down is to let you define a compact data structure to represent the information in your Java file, and then tell me what each byte in this data structure represents.

Scum: Oh, that's right.

Lao Xu: yes, on the one hand, it is convenient for me to load, and on the other hand, my virtual machine is not only for your Java language service, but also many languages can eventually be converted into my virtual machine recognition, you have to design a general format.

Scum: mm-hmm, I understand this time!

Class 1 my class name is FlashObject.

Find a place to save it and put it at the beginning.

A small square here is 1 byte, which is 8 bits. An English letter is represented by an ASCII code as 1 byte, so it occupies a square and will not be interpreted later. It occurred to me that this class should have a parent class.

Although it is not written in this .java file, it also has its default parent class, Object.

Of course, we have to record the full class name.

Java/lang/Object

Where can I write it down? Just follow the class name.

Ah, no, my class name and parent class name are all longer, so next to each other, who knows where the demarcation point is.

No, you have to add a length in front of it.

In addition to the parent class, there is also an interface name! Although we didn't write this class, we have to define it.

This interface is slightly different from the class name and parent class name, because there may be more than one.

But it doesn't matter, it takes two bytes to indicate the number of interfaces, and then the interface names are still arranged next to each other as above.

Yeah, perfect.

2 constant pool slowly, I find that there are more and more places that need string names.

In addition to the class name, parent class name, interface name, attribute name, method name, property class name, method input type name, return value type name, and so on.

On the one hand, if each is expanded and written in this way, the file format will be very messy and many structures will be longer.

On the other hand, many strings are duplicated, such as the class name String of the property name and the class name String of the return value of the method getName, which is repeated twice, wasting space.

Therefore, I decided to invalidate the previous scheme and design a new structure to store these strings uniformly, which I named constant pool.

Each string has an index corresponding to it, which can be calculated without the need for additional fields.

In this way, the class, parent, and interface can all point to this index, so the length can be fixed.

Now, of course, this constant pool only holds strings.

It is not hard to imagine that there may also be integer, floating-point values as constants, or even a reference type, which then points to an index in the constant pool, a bit like a pointer.

With so many types, there is bound to be a place to record type information, so it seems that we have to change the previous design.

In this way, our constant pool can not only store simple string constants, but also store the values of the corresponding data structures according to different types.

Of course, the overall structure of our constant pool remains the same, but there are rich types of structures in it.

Similarly, our entire design has not been affected by minor changes to the constant pool.

OK, summarize our current overall plan.

At the beginning of the constant pool, then all the needed constants are put here, and an index is used to point to it.

Then we store the information about the class itself, and we store the information about the current class, parent class, and interface.

It seems that the slimming work required by Lao Xu has begun to take shape.

3 variables now the information of the class itself has been found in the right place to store, and then we store the variables.

There may also be multiple variables, so the structure still follows our previous thinking, starting with the number of stores, followed by each data structure that stores the variables.

As for what data structure is used to store variables and whether they are fixed length, that is what we are going to design next.

Let's take out one of the variables and see what it has.

Private String name

It is very clear that this part of private is the tag of the variable, String is the variable type, and name is the variable name.

Take a look at the marking section first.

In addition to private, there are public, protected, static, final, volatile, transient and so on. Some can be put together, such as

Public static final String name

Some can't be put together, such as

Public private String name; / / error

We use a bitmap, with each tag represented by a bit (for example, public in the first bit, private in the second bit, static in the fourth bit, and final in the fifth bit.), so that no matter how arranged and combined, the final value is not the same.

We design and record the values corresponding to these tags.

Marking

Value

Public

0x0001

Private

0x0002

Protected

0x0004

Static

0x0008

Final

0x0010

Volatile

0x0040

Transient

0x0080

Composite tags can be represented as adding them, such as public static, which is 0x0001 + 0x0008 = 0x0009.

In this way of assignment, there is no repetition after different permutations and combinations, and the tag can be easily deduced according to the value.

That's good. That's it.

Oh, by the way, the class information itself also has public ah private tag attributes. When I just recorded the class information, I forgot to add it first so as not to forget it later!

Let's look at the type section.

The current type is String and belongs to a class type in a reference data type.

Private String name

In addition, there are eight basic data types, and array types in reference types.

In order to take up less space, we represent it with the least number of symbols.

Symbolic representation

Types

B

Byte

C

Char

D

Double

F

Float

I

Int

J

Long

S

Short

Z

Boolean

LClassName

Class

[

Array

The basic data types and array types here take only one char to represent, which takes only 1 byte.

If it is a class, it takes up L and; two bytes, plus the number of bytes occupied by the full class name.

For example, the String type here, represented by symbols, is

Ljava/lang/String

Note, however, that the symbols here can also be stored in the constant pool, while the type descriptor part of our variable structure only needs a constant pool index.

Ok, the second part is done.

Let's look at the name part.

The name part has nothing to say, I believe you can guess directly, directly above the picture.

OK, the two-byte tag, the two-byte type descriptor, the two-byte variable name, this is the data structure of one of our variables.

Put it in our final general view.

Got it!

4 there may also be many methods. At present, I only have two methods. Let's use the add method to analyze.

Public int add (int a, int b) {return a + b;} of course, more precisely, I have an unwritten construction method.

In short, there may be a lot.

However, with the experience of designing variables, the data structure of the method soon took shape.

The value of the tag part is the same as that of the variable tag part, and we can just assign a value to them.

Marking

Value

Public

0x0001

Private

0x0002

Protected

0x0004

Static

0x0008

Final

0x0010

Volatile

0x0040

Transient

0x0080

Synchronized

0x0020

Native

0x0100

Abstract

0x0400

Method descriptor, which refers to the input participation return value of the method, such as our:

Int add (int a, int b)

The type symbol representation of the input participating return value is exactly the same as the symbol representation of the variable type above, except that there is an extra void type.

Symbolic representation

Types

B

Byte

C

Char

D

Double

F

Float

I

Int

J

Long

S

Short

Z

Boolean

LClassName

Class

[

Array

V

Void

Because there are multiple parameter types, you need to set an overall format, and the format of the entire descriptor is:

(parameter 1 type 2 type.) Return value type

Like ours.

Int add (int a, int b)

It is represented as

(II) I

Is it very concise? Again, this is also a string and can be stored in a constant pool, so I won't repeat it.

(as for the names of parameters an and b, they do not need to be saved. In fact, when running in the converted bytecode and in the actual virtual machine, you only need to know the location in the local variable table, and it doesn't matter what you call it.)

The name of the method, we are all too familiar with, put constant pool!

Ok, the first three are done. The last one, it's interesting.

Code, exceptions, comments, etc. As you can see, there is quite a lot of information to record.

For example, I write this way.

@ RequestMing () public String function (String a) throws Exception {return a;} then there will be code parts, exceptions, comments, and other information that need to be entered.

But it seems that except for the code part, other parts are not available in every method. If they are all defined, isn't it a waste of space? what should we do?

Following the example of constant pools, we call these parts "properties of methods". A method may have multiple properties, and the design structure is as follows.

In this way, what attributes the method has, just add them as needed, and if you don't need this attribute, you don't have to waste space, perfect!

Looking back at our method.

Public int add (int a, int b) {return a + b;} just now the method signature part has been solved, only the code

Return a + b

How do I store this?

I've heard Lao Xu say before that JVM recognizes something called bytecode, so I'm going to convert the code written in Java language into bytecode.

This part is very complicated, so I don't need to talk about my process. After some effort, I convert this simple line of code into bytecode.

1B 1C 60 AC

There are four bytes in total.

I put these four bytes in the properties of the code type just now.

Ok, it's done.

Looking back, we complete part of the previous method.

And then add this structure to our global structure.

Perfect!

5. Class I transformed myself into this structure, and with this final design draft, I went to Lao Xu.

Lao Xu: mm-hmm! Not bad!

Scum: of course. I've been studying it for a long time.

Lao Xu: but I'll change it for you and add something at the beginning.

Scum: Lao Xu, what are you adding here?

Lao Xu: you have no experience at first sight.

Magic number is generally used to identify the format of this file, through the file name suffix is unreliable, generally there is a format of the file will have a magic number.

The latter two are used to identify the version number, different versions may have different data structures and supported functions, which will be useful in the future!

Scum: I see, but you are still ignorant and knowledgeable. But you said to identify the format of this file, what is this file of mine?

Lao Xu: you stupid thing, just call it class file!

FlashObject.class

The postscript according to the Java virtual machine specification, Java Virtual Machine Specification Java SE 8 Edition, the standard structure of a class file, looks like this.

ClassFile {U4 magic; U2 minor_version; U2 major_version; U2 constant_pool_count; cp_info constant_ Pool [constant _ pool_count-1]; U2 access_flags; U2 this_class; U2 super_class; U2 interfaces_count U2 interfaces [interfaces _ count]; U2 fields_count; field_info fields [fields _ count]; U2 methods_count; method_info methods [methods _ count]; U2 attributes_count; attribute_info attributes [attributes _ count];} our design is almost the same.

Only the latter two items are not covered by us, and they are not the key points in themselves.

There are several types in constant pools.

If Constant TypeValueCONSTANT_Class7CONSTANT_Fieldref9CONSTANT_Methodref10CONSTANT_InterfaceMethodref11CONSTANT_String8CONSTANT_Integer3CONSTANT_Float4CONSTANT_Long5CONSTANT_Double6CONSTANT_NameAndType12CONSTANT_Utf81CONSTANT_MethodHandle15CONSTANT_MethodType16CONSTANT_InvokeDynamic18 wants to know the full details of the class file, the best way is to read the official documentation, which is part 4 of the Java virtual machine specification.

Chapter 4. The class File Format

The link here can be located directly:

Https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-4.html#jvms-4.3.2

Do not find the official documents obscure, this part is still very clear, most blogs basically lack of explanation of the format, and do not speak vividly, it is better to read the official documents directly.

Another good way is to directly observe the binary structure parsing of class files. A tool is recommended here.

Classpy

Use this tool to open a class file, which looks like this.

The tree structure parsed on the left can directly correspond to the binary content of the class file on the right, which is very easy to use.

Finally, I hope you can find time to use this tool to analyze a complex class file, which will be very helpful. I wish you all to learn class files.

End ~

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report