Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analyzing the running Mechanism of web Program from Hello World

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "analyzing the running mechanism of web program from Hello World". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "analyzing the running mechanism of web program from Hello World".

The hidden process of the development platform

Each language has its own development platform, and most of our programs are born here. The transformation process from program source code to executable file is actually many steps and very complex, but the current development platform undertakes all these things on its own, which brings us convenience and conceals a large number of implementation details at the same time. So most programmers are only responsible for writing code, while other complex transformations are done silently by the development platform.

According to my understanding, the process from source code to executable file can be divided into the following stages:

1. From the source code to the machine language and organize the resulting machine language according to certain rules. Let's call it file A.

2. Link file A with file B needed to run A (such as library functions) to form file A +.

3. Load the file A + into memory and run the file

(in fact, if you are reading reference books or other materials, it may be more than these steps, but here I summarize it into 3 steps to simplify it.)

The key steps for these things to form executable documents are indispensable. Now you see being "deceived" by the development platform. The following part will clear the fog and return the true face of your development platform.

Target file

There is a classic saying in the computer field:

"any problem in computer science can be sloved by another layer of indirecition"

"any problem in computer science can be solved by adding an intermediate layer."

For example, to achieve the conversion from A to B, we can first convert A to the file A, and then convert the file A + to the file B we need. (in fact, this method is also described in Polya's "how to slove it". When solving a problem, you can simplify the problem by adding a middle layer.

Then the process from source code to executable files can be understood in this way. The same is true from source code to executable files, solving the problem by (constantly) adding an intermediate layer between them. And above, first convert the source program into the intermediate file A, and then convert the intermediate file into the target file we need.

This is the way of thinking when dealing with documents.

In fact, the above-mentioned document An is more professional: the target file. It is not an executable program and needs to be linked and loaded with other target files before it can be executed. For a source program, the first thing the development platform should do is to translate the source program into machine language. One of the most important parts is compilation. I believe many people know that the source code is translated into machine language (it is actually a pile of binary code). Compilation knowledge is very important, but it is not the focus of this article, interested can google.

Target file format:

Now let's take a look at how the target files mentioned above are organized (that is, the storage structure).

Origin:

Imagine how you would organize these binaries if you were to design them. Just as the items on the desk have to be sorted and placed neatly, the translated binaries are also sorted and stored for ease of management, putting the code together and the data together. In this way, the binary code is divided into different blocks to store. Such an area is something called a segment.

Standard:

Like many things in computer science, in order to facilitate people's communication, program compatibility and other issues. A standard was also set for this binary storage method, and COFF (common object file format) was born. Nowadays, the object file format of windows, Linux, and other mainstream operating systems is more or less the same as COFF, and can be regarded as a variation of it.

A.out:

A.out is the default name of the target file. That is, when compiling a file, if the compiled object file is not renamed, a file named a.out will be generated after compilation. I'm not going to dig into the exact reason why I used this name here. Those who are interested can google themselves.

The following figure gives you a more intuitive understanding of the target file:

The above figure is a typical structure of the target file, the actual situation may be different, but they are all derived from this basis.

ELF header: the * segments in the above figure. The header is the header of the object file and contains some basic information about the object file. Such as the version of the file, the target machine model, the program entry address, and so on.

Text segment: the data in it is mainly the code part of the program.

Data segment: the data part of a program, such as a variable.

Relocate the segment:

The relocation segment includes text relocation and data relocation, which contains relocation information. Generally speaking, there are situations in the code where external functions or variables are referenced. Since they are references, these functions and variables do not exist in the target file. When using them, give their actual address (this process occurs when linking). It is these relocation tables that provide information to find these actual addresses. After understanding the above, text relocation and data relocation are not difficult to understand.

Symbol table: the symbol table contains all the symbolic information in the source code. Includes each variable name, function name, and so on. It records the information of each symbol, such as the symbol "student" in the code, and the corresponding information of this symbol is included in the symbol table. Including the section where the symbol is located, its attributes (read and write permissions) and other related information.

In fact, the original source of the symbol table can be said to be in the lexical analysis stage of compilation. When doing lexical analysis, each symbol in the code and its attributes are recorded in the symbol table.

String table: a function similar to a symbol table, which stores some string information.

One more thing to say is that the target files are stored in binary, which is itself a binary file.

The real object file will be more complex than this model, but its idea is the same: it is stored by type, plus some information needed in the segments and links that describe the information of the target file.

A.out partition

Hello World

Empty words, let's now take a look at the compiled object file of hello world, which is described in C.

Simple hellow world source code:

/ * hello.c*/ # include int main () {int adep5; printf ("hellow world\ n");}

In order to have data to play in the data segment, "int axiom 5" is added here.

If you are on VC, click run to see the results. In order to see exactly how it is handled internally, we use GCC to compile.

Running

Gcc hello.c

If you look at our directory again, there will be more target files, a.out.

What we want to do now is to see what's in a.out. Maybe some children's shoes think back to using vim text to view it, and I was so naive at the time. But what an a.out is, how can it be exposed so simply? No, not vim. "most of the problems we have encountered have been encountered and solved by our predecessors." Yes, there is a very powerful tool called objdump. With it, we can thoroughly understand the various details of the target file, and of course, another one called readelf is also very useful, which will be introduced later. These two tools are usually included in Linux, and you can google them on your own.

Note: the code here is mainly compiled with GCC under Linux, and Objdump and readelf are used to view the object file. But I will put all the running results on the picture, so it's no problem to see the following without touching Linux's children's shoes before. I use ubuntu, and it feels good.

The following is the organizational structure of a.out: (starting address, size of each segment, etc.)

The command to view the target file is objdump-h a.out

Just like the format of the target file described above, you can see that it is classified and stored. The target file is divided into six paragraphs.

From left to right, the * * column (Idx Name) is the name of the segment, the second column (Size) is the size, VMA is the virtual address, LMA is the physical address, and File off is the offset within the file. That is, the distance from a reference in the paragraph (usually the beginning of the paragraph). The Algn of * * is a description of segment attributes. Ignore it for the time being.

"text" section: code snippet.

"data" section: that is, the data segment mentioned above, which stores the data in the source code, usually with initialized data.

"bss" segment: also a data segment that stores uninitialized data, which is stored separately because the space has not yet been allocated.

"rodata" segment: read-only data segment, in which the data stored is read-only.

The "cmment" stores the compiler version information.

The remaining two paragraphs are of no practical significance to our discussion, so we will not introduce them again. Think that they contain some linked, compiled, installed information.

Note:

The target file format here only lists the main parts of the actual situation. In fact, there are still some tables that are not listed. If you are also using Linux, you can use objdump-X to list more detailed paragraphs.

Go deep into a.out

The above part talks about the typical segments in the target file through examples, mainly the information of the segments, such as size and other related attributes.

So what exactly is there in these paragraphs? what exactly is stored in the "text" paragraph? it is still using our objdump.

Objdump-s a.out can view the hexadecimal format of the target file with the-s option.

The results are as follows:

As shown in the figure above, the hexadecimal representation of each segment is listed. You can see that the picture is divided into two columns, the left column is the hexadecimal representation, the right side shows the corresponding information. It is obvious that there is "hello world" in the read-only segment of "rodata". Khan, it seems that the "hello" in the program has the wrong number, followed by an extra "w". Screenshot is troublesome. Excuse me.

You can also check the ASCII value of "hellow world", and the corresponding hexadecimal is the content. The above paragraph of "comment" contains some version information of the compiler, and the following is the GCC compiler, followed by the version number.

A.out disassembly

The process of compilation always changes the source text into assembly form first, and then translates it into machine language. (adding a middle tier) after reading so many a.out, it is necessary to study his compilation form.

Objdump-d a.out can list the assembled form of the file. However, only the main part is listed here, that is, the main function part. In fact, there is still more work to be done at the beginning of the main function execution and after the main function execution. That is, initializing the function execution environment and releasing the space occupied by the function, and so on.

In the figure above, the hexadecimal form of the code is on the left and the assembly form is on the left. Children's shoes that are familiar with the compilation should be able to understand most of them, so I won't repeat them here.

A.out header file

When introducing the object file format, we mention the concept of overheader file, which contains some basic information about the object file. Such as the version of the file, the target machine model, the program entry address, and so on.

The following figure is in the form of a file header:

You can look at it with readelf-h. (what you see in the following figure is hello.o, which is the source file hello.c compiled but not linked. This is mostly the same as viewing a.out)

The figure is divided into two columns, the left column represents the attribute, the right side is the attribute value. * * rows are often called magic numbers. Followed by a series of numbers, the specific meaning of which is not to say, you can go to google.

Next is some information related to the target file. As it has little to do with the problem we are going to discuss, we will not discuss it here.

The above content uses specific examples to describe the internal organization of the target file, which is only an intermediate process in the process of generating an executable file, and how the program runs has not been discussed yet. how the object file is transformed into an executable file and how the executable file is executed will be discussed in the following sections

A simple understanding of links

The popular way to link is to put several executable files. If the function defined in file B is referenced in program A, in order for the function in A to be executed properly, you need to put the function part of B in the source code of A, then the process of merging An and B into a file is a link. There is a special process for linking programs, called linkers. He processes some input target files and synthesizes them into an output file. These target files often have mutual data and function references.

We have seen the disassembly form of hello world above, which is a file that has not been linked, that is, the address of an external function is not known when it is referenced, as shown below:

In the figure above, the cal instruction calls the printf () function, because the printf () function is not in this file at this time, so its address cannot be determined, and its address is represented by "ff ff ff" in hexadecimal. After the link, the address will become the actual address of the function, because the function has been loaded into the file after the connection.

Classification of links: according to the order of merging A-related data or functions into a file, links can be divided into static links and dynamic links.

Static link:

Complete the link before the program is executed. That is, the file can not be executed until the link is completed. But there is an obvious drawback, such as library functions. If both file An and file B need to use a library function, they will have this library function in the file they connect to after the link is completed. When An and B execute at the same time, there are two copies of the library function in memory, which is undoubtedly a waste of storage space. This waste is particularly evident when the scale expands. Static links also have disadvantages such as not easy to upgrade. In order to solve these problems, many programs now use dynamic links.

Dynamic links: unlike static links, dynamic links are linked only when the program is executed. That is when the program loads and executes. Again, in the above example, if both An and B use the library function Fun (), only one copy of Fun () is needed in memory when An and B execute.

There is still a lot of knowledge about links, which will be discussed in special articles in the future. Let's not talk about it here.

A simple explanation of loading

We know that in order to run the program, it is necessary to load the program into memory. In the past, the whole program was loaded into the physical memory, but now the virtual storage mechanism is generally adopted, that is, each process has a complete address space, which gives people the impression that each process can use the completed memory. A memory manager then maps the virtual address to the actual physical memory address.

According to the above description, the address of the program can be divided into virtual address and actual address. The virtual address is her address in her virtual memory space, and the physical address is the actual address where she was loaded.

Thank you for your reading, the above is the content of "analyzing the running mechanism of web program from Hello World". After the study of this article, I believe you have a deeper understanding of the problem of analyzing the running mechanism of web program from Hello World, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report