In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Editor to share with you the example analysis of the Linux program compilation process, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
This article will show you how to convert a program written in the high-level Cpicurus + language into binary code that can be executed by the processor, including four steps:
Preprocessing (Preprocessing)
Compile (Compilation)
Compilation (Assembly)
Link (Linking)
Introduction to GCC tool chain
Commonly known as GCC is the abbreviation of GUN Compiler Collection, which is a commonly used compilation tool on Linux system. GCC tool chain software includes GCC, Binutils, C runtime and so on.
GCC
GCC (GNU C Compiler) is a compilation tool. In this paper, the process of converting a program written in Chample + language into binary code that can be executed by the processor is completed by the compiler.
Binutils
A set of binary program processing tools, including: addr2line, ar, objcopy, objdump, as, ld, ldd, readelf, size, etc. This set of tools are indispensable for development and debugging, which are briefly introduced as follows:
Addr2line: used to convert the program address to the corresponding program source file and the corresponding line of code, and the corresponding function can also be obtained. This tool will help the debugger locate the corresponding source code during debugging.
As: mainly used for assembly. For a detailed introduction to assembly, see below.
Ld: mainly used for links, see below for a detailed description of links.
Ar: mainly used to create static libraries. In order for beginners to understand, the concepts of dynamic library and static library are introduced here:
If you want to generate multiple .o object files into a single library file, there are two types of libraries, one is static libraries, and the other is dynamic libraries.
In windows, static libraries are files with the suffix .lib, and shared libraries are files with the suffix .dll. In linux, static libraries are files with the suffix .a, and shared libraries are files with the suffix .so.
The difference between static and dynamic libraries is that the code is loaded at different times. The code of the static library has been loaded into the executable program during the compilation process, so it is larger. The code of the shared library is loaded into memory when the executable program is running, and there are only simple references during compilation, so the code is small. On Linux systems, you can use the ldd command to view a shared library that an executable program depends on.
If there are multiple programs that need to be run at the same time in a system and there are shared libraries between these programs, then using the form of dynamic libraries will save memory.
Ldd: can be used to view a shared library on which an executable depends.
Objcopy: translate one object file into another, such as .bin to .elf, or .elf to .bin, etc.
Objdump: the main function is disassembly. For a detailed introduction to disassembly, see below.
Readelf: to display information about the ELF file, see later for more information.
Size: lists the size and total size of each part of the executable file, code snippet, data segment, total size, etc., see the following for specific examples of using size.
C runtime
The C language standard mainly consists of two parts: one describes the syntax of C, and the other describes the C standard library. The C standard library defines a set of standard header files, each of which contains some related functions, variables, type declarations and macro definitions. For example, the common printf function is a C standard library function, and its prototype is defined in the stdio header file.
The C language standard only defines the prototype of C standard library functions and does not provide implementation. Therefore, C language compilers usually need the support of a C runtime library (C Run Time Libray,CRT). The C runtime library is often referred to as the C runtime. Similar to the C language, C++ defines its own standards and provides related support libraries, called the C++ runtime library.
Preparatory work
Because the GCC toolchain is mainly used in the Linux environment, this paper will also use the Linux system as the working environment. In order to demonstrate the whole process of compilation, this section first prepares a simple Hello program written in C as an example, whose source code is as follows:
# include / / this program is simple to print a string of Hello World. Int main (void) {printf ("Hello World!\ n"); return 0;}
Compilation process
1. Pretreatment
The process of pretreatment mainly includes the following processes:
Delete all # define, expand all macro definitions, and handle all conditional precompiled instructions, such as # if # ifdef # elif # else # endif, etc.
Process the # include precompiled instruction and insert the included file into the location of the precompiled instruction.
Delete all comments "/ /" and "/ * /".
Add line numbers and file identifiers to generate debug line numbers and compilation error warning line numbers at compile time.
Keep all the # pragma compiler instructions and use them later in the compilation process.
The commands for preprocessing with gcc are as follows:
$gcc-E hello.c-o hello.i / / option to preprocess the source file hello.c file to generate hello.i / / GCC-E causes GCC to stop immediately after preprocessing
The hello.i file can be opened as a plain text file, and the code snippet is as follows:
/ / hello.i code snippet extern void funlockfile (FILE * _ stream) _ attribute__ ((_ _ nothrow__, _ _ leaf__)); # 942 "/ usr/include/stdio.h" 3 4 # 2 "hello.c" 2 # 3 "hello.c" int main (void) {printf ("Hello World!"\ n"); return 0;}
two。 Compile
The compilation process is to generate the corresponding assembly code after a series of lexical analysis, syntax analysis, semantic analysis and optimization of the preprocessed files.
The commands to compile with gcc are as follows:
$gcc-S hello.i-o hello.s / / the option to compile the preprocessed hello.i file to generate the assembler hello.s / / GCC-S causes GCC to stop after compilation and generate the assembler
The code snippet of the assembler hello.s generated by the above command is shown below, all of which is assembly code.
/ / hello.s code snippet main: .LFB0: .cfi _ startproc pushq% rbp .cfi _ def_cfa_offset 16 .cfi _ offset 6,-16 movq% rsp,% rbp .cfi _ def_cfa_register 6 movl $.LC0,% edi call puts movl $0,% eax popq% rbp .cfi _ def_cfa 7, 8 ret .cfi _ endproc
3. Compilation
The assembly procedure calls to process the assembly code to generate instructions that the processor can recognize and save them in the object file with the suffix .o. Since almost every assembly statement corresponds to a processor instruction, assembly is relatively simple compared with the compilation process, and can be translated one by one according to the comparison table of assembly instructions and processor instructions by calling the assembler as in Binutils.
When the program consists of multiple source code files, each file must be assembled first, and the .o object file is generated before you can proceed to the next link work. Note: the target file is already part of the final program, but cannot be executed until it is linked.
The commands to assemble using gcc are as follows:
$gcc-c hello.s-o hello.o / / the option to assemble the compiled hello.s file to generate the target file hello.o / / GCC-c causes the GCC to stop after the assembly, generate the object file / / or directly call as to assemble $as-c hello.s-o hello.o / / use the as in Binutils to assemble the hello.s file to generate the target file
Note: the hello.o target file is a redirectable file in ELF (Executable and Linkable Format) format.
4. Link
Links are also divided into static links and dynamic links, and the main points are as follows:
Static linking refers to adding the static library directly to the executable file during the compilation phase, so that the executable file will be larger. The linker copies the code of the function from its location (in a different target file or static link library) to the final executable program. In order to create an executable file, the main tasks that the linker must accomplish are symbol parsing (linking the definition and reference of the symbol in the target file) and relocation (mapping the symbol definition to the memory address and then modifying all references to the symbol).
Dynamic linking means that only some description information is added in the link stage, and the corresponding dynamic library is loaded into memory from the system when the program is executed.
In Linux systems, the order of searching paths for dynamic libraries when gcc compiles links is usually: first from the path specified by the parameter-L of the gcc command; then from the path specified by the environment variable LIBRARY_PATH; and then from the default path / lib, / usr/lib, / usr/local/lib.
In Linux systems, the order of dynamic library search paths when executing binaries is usually as follows: first search the dynamic library search path specified when compiling the object code; then address the path specified by the environment variable LD_LIBRARY_PATH; then search the path specified in the configuration file / etc/ld.so.conf; and then search from the default path / lib, / usr/lib.
On Linux systems, you can use the ldd command to view a shared library that an executable program depends on.
Because the paths of linked dynamic libraries and static libraries may coincide, if there are static library files and dynamic library files with the same name in the path, for example, dynamic libraries are preferred by default when libtest.an and libtest.so,gcc are linked, libtest.so will be linked. If you want gcc to choose the link libtest.a, you can specify the gcc option-static, which forces the use of static libraries for linking. Take Hello World as an example:
If you use the command "gcc hello.c-o hello", the dynamic library is used for linking, and the size of the generated ELF executable (viewed using the Binutils's size command) and the linked dynamic library (viewed using the Binutils's ldd command) are as follows:
$gcc hello.c-o hello $size hello / / use size to view the size text data bss dec hex filename 1183 552 8 1743 6cf hello $ldd hello / / you can see that the executable file links to many other dynamic libraries Mainly Linux's glibc dynamic library linux-vdso.so.1 = > (0x00007fffefd7c000) libc.so.6 = > / lib/x86_64-linux-gnu/libc.so.6 (0x00007fadcdd82000) / lib64/ld-linux-x86-64.so.2 (0x00007fadce14c000)
If you use the command "gcc-static hello.c-o hello", the static library is used for linking, and the size of the generated ELF executable (viewed using Binutils's size command) and the linked dynamic library (viewed using Binutils's ldd command) are as follows:
$gcc-static hello.c-o hello $size hello / / use size to view the size of text data bss dec hex filename 823726 7284 6360 837370 cc6fa hello / / you can see that the code size of text has become extremely large $ldd hello not a dynamic executable / / indicates that there are no linked dynamic libraries
The final file generated by the linker is an executable file in ELF format. An ELF executable file is usually linked into different segments, such as .text, .data, .rodata, .bss and so on.
Analyze ELF files
Segments of the 1.ELF file
The ELF file format is shown in the following figure, and the segments between ELF Header and Section Header Table are Section. A typical ELF file contains the following sections:
.text: the instruction code snippet of the compiled program.
.rodata: ro stands for read only, that is, read-only data (such as constant const).
.data: initialized C program global variables and static local variables.
.bss: uninitialized C program global variables and static local variables.
.debug: debug the symbol table, and the debugger uses the information in this section to help debug.
You can use readelf-S to view information about each of its section as follows:
$readelf-S hello There are 31 section headers, starting at offset 0x19d8: Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [0] NULL 0000000000000000 00000000 00000000000000000000000. [11] .init PROGBITS 00000000004003c8 000003c8 0000000000001a 00000000000000 AX 004. [14] .text PROGBITS 0000000000400430 00000430 0000000000180000000000000000 AX 00 16 [15] .fini PROGBITS 00000000004005b4 000005b4.
two。 Disassemble ELF
Since the ELF file cannot be opened as a plain text file, if you want to view the instructions and data contained in an ELF file directly, you need to use the disassembly method.
Disassemble it using objdump-D as follows:
$objdump-D hello. 0000000000400526: / / PC address / / PC address of the main tag: assembly format of instruction encoding instructions 400526: 55 push% rbp 400527: 48 89 e5 mov% rsp,%rbp 40052a: bf c4 05 40 00 mov $0x4005c4 % edi 40052f: E8 cc fe ff ff callq 400400 400534: B8 00 00 00 mov $0x0 mov% eax 400539: 5d pop% rbp 40053a: c3 retq 40053b: 0f 1f 44 00 00 nopl 0x0 (% rax,%rax,1).
Disassemble it using objdump-S and mix its C source code:
$gcc-o hello-g hello.c / / add-g option $objdump-S hello. 0000000000400526: # include int main (void) {400526: 55 push% rbp 400527: 48 89 e5 mov% rsp,%rbp printf ("Hello World!"\ n "); 40052a: bf c4 05 40 00 mov $0x4005c4 return% edi 40052f: E8 cc fe ff ff callq 400400 return 0 400534: b8 00 00 00 mov $0x0 rbp% eax} 400539: 5d pop% rbp 40053a: c3 retq 40053b: 0f 1f 44 00 00 nopl 0x0 (% rax,%rax,1). The above is all the contents of the article "sample Analysis of the compilation process of Linux programs". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.