Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to master binary files

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "how to master binaries". Many people will encounter such a dilemma in the operation of actual cases, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Before we talk about the process, let's take a look at the binary executable and the target file. We know that there is a process after the program is running, so understanding the structure of the program is of great benefit to understanding the operating system. Three questions are thrown first: the file generated by the compiler after compiling the code is the object file. What is in the object file? What's in the executable file? What is the difference between an executable file and an object file?

This document will be explained throughout using the following simple C file:

# include int main () {printf ("hello"); return 0;} 1. Compilation process

The code needs to be compiled if it is to be an executable. The compilation process is divided into preprocessing, compilation, assembly and linking.

Preprocessing: mainly deals with the preprocessing instructions in the source code, which mainly refers to "# include", "# define" and so on. Its processing method is to replace the content where these instructions are located. For example, "# include" will bring in the entire header file.

Compilation: the preprocessed files in the previous step are analyzed for lexical analysis, syntax analysis and semantic analysis-a series of steps are optimized to generate assembly code. This is the most difficult and complex part of the compilation process. In the current version of GCC, the preprocessing and compilation processes have been merged into one step.

Assembler: converts assembly code into machine-executable instructions, which is a process of translation. There is an one-to-one correspondence between assembly instructions and the comparison table of machine instructions.

Link: assemble each independently compiled module, and these modules will refer to each other. Linking is to enable these independent modules to correctly reference each other to form a complete executable file. Links are divided into dynamic links and static links, which will be explained in the next article.

The entire compilation process using the above C file is as follows:

Preprocessing: gcc-E hello.c-o hello.i compilation: gcc-S hello.i-o hello.s compilation: gcc-c hello.s-o hello.o link: ld-static / usr/lib64/crt1.o / usr/lib64/crti.o / usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtbeginT.o-L / usr/lib/gcc/x86_64-redhat-linux/4.8.5-L / usr/lib-L. Hello.o-start-group-lgcc-lgcc_eh-lc-end-group / usr/lib/gcc/x86_64-redhat-linux/4.8.5/crtend.o / usr/lib64/crtn.o

If you don't want to go to so much trouble, just do it with one line of code:

Gcc hello.c-o hello2. What is an executable file

Executable files refer to files that can be loaded and executed by the operating system. Executable files are different in different systems. For example, an executable file in windows is a file with an exe suffix, while an executable file in linux can be a file with any suffix, but needs to add "executable" permission to the file. Note that executables cannot be common in the same operating system if the architecture of CPU is different. For example, executables on ARM architecture cannot be executed directly in X86 architecture in linux, because executables usually contain binary-encoded CPU instructions, and each CPU has a different set of instructions. Cross-compilation is usually required in this case: generating executable files for one platform on another, such as ARM executable files on X86 platforms.

3. Executable file type

The executable file type is PE (Portable Executable) on windows systems and ELF (Executable Linkable Format) on Linux systems.

Under linux, you can use the file command to view the file format of a file. An ordinary source file shows that:

[root@centos7-dev hello] # file hello.chello.c: C source, ASCII text

The target file is:

[root@centos7-dev hello] # file hello.ohello.o: ELF 64-bit LSB relocatable, x86-64, version 1 (SYSV), not stripped

The executable is:

[root@centos7-dev hello] # file hellohello: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID [sha1] = 6115b831b9be5d023a87ce84ecd72d44cbfa1548, not stripped

Not only executables can be stored according to these two types, target files (.obj / .o) can, link libraries (.dll / .so) can, and even coredump under linux is an executable file format. Since the storage structure of the target file is the same as that of the executable file, what is the difference between the two? Bottom line: the target file is an executable that has not yet been linked.

4. About Link

Why is there a link action?

First pull the time back to the era of punching stripes, when the computer first came out, the programs were written on the stripes. The initial program is a piece of paper tape, which is very simple and complete. Then as time goes on, there are more and more programs and more and more stripes. At some point, a programmer wanted to be lazy, and the A program he was writing wanted to use a function of the B program he had written before, so he cut out the code on the B strip and spliced it into the A program. This is the earliest link, and it should also be the earliest static link.

In the process of modern software development, the number of program files is very large, often a project has thousands of modules and files. These modules and files are independent and dependent on each other, and it is common to call each other. If you think about it, what should such a huge project do to compile an executable file? First, each code file is compiled separately to get the target file, and then these target files are pinched together to form an executable file. And the process of pinching together is the link.

Without this linking process, the target files of thousands of modules would not work at all because they could not be executed. Someone said: the whole project project can only write a file, only to compile this file can not be linked. That's right, but that's not a good way to do multi-person collaborative development.

5.ELF structure

Because I am not familiar with the executable file structure of windows system, I will analyze the executable file structure ELF of linux system here. ELF contains compiled machine instructions and data, symbol tables, debugging information, strings, etc., these different types of information will be stored separately in a module, generally referred to as "segments".

First of all, we need to know which paragraphs are available in ELF. Please take a look at the following materials:

.bss constitutes the uninitialized data of the memory image of the program. By definition, the system initializes the data to zero when the program starts running. As indicated by the section type SHT_NOBITS, this section does not take up any file space. .comment comment information, usually provided by components of the compilation system. .data, .data1 constitute the initialized data of the memory image of the program. .dynamic dynamic link information. The string required by .dynstr for dynamic linking, usually a string that represents the name associated with each item in the symbol table. .dynsym dynamic link symbol table. .eh _ frame_hdr, .eh _ frame are used to expand the call frame information of the stack. The .fini executable instruction that constitutes a single termination function for the executable or shared object file that contains this section. An array of .fini _ array function pointers to form a single terminating array of executable files or shared target files containing this section. .got global offset table. Hash symbol hash table. An .init executable instruction that constitutes a single initialization function for the executable file or shared object file that contains this section. An array of .init _ array function pointers to form a single initialization array of executable files or shared target files containing this section. The pathname of the interpreter of the .interp program. .lbss is specific to x64 uninitialized data. This data is similar to .bss, but is used for sections larger than 2 GB in size. .ldata, .ldata1 are x64-specific initialized data. This data is similar to .data, but is used for sections larger than 2 GB in size. .lrodata, .lrodata1 are x64-specific read-only data. This data is similar to .rodata, but is used for sections larger than 2 GB in size. The information in this format is described in the .note comments section. .plt process link table. The .preinit _ array function pointer array, which is used to form a single preinitialized array of executable files or shared target files containing this section. .rela does not apply to the relocation of specific sections. One of the uses of this section is for register relocation. .relname, .relaname relocation information, as described in the relocation section. If the file has loadable segments that include relocation, the properties of this section will include the SHF_ALLOC bit. Otherwise, the bit is disabled. Typically, the name is provided by the section that applies the relocation. Therefore, the name of the relocation section of .text is usually .rel.text or .rela.text. .rodata, .rodata1 usually constitute read-only data for non-writable segments in a process image. The name of the .shstrtab section. The .strtab string, which is usually a string that represents the name associated with the symbol table item. If the file has a loadable segment that includes a table of symbolic strings, the properties of this section will include the SHF_ALLOC bit. Otherwise, the bit is disabled. ".symtab symbol table, as described in the symbol table section." If the file has a loadable segment that includes a symbol table, the properties of this section include the SHF_ALLOC bit. Otherwise, the bit is disabled. .symtab _ shndx this section contains an array of section indexes for special symbol tables, as described in .symtab. If the associated symbol table section includes the SHF_ALLOC bit, the properties of this section will also include that bit. Otherwise, the bit is disabled. .tbss this section contains uninitialized thread local data that makes up the memory image of the program. By definition, when you instantiate data for each new execution stream, the system initializes the data to zero. As indicated by the section type SHT_NOBITS, this section does not take up any file space. Sections .tdata and .tdata1 contain initialized thread local data that forms the memory image of the program. For each new execution flow, a copy of its contents is instantiated. The text or executable instruction of a .text program.

With the above information, we can use the objdump command to see which modules are in the hello executable:

[root@centos7-dev hello] # objdump-h hellohello: file format elf64-x86-64Sections:Idx Name Size VMA LMA File off Algn 0.interp 0000001c 00000000004002380000000040023800000238 2 CONTENTS, ALLOC, LOAD, READONLY, DATA 1. Note.ABI-tag 00000020 000000004002540000000000250000000025400000254 2 CONTENTS, ALLOC, LOAD, READONLY DATA 2. Note.gnu.build-id 00000024 0000000000400274 00000000400274 00000274 2, CONTENTS, ALLOC, LOAD, READONLY, DATA 3 .gnu.hash 0000001c 00000000400298 00000000400298 00000298 2 cycles 3 CONTENTS, ALLOC, LOAD, READONLY, DATA 4 .dynsym 000060 00000000004002b8 00000000004002b8 000002b8 2 cycles 3 CONTENTS, ALLOC, LOAD, READONLY, DATA 5 .dynstr 00003f 00000000400318 000000400318 0CONTENTS, ALLOC, LOAD, READONLY 318 DATA 6 .gnu.version 00000008 00000000400358 000000400358 00000358 2 cycles 1 CONTENTS, ALLOC, LOAD, READONLY, DATA 7 .gnu.version _ r 000020 00000000400360 00000000400360 003503 CONTENTS, ALLOC, LOAD, READONLY, DATA 8 .rela.dyn 000018000000000000000040038000000380 2 cycles 3 CONTENTS, ALLOC, LOAD, READONLY, DATA 9 .rela.plt 00004800000000000000000000000000000000398 398 CONTENTS, ALLOC, LOAD, READONLY DATA 10 .init 0000001a 00000000004003e0 00000000004003e0 000003e0 2room2 CONTENTS, ALLOC, LOAD, READONLY, CODE 11 .plt 0000400000000040040000000040040000004004000000000000004004004000000000400400000000040040000000400440 00000440 2 steps 4 CONTENTS, ALLOC, LOAD, READONLY, CODE 13 .fini 00000009 00000000004005c4 00000000004005c4 000005c4 2 steps 2 CONTENTS, ALLOC, LOAD, READONLY CODE 14 .rodata 00000016 00000000004005d0 00000000004005d0 000005d0 2 room3 CONTENTS, ALLOC, LOAD, READONLY, DATA 15 .eh _ frame_hdr 00000034 00000000004005e8 00000000004005e8 000005e8 2 room2 CONTENTS, ALLOC, LOAD, READONLY, DATA 16 .eh _ frame 000000f4 00000000400620 00000000400620 00000620 room3 CONTENTS, ALLOC, LOAD, READONLY, DATA 17 .init _ array 00000008 0000000000600e10 0000000000600e10 00000e10 2 room3 CONTENTS, ALLOC, LOAD DATA 18 .fini _ array 00000008 0000000000600e18 0000000000600e18 00000e18 2 room3 CONTENTS, ALLOC, LOAD, DATA 19 .jcr 00000008 0000000000600e20 0000000000600e20 00000e20 2 room3 CONTENTS, ALLOC, LOAD, DATA 20 .dynamic 000001d0 0000000000600e28 0000000000600e28 00000e28 2 room3 CONTENTS, ALLOC, LOAD, DATA 21. Got 00000008 0000000000600ff8 0000000000600ff8 00000ff8 2 room3 CONTENTS, ALLOC, LOAD DATA 22 .got.plt 00000030 00000000601000 000000601000000000601000000000003 CONTENTS, ALLOC, LOAD, DATA 23 .data 0000004 0000000000601030 00000000601030 0000000000000000000000000000000000601034 00000000001034 2 room0 ALLOC 25 .comment 00005a 000000000000000000001034 CONTENTS, READONLY

Then use the same command to view the target file:

[root@centos7-dev hello] # objdump-h hello.ohello.o: file format elf64-x86-64Sections:Idx Name Size VMA LMA File off Algn 0 .text 0000001a 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000005a 2 minutes 0 CONTENTS, ALLOC, LOAD DATA 2 .bss 00000000 00000000000000000000000000000005a 2 minutes 0 ALLOC 3 .rodata

It can be found that the module information inside the target file is much less than that of the executable file, because there is a lot of processing during the process of linking the target file to an executable file, such as address and space allocation, relocation, symbol resolution, and so on.

We can also use objdump to view the code information for each module:

Objdump-s-d hello

View the symbol table:

Objdump-x hello

More information about the use of the objdump command can be found at man objdump.

This is the end of "how to master binaries". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report