Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does the code you write run?

2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > IT Information >

Share

Shulou(Shulou.com)11/24 Report--

This article comes from the official account of Wechat: developing Internal skills practice (ID:kfngxl). Author: Zhang Yanfei allen

Hello, everyone. I'm Brother Fei!

Today we're going to think about a simple question: how does a program run on Linux?

Let's take the simplest Hello World program in the universe as an example.

# include int main () {printf ("Hello, World!\ n"); return 0;} after we have written the code, we simply compile it, and then start it under the shell command line.

# gcc main.c-o helloworld#. / helloworldHello, World! So what happened in the process of compiling and running? Today, let's take a closer look.

First, understand that the executable file format source code will generate an executable program file after compilation. Let's first understand what the compiled binary file looks like.

Let's first look at the format of this file using the file command.

# file helloworldhelloworld: the ELF 64-bit LSB executable, x86-64, version 1 (SYSV),... file command gives a summary of this binary file, where ELF 64-bit LSB executable indicates that the file is a 64-bit executable file in ELF format. X86-64 represents the cpu schema supported by the executable.

The full name of LSB is Linux Standard Base, which is the standard specification of Linux. The aim is to develop a series of standards to enhance the compatibility of Linux distributions.

ELF, whose full name is Executable Linkable Format, is a binary file format. The target file, executable file, and CoreDump under Linux are all stored in this format.

The ELF file consists of four parts, namely, the ELF header (ELF header), Program header table, Section and Section header table.

Next, let's introduce it one by one in several sections.

1.1 ELF file header ELF file header records the attribute information of the entire file. The original binary is very difficult to observe. However, we have a handy tool-readelf, which can help us view all kinds of information in ELF files.

Let's first look at the ELF header of the compiled executable file, which can be viewed using the-- file-header (- h) option.

# readelf-- file-header helloworldELF Header: Magic: 7f 45 4c 46 02 01 00 00 00 Class: ELF64 Data: 2's complement Little endian Version: 1 (current) OS/ABI: UNIX-System V ABI Version: 0 Type: EXEC (Executable file) Machine: Advanced Micro Devices X86-64 Version: 0x1 Entry Point address: 0x401040 Start of program headers: 64 (bytes into file) Start of section headers: 23264 (bytes into file) Flags: 0x0 Size of this header: 64 (bytes) Size of program headers: 56 (bytes) Number of program headers: 11 Size of section headers: 64 (bytes) Number The of section headers: 30 Section header string table index: 29ELF file header contains summary information about the current executable file Let me take out some of the key ones and explain them to you.

Magic: a string of special identification codes, mainly used by external programs to quickly identify the file and quickly determine whether the file type is ELF.

Class: indicates that this is an ELF64 file

Type: indicates an executable file for EXEC. Other file types include REL (relocatable object file), DYN (dynamic link library), and CORE (system debugging coredump file).

Entry point address: the address of the program entry, which shows that the entry is at the 0x401040 location

The size of the Size of this header:ELF header, which is shown here to occupy 64 bytes

The above fields are the overall description of ELF in the ELF header. In addition, there is descriptive information about program headers and section headers in the ELF header.

Start of program headers: indicates the location of the Program header

Size of program headers: each Program header size

Number of program headers: how many Program header are there altogether

Start of section headers: indicates the starting position of the Section header.

Size of section headers: the size of each Section header

Number of section headers: how many Section header are there altogether

1.2 Program Header Table before introducing Program Header Table, let's take a look at the similar concepts in the ELF file-Segment and Section.

The most important component within the ELF file is the Section. Each Section is generated by a compiler linker and has a different purpose. For example, the compiler will compile the code we wrote into .text Section and put global variables into .data or .bss Section.

But for the operating system, it does not care about what the specific Section is, it only cares about what permissions this piece of content should be loaded into memory, such as read, write, execute and other permission attributes. So Section with the same permissions can be put together to form a Segment to make it easier for the operating system to load more quickly.

Because Segment and Section are translated into Chinese, the meaning is so close that it is very difficult to understand. So in this article I will directly use the authentic concepts of Segment and Section instead of translating them into paragraphs or sections, which is too confusing.

Program headers table is used as the header information of all Segments to describe all Segments.

Use the-- program-headers (- l) option of the readelf tool to parse and see the content stored in this area.

# readelf-program-headers helloworldElf file type is EXEC (Executable file) Entry point 0x401040There are 11 program headers Starting at offset 64Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000400040 0x0000000000400040 0x0000000000000268 0x0000000000000268 R 0x8 INTERP 0x00000000000002a8 0x00000000004002a8 0x00000000004002a8 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: / lib64/ld-linux-x86-64.so.2] LOAD 0x0000000000000000 0x0000000000400000 0x0000000000400000 0x0000000000000438 0x0000000000000438 R 0x1000 LOAD 0x0000000000001000 0x0000000000401000 0x0000000000401000 0x00000000000001c5 0x00000000000001c5 RE 0x1000 LOAD 0x0000000000002000 0x0000000000402000 0x0000000000402000 0x0000000000000138 0x0000000000000138 R 0x1000 LOAD 0x0000000000002e10 0x0000000000403e10 0x0000000000403e10 0x0000000000000220 0x0000000000000228 RW 0x1000 DYNAMIC 0x0000000000002e20 0x0000000000403e20 0x0000000000403e20 0x00000000000001d0 0x00000000000001d0 RW 0x8 NOTE 0x00000000000002c4 0x00000000004002c4 0x00000000004002c4 0x0000000000000044 0x0000000000000044 R 0x4 GNU_EH_FRAME 0x0000000000002014 0x0000000000402014 0x0000000000402014 0x000000000000003c 0x000000000000003c R 0x4 GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 RW 0x10 GNU_RELRO 0x0000000000002e10 0x0000000000403e10 0x0000000000403e10 0x00000000000001f0 0x00000000000001f0 R 0x1 Section to Segment ming: Segment Sections... 00 01 .interp 02 .interp. Note.gnu.build-id. Note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version _ r .rela.dyn .rela.plt 03 .init .plt .text .fini 04 .rodata .eh _ frame_hdr .eh _ frame 05 .init _ array .fini _ array .dynamic .got .got.plt .data .bss 06 .dynamic 07 .note. Gnu.build-id. Note.ABI-tag 08. Eh _ frame_hdr 09 10. Init _ array. Fini _ array. Dynamic. Got the above results show that there are a total of 11 program headers.

For each segment, Offset, VirtAddr and other information describing the current segment are output. Offset represents the starting position of the current segment in the binary file, and FileSiz represents the size of the current segment. Flag represents the permission type of the current segment, R for readable, E for executable, and W for writable.

At the bottom, it also shows how many Section each segment is made up of, for example, paragraph 03 is made up of ".init .plt. Text. Fini" four Section.

1.3 Section Header Table differs from Program Header Table in that Section header table directly describes each Section. Both describe a variety of Section, but for different purposes, one for loading and one for links.

Use the-section-headers (- S) option of the readelf tool to parse and see the content stored in this area.

# readelf-section-headers helloworldThere are 30 section headers, starting at offset 0x5b10:Section Headers: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align. [13] .text PROGBITS 0000000000401040 00001040 0000000000000175 0000000000000000 AX 00 16. [23] .data PROGBITS 0000000000404020 00003020 0000000000000010 0000000000000000 WA 008 [24] .bss NOBITS 0000000000404030 00003030 0000000000000008 0000000000000000 WA 001. Key to Flags: W (write), A (alloc), X (execute), M (merge), S (strings), I (info), L (link order), O (extra OS processing required), G (group), T (TLS), C (compressed), x (unknown), o (OS specific), E (exclude), l (large), p (processor specific) results show that the file has a total of 30 Sections, and the position of each Section in the binary file is indicated by the Offset column. The size of the Section is reflected by the Size column.

Each of these 30 Section has a unique role. The code we write is compiled into binary instructions and put into the .text Section. In addition, we see that the address shown in the Address column of the .text segment is 00000000401040. Recall that we saw in the header of the ELF file that the entry address shown by Entry point address is 0x401040. This means that the entry address of the program is the address of the .text segment.

Two other Section that are worth watching are .data and .bss. The global variable data in the code will occupy some place in these two Section after compilation. As shown in the following simple code.

/ / the uninitialized memory area is located in the .bss segment int data1; / / the initialized memory region is located in the .data segment int data2 = 100; / / the code is located in the .text section int main (void) {} 1.4 entry for further inspection. Next, we want to take another look at the program entry 0x401040 we mentioned earlier to see what it is. This time we use the nm command to take a closer look at the symbols and their address information in the executable file. The-n option is used to sort the displayed symbols by address, not by name.

# nm-n helloworld w _ gmon_start__ U _ libc_start_main@@GLIBC_2.2.5 U printf@@GLIBC_2.2.5 00000000401040 T _ start0000000000401126 T main can see from the above output that the program entry 0x401040 points to the address of the _ start function. After this function performs some initialization operations, our entry function main will be called, which is located at the 0x401126 address.

Second, an overview of the process of creating a user process after our code has been compiled to generate an executable program, the next step is to use shell to load it and run it. Generally speaking, shell processes load and run new processes through fork+execve. A core shell logic that simply loads helloworld commands is the following process.

/ / shell code sample int main (int argc, char * argv []) {pid= fork (); if (pid==0) {/ / if you are in the process / / use exec series functions to load and run the executable execve ("helloworld", argv, envp);} else {}} shell process first creates a process through the fork system call. Then call execve in the child process to load the executed program file, and then you can call to the running entrance of the program file to run the program.

The fork system call kernel entry is under kernel / fork.c.

/ / file:kernel/fork.cSYSCALL_DEFINE0 (fork) {return do_fork (SIGCHLD, 0,0, NULL, NULL); in the implementation of do_fork, the core is a copy_process function that generates a new task_struct by copying the parent process (thread).

/ / file:kernel/fork.clong do_fork () {/ / copy a task_struct to struct task_struct * p; p = copy_process (clone_flags, stack_start, stack_size, child_tidptr, NULL, trace); / / subtasks join the ready queue and wait for the scheduler to schedule wake_up_new_task (p) Apply for task_struct for the new process in the copy_process function, initialize the new process with the current process's own address space, namespace, etc., and apply for the process pid for it.

/ / file:kernel/fork.cstatic struct task_struct * copy_process () {/ / copy process task_struct structure struct task_struct * p; p = dup_task_struct (current); / / process core element initialization retval = copy_files (clone_flags, p); retval = copy_fs (clone_flags, p); retval = copy_mm (clone_flags, p); retval = copy_namespaces (clone_flags, p) / / apply for pid & & set the process number pid = alloc_pid (p-nsproxy-pid_ns); p-pid = pid_nr (pid); p-tgid = pmurpid;} after execution, enter wake_up_new_task to wait for the new process to be scheduled by the scheduler.

However, the fork system call can only copy a new process based on the current shell process. The code and data in this new process are exactly the same as the original shell process.

To load and run another program, such as the helloworld program we compiled, you also need to use the execve system call.

three。 Linux executable file loader actually Linux is not written to death can only load ELF an executable file format. When it starts, it loads all the executable parsers it supports. And use a formats two-way linked list to hold all parsers. The structure of the formats bi-directional linked list in memory is shown in the following figure.

Let's take ELF's loader elf_format as an example to see how this loader is registered. In Linux, each loader is represented by a linux_binfmt structure. It specifies the load_binary function pointer to load the binary executable file and the core_dump function to load the crash file. Its full definition is as follows

/ / file:include/linux/binfmts.hstruct linux_binfmt {int (* load_binary) (struct linux_binprm *); int (* load_shlib) (struct file *); int (* core_dump) (struct coredump_params * cprm);}; where the specific load function is specified in the ELF loader elf_format, for example, the load_binary member points to the specific load_elf_binary function. This is the entrance to the ELF load.

/ / file:fs/binfmt_elf.cstatic struct linux_binfmt elf_format = {.module = THIS_MODULE, .load _ binary = load_elf_binary, .load _ shlib = load_elf_library, .core _ dump = elf_core_dump, .min _ coredump = ELF_EXEC_PAGESIZE,}; the loader elf_format is registered with register_binfmt during initialization.

/ / file:fs/binfmt_elf.cstatic int _ init init_elf_binfmt (void) {register_binfmt (& elf_format); return 0;} and register_binfmt hangs the loader in the global loader list-formats global linked list.

/ / file:fs/exec.cstatic LIST_HEAD (formats); void _ _ register_binfmt (struct linux_binfmt * fmt, int insert) {insert? List_add (& fmt-lh, & formats): list_add_tail (& fmt-lh, & formats);} Linux supports other formats in addition to the elf file format. Search register_binfmt in the source directory to find loaders in all the formats supported by the Linux operating system.

# grep-r "register_binfmt" * fs/binfmt_flat.c: register_binfmt (& flat_format); fs/binfmt_elf_fdpic.c: register_binfmt (& elf_fdpic_format); fs/binfmt_som.c: register_binfmt (& som_format); fs/binfmt_elf.c: register_binfmt (& elf_format); fs/binfmt_aout.c: register_binfmt (& aout_format); fs/binfmt_script.c: register_binfmt (& script_format) Fs/binfmt_em86.c: register_binfmt (& em86_format); in the future, Linux will traverse the formats linked list when loading binaries, querying the appropriate loader based on the file format to be loaded.

4. Execve loading user program loading executable file is done by execve system call.

The system call reads the executable file name, parameter list, and environment variables entered by the user and starts to load and run the executable file specified by the user. The location of the system call is in the fs / exec.c file.

/ / file:fs/exec.cSYSCALL_DEFINE3 (execve, const char _ user *, filename,) {struct filename * path = getname (filename); do_execve (path-name, argv, envp)} int do_execve () {return do_execve_common (filename, argv, envp);} execve system called do_execve_common function. Let's look at the implementation of this function.

/ / file:fs/exec.cstatic int do_execve_common (const char * filename,) {/ / linux_binprm structure is used to save the parameter struct linux_binprm * bprm; / / 1 used when loading binaries. Request and initialize the brm object value bprm = kzalloc (sizeof (* bprm), GFP_KERNEL); bprm-file =; bprm-filename =; bprm_mm_init (bprm) bprm-argc = count (argv, MAX_ARG_STRINGS) Bprm-envc = count (envp, MAX_ARG_STRINGS); prepare_binprm (bprm); / / 2 traverses to find the appropriate binary loader search_binary_handler (bprm);} the specific work of applying and initializing the brm object in this function can be shown in the following figure.

In this function, the following three pieces of work are completed.

First, use kzalloc to apply for linux_binprm kernel objects. This kernel object is used to hold the parameters used when loading binaries. After the application is completed, the parameter object is initialized.

Second, a brand new mm_struct object is requested in bprm_mm_init, which is ready to be saved for use by the new process.

Third, apply a page of virtual memory space to the stack of the new process and record the stack pointer.

Fourth, read the header 128 bytes of the binary file.

Let's take a look at the code related to initializing the stack.

/ / file:fs/exec.cstatic int _ bprm_mm_init (struct linux_binprm * bprm) {bprm-vma = vma = kmem_cache_zalloc (vm_area_cachep, GFP_KERNEL); vma-vm_end = STACK_TOP_MAX; vma-vm_start = vma-vm_end-PAGE_SIZE; bprm-p = vma-vm_end-sizeof (void *) } in the above function, a vma object (representing a range in the virtual address space) is applied, vm_end points to STACK_TOP_MAX (near the top of the address space), and a Page size is left between vm_start and vm_end. In other words, the size of the 4KB is applied to the stack by default. Finally, record the stack pointer to bprm- > p.

Also take a look at prepare_binprm, which reads 128bytes from the file header in this function. The reason for doing this is to read the binary file header to make it easier to determine its file type later.

/ / file:include/uapi/linux/binfmts.h#define BINPRM_BUF_SIZE 128//file:fs/exec.cint prepare_binprm (struct linux_binprm * bprm) {memset (bprm-buf, 0, BINPRM_BUF_SIZE); return kernel_read (bprm-file, 0, bprm-buf, BINPRM_BUF_SIZE) } after applying for and initializing the brm object value, finally use the search_binary_handler function to traverse the registered loader in the system and try to parse and load the current executable file.

In Section 3.1 we introduced that all the loaders in the system are registered in the formats global linked list. The work of the function search_binary_handler is to traverse the global linked list and find the parser based on the file type data carried in the binary header. When found, the function that calls the parser loads the binary file.

/ / file:fs/exec.cint search_binary_handler (struct linux_binprm * bprm) {for try=0; try2; try++ {list_for_each_entry (fmt, & formats, lh) {int (* fn) (struct linux_binprm *) = fmt-load_binary; retval = fn (bprm); / / if (retval = 0) {return retval is returned if the load is successful } / / load failed to continue the loop to try to load} list_for_each_entry in the above code is traversing the global linked list of formats, traversing to determine whether each list element has a load_binary function. If so, call it to try to load.

Recall that 3.1registering the executable file loader, for the ELF file loader elf_format, the load_binary function pointer points to load_elf_binary.

/ / file:fs/binfmt_elf.cstatic struct linux_binfmt elf_format = {.module = THIS_MODULE, .load _ binary = load_elf_binary,}; then the load goes into the load_elf_binary function. This function is very long, it can be said that all the program loading logic is reflected in this function. According to the main work of this function, I will introduce it to you in the following five small parts.

In the course of the introduction, in order to express clearly, I will adjust the location of the source code slightly, which may be different from the order of the number of lines of kernel source code.

4.1 ELF file header read in load_elf_binary the ELF file header is read first.

The file header contains some data such as the current file format type, so some legitimacy judgment will be made after reading the file header. If it is not legal, exit and return.

/ / file:fs/binfmt_elf.cstatic int load_elf_binary (struct linux_binprm * bprm) {/ / 4.1The ELF header parses / / defines the structure problem and applies for memory to save the ELF header struct {struct elfhdr elf_ex; struct elfhdr interp_elf_ex;} * loc; loc = kmalloc (sizeof (* loc), GFP_KERNEL); / / get the binary header loc-elf_ex = * ((struct elfhdr *) bprm-buf) / / A pair of headers make a series of legal judgments, and if it is illegal, exit if (loc-elf_ex.e_type! = ET_EXEC &) {goto out;}} 4.2 Program Header read to record the number of Program Header in the ELF file header, and the ELF header is followed by Program Header Tables. So the kernel can then read out all the Program Header.

/ / file:fs/binfmt_elf.cstatic int load_elf_binary (struct linux_binprm * bprm) {/ / 4.1 ELF header parsing / / 4.2 Program Header read / / elf_ex.e_phnum saves the number of Programe Header / / then calculates all the Program Header sizes according to the Program Header size sizeof (struct elf_phdr) / / and reads it into size = loc-elf_ex.e_phnum * sizeof (struct elf_phdr) Elf_phdata = kmalloc (size, GFP_KERNEL); kernel_read (bprm-file, loc-elf_ex.e_phoff, (char *) elf_phdata, size);} 4.3 clear the resources inherited by the parent process. In the process created by the fork system call, it contains a lot of information about the original process, such as the old address space, signal table and so on. These are of little use when the new program is running, so they need to be emptied and processed.

The specific work includes initializing the signal table of the new process, applying the new address space object and so on.

/ / file:fs/binfmt_elf.cstatic int load_elf_binary (struct linux_binprm * bprm) {/ / 4.1 ELF header parsing / / 4.2 Program Header read / / 4.3.Clean the resources inherited by the parent process retval = flush_old_exec (bprm); current-mm-start_stack = bprm-p After emptying the resources inherited by the parent process (of course, the new mm_struct object is used), the address space pointer of the previously prepared process stack is directly set to the mm object. So that the stack can be used in the future.

4.4.Performing Segment loading. Next, the loader loads all the LOAD type Segment in the ELF file into memory. Use elf_map to allocate virtual memory in the virtual address space. Finally, the relevant pointers of each address space such as start_code, end_code, start_data, end_data and so on in the virtual address space mm_struct are set appropriately.

Let's take a look at the specific code:

/ / file:fs/binfmt_elf.cstatic int load_elf_binary (struct linux_binprm * bprm) {/ / 4.1 ELF header parsing / / 4.2 Program Header read / / 4.3 clear the resources inherited by the parent process / / 4.4 execute the Segment loading process / / traverse the Program Header for of the executable file (I = 0, elf_ppnt = elf_phdata; I

< loc->

Elf_ex.e_phnum; iTunes, elf_ppnt++) / / load only Segment of type LOAD, otherwise skip if (elf_ppnt-p_type! = PT_LOAD) continue / / create a memory mmap for Segment and map the contents of the program file to virtual memory space / / so that the code and data in the future program can be accessed error = elf_map (bprm-file, load_bias + vaddr, elf_ppnt, elf_prot, elf_flags, 0); / / calculate the member addresses needed for mm_struct start_code =; start_data = end_code =; end_data = } current-mm-end_code = end_code; current-mm-start_code = start_code; current-mm-start_data = start_data; current-mm-end_data = end_data;} where load_bias is the base address where Segment will be loaded into memory. There are several possibilities for this parameter.

A value of 0 is mapped in memory directly according to the address in the ELF file.

The value is aligned to the beginning of the integer page, and the physical file may be compact enough for the size of the executable, regardless of alignment. But the operating system needs to load the Segment to the beginning of the integer page in order to be efficient when loading.

Data memory request-heap initialization because the data segment of the process requires write permission, you need to use set_brk system calls to apply for virtual memory specifically for the data segment.

/ / file:fs/binfmt_elf.cstatic int load_elf_binary (struct linux_binprm * bprm) {/ / 4.1 ELF header parsing / / 4.2 Program Header read / / 4.3 clear the resources inherited by the parent process / / 4.4 execute the Segment loading process / / 4.5 data memory request & initialize retval = set_brk (elf_bss, elf_brk) } two things are done in the set_brk function: the first is to request virtual memory for the data segment, and the second is to initialize the start and end pointers of the process heap.

/ / file:fs/binfmt_elf.cstatic int set_brk (unsigned long start, unsigned long end) {/ / 1 request virtual memory for data segments start = ELF_PAGEALIGN (start); end = ELF_PAGEALIGN (end); if (end start) {unsigned long addr; addr = vm_brk (start, end-start);} / / 2 pointer to initialize the heap current-mm-start_brk = current-mm-brk = end; return 0 Because when the program is initialized, the heap is still empty. So when the heap pointer is initialized, the heap start address start_brk and end address brk are set to the same value.

4.6 Jump to program entry execution records the entry address of the program in the ELF file header. In the case of non-dynamic link loading, this is the entry address.

But if it is a dynamic link, that is to say, there is a Segment of type INTERP, the dynamic linker loads and runs first, and then calls it back to the code entry address of the program.

# readelf-- program-headers helloworldProgram Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align INTERP 0x00000000000002a8 0x00000000004002a8 0x00000000004002a8 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: / lib64/ld-linux-x86-64.so.2] for dynamic loader type You need to load the dynamic loader (the ld-linux-x86-64.so.2 file in this article's example) into the address space first.

The entry address of the dynamic loader is calculated after the loading is completed. This code I show below, impatient students can skip. Anyway, all you need to know is that the entry address of a program has been calculated.

/ / file:fs/binfmt_elf.cstatic int load_elf_binary (struct linux_binprm * bprm) {/ / 4.1 ELF header parsing / / 4.2 Program Header read / / 4.3 clear resources inherited by the parent process / / 4.4 execute Segment load / / 4.5 data memory request & heap initialization / / 4.6 jump to program entry to execute / / traverse program header table / for the first time / preprocess only for segment of PT_INTERP type / / this segment holds the path information of the dynamic loader in the file system for (I = 0) I

< loc->

Elf_ex.e_phnum; iTunes +) {.} / / traverses program header table for the second time, doing some special treatment elf_ppnt = elf_phdata; for (I = 0; I

< loc->

Elf_ex.e_phnum; iTunes, elf_ppnt++) {.} / / if the dynamic linker is specified in the program, read out the dynamic linker program if (elf_interpreter) {/ / load and return the code segment address elf_entry = load_elf_interp (& loc- > interp_elf_ex, interpreter, & interp_map_addr, load_bias) / / calculate the dynamic linker entry address elf_entry + = loc- > interp_elf_ex.e_entry;} else {elf_entry = loc- > elf_ex.e_entry;} / / Jump to the entry to execute start_thread (regs, elf_entry, bprm- > p);.} V, summarize a simple line of helloworld code, but it takes a lot of internal work to understand its running process clearly.

This article first leads you to understand and understand the binary runnable ELF file format. The ELF file consists of four parts, namely, the ELF file header (ELF header), Program header table, Section and Section header table.

When Linux initializes, it registers all supported loaders in a global linked list. For ELF files, its loader is defined in the kernel as elf_format, and its binary loading entry is the load_elf_binary function.

Generally speaking, shell processes load and run new processes through fork + execve. The purpose of executing the fork system call is to create a new process. However, the code and data of the new process created by fork are exactly the same as the content of the original shell process. To load and run another program, you also need to use the execve system call.

In the execve system call, a linux_binprm object is first requested. During the initialization of the linux_binprm, a brand new mm_struct object is requested, ready for use by the new process. A 4KB of virtual memory is also prepared for the stack of the new process. The first 128 bytes of the executable file are also read.

The next step is to call the load_elf_binary function of the ELF loader to actually load. The following steps will be performed:

ELF file header parsing

Program Header read

Clear the resources inherited by the parent process, using the new mm_struct and the new stack

Perform Segment load to load all the LOAD type Segment in the ELF file into virtual memory

Request memory for the data Segment and initialize the starting pointer of the heap

Finally, calculate and jump to the program entrance for execution.

When the user process starts, we can view each Segment in the process through the proc pseudo file.

# cat / proc/46276/maps00400000-00401000 Rafael fd:01 p 00000000 fd:01 396999 / root/work_temp/helloworld00401000-00402000 r-xp 00001000 fd:01 396999 / root/work_temp/helloworld00402000-00403000Rafael p 00002000 fd:01 396999 / root/work_temp/helloworld00403000-00404000rUV p 00002000 fd:01 396999 / root/work_temp/helloworld00404000-00405000 rw-p 00003000 fd:01 396999 / root/work_temp/helloworld01dc9000-01dea000 rw-p 00000000 00:00 0 [heap] 7f0122fbf000-7f0122fc1000 rw-p 00000000 00:00 0 7f0122fc1000-7f0122fe7000 Rukyu p 00000000 fd:01 1182071 / usr/lib64/libc -2.32.so7f0122fe7000-7f0123136000 r-xp 00026000 fd:01 1182071 / usr/lib64/libc-2.32.so.7f01231c0000-7f01231c1000 RMurray p 0002a000 fd:01 1182554 / usr/lib64/ld-2.32.so7f01231c1000-7f01231c3000 rw-p 0002b000 fd:01 1182554 / usr/lib64/ld-2.32.so7ffdf0590000-7ffdf05b1000 rw-p 00000000 00:00 0 [stack]. Although this article is very long, it is still only a string of the general loading startup process. If you encounter problems that you want to figure out in your work and study in the future, you can follow the ideas of this article to find specific problems in the source code, and then help you find the solutions to the problems at work.

Finally, careful readers may have found that there is some waste in the process of loading and running the new program in the example of this article. The fork system call first copies a lot of information about the parent process, while execve is re-assigned when it loads the executable program. So in actual shell programs, vfork is generally used. Its working principle is basically the same as that of fork, but the difference is that it copies less information that is not used in execve system calls, thus improving loading performance.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

IT Information

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report