In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article will explain in detail how the linker script syntax in Linux Kernel compilation and links is, and the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.
First of all, let's talk about how this problem came about.
When I was compiling the kernel, I found a document like this in the arch/arm/kernel directory: vmlinux.lds.S. At first glance, do you think about compiling documents? If you open it, it doesn't seem to be. So what does it do? And as I said earlier, when I make version 1, I found that the use of this file in the ld command, that is, ld-T vmlinux.lds.S, seems to be used for the link command, as shown below
For example, arm-linux-ld-EL-p-- no-undefined-X-- build-id-o vmlinux-T arch/arm/kernel/vmlinux.lds. Man ld, get-T means: specify a Linker script for ld, which means that ld generates the final binary based on the contents of the file.
Maybe you have never paid attention to the above problem, but when you study the kernel code, there is often a place to say, "the _ _ init macro will generate a specific section in the last module, and then when the kernel loads, look for the function in this section." to put it bluntly, the above sentence means that there is a specific section in the final generated module. What is this?
Well, I hope the above questions arouse your curiosity. Next we come to literacy, and finally we will give a link address, where each viewer can go for further study.
What is a section?
Well, we need to explain how the binary executables generated by compiling links (such as ELF,EXE), so or dll, the kernel (uncompressed, participate in the first section of this series, vmlinux), or ko are organized.
In fact, we all know more or less what text/bss/data sections (also known as section) are included in these binaries. The text section stores code, the data stores static variables that have been initialized, and the bss section stores something uninitialized.
I won't look into what's on it. Anyway, a binary will eventually contain a lot of section. So, why is section called text/bss/data? can you call it another name?
OK, yes. But you have to tell ld, so all you have to do is specify a linker script with the-T option. We will introduce these contents in the following examples.
(again and again, we are only throwing a brick to attract jade in theory. We hope that those who are interested in watching the official's own research and paying attention to share your achievements with us. )
Introduction to basic knowledge of link script
The syntax in linker script is linker command language (simple language, don't be afraid...). So what is the purpose of LS?
LS describes how the section in the input file (that is, the .o file generated by the gcc-c command, the object file) eventually corresponds to an output file. This is easy to understand, for example, an elf consists of three .o files, each of which has a text/data/bss segment, but the final elf merges the segments of the three input .o files together.
All right, let's introduce some basics:
The function of ld is to assemble input files into an output file. There is a special organizational structure inside these files, which is called object file format. Each file is called object file (this is probably the origin of the .o file. Haha, the output file is also called an executable file (an executable), but for ld, it is also a kind of object file. So what's so special about Object files? Well, its internal organization is organized according to section (paragraph, or section, no longer distinguishing between the two). In a word, the object file contains segments inside.
Each segment has a name and size. In addition, the segment also contains some data, which is called section contents, which is later called segment content. Each segment has different attributes. For example, the text segment is marked loadable, indicating that the contents within that segment needs to be loaded into memory at run time (when the output file is executed, of course). Other segments do not have contents, so these segments are marked as allocatable, which means that some memory needs to be allocated (sometimes the memory will be initialized to 0, which should be the BSS segment. The BSS segment does not take up space in the binary file, that is, the size of the binary file on disk is relatively small, but after loading into memory, you need to allocate memory space for the BSS segment. There are also segments that belong to debug, which contain some debug information
Since you need to load into memory, what is the address to load into memory? Both loadable and allocable segments have two addresses: VMA: the virtual address, that is, the address of the program at run time. For example, if the VMA first address of the text segment is set to 0x800000000, then this is the first address of the runtime. There is also a LMA, or Load memory address. This address is the address when the section was loaded. Are you dizzy? What's the difference between the two? In general, VMA=LMA. But there are exceptions. For example, set the LMA of a data segment in ROM (that is, copy it to ROM when loading) and copy it to RAM at run time, so that LMA and VMA are different. -"it's hard to understand, isn't it? This approach is used to initialize some global variables based on that ROM based system. (ask a question, when run, how to make corresponding settings according to the VMA in section? You may need to study the content of the execve implementation in the kernel in the future. About VMA and LMA, you can see it through the objdump-h option.
Three simple examples
Here's a simple example.
SECTIONS
{
. = 0x10000
.text: {* (.text)}
. = 0x8000000
.data: {* (.data)}
.bss: {* (.bss)}
}
SECTIONS is the key command in LS syntax, which is used to describe the memory layout of the output file. For example, the above example contains three parts of text/data/bss (in fact, text/data/bss is the paragraph, but the word SECTIONS is a command in LS, I hope you will understand it).
. = 0x10000; one of them. Very critical, it stands for location counter (LC). It means that the beginning of the .text segment is set at 0x10000. This LC should refer to LMA, but in most cases VMA=LMA.
.text: {* (.text)}, which indicates that the .text segment of the output file consists of .text segments of all input files (*). The composition order is the order in which the input files are entered in the ld command, for example, 1.objjj2.obj.
Since then, there is a. = 0x8000000000;. If there is no such assignment, then LC should be equal to 0x10000+sizeof (text segment), that is, if LC is not forced to specify, it defaults to the length of the last LC+ intermediate section. Fortunately, LC=0X800000000 is mandatory here. Indicates that the beginning of the following .data segment is at this address.
The .data and subsequent .bss representations are made up of .data and .bss sections of the input file, respectively.
You see, what have we learned from this LC file?
Well, we can set the LMA value of each segment at will. Of course, in most cases, we don't need to have our own LS to control the memory layout of the output file. But LK (linux kernel) is different.
Four overlords hard bow-vmlinux.lds.S analysis
OK, with the above basic knowledge, next we bully on the bow, directly analyze arch/arm/kernel/vmlinux.lds.S. Although the final link is vmlinux.lds, but that file
Obtained by vmlinux.lds.S (this is an assembly file)
Arm-linux-gcc-E-Wp,-MD,arch/arm/kernel/.vmlinux.lds.d-nostdinc. -Dempsey Kernel _-mlittle-endian.
-DTEXT_OFFSET=0x00008000-P-C-Uarm-decorated Assebys _-o arch/arm/kernel/vmlinux.lds arch/arm/kernel/vmlinux.lds.S
So, let's analyze vmlinux.lds directly.
/ *
If you have a bunch of comments, you will no longer post them here. In addition, add the / / sign as the comment mark.
* Convert a physical address to a Page Frame Number and back
, /
/ / OUTPUT_ARCH is the COMMAND in the LS syntax to specify the machine arch of the output file. Objdump-f queries all supported machine. In addition,
/ / these things involve a kind of BFD. You can search the contents of BFD by yourself.
/ / this indicates that the output file is based on the ARM schema
OUTPUT_ARCH (arm)
/ / ENTRY is also a command that is used to set the entry point. This indicates that the entry point is stext. According to LD, the entry point means the first instruction that the program runs. The kernel is a module, and people think of it.
/ / A large program running on hardware will be fine. And our program runs on kernel supremacy. Compare the Java virtual machine with the Java programs running on it.
ENTRY (stext)
/ / set jiffies to jiffies_64
Jiffies = jiffies_64
/ / define the segment of the output file
SECTIONS
{
/ / set location count to 0xc0008000, which is easy to understand, right? The kernel runs at addresses above C0000000.
. = 0xC0000000 + 0x00008000
/ / define a .text.head segment, which consists of all .text.head segments in the input file.
/ *
In LS syntax, the definition of seciton is as follows:
Section [address] [(type)]:
[AT (lma)] [ALIGN (section_align)]
[SUBALIGN (subsection_align)]
[constraint]
{
Output-section-command
Output-section-command
...
} [> region] [AT > lma_region] [: phdr: phdr.] [= fillexp]
Where address is VMA and LMA in the AT command. In general, address is not set, so it is equal to the current location counter by default
, /
.text.head: {
/ * this is very critical. We often see some variable declarations in kernel code, such as extern int _ _ stext, but we can't find where they are defined.
In fact, these are defined in the lds file. Here is a little knowledge about compiling links. We just need to know about it, and we can study the specific content deeply by ourselves.
Suppose a variable int x = 0 is defined in the C code; then
1 the compiler first allocates a piece of memory to store the value of the variable
2 the compiler creates an item in the symbol table of the program to store the address of this variable
For example, if int x = 0 above, create an x item in the symbol table that points to a block of memory, the sizeof (int) size, and stores a value of 0. When there is a place to use this x, the compiler will generate the corresponding code
First point to the memory of this x, and then read the value in memory.
The above is the definition of a variable in C. However, variables can also be defined in Linker script, where only one symbol entry is generated, but no memory is allocated. For example, _ stext=0x100, then
Create a symbol entry that points to the memory of 0x100, but there is no value stored in that memory. So, if we use the variable defined in LS in C, we can only take its address. Here is an example:
Start_of_ROM = .rom; end_of_ROM = .rom + sizeof (.ROM)-1 candidate starter of alternate flash = .flash; the above three variables are defined in LS, pointing to the beginning and end of the .ROM segment, and the beginning of the FLASH segment. Now in the C code you want to copy the contents of the ROM section to the FLASH section. Here is the C code: extern char start_of_ROM, end_of_ROM, start_of_FLASH;memcpy (& start_of_FLASH, & start_of_ROM, & end_of_ROM-& start_of_ROM)
Pay attention to the address symbol &. Variables defined in LS can only be used in this way in C code. The value start_of_ROM itself is meaningless, only its address makes sense. Because its value is not initialized.
The address points to the beginning of the .ROM section.
To put it bluntly, the variable defined in LS is actually the address, that is, _ stext=0x100 is an address in C code int * _ stext=0x100. All right?
A slot is allocated in the final ld, and then the address of x is stored. In other words, ld knows about these activities. So, of course, we're in LS.
You can also define a variable and use it in C. So the following sentence actually defines a _ stext variable. It can be referenced through extern in C #. But here's one.
A more crucial issue. The value of xroom0 defined in C is initialized to 0. That's slot.... To be replenished
, /
_ stext =.
_ sinittext =.
* (.text.head)
}
/ / define .init segment, which consists of all .init.text / .cpuinit.text / .meminit.text
/ / the value of LC at this time is the beginning of .init
.init: {/ * Init code and data * /
* (.init.text) * (.cpuinit.text) * (.meminit.text)
/ / define a variable _ einitext whose value is the current LC, that is, the initial value of .init + * (.init.text) * (.cpuinit.text) * (.meminit.text). In other words, variables
/ / _ einitext marks an end.
_ einittext =.
/ / the following variable _ _ proc_info_begin marks a beginning
_ _ proc_info_begin =.
* (.proc.info.init) / / all the contents of the .proc.info.init section are here
_ _ proc_info_end =.; / / the following variable _ _ proc_info_end marks the end, which and the _ _ proc_info_begin variable firmly jam the contents of the output file .proc.info.init.
/ / with the introduction of begin and end above, it is easy to follow. Most of them are a begin+end to jam a piece of content. According to the previous introduction, begin and end can be referenced in C programs.
/ / that is, we can get the stuck content through Begin+end. For example, we put some initialized function pointers into a begin and end. And then through a cycle, isn't that
/ / can these functions be called? Finally, let's give an example to introduce.
_ _ arch_info_begin =.
* (.arch.info.init)
_ _ arch_info_end =.
_ _ tagtable_begin =.
* (.taglist.init)
_ _ tagtable_end =.
. = ALIGN (16)
_ _ setup_start =.
* (.init.setup)
_ _ setup_end =.
_ _ early_begin =.
* (.early_param.init)
_ _ early_end =.
_ _ initcall_start =.
* (.initcallearly.init)
_ _ early_initcall_end =.
* (.initcall0.init) * (.initcall0s.init) * (.initcall1.init) * (.initcall1s.init) * (.initcall2s.init) * (.initcall2s.init) * (.initcall3s.init) * (.initcall4.init) * (.initcall4s.init) * (.initcall5s.init) * (.initcall5s.init) * (.initcall6.init) * (.initcall6s.init) * (.initcall7.init) * (.initcall7s.init)
_ _ initcall_end =.
_ _ con_initcall_start =.
* (.con_initcall.init)
_ _ con_initcall_end =.
_ _ security_initcall_start =.
* (.security_initcall.init)
_ _ security_initcall_end =.
. = ALIGN (32); / / ALIGN, indicating alignment, that is, the position of the Location Counter here must be aligned by 32
_ _ initramfs_start =.; / / location of ramfs
Usr/built-in.o (.init.ramfs)
_ _ initramfs_end =.
. = ALIGN (4096); / / 4K alignment
_ _ per_cpu_load =.
_ _ per_cpu_start =.
* (.data.percpu.page_aligned)
* (.data.percpu)
* (.data.percpu.shared_aligned)
_ _ per_cpu_end =.
_ _ init_begin = _ stext
* (.init.data) * (.cpuinit.data) * (.cpuinit.rodata) * (.meminit.data) * (.meminit.rodata)
. = ALIGN (4096)
_ _ init_end =.
}
/ / DISACARD is a special section, which means that input segments that meet this condition will not be written to the output segment, that is, the output file does not contain the following sections
/ DISCARD/: {/ * Exit code and data * /
* (.exit.text) * (.cpuexit.text) * (.memexit.text)
* (.exit.data) * (.cpuexit.data) * (.cpuexit.rodata) * (.memexit.data) * (.memexit.rodata)
* (.exitcall.exit)
* (.ARM.exidx.exit.text)
* (.ARM.extab.exit.text)
}
/ / omit part of the content
/ / ADDR is a built-in function that returns the
/ *
Here's a little example. Let's see what VMA and LMA do.
SECTIONS
{
.text 0x1000: {* (.text) _ etext =. The VMA of / .text segment is 0x1000 and LMA=VMA
The VMA of the .mdata 0x2000: / / .mdata segment is 0x2000, but its LMA is at the end of the .text segment
AT (ADDR (.text) + SIZEOF (.text))
{_ data =. ; * (.data); _ edata =. ;}
.bss 0x3000:
{_ bstart =. ; * (.bss) * (COMMON); _ bend =. ;}
}
Did you see that? The .mdata segment runs in 0x2000, but the data load address comes after the .text segment, so the runtime needs to copy the contents of the .mdata section.
Extern char _ etext, _ data, _ edata, _ bstart, _ bend
Char * src = & _ etext; / / _ etext is the VMA address at the end of the .text side, but it is also the beginning of the LMA of the .mdata segment. It is specified by the AT of LS.
Char * dst = & _ data; / / _ data is the VMA of the mdata segment. Now you need to copy the content at the beginning of the LMA address to the place where the VMA begins.
/ * ROM has data at end of text; copy it. , /
While (dst
< &_edata) *dst++ = *src++; //拷贝....明白了?不明白的好好琢磨 /* Zero bss. */ for (dst = &_bstart; dst< &_bend; dst++) *dst = 0; //初始化数据区域 */ .rodata : AT(ADDR(.rodata) - 0) { __start_rodata = .; *(.rodata) *(.rodata.*) *(__vermagic) *(__markers_strings) *(__tracepoints_strings) } .rodata1 : AT(ADDR(.rodata1) - 0) { *(.rodata1) } ......//省略部分内容 _edata_loc = __data_loc + SIZEOF(.data); .bss : { __bss_start = .; /* BSS */ *(.bss) *(COMMON) _end = .; } /* Stabs debugging sections. */ .stab 0 : { *(.stab) } .stabstr 0 : { *(.stabstr) } .stab.excl 0 : { *(.stab.excl) } .stab.exclstr 0 : { *(.stab.exclstr) } .stab.index 0 : { *(.stab.index) } .stab.indexstr 0 : { *(.stab.indexstr) } .comment 0 : { *(.comment) } } //ASSERT是命令,如果第一个参数为0,则打印第二个参数的信息(也就是错误信息),然后ld命令退出。 ASSERT((__proc_info_end - __proc_info_begin), "missing CPU support") ASSERT((__arch_info_end - __arch_info_begin), "no machine record defined") 五 内核代码中使用LS中定义的变量 咱们看一个小例子 [-->Init/main.c]
Extern initcall_t _ _ initcall_start [], _ _ initcall_end [], _ _ early_initcall_end []; / / these values are defined in LS. You can search on it.
Static void _ init do_initcalls (void)
{
Initcall_t * call
/ / the above has been defined as an array, so the following variables are taken directly as pointers, which means the same thing as used in the example above, but you can't use value anyway.
For (call = _ _ early_initcall_end; call < _ _ initcall_end; call++)
Do_one_initcall (* call)
/ * Make sure there is no pending stuff from the initcall sequence * /
Flush_scheduled_work ()
}
On Linux Kernel compilation and links in the linker script syntax is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.