What is the segmentation mechanism of Linux memory addressing? 07/06 Update SLTechnology News&Howtos

What is the segmentation mechanism of Linux memory addressing?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the principle of segmentation mechanism of Linux memory addressing". In daily operation, I believe that many people have doubts about the principle of segmentation mechanism of Linux memory addressing. Xiaobian consulted all kinds of data and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "what is the principle of segmentation mechanism of Linux memory addressing?" Next, please follow the editor to study!

I. Preface

Recently, I was learning about the Linux kernel and read the chapter on memory addressing in "understanding the Linux Kernel in depth." I thought I had already understood the segmented paging mechanism, but I found that I knew very little about it. As a result, looked up a lot of information, and finally straightened out the knowledge of memory addressing. Now record my understanding. I hope it will be of some help to the kernel learners. I also hope that you will point out the mistakes.

Second, what's going on in segments?

It is believed that anyone who has taken an operating system course knows segmented pagination, but it is strange that there is little mention of how segmented paging is produced in the book, which leads us to know why. Let's first take a look at the history of the segmentation mechanism.

The birth of real mode (16-bit processor and addressing)

Before the birth of the 8086 processor, memory addressing was to access the physical address directly. The 8086 processor extends the address bus to 20 bits in order to address 1m of memory space. However, an awkward problem arises: the width of ALU is only 16 bits, that is, ALU cannot calculate 20-bit addresses. In order to solve this problem, the segmentation mechanism has been introduced and stepped onto the historical stage.

To support segmentation, the 8086 processor sets four segment registers: CS,DS,SS and ES. Each segment register is 16-bit, and so is the address in the instruction that accesses memory. However, before feeding into the address bus, CPU adds it to the value in the register of a segment. Note here: the value of the segment register corresponds to the high 16 bits in the 20-bit address bus, so the addition is actually the addition of the high 12 bits in the memory bus and the 16 bits in the segment register, while the lower 4 bits remain the same. in this way, a 20-bit actual address is formed, and the conversion from 16-bit memory address to 20-bit actual address is realized, or "mapping".

The birth of protected mode (32-bit processor and addressing)

The 80286 processor has a 24-bit address bus, 16m address space and a protected mode (access to memory segments is restricted).

The 80386 processor is a 32-bit processor, ALU and address bus are 32-bit, and the addressing space is up to 4G. In other words, it can directly access 4G memory space without segmentation mechanism. Although it is the little prince of the new era, surpassing its countless predecessors, it needs to bear the mission of the family-compatible with the processors of the previous generation. That is, it must support both real mode and protected mode. Therefore, 80386 constructs a protection mode on the basis of segment registers and retains 16-bit segment registers.

Processors after 80386 have similar architectures, collectively referred to as IA32 (32 Bit Intel Architecture).

III. Memory addressing mechanism of IA32

Addressing hardware

In 8086 real mode, a segment of register is moved 4 bits to the left, and then added to the address ADDR and sent directly to the memory bus. The added address is the physical address of the memory unit, and this address in the program is called the logical address (or virtual address). In IA32 protected mode, this logical address is sent not directly to the memory bus but to the memory management unit (MMU). MMU consists of one or a group of chips whose function is to map logical addresses to physical addresses, that is, address translation, as shown in the figure.

Three addresses of IA32

Logical address: machine language instructions still use this address to specify the address of an Operand or the address of an instruction. This addressing is particularly specific in the segmented structure of Intel, which allows MS-DOS or Windows programmers to divide the program into segments. Each logical address consists of a segment and an offset.

Linear address: a linear address is a 32-bit unsigned integer that can represent addresses up to 232 (4GB). Linear addresses are usually represented in hexadecimal, and the range of values is 0x00000000~0xffffffff.

Physical address: that is, the actual address of the memory unit, which is used for chip-level memory cell addressing. The physical address is also represented by a 32-bit unsigned integer.

MMU address translation process

MMU is a kind of hardware circuit, which consists of two parts, one is segmentation component and the other is paging component. Here, we call them segmentation mechanism and paging mechanism respectively, in order to understand the hardware implementation mechanism from a logical point of view. The segmentation mechanism converts a logical address into a linear address; then, the paging mechanism converts a linear address into a physical address.

MMU_translate

Segment register of IA32

There are six 16-bit segment registers in IA32: CS,DS,SS,ES,FS,GS. Unlike the 8086 segment registers, these registers no longer hold the base address of a segment, but the Selector of a segment.

Fourth, the realization of segmentation mechanism.

Segment is the basic unit of virtual address space, and the segmentation mechanism must convert an address of virtual address space into a linear address of linear address space.

In order to achieve this mapping, it is not enough to use segment registers to determine a base address, at least describe the length of the segment, and require some other information about the segment, such as access rights. So, what is needed here is a data structure that includes three aspects:

1. Base address of the segment (Base Address): the starting address of the segment in the linear address space.

two。 Limit of a segment: the maximum offset that can be used within a segment in a virtual address space.

3. Protection attribute of a segment (Attribute): represents the properties of a segment. For example, whether the paragraph can be read or written, or whether the paragraph can be executed as a program, and the privilege level of the segment, and so on.

The above data structure is called segment descriptor, and the table composed of multiple segment descriptors is called segment descriptor table.

Segment descriptor

The so-called descriptor (Descriptor) is an 8-byte memory unit that describes the attributes of a segment. In real mode, the attributes of segments are nothing more than code segments, stack segments, data segments, the starting address of segments, the length of segments, and so on, while in protected mode, they are more complicated. IA32 combines them with an 8-byte number called a descriptor.

As you can see from the figure, a segment descriptor indicates the 32-bit base address of the segment and the 20-bit segment boundary (that is, the segment length). Here we only focus on the base address and segment boundaries, and skip other attributes.

1. Segment descriptor table

A variety of user descriptors and system descriptors are placed in the corresponding global descriptor table, local descriptor table and interrupt descriptor table. The descriptor table (that is, the segment table) defines all segments of the IA32 system. All descriptor tables themselves occupy a multiple of 8 bytes of memory space, ranging from 8 bytes (including at least one descriptor) to 64K bytes (up to 8K descriptors).

two。 Global descriptor table (GDT)

The global descriptor table GDT (Global Descriptor Table) contains descriptors for those segments that are common to all tasks in the system, except for task gate, interrupt gate, and trap gate descriptors. Its first 8-byte position is not used.

3. Interrupt descriptor table IDT (Interrupt Descriptor Table)

The interrupt descriptor table IDT (Interrupt Descriptor Table) contains 256 door descriptors. IDT can only contain task gate, interrupt gate and trap gate descriptors. Although the IDT table can be up to 64K bytes long, it can only access descriptors less than 2K bytes, that is, 256descriptors, which are designed to be compatible with 8086.

Local descriptor table (LDT)

The local descriptor table LDT (local Descriptor Table) contains descriptors associated with a given task, each with its own LDT. With LDT, you can isolate the code and data for a given task from other tasks. The local descriptor table LDT of each task is also represented by a descriptor, called LDT descriptor, which contains information about the local descriptor table and is placed in the global descriptor table GDT.

Summary

The memory addressing mechanism of IA32 completes the translation from logical address to linear address to physical address. Among them, the value in the segment register of the logical address provides the segment descriptor, and then obtains the segment base address and the segment boundary from the segment descriptor, and then adds the offset of the logical address to get the linear address, and the linear address gets the physical address through the paging mechanism.

First of all, we need to make it clear that the segmentation mechanism is the addressing method provided by IA32, which is at the hardware level. That is to say, whether you are windows or linux, as long as you use IA32's CPU to access memory, you have to go through the MMU translation process to get the physical address, that is, you must go through the logical address-linear address-physical address translation.

Fifth, the realization of segmentation in Linux

So much has been said about the implementation of the segmentation mechanism, but in fact, it is of no use to Linux. Because Linux basically does not use the segmentation mechanism, that is to say, the segmentation mechanism in Linux is only designed to be compatible with IA32 hardware.

The segment mechanism of Intel microprocessor was proposed in 8086. At that time, the segment mechanism was introduced to solve the translation from 16-bit address to 20-bit real address in CPU. To maintain this compatibility, 386 still uses the segment mechanism, but it is much more complex than before. Therefore, the design of the Linux kernel does not entirely adopt the segment scheme provided by Intel, but only uses the segmentation mechanism to a limited extent. This not only simplifies the design of the Linux kernel, but also creates conditions for porting Linux to other platforms, because many RISC processors do not support the segment mechanism. However, knowledge of the segment mechanism is the only way to get into the Linux kernel.

Starting with version 2.2, Linux allows all processes (or tasks) to use the same logical address space, so there is no need to use the local descriptor table LDT. But LDT is also used in the kernel, which only runs Wine in VM86 mode, because it is only used when simulating programs running Winodws or DOS software on Linux.

Any address given on IA32 is a virtual address, that is, any address is given by the way of "selector: offset", which is the basic characteristic of segment mechanism memory access mode. Therefore, we can not avoid using segment mechanism when designing operating system on IA32. A virtual address will eventually be converted into a linear address by means of "segment base address + offset". However, most hardware platforms do not support the segment mechanism, but only support the paging mechanism, so in order to make Linux more portable, we need to remove the segment mechanism and only use the paging mechanism. Unfortunately, the IA32 specification segment mechanism is not prohibited, so it is impossible to bypass it and give the address of the linear address space directly. In desperation, the designer of Linux simply let the base address of the segment be 0, and the boundary of the segment is 4GB. If any offset is given, the equation is "0 + offset = linear address", that is to say, "offset = linear address". In addition, because the segment mechanism stipulates "offset"

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.