How does Linux manage memory 07/15 Update SLTechnology News&Howtos

How does Linux manage memory

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how Linux manages memory". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how Linux manages memory.

Basic concept

Every Linux process has an address space, which consists of three segment areas: text segment, data segment, and stack segment. The following is an example of a process address space.

Data segment contains the storage of variables, strings, arrays, and other data of the program. The data segment is divided into two parts, data that has been initialized and data that has not yet been initialized. The uninitialized data is what we call BSS. The initialization of the data segment requires compiling a constant determined by the date and a variable with an initial value to start the program. Variables in all BSS sections are initialized to 0 after loading.

Unlike code snippets (Text segment), data segment segments can be changed. The program always modifies its variables. Moreover, many programs need to allocate space dynamically at execution time. Linux allows data segments to increase or decrease as memory is allocated and reclaimed. To allocate memory, the program can increase the size of the data segment. In the C language, there is a standard library malloc that is often used to allocate memory. The process address space descriptor contains dynamically allocated areas of memory called heap.

The third part is the stack segment. On most machines, the stack segment is at the top of the virtual memory address and expands to the lower location (to the address space of 0). For example, on a 32-bit x86 machine, the stack starts at 0xC0000000, which is the 3GB virtual address limit that the process allows to be visible in user mode. If the stack grows until it exceeds the stack segment, a hardware failure occurs and the page is dropped by one page.

When the program starts, the stack area is not empty; instead, it contains all the shell environment variables and the command line entered into shell to invoke it. For example, when you type

Cp cxuan lx

The cp program runs with the string cp cxuan lx on the stack so that you can find out the names of the source and target files.

When two users are running in the same program, such as an editor (editor), two copies of the editor's program code are kept in memory, but this approach is not efficient. The Linux system supports shared text segments as an alternative. In the following figure, we will see two processes An and B, which have the same text area.

Data segments and stack segments are shared only after fork, and sharing is also about sharing unmodified pages. If any one needs to be larger but there is no adjacent space, it will not be a problem, because adjacent virtual pages do not have to be mapped to adjacent physical pages.

In addition to dynamically allocating more memory, processes in Linux can access file data through memory-mapped files. This feature allows us to map a file to part of the process space and the file can be read and written like a byte array in memory. Mapping a file in makes it much easier to read and write randomly than to use the Igamot O system call such as read and write. This mechanism is used for access to shared libraries. As shown below

We can see that two identical files are mapped to the same physical address, but they belong to different address spaces.

The advantage of mapping files is that two or more processes can be mapped to the same file at the same time, and the write operations of any one process to the file are visible to other files. By mapping temporary files, you can provide high bandwidth for multithreaded shared memory, and temporary files disappear after the process exits. But in fact, there are no two identical address spaces, because each process maintains different open files and signals.

Linux memory management system call

Let's take a look at the system call method for memory management. In fact, POSIX does not specify any system calls for memory management. However, Linux has its own memory system calls, and the main system calls are as follows

If an error is encountered, the return value of s is-1 addr an is the memory address, len is the length, prot is the control protection bit, flags is the other flag bit, fd is the file descriptor, and offset is the file offset.

Brk specifies the size of the segment by giving the first byte address beyond the segment. If the new value is larger than the original, the data area will become larger and larger, and vice versa.

The mmap and unmap system calls control the mapping file. The first parameter of mmp, addr, determines the address of the file map. It must be a multiple of the page size. If the parameter is 0, the system assigns an address and returns a. The second parameter is length, which tells how many bytes to map. It is also a multiple of the page size. Prot determines the protection bits of the mapping file, which can be marked as readable, writable, executable, or a combination of these. The fourth parameter, flags, controls whether the file is private or readable and whether addr is required or just prompted. The fifth parameter, fd, is the file descriptor to be mapped. Only open files can be mapped, so if you want to do file mapping, you must open the file; the last parameter, offset, indicates when the file starts, not necessarily from scratch every time.

Implementation of memory Management in Linux

Memory management system is one of the most important parts of the operating system. From the early days of the computer, we actually used more memory than we actually had in the system. The memory allocation strategy overcomes this limitation, and the most famous of these is virtual memory (virtual memory). By sharing virtual memory among competing processes, virtual memory allows the system to have more memory. The virtual memory subsystem mainly includes the following concepts.

Large address space

The operating system makes the system seem to be much larger than the actual physical memory because virtual memory is many times larger than physical memory.

Protection

Each process in the system has its own virtual address space. These virtual address spaces are completely separate from each other, so the process running one application does not affect the other. Also, the hardware virtual memory mechanism allows memory to protect critical memory areas.

Memory mapping

Memory mapping is used to map images and data files to the process address space. In memory mapping, the contents of the file are mapped directly to the virtual space of the process.

Fair physical memory allocation

The memory management subsystem allows each running process in the system to allocate the physical memory of the system fairly.

Shared virtual memory

Although virtual memory allows processes to have their own memory space, sometimes you need to share memory. For example, several processes are running in shell at the same time, which involves inter-process communication in IPC, where you need shared memory for information transmission rather than running independently by copying a copy of each process.

Let's formally discuss what virtual memory is.

Abstract Model of Virtual memory

Before considering the method used by Linux to support virtual memory, it is useful to consider an abstract model that is not bothered by too much detail.

When the processor executes an instruction, it reads the instruction from memory and decode it. When the instruction is decoded, it fetches the contents of a location and stores it in memory. The processor then moves on to the next instruction. In this way, the processor is always accessing memory to obtain instructions and store data.

In a virtual memory system, all address spaces are virtual rather than physical. But it is the physical address that actually stores and fetches the instructions, so you need to have the processor convert the virtual address to the physical address according to a table maintained by the operating system.

In order to simply complete the translation, virtual and physical addresses are divided into fixed-size blocks called page. These pages are of the same size, and if the pages are not the same size, the operating system will be difficult to manage. Linux on Alpha AXP systems uses 8 KB pages, while Linux on Intel x86 systems uses 4 KB pages. Each page has a unique number, the page frame number (PFN).

The above is the Linux memory mapping model, in which the virtual address consists of two parts: the offset and the virtual page box number. Each time the processor encounters a virtual address, it extracts the offset and the virtual page box number. The processor must convert the virtual page frame number to a physical page number and then access the physical page at the correct offset.

The above figure shows the virtual address space of two processes An and B, each with its own page table. These page tables map virtual pages in the process to physical pages in memory. Each item in the page table contains

Valid flag (valid flag): indicates whether the table entry on this page is valid

The physical page box number described by the entry

Access control information, how the page is used, whether it is writable and whether the code can be executed

To map the virtual address of the processor to the physical address of memory, you first need to calculate the page frame number and offset of the virtual address. The page size is to the power of 2 and can be done by shifting.

If the current process tries to access the virtual address but cannot access it, this situation is called a page fault exception, and the wrong address of the virtual operating system and the cause of the page error will be notified to the operating system.

By mapping virtual addresses to physical addresses in this way, virtual memory can be mapped to the physical pages of the system in any order.

Pagination on demand

Because physical memory is much less than virtual memory, the operating system needs to be careful to avoid using inefficient physical memory directly. One way to save physical memory is to load only the pages currently used by the executor (isn't that a lazy idea? ). For example, you can run the database to query the database, in which case not all the data is loaded into memory and only the data that needs to be checked is loaded. This technique of loading virtual pages into only when needed is called paging on demand.

exchange

If a process needs to pass a virtual page into memory, but there are no physical pages available at this time, the operating system must discard another page in physical memory to make room for that page.

If the page has been modified, the operating system must retain the contents of the page so that it can be accessed later. This type of page is called a dirty page, and when it is removed from memory, it is saved in a special file called a swap file. Access to swap files is slow relative to the speed of the processor and physical memory, and the operating system needs to strike a balance between writing pages to disk and keeping them in memory for reuse.

Linux uses the recently least used (LRU) page aging technology to fairly select pages that may be deleted from the system. This solution involves each page in the system. The age of the page varies with the number of visits. If a page is visited more times, then the page means the younger it is. If a page is visited too few times, then the page is easier to be replaced.

Physical and virtual addressing mode

Most multifunction processors support the concepts of physical address mode and virtual address mode. Physical addressing mode does not require a page table, and the processor does not attempt to perform any address translation in this mode. The Linux kernel is linked to run in the physical address space.

The Alpha AXP processor has no physical addressing mode. Instead, it divides the memory space into several areas and specifies two of them as physically mapped addresses. This kernel address space is called the KSEG address space, and it contains all addresses up from the 0xfffffc0000000000. In order to execute or access data from linked code in KSEG (by definition, kernel code), the code must be executed in kernel mode. Link to the Linux kernel on Alpha to execute from the address 0xfffffc0000310000.

access control

Each item of the page table also contains access control information, which mainly checks whether the process should access memory.

Access restrictions on memory are required if necessary. For example, memory that contains executable code is naturally read-only memory; the operating system should not allow processes to write data through its executable code. By contrast, a page containing data can be written, but an attempt to execute an instruction in that memory will fail. Most processors have at least two execution modes: kernel mode and user mode. You don't want to access users to execute kernel code or kernel data structures unless the processor is running in kernel mode.

The access control information is stored in the Page Table Entry, page table item above, and the picture above is the PTE of Alpha AXP. Bit fields have the following meanings

Indicates valid, whether it is a valid bit

FOR

Failure while reading, failure while trying to read this page

FOW

Error while writing, error occurred while trying to write

FOE

An error occurred during execution, and when attempting to execute the instructions on this page, the processor reports the page error and passes control to the operating system

ASM

Address space matching, which is used when the operating system wants to clear some entries in the translation buffer.

The prompt used when mapping an entire block using a single translation buffer entry instead of multiple translation buffer entries.

KRE

Code running in kernel mode can read the page

URE

Code in user mode can read the page

KWE

Code running in kernel mode can be written to the page

UWE

Code running in user mode can be written to the page

Page frame number

For PTE with V bit set, this field contains the physical page frame number (page frame number) of this PTE. For invalid PTE, if this field is not zero, it contains information about the location of the page in the swap file.

In addition, Linux uses two bits

_ PAGE_DIRTY

If set, the page needs to be written out to the swap file

_ PAGE_ACCESSED

Linux is used to mark a page as visited.

Caching

The above virtual memory abstract model can be implemented, but it will not be too efficient. Operating system and processor designers are trying to improve performance. But in addition to improving the speed of processors, memory, etc., the best way is to maintain a cache of useful information and data to make certain operations faster. In Linux, many buffers related to memory management are used, and buffers are used to improve efficiency.

Buffer cache

The buffer cache contains the data buffers used by the block device driver.

Do you remember what a piece of equipment is? Let's review here.

A block device is a device that can store fixed-size block information, which supports reading and (optional) writing data to a fixed-size block, sector, or cluster. Each block has its own physical address. The size of the block is usually between 512 and 65536. All transmitted information is in contiguous blocks. The basic feature of block devices is that each block is relatively opposite and can read and write independently. Common block devices include hard disk, Blu-ray disc and USB disk.

Block devices usually require fewer pins than character devices.

The buffer cache is used to quickly find data blocks through device identifiers and block numbers. If the data can be found in the buffer cache, there is no need to read the data from the physical block device, which is much faster.

Page cache

Page caching is used to speed up access to images and data on disk

It is used to cache the contents of the file one page at a time and can be accessed through the file and the offset in the file. When pages are read from disk into memory, they are cached in the page cache.

Swap area cache

Only modified (dirty pages) are saved in the swap file

As long as these pages have not been modified since they were written to the swap file, the next time you swap the page, you do not need to write it to the swap file because the page is already in the swap file. You can just throw it away. In a heavily swapped system, this saves a lot of unnecessary and expensive disk operations.

Hardware caching

A hardware cache is usually used in processors. The cache of page table entries. In this case, the processor does not always read the page table directly, but caches the translation of the page as needed. These are conversion backup buffers, also known as TLB, that contain cached copies of page table entries from one or more processes in the system.

After referencing the virtual address, the processor tries to find a matching TLB entry. If found, you can translate the virtual address directly into a physical address and perform the correct action on the data. If the processor cannot find a matching TLB entry, it gets support and help from the operating system by signaling that a TLB loss has occurred. System-specific mechanisms are used to pass the exception to operating system code that can fix the problem. The operating system generates a new TLB entry for address mapping. After clearing the exception, the processor will try to translate the virtual address again. It can be carried out successfully this time.

There are also drawbacks to using caches. To save effort, Linux must use more time and space to maintain these caches, and if the cache is corrupted, the system will crash.

Linux page table

Linux assumes that page tables are divided into three levels. Each page table accessed contains the next page table

The PDG in the figure represents the global page table. When a new process is created, a new page directory, PGD, is created for the new process.

To convert a virtual address to a physical address, the processor must take the contents of each level field, convert it to the offset of the physical page that contains the page table, and read the page frame number of the next level page table. Repeat this three times until you find the page box number of the physical page that contains the virtual address.

Every platform on which Linux runs must provide translation macros that allow the kernel to traverse the page tables of a particular process. In this way, the kernel does not need to know the format of the page table entries or how they are arranged.

Page allocation and unallocation

There are many requirements for physical pages in the system. For example, when an image is loaded into memory, the operating system needs to allocate pages.

All physical pages in the system are described by the mem_map data structure, which is a list of mem_map_t. It includes some important attributes

Count: this is the count of users for the page, which is greater than 1 when the page is shared between multiple processes

Age: this describes the age of the page to determine whether the page is suitable for discarding or swapping

Map_nr: this is the physical page frame number of this mem_map_t description.

The page allocation code uses the free_area vector to find and release the page, and each element of the free_area contains information about the page block.

Page allocation

Linux's page allocation uses a well-known partner algorithm for page allocation and de-allocation. The page is allocated in blocks in terms of the power of 2. This means that it can allocate 1 page, 2 pages, 4 pages, and so on, as long as there are enough pages available in the system to meet the requirements. The criterion is nr_free_pages > min_free_pages, and if it is satisfied, the page block of the desired size will be searched in free_area to complete the allocation. Each element of free_area has a mapping of allocated and free page blocks of that size.

The allocation algorithm searches for the requested size of the page block. If no page block of the requested size is available, a page block twice the request size is searched and repeated until a page block is found after searching free_area. If the page block found is larger than the requested page block, the found page block is subdivided until the appropriate size block is found.

Because each block is to the power of 2, the splitting process is easy because you only need to split the block in half. The free block is queued in the appropriate queue, and the allocated page block is returned to the caller.

If a 2-page block is requested, the first block of 4 pages (starting with the frame on page 4) will be divided into two 2-page blocks. The first page (starting at the frame on page 4) will be returned to the caller as the assigned page, and the second block (starting on page 6) will be queued as a 2-page free block to element 1 of the free_area array.

Page unassigned

One of the biggest consequences of the above approach to memory is the fragmentation of memory, which divides larger free pages into smaller pages. The page de-allocation code reassembles the page into larger free blocks as much as possible. Each time a page is released, adjacent blocks of the same size are checked to see if they are free. If so, combine it with the newly released page block to form a new free page block for the next page size block. Each time you reassemble two page blocks into a larger free page block, the page release code attempts to reassemble the page block into a larger free page. In this way, the blocks of available pages will use as much memory as possible.

For example, in the figure above, if you want to release the page on page 1, combine it with the already free page frame on page 0 and queue it into element 1 of free_area as a free block with a size of 2 pages.

Memory mapping

The kernel has two types of memory mapping: shared (shared) and private (private). Private mapping is more efficient when a process is used to read-only files instead of writing files. However, any write to a private mapping page causes the kernel to stop mapping pages in that file. Therefore, the write operation does not change the file on disk, nor is it visible to other processes accessing the file.

Pagination on demand

Once the executable image is mapped to virtual memory, it can be executed. Because only the first part of the image is physically pulled into memory, it will soon access virtual memory areas where physical memory does not already exist. The operating system reports this error when a process accesses a virtual address that does not have a valid page table.

The page fault describes the virtual address of the page error and the type of memory access (RAM) caused.

Linux must find the vm_area_struct structure that represents the area of memory where the page error occurred. Because searching for vm_area_struct data structures is critical to effectively handling page errors, they are linked together in an AVL (Adelson-Velskii and Landis) tree structure. If the virtual address that caused the failure has no vm_area_struct structure, the process has already accessed the illegal address, and Linux will send a SIGSEGV signal to the process. If the process does not have a handler for the signal, the process will be terminated.

Linux then checks the type of page error that occurs based on the type of access allowed for this virtual memory area. If the process accesses memory in an illegal way, such as writing to read-only areas, it also sends a memory access error signal.

Now that Linux has determined that the page error is legal, it must be handled.

Thank you for your reading, the above is the content of "how Linux manages memory". After the study of this article, I believe you have a deeper understanding of how Linux manages memory, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.