How to understand Linux memory management 07/19 Update SLTechnology News&Howtos

How to understand Linux memory management

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to understand Linux memory management, many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Memory management should be a very important subsystem in the Linux kernel. I have been trying to figure out how to write an article on Linux memory management. Because the content is so large and complex, it is really a test to be easy to understand without losing professionalism. Understanding the implementation of internal management is of great help to both kernel developers and application developers. This article is also committed to using simple and vivid language to lead you to understand the principles of memory management, of course, without some theoretical knowledge. Our purpose is not to explore the theory, but to have a more comprehensive understanding of the principle. When necessary, we will go deep into the theory and peep behind the theoretical knowledge.

Process and memory

As we all know, processes need memory to run. It is mainly used to store from storage media (disk / flash/...) The loaded program code and the data content needed for the process to run. In my other article, how to understand heaps and stacks in depth is explained to the composition of the process. There will be 5 different data segments for a process.

Code snippet (text): a code snippet is an operation instruction used to store an executable file, that is, it stores a mirror image in memory in the executable program. Code snippets are not allowed to be modified, so you can only read them, not write them.

Data segments (data): data segments are mainly used to store initialized global variables, that is, to store variables statically allocated by the program (static memory allocation means that the compiler allocates memory according to the source program when compiling the program. Dynamic memory allocation is after the program is compiled, the runtime calls the runtime library function to allocate memory. Static allocation is fast, efficient, but limited because it is before the program runs. Dynamic allocation is performed when the program is running, so it is slow but flexible. And global variables.

Bss section: the bss section contains uninitialized global variables in the program, and all bss segments are uniformly zeroed in memory. (extension: this is why uninitialized global variables are cleared to zero)

Heap: a heap is used to store memory dynamically allocated by a process, and its size is not fixed. For more information, please refer to how to understand the stack and stack in depth.

Stack (stack): stack is used to store temporary variables, that is, variables in C program {}, excluding variables declared by static (although static is a local variable, its scope is in {}, but its life cycle is the entire program life cycle, it is stored in data segments). When the program calls a function, the function with too many parameters will press the parameters into the stack through the stack, and after the call is over, the return value of the function will also be returned through the stack. In this sense, we can think of the stack as a memory area for storing and exchanging temporary data. For more information, please refer to the article on how to understand heaps and stacks in depth.

Through the program's different uses of memory, it is divided into the above five different segments, then how are these segments organized in memory? Look at the following picture:

From the figure, it is not difficult to find that the stacks seem to be next to each other, one of them is "long" downward (stack down, stack up in i386 architecture), and the other is "long" upward, relative to each other. But you don't have to worry that they will meet, because there is a really big gap between them.

From the user mode to the kernel state, the change in the form of memory we use:

The logical address is converted into a linear address through the segment mechanism, and the linear address is converted into a physical address through the page mechanism. However, we should know that although the Linux system retains the segment mechanism, the segment addresses of all programs are determined to be 0-4G, so although logical addresses and linear addresses are two different address spaces, logical addresses are equal to linear addresses in Linux, and their values are the same. Along this clue, the main problems we study focus on the following questions.

How is the process address space managed?

How do process addresses map physical memory?

How is physical memory managed?

Let's take a look next.

Process address space

Modern operating system basically uses virtual memory management technology, of course, Linux as an advanced os is no exception, each process has its own process address space. The space is a linear virtual space of 4G. All the users are exposed to are virtual addresses, so they can't see the physical address at all, and they don't have to care about the physical address. Using this way of virtual address, you can protect memory resources and play the role of isolation. And for the user program, it is always the size of 4G, and the code segment address can be determined when the program is compiled. We should know three things:

4G process address space is artificially divided into two parts-user space and kernel space. User space ranges from 0 to 3G (0xC0000000), and kernel space occupies 3G to 4G. Usually, user processes can only access the virtual address of the user space, not the virtual address of the kernel space. Kernel space can only be accessed at any time when the user process makes a system call (executes on behalf of the user process in kernel state).

Whenever the process switches, the user space changes, and the kernel space is mapped by the kernel. It does not change with the process. Kernel space has its own corresponding page table (init_mm.pgd), and user processes have their own page tables.

The user space of each process is independent.

Process memory management

The objects managed by process memory are memory images on the process's linear address space, which are actually the virtual memory areas (memory region) used by the process. A process virtual space is a 32-or 64-bit "flat" (independent contiguous interval) address space (the exact size of the space depends on the architecture). It is not easy to manage such a large flat space in a unified way. for ease of management, the virtual space is divided into many memory areas of variable size (but must be multiples of 4096). These areas are arranged in the same order as parking spaces in the process linear address. The principle of dividing these areas is to "store address spaces with consistent access attributes", which means nothing more than "readable, writable, executable, and so on".

If you want to view the area of memory occupied by a process, you can use the command cat / proc//maps to get it (pid is the process number), and you will find the following list:

08048000-08049000 r-xp 00000000 03:03 439029 / home/mm/src/example 08049000-0804a000 rw-p 00000000 03:03 439029 / home/mm/src/example … Bfffe000-c0000000 rwxp ffff000 00:00 0

Each row of data is in the following format:

(memory area) start-end access offset major number: secondary device number I node file.

Note: you will find that the process space contains only three memory areas, there seems to be no heap, bss, etc., in fact, this is not the case, the program memory segment and the process address space of the memory area is a fuzzy corresponding, that is, heap, bss, data segments (initialized) are represented by the data segment memory area in the process space.

The data structure that represents the memory region in the Linux kernel is vm_area_struct, and the kernel manages each memory region as a separate memory object. Using the object-oriented method, the VMA structure can represent many types of memory regions, including memory-mapped files and process user space stacks, etc., and the operation methods of these areas are also different.

The structure of vm_area_strcut is relatively complex, please refer to the relevant materials for its detailed structure. We only add a little bit to the way it is organized. Vm_area_struct is the basic management unit to describe the address space of a process. For a process, it often needs multiple memory regions to describe its virtual space. How to relate these different memory areas? You may think of using linked lists, and it is true that the vm_area_struct structure is linked in the form of linked lists, but to make it easier to find, the kernel organizes memory areas in the form of a red-black tree (the previous kernel used a balanced tree) to reduce search time. The two forms of organization that coexist are not redundant: linked lists are used when all nodes need to be traversed, while red-black trees are suitable for locating specific areas of memory in the address space. The kernel uses both data structures in order to achieve high performance for a variety of different operations in the memory area.

The following figure reflects the management model of the process address space:

The corresponding description structure of the process address space is the "memory description structure", which represents the entire address space of the process and contains all the information related to the process address space, including, of course, the memory area of the process.

How exactly is process memory allocated and reclaimed?

Some system calls we know, such as creating process fork (), program loading execve (), mapping file mmap (), dynamic memory allocation brk (), etc., all need to allocate memory to the process. But at this time, the process is not the actual physical memory, only virtual memory, in fact, it only represents the "memory area" in the kernel. The process's allocation of memory areas is ultimately performed on the do_mmap () function in the kernel (except for brk).

The kernel uses the do_mmap () function to create a new linear address range. An address range is then added to the process's address space, perhaps by creating a new area or extending the memory area to exist. Of course, the corresponding memory area is freed using the function do_ummap ().

How does memory change from virtual to solid?

From the above, we can see that the addresses that the process can operate directly are virtual addresses. When the process needs memory, it only gets the virtual memory area from the kernel, not the actual physical address, and the process does not get the physical memory (physical page-page concept, please refer to the hardware fundamentals chapter). All you get is the right to use a new linear address range. Only when the process actually accesses the newly acquired virtual address does the actual physical memory generate a "missing page" exception by the "request page mechanism", thus entering the function of allocating the actual page.

This exception is the basic guarantee for the existence of the virtual memory mechanism-it tells the kernel to actually allocate physical pages to the process and establish the corresponding page table before the virtual address is actually mapped to the physical memory of the system. (of course, if the page is swapped out to disk, a page fault exception will also occur, but there is no need to create a page table at this time.)

This request page mechanism delays the allocation of pages until it can no longer be delayed, and is in no hurry to get everything done at once (this idea is a bit like the proxy in the design pattern). The reason why this can be done is to take advantage of the "locality principle" of memory access, the advantage of request pages is to save free memory and improve the throughput of the system. For a clearer understanding of the request page mechanism, take a look at the book "in depth understanding the linux Kernel."

Here we need to explain the nopage operation on the memory region structure. When the virtual memory of the accessed process does not actually allocate the page, this operation is called to allocate the actual physical page and to create a page table entry for that page. In the final example, we will demonstrate how to use this method.

How is physical memory managed?

Although the object that the application operates on is virtual memory mapped to physical memory, the processor manipulates physical memory directly. So when an application accesses a virtual address, it must first convert the virtual address to a physical address before the processor can resolve the address access request. Address translation can only be completed by querying the page table. generally speaking, address translation requires segmenting the virtual address so that each segment of the virtual address points to the page table as an index. the page table entry points to the next level of page table or to the final physical page.

Each process has its own page table. The pgd field of the process descriptor points to the process's page global catalog. Let's borrow a picture from the linux device driver to take a look at the translation relationship between the process address space and the physical page.

The above process is easy to say but difficult to do. Because physical pages must be allocated before virtual addresses can be mapped to pages-that is, free pages must be obtained from the kernel and page tables must be established. Let's look at the mechanism by which the kernel manages physical memory.

The Linux kernel manages physical memory through a paging mechanism, which divides the entire memory into countless 4k (in i386 architecture) pages, so the basic unit for allocating and reclaiming memory is memory pages. The use of paging management helps to flexibly allocate memory addresses, because the allocation does not require large chunks of contiguous memory, and the system can scrape out the required memory for the process. Even so, in fact, the system tends to allocate contiguous blocks of memory when using memory, because the page table does not need to be changed when allocating continuous memory, thus reducing the refresh rate of TLB (frequent refreshes greatly slow down access speed).

In view of the above requirements, the kernel uses a "partner" relationship to manage idle pages in order to minimize discontinuities when assigning physical pages. We should be familiar with the partnership allocation algorithm. If you don't understand it, you can refer to the relevant information. We only need to understand that the organization and management of free pages in Linux takes advantage of partnerships, so the allocation of free pages also needs to follow the partnership, and the minimum unit can only be a power of 2 page size. The basic function for allocating free pages in the kernel is get_free_page/get_free_pages, which allocates either a single page or a specified page (2, 4, 8... 512 pages).

Note: get_free_page allocates memory in the kernel, unlike malloc allocating in user space, malloc uses heap dynamic allocation to actually call the brk () system call, which expands or shrinks the process heap space (it modifies the process's BRK domain). If the existing memory area is not enough to hold the heap space, the corresponding memory area will be expanded or shrunk in multiple of the page size, but the BRK value will not be changed by the multiple of the page size, but will be modified as requested. So Malloc allocates memory in user space in bytes, but the kernel still allocates memory in pages internally.

In addition, it needs to be said that physical pages are described by the page structure struct page in the system, and all the pages in the system are stored in the array mem_map [], through which every page in the system (idle or non-idle) can be found. The free pages can be indexed by the free page linked list (free_ area [Max _ ORDER]) of the partnership organization mentioned above.

What is slab?

Allocating memory in the smallest unit of pages is indeed convenient for physical memory in kernel management systems, but the most commonly used memory in the kernel itself is often very small (much smaller than a page)-- for example, less than a page is needed to store file descriptors, process descriptors, virtual memory area descriptors, and so on. The memory used to store descriptors is like bread crumbs and bread compared to pages. Multiple of these small chunks of memory can be gathered in a whole page; and these chunks of memory are generated / destroyed as frequently as bread crumbs.

In order to meet the needs of the kernel for such small blocks of memory, the Linux system uses a technology called slab allocator. The implementation of Slab allocator is quite complex, but the principle is not difficult, and its core idea is the application of "storage pool". Memory fragments (small chunks of memory) are treated as objects, and when used, they are not released directly but are cached in the "storage pool" for next use, which undoubtedly avoids the extra load caused by frequent creation and destruction of objects.

Slab technology not only avoids the inconvenience caused by internal memory fragmentation (the main purpose of introducing Slab allocator is to reduce the number of calls to partner system allocation algorithms-frequent allocation and recycling will inevitably lead to memory fragments-it is difficult to find large chunks of continuous available memory), but also can make good use of hardware cache to improve access speed.

Slab is not a way of memory allocation that exists independently from the partnership. Slab is still based on pages. In other words, Slab shredded the page (a linked list of free pages from partnership management) into many small blocks of memory for allocation. Kmem_cache_alloc and kmem_cache_free are used for object allocation and destruction in slab.

Kmalloc ()

The lab allocator is not only used to store kernel-specific structures, it is also used to handle kernel requests for small blocks of memory. Of course, given the characteristics of the Slab allocator, generally speaking, requests for small chunks of memory less than a page in kernel programs are done through the interface Kmalloc provided by the Slab allocator (although it allocates 32 to 131072 bytes of memory). From the perspective of kernel memory allocation, kmalloc can be seen as an effective supplement to get_free_page (s), and the memory allocation granularity is more flexible.

If you are interested, you can find various slab statistics used in the kernel execution site in / proc/slabinfo, where you will see the usage information of all the slab in the system. From the information, we can see that in addition to the slab used by the special structure, there are also a large number of Slab for Kmalloc (some of which are for dma).

Vmalloc ()

From the perspective of memory management theory, the purpose of partnership and slab technology is basically the same. they are all to prevent "sharding", but sharding is divided into external slicing and internal slicing. The so-called internal slicing means that the system has to allocate a large area of continuous memory to it in order to meet the needs of a small memory area (continuous), resulting in a waste of space. The outer part of the chip means that although the system has enough memory, it is scattered fragments, which can not meet the demand for large chunks of "continuous memory". No matter what kind of fragmentation is an obstacle to the effective use of memory in the system. The slab allocator allows a large number of small blocks of memory contained in a page to be allocated independently, avoiding internal fragmentation and saving free memory. The partnership manages memory blocks in groups by size, which to some extent reduces the harm of external fragmentation, because page frame allocation is no longer blind, but in order according to size, but the partnership only reduces external fragmentation, but it has not been completely eliminated. Compare yourself with the rest of the free memory after allocating pages many times.

So the ultimate idea of avoiding outer slices is how to use discontiguous memory blocks to form "seemingly large memory blocks"-- this is very similar to user space allocating virtual memory, which is logically continuous, actually mapping to physical memory that is not necessarily contiguous. The Linux kernel borrows this technique, allowing kernel programs to assign virtual addresses in the kernel address space, as well as using page tables (kernel page tables) to map virtual addresses to scattered memory pages. As a result, the problem of external fragmentation in the use of kernel memory is solved perfectly. The kernel provides the vmalloc function to allocate kernel virtual memory, which is different from kmalloc in that it can allocate much larger memory space than Kmalloc (which can be much larger than 128K, but must be a multiple of the page size), but compared with Kmalloc, Vmalloc needs to remap the kernel virtual address and must update the kernel page table, so the allocation efficiency is lower (space for time).

The kernel virtual memory allocated by vmalloc and the kernel virtual memory allocated by kmalloc/get_free_page are in different intervals and do not overlap. Because the kernel virtual space is managed by partitions, each performs its own duties. Process space addresses are distributed from 0 to 3G (actually to PAGE_OFFSET, which is equal to 0xC0000000 in 0x86). The address from 3G to vmalloc_start is a physical memory mapping area (which contains kernel images, physical page table mem_map, etc.). For example, the system memory I use is 64m (can be seen in free), then (3G--3G+64M) this memory should be mapped to physical memory. While the vmalloc_start location should be near the 3G+64M (say "nearby" because there will be an 8m size gap during the physical memory mapping area and vmalloc_start to prevent the jump boundary), the vmalloc_end location is close to 4G (say "close" because the final location system will reserve a 128k area for dedicated page mapping, and there may be a high-end memory mapping area, these are details, here we do not entanglement).

Fuzzy outline of memory distribution

The contiguous memory allocated by the get_free_page or Kmalloc function is trapped in the physical mapping area, so the kernel virtual address returned by them is only an offset (PAGE_OFFSET) from the actual physical address. You can easily convert it into a physical memory address. At the same time, the kernel also provides the virt_to_phys () function to convert the physical mapping area address in the kernel virtual space into a physical address. You know, the address in the physical memory mapping area corresponds to the kernel page table in order, and each physical page in the system can find its corresponding kernel virtual address (in the physical memory mapping area).

The address assigned by vmalloc is limited between vmalloc_start and vmalloc_end. Each block of kernel virtual memory allocated by vmalloc corresponds to a vm_struct structure (not to be confused with vm_area_struct, which is the structure of the process virtual memory area), and different kernel virtual addresses are separated by 4k free areas to prevent crossing boundaries-- see figure below). Like the characteristics of process virtual addresses, these virtual addresses have no simple displacement relationship with physical memory and can only be converted to physical addresses or physical pages through the kernel page table. It is possible that they have not yet been mapped, and physical pages are actually allocated only when a page fault occurs.

After reading the above, do you have any further understanding of how to understand Linux memory management? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.