How to analyze memory Management in Curve 07/13 Update SLTechnology News&Howtos

How to analyze memory Management in Curve

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is to share with you about how to analyze the memory management in Curve. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article.

Preface

Several memory-related problems have been encountered in Curve practice, and the operating system memory management is related to the following two times:

Memory on chunkserver cannot be freed

The memory of mds is growing slowly.

Memory problems are mostly difficult to find in the development phase. High pressure stability tests and abnormal tests are often prone to problems in the testing phase. Of course, we need to be careful enough in the testing phase. In addition to paying attention to io-related metrics, we should also pay attention to the use of resources such as server memory / CPU/ Nic and whether the collected metric meets expectations. For example, the above problems mds memory growth is slow, if you only focus on whether the io is normal, it will not be found in the test phase. It is not easy to locate memory problems, especially in the case of large software scale.

The following is mainly from the developer's point of view to talk about memory management in Curve, not too much emphasis on memory management theory, the purpose is to share our understanding of Linux memory management and some methods of memory problem analysis in the process of software development. This article will be carried out from the following aspects:

Memory layout. Explain the memory layout with Curve software.

Memory allocation policy. Explain the necessity of the memory allocator, the problems that need to be solved and its characteristics, and then illustrate the memory management method of one of the memory allocators by an example.

Memory management of Curve. This paper introduces the selection and reasons of memory allocator for Curve software.

Memory layout

Before talking about memory management, let's briefly introduce the knowledge of memory layout.

When the software is running, it needs to occupy a certain amount of memory to store some data, but the process does not directly deal with the physical memory that stores the data, but directly manipulates the virtual memory. Physical memory is real, that is, bars of memory; virtual memory hides the concept of physical memory for processes, providing processes with simple and easy-to-use interfaces and more complex functions. The memory management mentioned in this article refers to virtual memory management. Why do you need to abstract a layer of virtual memory? How are virtual memory and physical memory mapped and managed? What is physical addressing? These issues of lower layers of virtual memory are beyond the scope of this article.

Linux maintains a separate virtual address space for each process, including two partial process virtual memory (user space) and kernel virtual memory (kernel space). The editor mainly discusses the user space that can be operated by the process, as shown in the following figure.

Now let's use pmap to see the distribution of the running curve-mds virtual space. Pmap is used to view the memory image information of a process, and this command reads the information in / proc/ [pid] / maps.

/ / pmap-X {process id} check the process memory distribution sudo pmap-X 2804620 / / the curve-mds memory distribution obtained by pmap has many items Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous ShmemPmdMapped Shared_Hugetlb Private_Hugetlb Swap SwapPss Locked Mapping / / for the convenience of display, the value after Pss has been deleted here. The middle address is omitted from 2804620: / usr/bin/curve-mds-confPath=/etc/curve/mds.conf-mdsAddr=127.0.0.1:6666-log_dir=/data/log/curve/mds-graceful_quit_on_sigterm=true-stderrthreshold=3 Address Perm Offset Device Inode Size Rss Pss Mapping c000000000 rw-p 00000000 00:00 0 65536 1852 559f0e2b9000 r-xp 00000000 41:42 37763836 9112 6296 curve- Mds 559f0eb9f000 Rafael 008e5000 41:42 37763836 136 136 136 curve-mds 559f0ebc1000 rw-p 00907000 41:42 37763836 44 4 curve-mds 559f0ebc2000 rw-p 00000000 00:00 010040 4244 4244 559f1110a000 rw-p 00000000 00:00 02912 2596 [heap] 7f6124000000 rw-p 00000000 00:00 0156 156 7f6124027000-- p 00000000 00:00 0 65380 00 7f612b7ff000-p 00000000 00:00 04 00 7f612b800000 rw-p 00000000 00:00 0 8192 8 8 7f612c000000 rw-p 00000000 00:00 0132 4 4 7f612c021000-p 00000000 00:00 065404 00. 7f6188cff000-p 0026c000 41:42 37750237 2044 00 7f61895b7000 r-xp 00000000 41:42 50201214 96 96 0 libpthread-2.24.so 7f61895cf000-p 00018000 41:42 50201214 2044 00 libpthread-2.24.so 7f61897ce000 Ruki p 00017000 41:42 50201214 44 4 libpthread-2.24.so 7f61897cf000 rw-p 00018000 41:42 50201214 44 4 Libpthread-2.24.so 7f61897d0000 rw-p 00000000 00:00 0 16 44 7f61897d4000 r-xp 00000000 41:42 50200647 16 16 0 libuuid.so.1.3.0 7f61897d8000-p 00004000 41:42 50200647 204400 libuuid.so.1.3.0 7f61899d7000 Ruki p 00003000 41:42 50200647 44 4 libuuid.so.1.3.0 7f61899d8000 rw-p 00004000 41:42 50200647 4 4 libuuid.so.1.3.0 7f61899d9000 r-xp 00000000 41:42 37617895 9672,89048904 libetcdclient.so 7f618a34b000-p 00972000 41:42 37617895 2048 libetcdclient.so 7f618a54b000 r Mustang p 00972000 41:42 3761789565565656645664 libetcdclient.so 7f618abb2000 rw-p 00fd9000 41:42 37617895 292 252 libetcdclient.so 7f618abfb000 rw-p 00000000 00:00 0 140 60 60 7f618ac1e000 r-xp 00000000 41:42 50201195 140 136 0 ld-2.24.so 7f618ac4a000 rw-p 00000000 00:00 0 1964 1 236 1236 7f618ae41000 r Mustang p 00023000 41:42 50201195 4 4 4 ld-2.24.so 7f618ae42000 rw-p 00024000 41:42 50201195 4 4 ld-2.24.so 7f618ae43000 rw-p 00000000 00:00 0 4 4 4 7fffffd19000 rw-p 00000000 00:00 0 132 24 [stack] 7fffffdec000 r Merry p 00000000 00:00 0 8 00 [vvar] 7fffffdee000 r-xp 00000000 00:00 0 8 4 [vdso] ffffffffff600000 r-xp 00000000 00:00 0 4 00 [vsyscall] = 1709344 42800 37113

The actual space consumed by the process in the above output starts with 0x559f0e2b9000, not the 0x40000000 drawn on the memory distribution map. This is because address space distribution randomization (ASLR) is used to randomly generate the starting address of key parts of the process address space (such as stack, library, or heap) in order to enhance system security and prevent malicious programs from attacking known addresses. A value of 1 or 2 for / proc/sys/kernel/randomize_va_space in Linux indicates that address space randomization is turned on, and the difference between values 1 and 2 is that the key parts of randomization are different; 0 means off.

Next, the files corresponding to the beginning of the three addresses of 0x559f0e2b9000 0x559f0eb9f000 0x559f0ebc1000 are all curve-mds, but they have different permissions for this file. Each letter represents the permission r-read w-write x-executable p-private s-share. Curve-mds is an elf type file. From the point of view of content, it contains code segment, data segment, BSS segment and so on. From the point of view of loading to memory, the operating system does not care about the contents of each segment, but only cares about the problems related to loading, mainly permissions, so the operating system will merge the segments with the same permissions to load. We see here that the permissions represented by code segments are readable and executable segments, read-only segments represented by read-only data, and read-writable segments represented by data segments and BSS segments.

Start further down the 0x559f1110a000, corresponding to the runtime heap in the figure above, where the dynamically allocated memory at run time will be performed. We find that there is also a random offset at the end of the .bss section.

Then 0x7f6124000000 starts, corresponding to the mmap memory-mapped area shown above, which contains dynamic libraries, large chunks of memory requested by users, and so on. Here we can see that both Heap and Memory Mapping Region can be used for dynamically allocated memory in programs using malloc, which will be expanded in the next section of the memory allocation strategy, which is also the focus of this article.

Then 0x7fffffd19000 starts with stack space, which is usually several megabytes.

Finally, the vvar, vdso and vsyscall regions are designed to implement virtual function calls to accelerate part of the system calls, so that the program can directly call the system calls without entering kernel state 1. There is no specific expansion here.

Memory allocation strategy

We usually use malloc to allocate memory in the Heap and Memory Mapping Region areas. Mallloc is actually done by two system calls: brk and mmap

Brk allocated region corresponding heap heap

The area assigned by mmap corresponds to Memory Mapping Region

If every developer uses system call brk and mmap to allocate free memory directly during software development, the development efficiency will become very low, and it will be very error-prone. Generally speaking, we will directly use memory management libraries in our development. Currently, there are three mainstream memory managers: ptmalloc`tcmalloc`jemalloc, all of which provide malloc and free interfaces. Glibc uses ptmalloc by default. The role of these libraries is to manage the area of memory it gets through system calls, and in general, a good general-purpose memory allocator should have the following characteristics:

The amount of extra space loss is as small as possible. For example, the application only needs 5k of memory, and the allocator allocates 10k to him, which results in a waste of space.

Distribute as quickly as possible.

Try to avoid memory fragmentation. Let's combine the picture to get a visual sense of memory fragmentation.

Versatility, compatibility, portability, easy to debug.

We use the following figure to visually illustrate the recovery and allocation of heap memory in a single thread by ptmalloc, the default memory manager of glibc:

Malloc (30k) allocates memory by system calling brk to extend the heap top.

Malloc (20k) continues to extend the heap top through a system call to brk.

Malloc (200k) requests memory greater than 128K by default (determined by M_MMAP_THRESHOLD, the default size is 128K, which can be adjusted), and allocates memory using the system call mmap.

The space of free (30k) is not returned to the system, but managed by ptmalloc. As can be seen from the malloc in steps 1 and 2, when we allocate space, we call brk to expand the heap top, and returning the space to the system is the opposite operation, that is, shrinking the heap top. Here, because the space of the second step malloc (20k) is not released, the top of the stack cannot be contracted at this time. This part of the space can be reallocated, such as malloc (10k), where 10k space can be allocated from here without having to apply for it through brk. Consider a situation where the space at the top of the heap is occupied all the time, and some of the space at the top of the heap is released by the application, but memory fragmentation is formed because there is not enough space to be used.

When the space application free (20k) is released, ptmalloc will merge the 20k and 30k areas just now, and if the idle at the top of the heap exceeds the M_TRIM_THREASHOLD, it will return the contraction to the operating system.

The space allocated by free (200k) mmap is returned directly to the system.

For multithreaded programs, how is ptmalloc allocated? In the case of multithreading, you need to deal with the competition among threads. If, as before, the space smaller than HEAP_MAX_SIZE (the default size of 64-bit system is 64m) uses brk extended heap top, and the space larger than HEAP_MAX_SIZE uses mmap application, then for programs with a large number of threads, if there are more frequent memory allocation operations on each thread, the competition will be very fierce. The method of ptmalloc is to use multiple allocation areas, including two types of allocation areas: the main allocation area and the dynamic allocation area.

Main allocation area: memory is allocated in the Heap and Memory Mapping Region areas

Dynamic allocation area: allocates memory in the Memory Mapping Region area and defaults to the size of each request in a 64-bit system. Main threads and threads that execute malloc first use different dynamic allocation areas, and once the number of dynamic allocation areas increases, the number of dynamic allocation areas does not decrease. The maximum number of dynamic allocation zones is (2 number of cores + 1) for 32-bit systems and (8 number of cores + 1) for 64-bit systems.

Take a multithreaded example to look at space allocation in this case:

/ / there are three threads / / main thread: allocate 4k space once / thread 1: allocate 4k space / / thread 1: allocate 4k space # include # include void* threadFunc (void* id) {std::vector malloclist; for (int I = 0; I < 100) Malloclist.emplace_back +) {malloclist.emplace_back ((char*) malloc (1024 * 4));} sleep (300); / / waiting here is to check the memory distribution} int main () {pthread_t T1 void* s; int ret; char* addr T2; int id1 = 1; int id2 = 2; void* s; int ret; char* addr Addr = (char*) malloc (4 * 1024); pthread_create (& T1, NULL, threadFunc, (void *) & id1); pthread_create (& T2, NULL, threadFunc, (void *) & id2); pthread_join (T1, NULL); pthread_join (T2, NULL); return 0;}

Let's use pmap to check the memory distribution of the program:

741545:. / memory_test Address Perm Offset Device Inode Size Rss Pss Mapping 56127705a000 r-xp 00000000 08:02 62259273 4 4 memory_test 56127725a000 r Murray p 00000000 08:02 62259273 4 4 4 memory_test 56127725b000 rw-p 00000001000 08:02 62259273 4 4 memory_test 5612784b9000 rw-p 00000000 00:00 0132 8 [heap] * * 7f0df0000000 rw-p 00000000: 000 404 404 404 7f0df0065000-p 00000000 00:00 0 65132 00 7f0df8000000 rw-p 00000000 00:00 0404 404 404 404 7f0df8065000-p 00000000 00:00 065132 00 * * 7f0dff467000-p 00000000 00:00 04 00 7f0dff468000 rw-p 00000000 00:00 08192 8 7f0dffc68000-p 00000000 00:00 04 7f0dffc69000 rw-p 00000000 00:00 0 8192 8 7f0e00469000 r-xp 00000000 08:02 50856517 1620 1052 9 libc-2.24.so 7f0e005fe000-p 00195000 08:02 50856517 204800 libc-2.24.so 7f0e007fe000 r color p 00195000 08:02 50856517 16 16 libc-2.24.so 7f0e00802000 rw-p 00199000 08:02 50856517 8 8 libc-2.24.so 7f0e00804000 rw-p 00000000 00:00 0 16 12 12 7f0e00808000 r-xp 00000000 08:02 50856539 96 96 1 libpthread-2.24.so 7f0e00820000-p 00018000 08:02 50856539 2044 00 libpthread-2.24.so 7f0e00a1f000 Rashi p 00017000 08:02 50856539 44 4 libpthread-2.24.so 7f0e00a20000 rw-p 00018000 08:02 50856539 4 4 4 libpthread-2.24.so 7f0e00a21000 rw-p 00000000 00:00 0 16 4 7f0e00a25000 r-xp 00000000 08:02 50856513 140 1 ld-2.24.so 7f0e00c31000 rw-p 00000000 00:00 016 16 16 7f0e00c48000 r Murray p 00023000 08:02 50856513 4 4 ld-2.24.so 7f0e00c49000 rw-p 00024000 08:02 50856513 4 4 4 ld-2.24.so 7f0e00c4a000 rw-p 00000000 00:00 0 4 4 4 7ffe340be000 rw-p 00000000 00:00 0132 12 12 [stack] 7ffe3415c000 r Merry stack 00000000 00:00 0 8 00 [vvar] 7ffe3415e000 r-xp 00000000 00:00 0 8 40 [vdso] ffffffffff600000 r-xp 00000000 00:00 0 4 00 [vsyscall] = 153800 2224 943

Pay attention to the bold section above, the red area adds up to 65536K, of which 404K is rw-p (readable and writable) permission, 65132K is-p (unreadable) permission; the yellow area is similar. When the two threads are allocated, the ptmalloc is given a dynamic partition, and each time 64m of memory is applied, and then a portion of the 64m is sliced out to the application.

Here's another interesting phenomenon: let's use strace-f-e "brk, mmap, munmap"-p {pid} to trace the system calls in malloc:

Mmap (NULL, 8392704, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK,-1, 0) = 0x7f624a169000strace: Process 774601 attached [pid 774018] mmap (NULL, 8392704, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK,-1, 0) = 0x7f6249968000 [pid 774601] mmap (NULL, 134217728, PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,-1, 0) = 0x7f6241968000 [pid 774601] munmap (0x7f6241968000, 40468480strace: Process 774602 attached) = 0 [pid 774601] munmap (0x7f6248000000, 26640384) = 0 [0x7f6248000000 774602] pid (pid, 134217728) PROT_NONE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_NORESERVE,-1,0) = 0x7f623c000000 [pid 774602] munmap (0x7f6240000000, 67108864) = 0

Here the main thread [774018] requires the allocation of 8M+4k space; thread 1 [774601] first mmap 128m space, and then returns 40468480 bytes of 0x7f6241968000 as the starting address and 26640384 bytes of 0x7f6248000000 as the starting address, and the rest is 0x7F6244000000 ~ 0x7F6248000000. Apply first and then return or to ensure that the start and end address of this part of the allocated memory is byte aligned.

Memory Management of Curve

Two allocators are selected in Curve: ptmalloc and jemalloc. Where MDS uses the default ptmalloc,Chunkserver and Client side uses jemalloc.

The two issues mentioned at the beginning of this article are explained here. First of all, the slow growth of MDS memory, the phenomenon is the daily growth of 3G. The process of analyzing this problem is as follows:

The first is to use pmap to view the memory distribution. We use pmap to look at the slow growth of memory allocation in curve-mds, and find that there is a large amount of 64m memory allocated in Memory Mapping Region, and it is still allocated after a period of time. A memory leak is suspected from here.

Then check the requested stress on the application. Looking at the relevant business metric on MDS, it is found that the pressure on MDS is very small, and the iops of some control plane rpc is about a few hundred, which should not be caused by high business pressure.

Next, look at the data on the 64m memory of the curve-mds section. Use gdb-p {pid} attach to trace the thread, dump meemory mem.bin {addr1} {addr2} to get the memory of the specified address segment, and then look at this part of the memory to basically identify several suspicious points.

Check the code according to these points to see if there is a memory leak.

The Chunkserver side does not use jemalloc from the beginning, but also uses the default ptmalloc at first. Instead of jemalloc, it is mentioned at the beginning of this article that the memory of Chunkserver cannot be released during testing. The phenomenon of this problem is that the memory of chunkserver grows rapidly within 2 hours, with a total increase of about 50 GB, but it is not released later. The process of analyzing this problem is as follows:

First analyze the data sources in memory. This is different from MDS, where MDS is full of control plane requests and some metadata caching. The memory growth on Chunkserver generally comes from two places: one is the request sent by the user, and the other is the synchronization of data between copyset's leader and follower. Both of these involve the brpc module.

The memory management of brpc has two modules, IOBuf and ResourcePool. The space in IOBuf is generally used to store user data. ResourcePool manages socket, bthread_id and other objects, and the memory object unit managed is 64K.

View some trend indicators for the corresponding module. Observing the metric of these two modules, it is found that the memory consumed by IOBuf and ResourcePool has the same growth trend during this period of time.

After IOBuf, the occupied memory is returned to ptmalloc, and the memory managed in ResourcePool is not returned to ptmalloc but managed by itself.

From this phenomenon we suspect that the memory ptmalloc returned by IOBuf to ptmalloc cannot be freed.

Analysis and verification. Combined with the memory allocation strategy in section 2, if the top space of the heap is occupied all the time, the top-down space of the heap cannot be released. You can still use pmap to determine the guess by looking at the current size of memory on the heap and the permissions of memory (whether there is a lot of-- p permission). So jemalloc is used in the later Chunkserver. Here you can see that in the case of multithreading, if part of the memory is held by the application for a long time, using ptmalloc may encounter the problem that the memory cannot be freed.

The above is how to analyze the memory management in Curve. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.