Memory Analysis BPF tool 07/06 Update SLTechnology News&Howtos

Memory Analysis BPF tool

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/03 Report--

The kernel and processor are responsible for mapping virtual memory to physical memory. To improve efficiency, memory maps are created in a memory group called pages, where the size of each page is the details of the processor. Although most processors also support greater capacity, 4 KB,Linux usually calls it hugepage large pages. The kernel can service physical memory page requests from its own free list, and the kernel maintains these requests for each DRAM group and CPU to improve efficiency. The kernel's own software usually consumes memory from these free lists through kernel allocators such as the slab allocator.

Memory pages and swapping

The life cycle of a typical user memory page is shown in figure 7-2, which enumerates the following steps:

1. The application starts with a memory allocation request (for example, libc malloc ()).

two。 The allocation library can service memory requests from its own free list, or it may need to expand virtual memory to accommodate it. According to the allocation library, it will:

1. Expand the heap size by calling brk () syscall and allocating heap memory.

two。 Create a new memory segment through the mmap () system call.

3. Later, the application attempts to use the allocated memory range by storing and loading instructions, which involves invoking the processor memory management unit (MMU) for virtual-to-physical address translation. At this point, the lie of virtual memory is exposed: the address is not mapped! This results in a MMU error called a page fault.

4. Page errors are handled by the kernel, which establishes a mapping from its physical memory available list to virtual memory, and then notifies MMU of the mapping for later lookup. Now, this process takes up additional physical memory pages. The amount of physical memory used by a process is called its resident set size (RSS).

5. When there is too much memory demand on the system, the kernel page output daemon (kswapd) may look for available memory pages. It frees up one of three types of memory (although only (c) is shown in figure 7-2, because it shows the life cycle of the user's memory page:

1. Unmodified file system pages read from disk (called "supported by disk"): these pages can be released immediately and simply re-read as needed. These pages are application executable text, data, and file system metadata.

two。 Modified file system pages: these are "dirty" and must be written to disk before they can be released.

3. Application memory pages: because they do not have a file source, they are called anonymous memory. If you are using switching devices, you can first store them on the switching device and release them. Writing a page to a switching device is called switching (on Linux).

Memory allocation requests are usually frequent activities: for busy applications, user-level allocations can occur millions of times per second. Loading and storing instructions and MMU lookups are more frequent. They can happen billions of times per second. In figure 7-2, these arrows are shown in bold. Other activities are relatively rare: brk () and mmap () calls, page errors, and page exit (brighter arrows).

Page-out daemon page output daemon

Periodically activate the page output daemon (kswapd) to scan the LRU list of inactive and active pages for available memory. As shown in figure 7-3, when the free memory passes the low threshold, it will wake up, and when the free memory passes the high threshold, it will return to sleep.

Kswapd coordinates background page calls out; these should not directly damage application performance except for CPU and disk Imax O contention. If kswapd cannot free memory fast enough, the adjustable minimum page threshold is exceeded and direct recycling is used; this is the foreground mode of freeing memory to meet allocation conditions. In this mode, allocate blocking (pause) and wait for the page to be released synchronously.

Direct recycling can call kernel module contractile functions: this freed memory may remain in the cache, including the kernel slab cache.

Swap devices switching equipment

Swapping devices provide a degraded mode of operation for systems that are out of memory: processes can continue to allocate, but now move infrequently used pages into and out of the swapping device, which usually makes applications run much slower.

Some production systems can run without swapping; the reason for this is that for those critical systems, a degraded mode of operation is never acceptable, because these critical systems may have many redundant (and healthy) servers that are much easier to use than the ones that started swapping. (for example, this is usually the case for Netflix cloud instances.)

If there is no swap system running out of memory, the kernel oom killer sacrifices a process. To avoid this, configure the application to never exceed the system's memory limit.

Oom killer

The Linux out-of-memory killer is the last resort to free memory: it will use heuristics to find victim processes and sacrifice them by killing them. Heuristic search will release the biggest victim of many pages, and this is not a critical task, such as kernel threads or init (PID 1). Linux provides a way to adjust the behavior of OOM killers throughout the system and in each process.

Page compaction page compression

Over time, the released pages become fragmented, making it difficult for the kernel to allocate large contiguous blocks as needed. The kernel uses compressors to move pages, freeing up contiguous areas.

File system caching and buffering file system caching and buffering

Linux borrows free memory for file system caching and restores it to idle state when needed. As a result of this borrowing, the system reports that the available memory tends to zero after Linux starts, which may cause users to worry that the system will actually run out of memory when it is just warming up its file system cache. In addition, the file system uses memory for write-back buffering (write-back buffering).

You can adjust the Linux to prefer to free memory from the file system cache or by swapping (by adjusting the parameter vm.swappiness).

Traditional analysis tools

Traditional performance tools provide many capacity-based memory usage statistics, including how much virtual and physical memory is used per process and system, as well as some subdivisions, such as by process segment or panel. Analyzing memory usage goes beyond the basics, such as page error rates, allocation libraries, runtime or applications require built-in tools for each allocation, or you can use a virtual machine parser such as Valgrind; the latter approach may cause the target application to run more than 10 times slower during detection. BPF tools are more efficient and less expensive.

Tool

Type

Description

Dmesg

Kernel log

OOM killer event details

Swapon

Kernel statistics

Swap device usage

Free

Kernel statistics

System-wide memory usage

Kernel statistics

Process statistics, including memory usage

Pmap

Kernel statistics

Process memory usage by segment

Vmstat

Kernel statistics

Various statistics, including memory

Sar

Kernel statistics

Can show page fault and page scanner rates

Perf

Software events, hardware statistics, hardware sampling

Memory-related PMC statistics and event sampling

BPF tools related to memory analysis

Memory-related tools:

Tool

Source

Target

Description

Oomkill

BCC/BT

OOM

Shows extra info on OOM kill events

Show oom-related events

Memleak

BCC

Sched

Shows possible memory leak code paths

Show possible memory leak code paths

Mmapsnoop

Book

Syscalls

Traces mmap (2) calls system-wide

Tracking system-wide mmap calls

Brkstack

Book

Syscalls

Shows brk () calls with user stack traces

Displays the brk () call with the user stack trace

Shmsnoop

BCC

Syscalls

Traces shared memory calls with details

Track the details of shared memory calls

Faults

Book

Faults

Shows page faults, by user stack trace

Display page errors through user stack trace

Ffaults

Book

Faults

Shows page faults, by filename

Display page errors by file name

Vmscan

Book

Measures VM scanner shrink and reclaim times

Measuring the shrinkage and recovery time of vm scaner

Drsnoop

BCC

Traces direct reclaim events, showing latency

Track direct recycling events, showing delays

Swapin

Book

Shows swap-ins by process

Display swap by process

Hfaults

Book

Faults

Shows huge page faults, by process

Display large page error conditions by process

In addition, there are several BPF tools for memory analysis: kmem, kpages, slabratetop, numamove

Oomkill

Oomkill is a BCC and bpftrace tool for tracking out-of-memory killer events and printing details (such as average load). Load averaging provides some additional context for the state of the system at OOM, indicating whether the system is becoming busy or stable.

This output indicates that PID 18601 (perl) requires memory, which triggers OOM termination for PID 1165 (java). PID 1165 has a memory footprint of 18006224 pages;, which is typically 4 KB per page, depending on the processor and process memory settings. The average loadavg load indicates that the system becomes busier when the OOM terminates.

The tool works by using kprobes to track the oom_kill_process () function and print various details. In this case, the average load can be obtained simply by reading / proc/loadavg. When debugging OOM events, you can enhance this tool as needed to print additional details. In addition, the tool has not used oom trace points that display more detailed information about how to select a task.

Memleak

Memleak is a BCC tool that tracks memory allocations and idle events as well as allocation stack traces. Over time, it can show long-term survivors-allocations that have not yet been released.

This example shows the memleak running on the bash shell process:

Memleak alone cannot tell you whether these allocations are real memory leaks (memory leaks: allocated memory that is unreferenced and never freed), memory growth or long-term allocation. To distinguish between them, you need to study and understand the code path.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.