In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces "how to understand Android Native memory analysis based on Rust". In daily operation, I believe many people have doubts about how to understand Android Native memory analysis based on Rust. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts of "how to understand Android Native memory analysis based on Rust". Next, please follow the editor to study!
1. Pain points and demands of Android Native memory analysis
This section mainly introduces why we do this and what we expect to achieve in this matter.
1.1 defects in existing tools
Android has perfect performance analysis tools at the Java level, but there is no complete solution at the Native level. The main manifestations are as follows:
Do not support Android 4.x, online statistics show that 4.x version of the car still accounts for a large proportion, so this has become a problem that can not be ignored.
Android's native malloc_debug features behave differently in different versions, and most Android operating systems have been customized by system manufacturers, so there is no guarantee that these features will be available.
Therefore, it is impossible to analyze the performance of Android memory based on the inherent functions of Native system.
Our team has made some achievements in this area before, but there are still the following problems:
Hook the entry / end position of Native code function by modifying the compilation parameters, resulting in a serious performance degradation.
Because of the intrusive analysis, the analysis of memory problems needs to compile a separate package analysis, which greatly reduces the efficiency of solving the problem, and the troubleshooting cost of a memory leak is calculated on a daily basis.
Lack of accurate memory usage data.
1.2 create a complete Native memory performance analysis scheme
Combined with the pain points of door-to-door problems, we hope to have a complete set of Native memory performance analysis scheme. The specific demands are shown in the following points:
Most Android systems, including Android 4.x, are supported.
Non-invasive analysis, the discovery of memory problems and accurate location are completed at the same time.
Excellent performance and low overhead.
Support long-term memory leakage pressure test. The R & D team, including car factory customers, will conduct pressure tests on navigation and need to be able to support long-term stress tests and locate memory leaks.
Function-level memory usage data. The original scheme focused on solving the problem of memory leakage, and the memory usage data obtained was not accurate enough. We hope that the new scheme can obtain detailed memory usage data to support memory performance optimization.
two。 Memory Tower (MemTower) solution
This section mainly introduces the implementation of the memory-profiler project and the process of porting the memory tower (MemTower) scheme to the Android platform and the improvement of the original scheme. Explain how we achieve and meet the above demands.
2.1 Select Rust & Memory-profiler
In view of the door-to-door demands, it is expected to find a new solution. I happened to be studying Rust at that time, so I found the memory-profiler (hereinafter referred to as mp) project in combination with keyword search on GitHub. The author is a former Nokia engineer. Then there is the back memory tower. This section mainly describes the related principles and functions of mp combined with Rust to achieve memory Profile.
2.1.1 Hook implementation
The usual scenarios for Native memory performance analysis are memory call requests such as Hook malloc and free. The same is true of the principle of mp, using LD_PRELOAD preloaded custom library to achieve the Hook of memory operation functions. The biggest problem with this solution is that it is easy to cause circular malloc calls. As shown in the figure below, after the program memory request is made by Hook, the memory request of the Hook business itself will also trigger the memory request, resulting in a circular call to malloc and a stack crash.
Mp takes advantage of Rust's customizable memory allocator (Allocator), using jemalloc, once the Rust default memory allocator, as a custom allocator, and replaces the final memory request mmap with a custom function entry in the c code of jemalloc-sys (thus distinguishing the application from its own mmap call), and finally calls the mmap system call.
After the Rust memory request is forwarded to the system call, the applied memory request needs to be passed on to the system libc. Mp uses the feature switch of Rust to process application memory requests in two ways, both of which are achieved by specifying the link_name attribute in Rust:
Forward the application memory request to libc directly through the link_name of _ _ libc_malloc
By specifying the function entry _ rjem_malloc of jemallocator, the application and Rust share jemalloc.
In the end, you can make the Hook business use the full Rust language functions without having to worry about the collapse of loop calls caused by Rust's own code.
2.1.2 High performance stack inverse solution
In addition to using the characteristics of Rust system programming language to avoid memory loop calls, the author also uses the high-performance characteristics of Rust to implement several high-performance stack inverse solutions.
Take advantage of the stack backtracking information provided by the .eh _ frame section of ELF (C++ exception handling mechanism).
Based on the stack backtracking of .ARM.exidx + .ARM.extab, this is the unwind table provided by ARM.
The specific implementation can be seen in the author's Crate not-perf. Here choose the second way to illustrate, as shown in the figure, the stack of each thread uses thread local storage to maintain a set of stack frame cache, which comes from the unwind table information in the ELF file. When the stack frame is missed in the cache, the corresponding binary unwind table will be loaded into memory, and when hit, there is no need to read the file. Usually, the address space of the binary does not change after it is loaded, so the cache is very efficient. The disadvantage is that each thread has a complete set of caches. From the system level, it takes up a lot of memory overhead.
2.1.3 powerful data analysis capabilities
From the page of mp, we can see that in addition to memory Profile, there is also a corresponding data analysis server, using actix-web framework, and has a very powerful analysis function. The main features are as follows:
The timing curve from the perspectives of memory usage and leakage is very intuitive.
It is equipped with a very powerful filter, which can filter queries and corresponding memory flame graphs according to memory life cycle, function, time and other dimensions.
All functions have RESTful API interface, which can be easily customized.
Detailed instructions for use are not introduced too much here.
2.2 Transplant
After understanding the basic principles of mp, this section focuses on the various problems encountered in the process of porting the Android platform.
2.2.1 Custom Allocator
Mp's Hook scheme has many problems on the Android platform, mainly reflected in the following points:
Jemalloc itself is also the beginning of Android 5.0.The jemalloc-sys that comes with mp will result in two jemalloc in an app, which will eventually show a variety of abnormal crashes on different versions, and troubleshooting becomes a hindrance.
_ _ libc_malloc is the alias of the malloc function entry provided by glibc, but there is no such implementation on the Android platform.
Therefore, we use the most primitive dlsym method to obtain the memory-related function entry, and then package it into Rust Allocator. Applied memory requests also use these function addresses. As shown in the following figure, eventually all memory requests are passed to libc, so that the business code of Rust is transparent to libc.
2.2.2 Stack backtracking
There are also some migration changes in this stack backtracking. It is mentioned above that the author provides a stack backtracking method based on C++ exception handling mechanism, but this scheme requires relying on C++ library. C does not become the default dependency until after Android 8.0. This requires that applications running prior to 8.0 must also rely on the C++ library. So we removed the stack backtracking scheme and got rid of the dependency.
2.2.3 address space overload
When the program starts or calls dlopen/dlclose, the linker will load (or unload) the ELF file, and accordingly, the address space of the program will change. At this time, the address space in the stack backtracking cache may become invalid and need to be reloaded (reload). The reload operation scans the entire address space for changes, which is very expensive. At the same time, there is a need for a low-cost way to obtain address space changes. There are two main ways to implement mp:
The interface provided by libc is dl_iterate_phdr. Android API_LEVEL is less than 21 (that is, before 5. 0). After 5. 0, the structure of this function is different from the implementation in the higher version of Android. Therefore, the single C structure format defined by Rust will cause dirty data to be read as reload basis, resulting in very high frequency of reload.
The PERF_RECORD_MMAP2 event for Perf, which requires a kernel version greater than 3.16. So this is not available on Android 4.x.
In the actual running process, after the program has loaded all dependent ELF, the address space rarely changes again. Therefore, we modified to overload the address space only when the new ELF is loaded. The flame diagram results show that the computational cost of Hook can be greatly reduced.
2.3 improvement
So far, the memory tower has worked correctly (including 4.x) on the version of Android that supports LD_PRELOAD. But there is one thing that cannot be satisfied in the above request: long-term memory leakage pressure testing. And in the process of data analysis, we hope to have more dimensional information. Therefore, this section mainly introduces our improvements to the memory tower.
2.3.1 memory leak stress test
Mp was originally positioned, as its name suggests, to be a memory performance analysis tool that records full memory information. This determines the scale of its data. In multiple business scenarios that take a long time to test for an hour, the number of sampling data files generated is as many as 1GB~7GB, depending on the memory usage. This amount of data can not meet the needs of the business.
Therefore, we have added the memory leak detection mode (ONLY_LEAKED), which works as follows:
Each stack frame recorded in memory opening is recorded in a dictionary tree (Trie Tree), and the memory opened is recorded at the same time.
Update the node information corresponding to the dictionary tree when memory is freed. If the current leak reaches a certain threshold (such as 100MB), then stop sampling.
Writes the unreleased memory records stored in the entire dictionary tree to the file at the end of sampling.
The advantage of this mode is that the final amount of data is very small, and the actual data file size for an hour is between 100~200MB. After entering the compression of the postprocess subcommand that comes with mp, the size is less than 100MB. The deficiency is that the memory tower needs to cache a full amount of stack history data in memory, and the memory growth tends to stabilize when there are no new stack frame records.
2.3.2 enhanced Analysis filter
Navigation has a lot of business module divisions and threads, so the filter option by thread and library regularization has been added.
2.3.3 improvement of memory flame diagram
The memory flame chart of the original mp scheme takes the memory size (allocated) as the fire graph dimension, and the memory opening times (allocations) is also a very important index when analyzing the memory performance, so the memory opening times flame chart is added. This was the earliest improved function, and the shape of the flame diagram was similar to the shape of a tower, so the project was renamed: MemTower.
The last point is that the flame information of the original scheme is not divided by thread, so it will be more intuitive when we distinguish the stack information by thread.
Number of allocation flame diagram
Assigned size flame diagram
3. The ability of memory tower and more possibilities
The last section describes the capabilities, benefits, and possibilities offered by memory towers.
3.1 capability
Memory tower (MemTower) depends on setprop wrap.com.xxx.xxx and root permissions below Android 8.0. if version 8.0 or above does not have root permission, you can also load the memory tower library by configuring the Android project wrap.sh. In addition, due to the native support of Linux in mp, we have also successfully adapted embedded Linux projects such as Mercedes-Benz Daimler.
Supported platform: Android 4.x, 5.1.1 and 7 or higher (Bug exists on 5.0 and 6 systems, setprop cannot be set). Linux x86_64, AArch74, Arm.
Sampling method: non-invasive. Intrusive mode is optional for non-Root devices.
Sampling mode: general performance analysis mode and memory leak pressure test mode.
Features: high-performance stack inversion, perfect memory analysis Insight experience (multi-dimensional filter analysis, memory flame diagram, etc.).
Originally found that the memory leak problem re-package the second pressure test analysis, and then infer the possible leak point of the process time is calculated on a daily basis. The fine data can be parsed in a few minutes after a test by using the memory tower (MemTower), which greatly reduces the cost of memory performance analysis. This set of Hook ideas and high-performance stack inverse solutions provided by mp can actually be limited not only to memory analysis, but also to IO performance analysis or other problems.
At this point, the study on "how to understand Android Native memory analysis based on Rust" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.