Systematic Solution for Android Native Memory Leak
Memory leaks are one of the most critical issues for any android app, and identifying and analyzing C++ memory leaks on the Android platform can be daunting for developers. The AMAP app contains a huge volume of C++ code to manage the high-performance requirements posed by map rendering, navigation, and other core functions. To address the C++ memory leak challenges, AMAP’s technical team has developed a unique solution based on practical experience to ensure better product quality.
Allocation function statistics and stack backtrace are the core means of identifying and analyzing C++ memory leaks. To resolve C++ memory leaks, it is imperative to determine both, the memory allocation point and call stack, otherwise, the problem gets complicated and the costs of the resolution also rise.
In Android, the malloc_debug module of Bionic comprehensively monitors and collects statistics regarding the memory allocation function, however, stack backtrace is inefficient. As Android is developed, Google provides a few methods for analyzing stack backtrace, but these methods also witness the following obstacles:
- Libunwind is used throughout stack backtrace, which consumes numerous resources. When a large amount of native code exists, frequent calls of Libunwind may render applications unresponsive. The call stack that monitors all the memory operation functions needs to frequently call the related functions of Libunwind.
- Limitations on read-only memory (ROM) make routine development and testing inconvenient.
- An environment must be prepared for each troubleshooting operation that must be manually performed through CLI or DDMS. This process does not support comparative analysis and produces final results that are not intuitive.
Therefore, it is extremely important to implement efficient stack backtrace and build a systematic Android native memory analysis system. Consequently, AMAP’s improved and extended capabilities help to promptly identify and solve C++ memory leaks through automatic testing. This greatly improves development efficiency and reduces troubleshooting costs.
Stack Backtrace Acceleration
The Android platform uses Libunwind to perform stack backtrace, which usually meets the needs in most scenarios. However, global locks and unwind table parsing in the Libunwind implementation may result in performance loss and further renders application unresponsive and unavailable in case of frequent multithreaded calls.
A compiler has the -finstrument-functions compilation option that supports the insertion of user-defined functions (UDFs) in the beginning and at the end of a function.
It inserts calls to __cyg_profile_func_enter at the beginning of each function and inserts calls to __cyg_profile_func_exit at the end of the function. The two UDFs acquire call addresses, which can be recorded for acquiring the function call stack.
Sample results of instrumentation are illustrated in the image below:
Functions that require no instrumentation can be declared to the compiler through __attribute__((no_instrument_function)) function.
Information about calls can be recorded to ensure that the information is read by different threads without any impact. The recording can be implemented through thread synchronization. For example, you can add critical sections or mutexes when the involved variable is read or written, but this method lowers the efficiency.
Another method is to add locks based on thread-local storage (TLS), which is a dedicated storage area and accessible only by threads in that area. TLS eliminates the thread security issue and meets the requirement of call information recording. Therefore, to accelerate stack backtrace, you can record the call stack through instrumentation by using a compiler and then store the call stack in TLS. Following are the steps to implement it effectively:
- During the compilation process, insert the related code by using the -finstrument-functions compilation option.
- Record call addresses in TLS in the form of arrays and cursors to insert, remove, and acquire code as fast as possible.
- Define the data structures of arrays and cursors
- Initialize the storage key of thread_stack_t in TLS
- Initialize thread_stack_t and put it in TLS
- Implement __cyg_profile_func_enter and __cyg_profile_func_exit, and record the call address to TLS. The second parameter, call_site, of __cyg_profile_func_enter indicates the code segment address of a call point. The call address is recorded in the allocated array in TLS upon entry to the function, and the cursor ptr->current moves to the left. The cursor moves to the right upon exit from the function. The recording direction is inconsistent with the array growth direction so that the external interface for stack information acquisition can be more concise and efficient. You can directly copy the memory to acquire the call stack that starts with the address of the nearest call point and ends with the address of the farthest call point.
Refer to the logical diagram below:
- Provide an interface for stack information acquisition.
- Compile the preceding logic as a dynamic library as other service modules depend on the dynamic library compilation. Insert -finstrument-functions into the compilation flag for instrumentation so that calls to all functions are recorded in TLS. You can acquire the call stack by calling get_tls_backtrace(void** backtrace, int max) in any place.
Following charts outline the performance comparison based on Google Benchmarking performed on the 5.1 operating system of Huawei Enjoy 5s:
- Single-threaded acquisition through Libunwind
- Single-threaded acquisition through TLS
- Libunwind with 10 threads
- TLS method with 10 threads
As presented in the preceding statistical charts, in single-thread mode, the TLS method acquires the call stack at a speed that is 10 times faster than Libunwind. In 10-thread mode, the TLS method acquires the call stack at a speed that is 50 to 60 times faster than Libunwind.
Advantages and Disadvantages
The primary advantage of this solution is the highly improved acquisition speed which meets the requirement of frequent stack backtrace.
On the other hand, increased code size due to instrumentation by using the compiler is a critical limitation of this approach. The lengthy code can only be used by memory test packages, and not online. However, this problem can be solved through continuous integration via a common library. A corresponding memory test library is created from each C++ project upon delivery.
After accelerating the acquisition of the memory allocation stack, you can troubleshoot native memory leaks by using Google-provided tools, such as DDMS and the adb shell am dumpheap -n pid /data/local/tmp/heap.txt command. However, such troubleshooting is inefficient and must be performed in the defined mobile phone environment.
To improve the efficiency of troubleshooting native memory leaks, we built a comprehensive system as described below:
- Use the malloc_debug module of LIBC for memory monitoring. It is inconvenient to enable memory monitoring via the official method, as well as it is not conducive to automatic testing. Instead, you can compile code in your project to hook all memory functions and jump to the monitoring function leak_xxx of malloc_debug to execute the code. This approach ensures that malloc_debug monitors all memory applications and release operations and collects statistics accordingly.
- Use get_tls_backtrace to execute LIBC_HIDDEN int32_t get_backtrace_external(uintptr_t* frames, size_t max_depth) that is used in the malloc_debug module, in combination with stack backtrace acceleration.
- Establish socket communication to enable external programs for exchanging data through a socket and conveniently acquire memory data.
- Build a web client to acquire and parse the uploaded memory data for display. The address must be reverse engineered by using addr2line.
- Write test cases to be executed in combination with automatic testing. When testing begins, use the socket to collect memory information and subsequently store it. When testing ends, upload the memory information to the platform for parsing and send an evaluation email. When an alert is triggered, you can perform troubleshooting on the web client based on the memory curve and call stack information.