What are the knowledge points of Android GC? 04/13 Update SLTechnology News&Howtos

What are the knowledge points of Android GC?

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the knowledge points of Android GC what the relevant knowledge, the content is detailed and easy to understand, the operation is simple and fast, has a certain reference value, I believe you will have some gains after reading this Android GC knowledge points, let's take a look at it.

1. JVM memory recovery mechanism 1.1. Recovery algorithm

Tag recovery algorithm (Mark and Sweep GC)

Starting from the "GC Roots" collection, the memory is traversed once, retaining all the objects that can be directly or indirectly referenced by GC Roots, while the remaining objects are treated as garbage and recycled. This algorithm needs to interrupt the execution of other components in the process and may produce memory fragments.

Replication algorithm (Copying)

Divide the existing memory space into two fast, use only one of them at a time, copy the living objects in the memory in use to the unused memory blocks during garbage collection, and then clear all objects in the memory blocks in use. Swap the roles of the two memory to complete garbage collection.

Tag-compression algorithm (Mark-Compact)

First, all reachable objects need to be marked once from the root node, but then, instead of simply cleaning up untagged objects, it compresses all living objects to one end of memory. After that, clean up all the spaces outside the boundary. This method not only avoids the generation of fragments, but also does not need two pieces of the same memory space, so its performance-to-price ratio is relatively high.

Generation by generation

Put all the newly created objects into the memory area called the younger generation, which is characterized by the fact that the objects will be quickly recycled, so the more efficient replication algorithm is chosen in the younger generation. When an object survives after several collections, the object is put into memory space called the Old Age. For the new generation, it is suitable for the replication algorithm, while for the old age, the marking-compression algorithm is adopted.

1.2. The difference between replication and Mark-Compression algorithms

At first glance, there doesn't seem to be much difference between the two algorithms, both marked and then moved to another memory address for recycling, so why do different generations use different recycling algorithms?

In fact, the biggest difference between the two is that the former uses space for time and the latter uses time for space.

The former is not without independent "Mark" and "Copy" stages at work, but works together to do an action called Scavenge (or Evacuate, or Copy). In other words, every time you find a live object that has not been visited in this collection, you will Copy directly to the new place and set up Forwarding Pointer at the same time, which requires an extra space.

The latter needs separate Mark and Compact phases when working, and the Mark phase is used to find and mark all living objects, and then the compact phase moves the objects to achieve the purpose of Compact. If the Compact mode is Sliding Compaction, you can "slide" objects sequentially to one side of the space after Mark. Because you have already traversed the object graph in the whole space and know all the living objects, you can move in the same space without the need for an extra space.

Therefore, the recycling of the new generation will be faster, the recycling of the old era will take longer, and the application will be suspended in the compression phase, so we should try our best to avoid the occurrence of objects in the old era.

2. Dalvik virtual machine 2.1. Java reactor

The Java heap actually consists of an Active heap and a Zygote heap, where the Zygote heap is used to manage the various objects that the Zygote process preloads and creates during startup, while the Active heap is created before the Zygote process Fork the first child process. All subsequent application processes are Fork by the Zygote process and have their own Dalvik virtual machine. During the creation of the application, the Dalvik virtual machine uses the Cow policy to replicate the address space of the Zygote process.

Cow policy: at the beginning (when the address space of the Zygote process is not copied), the application process and the Zygote process share the same heap used to allocate objects. When a Zygote process or an application process writes to the heap, the kernel executes the

Line real copy operation, so that the Zygote process and the application process have their own copy, this is the so-called Cow. Because Copy is time-consuming, you must try to avoid Copy or as little Copy as possible.

To achieve this, when the first application process is created, the heap memory that is already used is divided into parts, and the heap memory that is not yet used is divided into another part. The former is called the Zygote heap and the latter is called the Active heap. All you have to do is copy the contents of the zygote heap to the application process. In the future, both Zygote processes and application processes will do it on the Active heap when they need to allocate objects. This allows the Zygote heap to be written as little as possible, thus reducing the amount of copying at write time. The objects allocated in the Zygote heap are mainly the classes, resources, and objects that are preloaded by the Zygote process during startup. This means that these preloaded classes, resources, and objects can be shared for a long time between Zygote processes and application processes. This reduces both copy operations and memory requirements.

2.2. Some indicators related to GC

We remember that when we optimized the GC stutter of a Meizu phone, we found that it was easy to trigger GC_FOR_MALLOC. This GC category will later say that it is caused by insufficient memory for allocated objects. But we also set up a large heap Size why there is not enough memory, here we need to understand the following concepts: the starting size of the Java heap (Starting Size), the maximum (Maximum Size), and the upper limit of growth (Growth Limit).

When starting the Dalvik virtual machine, we can specify the above three values with the options of-Xms,-Xmx, and-XX:HeapGrowthLimit, respectively. The above three values represent:

Starting Size: when the Dalvik virtual machine starts, it allocates an initial piece of heap memory to the virtual machine for use.

Growth Limit: is the maximum heap limit that the system gives to each program. Beyond this limit, the program will OOM.

Maximum Size: the maximum heap memory size without control starts with the maximum heap size that we can get from the system when we use the largeheap property.

At the same time, in addition to the above three indicators, there are also several indicators worthy of our attention, that is, heap minimum free value (Min Free), heap maximum free value (Max Free) and heap target utilization (Target Utilization). Assuming that after a certain GC, the memory occupied by the surviving object is LiveSize, then the ideal heap size should be (LiveSize / U). However, (LiveSize / U) must be greater than or equal to (LiveSize + MinFree) and less than or equal to (LiveSize + MaxFree). After each GC, the garbage collector will try to bring the heap utilization closer to the target utilization. So when we try to manually generate some objects of several hundred K and try to expand the available heap size, it will lead to frequent GC, because the allocation of these objects will lead to GC, and after GC, the heap memory will return to the appropriate proportion, and the local variables we use will soon be reclaimed as many objects that are still alive in theory, and our heap size will be reduced back to unable to achieve the purpose of expansion. At the same time, this is also a factor in the generation of CONCURRENT GC, which we will talk about in detail later.

2.3. Type of GC

GC_FOR_MALLOC: represents a GC triggered by insufficient memory when objects are allocated on the heap.

GC_CONCURRENT: when our application's heap memory reaches a certain amount, or it can be understood that it is almost full, the system will automatically trigger the GC operation to free memory.

GC_EXPLICIT: indicates that the GC is triggered when the application calls the System.gc or VMRuntime.gc interface or receives a SIGUSR1 signal.

GC_BEFORE_OOM: indicates that the GC is triggered by a final effort before preparing to throw an OOM exception.

In fact, the three types of GC, GC_FOR_MALLOC, GC_CONCURRENT, and GC_BEFORE_OOM, are triggered by the process of allocating objects. The main difference between concurrent and non-concurrent GC is that the former conditionally suspends and wakes non-GC threads in the process of GC, while the latter suspends non-GC threads all the time in the process of executing GC. Parallel GC makes applications more responsive by conditionally suspending and waking up non-GC threads. But at the same time, parallel GC requires one more operation to mark the root set objects and recursively mark those objects that have been accessed in the GC process, so it also needs to spend more CPU resources. We will also highlight the difference between the concurrent and non-concurrent GC of ART later.

2.4. Allocation of objects and timing of GC trigger

Call the function dvmHeapSourceAlloc to allocate the specified amount of memory on the Java heap. If the assignment is successful, the assigned address is returned directly to the caller. The function dvmHeapSourceAlloc allocates memory without changing the current size of the Java heap, which is a lightweight memory allocation action.

If the memory allocation in the previous step fails, you will need to perform a GC at this time. However, if the GC thread is already running, that is, the value of gDvm.gcHeap- > gcRunning is equal to true, then simply call the function dvmWaitForConcurrentGcToComplete until the GC execution is complete. Otherwise, you need to call the function gcForMalloc to execute the GC once, and the parameter false indicates that the object referenced by the soft reference object should not be recycled.

When the GC is finished, call the function dvmHeapSourceAlloc again to try a lightweight memory allocation operation. If the assignment is successful, the assigned address is returned directly to the caller.

If memory allocation fails in the previous step, consider setting the current size of the Java heap to the maximum Java heap specified when the Dalvik virtual machine starts, and then allocating memory. This is done by calling the function dvmHeapSourceAllocAndGrow.

If the function dvmHeapSourceAllocAndGrow is successful in allocating memory, the assigned address is returned directly to the caller.

If the memory allocation in the previous step still fails, then you have to make a tough move. Call the function gcForMalloc again to execute GC. The parameter true represents the object referenced by the soft reference object to be recycled.

After the execution of GC, the function dvmHeapSourceAllocAndGrow is called again for memory allocation. This is the last effort, success and things are over here.

The example diagram is as follows:

From this process, we can see that GC will be caused in the allocation of objects. The first time we fail to allocate objects, we will trigger GC but do not recycle the reference of Soft. If we re-allocate or fail, we will also reclaim the memory of Soft. The GC triggered by the former is a GC of type GC_FOR_MALLOC and the latter is a GC of type GC_BEFORE_OOM. When the memory allocation is successful, we will determine whether the current memory footprint has reached the threshold of GC_CONCURRENT, and if so, it will trigger GC_CONCURRENT.

So where does this threshold come from? for the target utilization rate we mentioned above, we will record a target value after GC. Theoretically, this value needs to be within the above range. If not, we will select the boundary value as the target value. The virtual machine records this target value as the total memory that is currently allowed to be allocated. At the same time, the fixed value (200cm 500K) is subtracted from the target value as the threshold for triggering GC_CONCURRENT events.

2.5. Reclaim algorithms and memory fragments

Most of the mainstream Davik adopts tagging and cleaning (Mark and Sweep) recovery algorithm, and some implement copy GC, which is different from HotSpot. The specific algorithm is determined at compile time and cannot be changed dynamically at run time. If the "WITH_COPYING_GC" option is specified in the command to compile the dalvik virtual machine, compile the "/ dalvik/vm/alloc/Copying.cpp" source code-this is the implementation of the copy GC algorithm in Android, otherwise compile "/ dalvik/vm/alloc/HeapSource.cpp"-which implements the tagging and cleaning GC algorithm.

Due to the shortcomings of the Mark and Sweep algorithm, it is easy to cause memory fragmentation, so under this algorithm, when we have a large amount of discontinuous small memory and allocate a larger object, it is still very easy to cause GC, such as decode images on the phone, as shown below:

Therefore, for Dalvik virtual machine phones, we should first try to avoid generating a lot of temporary small variables (for example, new objects in getView, onDraw and other functions), and try to avoid generating a lot of large objects with a long life cycle.

3. ART memory recovery mechanism 3.1. Java reactor

The main components of the Java heap used inside the ART runtime include Image Space, Zygote Space, Allocation Space and Large Object Space, which are used to preload some classes. Zygote Space and Allocation Space play the same role as the Zygote heap and Active heap in the garbage collection mechanism of the Dalvik virtual machine.

Large Object Space is a collection of discrete addresses that are used to allocate large objects to improve the management efficiency and overall performance of GC, as shown below:

In the following GC Log, we can also see that the GC Log of ART contains the information of LOS, which makes it easy for us to see the situation of large memory.

3.2. Type of GC

KGcCauseForAlloc: GC caused by insufficient memory when allocating memory, in which case the GC will Stop World.

KGcCauseBackground: when the memory reaches a certain threshold, it will go to GC, this time is a background GC, will not cause Stop World.

KGcCauseExplicit, which shows the gc of the call. If ART turns on this option, GC will be performed on system.gc.

There's more.

3.3. Allocation of objects and timing of GC trigger

Since there is basically no difference between the memory allocation under ART and that under Dalvik, I have directly pasted it.

3.4. Concurrent and non-concurrent GC

Unlike Dalvik, ART has only one recycling algorithm on GC. ART will choose different recycling algorithms under different circumstances. For example, non-concurrent GC will be used when Alloc is out of memory, and concurrent GC will be triggered when memory reaches a certain threshold after Alloc. At the same time, the GC strategy is also different in the front and background, and we will explain it to you one by one later.

Non-concurrent GC

Step 1. Call the member function InitializePhase implemented by the subclass to perform the GC initialization phase.

Step 2. Suspend all ART runtime threads.

Step 3. The member function MarkingPhase implemented by the subclass is called to perform the GC marking phase.

Step 4. The member function ReclaimPhase implemented by the subclass is called to perform the GC recovery phase.

Step 5. Restore the ART runtime thread that was suspended in step 2.

Step 6. Call the member function FinishPhase implemented by the subclass to execute the GC end phase.

Concurrent GC

Step 1. Call the member function InitializePhase implemented by the subclass to perform the GC initialization phase.

Step 2. Gets the lock used to access the Java heap.

Step 3. The member function MarkingPhase implemented by the subclass is called to perform the GC parallel marking phase.

Step 4. Releases the lock used to access the Java heap.

Step 5. Suspend all ART runtime threads.

Step 6. The member function HandleDirtyObjectsPhase implemented by the subclass is called to handle objects that are modified during the GC parallel marking phase.

Step 7. Restore the suspended ART runtime thread in step 4.

Step 8. Repeat steps 5 through 7 until all objects that were modified during the GC parallelism phase are processed.

Step 9. Gets the lock used to access the Java heap.

Step 10. The member function ReclaimPhase implemented by the subclass is called to perform the GC recovery phase.

Step 11. Releases the lock used to access the Java heap.

Step 12. Call the member function FinishPhase implemented by the subclass to execute the GC end phase.

Therefore, whether it is concurrent or non-concurrent, it will lead to the occurrence of Stop World. In the case of concurrency, the time of a single Stop World will be shorter. The basic difference is similar to Dalvik.

3.5. Differences between ART concurrency and Dalvik concurrency GC

First of all, you can compare it with the following two pictures.

Dalvik GC:

ART GC:

What is the difference between the concurrent GC of ART and the concurrent GC of Dalvik? at first glance, it seems that the two are similar. Although the thread is not suspended all the time, there will be a process of pausing the thread to execute the marked object. By reading the relevant documentation, you can see that ART concurrent GC has three main advantages for Dalvik:

Mark itself

When assigning objects, ART will push the newly allocated objects into the Allocation Stack described by the member variable allocationstack of the Heap class, thus reducing the scope of object traversal to a certain extent.

Pre-read

When marking the memory of Allocation Stack, the next object to be traversed is pre-read, and the other objects referenced by the object are pushed into the stack until the traversal is complete.

Reduce Suspend time

Other threads will not be Block in the Mark stage, and there will be dirty data at this stage, such as the data that Mark found will not be used but is used by other threads at this time. Some dirty data will also be processed in the Mark phase rather than left in the last Block, which will also reduce the processing time for dirty data in the later Block phase.

3.6. Foreground and background GC

The foreground Foreground refers to when the application is running in the foreground, while the background Background is when the application is running in the background. Therefore, Foreground GC is the GC that the application executes when the foreground is running, and Background is the GC that the application executes when it is running in the background.

When the application is running in the foreground, responsiveness is the most important, so the executed GC is also required to be efficient. On the contrary, responsiveness is not the most important thing when the application is running in the background, so it is appropriate to solve the problem of memory fragmentation of the heap. Therefore, Mark-Sweep GC is suitable as a Foreground GC and Mark-Compact GC is suitable as a Background GC.

Due to the ability of Compact, fragmentation can be well avoided on ART, which is also a good capability of ART.

3.7. ART is a good method.

Generally speaking, ART does much better than Dalvik on GC, not only the efficiency of GC and reducing Pause time, but also a separate allocation area for large memory in memory allocation, and there can also be algorithms to do memory demarcation in the background to reduce memory fragmentation. For developers, we can basically avoid a lot of stutter problems caused by GC under ART. In addition, according to Google's own data, ART is 10 times more efficient than Dalvik memory allocation, and GC is 2-3 times more efficient.

4. GC Log

When we want to track down some stutters that may be caused by GC based on the GC log, we need to understand the composition of the GC log and what the different information represents.

4.1. Dalvik GC log

The log format of Dalvik is basically as follows:

D/dalvikvm:

GC_Reason: as we mentioned above, gc_alloc or gc_concurrent, knowing different reasons makes it easier for us to deal with them differently.

Amount_freed: indicates how much memory is freed by the system through this GC operation.

Heap_stats: shows the percentage of idle current memory and its usage (memory occupied by active objects / total memory of the current program).

Pause_time: indicates the time that this GC operation caused the application to pause. With regard to the duration of this pause, GC operations cannot be performed concurrently before 2.3. that is, the system is GC, so the application can only block and wait for the GC to finish. Since 2.3.The GC operation has been carried out concurrently, that is, the normal operation of the application will not be affected during the GC process, but it will be blocked for a short period of time at the beginning and end of the GC operation, so there will be a subsequent total_time.

Total_time: indicates that the total time spent on this GC is different from the above Pause_time, that is, stop all. The stutter time mainly depends on the pause_time above.

4.2. ART GC log

I/art:

The basic situation is no different from Dalvik. GC has more Reason and one more OS_Space_Status.

LOS_Space_Status:Large Object Space, the space occupied by large objects, which is not allocated on the heap, but still belongs to the application memory space. It is mainly used to manage objects that occupy a large amount of memory, such as bitmap, to avoid frequent GC of the heap due to the allocation of large memory.

This is the end of the article on "what are the knowledge points of Android GC?" Thank you for your reading! I believe you all have a certain understanding of "what are the knowledge points of Android GC". If you want to learn more knowledge, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.