What is the memory management and garbage collection algorithm of V8? 04/21 Update SLTechnology News&Howtos

What is the memory management and garbage collection algorithm of V8?

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

Today, the editor will share with you the relevant knowledge of V8's memory management and garbage collection algorithm. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article, let's take a look at it.

Memory limitation of V8 and its solution

V8 was originally designed for browsers. There are few scenarios in which large memory is used. By default, only part of the memory is allowed to be used in the design. 64-bit systems are allowed to use about 1.4g memory and about 0.7g in 32-bit systems. Look at the memory limit method of the dependent V8 engine in Node, as shown in the following code:

Process.memoryUsage (); / / returns memory usage in bytes {rss: 22953984, / / Total heap memory requested heapTotal: 9682944, / / used heap memory heapUsed: 5290344, external: 9388}

Another important reason why V8 limits memory usage is that it takes longer for V8 to perform garbage collection when the heap memory is too large (1.5g for 50ms) and longer for non-incremental garbage collection (1.5g for 1s). After the follow-up explanation of the garbage collection mechanism of V8, I believe we can empathize more.

Although the V8 engine imposes restrictions on memory usage, it also exposes that the way to modify the memory limit is to add relevant parameters when starting the V8 engine. The following code shows modifying the dependent V8 engine memory limit in Node:

# change the memory limit of the old generation, unit mbnode-- max-old-space-size=2048 index.js# change the memory limit of the new generation, unit mbnode-- max-semi-space-size=1024=64 index.js

It should be noted here that the syntax of the changed new generation of memory has been changed to the above writing method, and the unit has also changed from kb to mb, and the old writing method is node-max-new-space-size. You can query the syntax of the new generation memory in the current Node environment by using the following command:

Node-- v8-options | grep max

V8 garbage collection strategy

In the historical evolution of the engine's automatic garbage collection mechanism, it has been found that there is no general algorithm that can solve garbage collection in any scenario. Therefore, modern garbage collection algorithms divide memory garbage into generations according to the survival time of objects, and generation-by-generation garbage collection algorithms implement different collection algorithms for different types of memory garbage.

V8 divides memory into two types: the new generation and the old generation:

The survival time of objects in Cenozoic memory is short.

The old generation objects live longer or reside in memory.

The new generation memory is stored in the new generation memory space (semispace), and the old generation memory is stored in the old generation memory space (oldspace), as shown in the following figure:

The new generation of memory adopts Scavenge algorithm.

Mark-Sweep and Mark-Compact algorithms are used in the old generation of memory.

Let's take a look at the algorithmic logic of Scavenge.

Scavenge algorithm

The Scavenge algorithm is used for the memory recovery of the new generation of memory, and the Cheney algorithm is used for the implementation of Scavenge. The Cheney algorithm divides the new generation of memory space into two, one space in use state (FromSpace) and one space in idle state (called ToSpace).

When memory allocation begins, it is first allocated in FromSpace. When the garbage collection mechanism is executed, the living objects in FromSpace will be checked, and the living objects will be copied to ToSpace, and the space occupied by non-living objects will be freed. After the replication is completed, the roles of FromSpace and ToSpace will be reversed. When an object is still alive after many times of replication, it is considered to be a long-term survival object, and then it will be promoted, and then the object will be moved to the old space oldSpace and managed by a new algorithm.

Scavenge algorithm actually replicates living objects back and forth in two spaces, which is a typical space-for-time practice, so it is very suitable for the new generation of memory, because only living objects are copied and there are a small number of living objects in the new generation of memory. However, there are several important issues to consider:

Avoid duplicate copies of references

Suppose there are three objects: temp1, temp2, and temp3, in which temp2 and temp3 all reference temp1,js code as follows:

Var temp2 = {ref: temp1,} var temp3 = {ref: temp1,} var temp1 = {}

When you copy temp2 from FromSpace to ToSpace, you find that temp1 is referenced, and then copy temp1 to ToSpace, which is a recursive process. However, when copying the temp3, it is found that temp1 is also referenced, and copying the temp1 at this time is repeated.

To avoid duplicate copies, add a tag visited to the object to indicate that the node has been accessed, and then use the visited attribute to determine whether to copy the object.

Maintain the correct reference relationship after copying

Because the temp1 does not need to be copied repeatedly, the memory address of the temp1 object in the ToSpace is not known after the temp3 is copied to the ToSpace.

The practice is that after the temp1 is copied, a new field attribute is generated on the object node that points to the new memory space address and is updated to the forwarding attribute of the old memory object, so the temp3 can find the reference address in the ToSpace through the forwarding attribute of the old temp1.

Memory objects exist after both the new generation and the old generation, which also poses problems:

How to mark memory objects after intergenerational (cross-space)

Const temp1 = {} const temp2 = {ref: temp1,}

For example, the two objects in the above code, temp1 and temp2, exist in the new generation, where temp2 refers to temp1. Assuming that temp2 is promoted to the old generation after GC, how can you tell whether temp1 is a living object in the next marking phase of GC?

In the reachability-based analysis algorithm, if you want to know whether the temp1 is alive, you must know whether there is a root object reference that references the temp1 object. In this case, the younger generation of GC will go through all the older generation objects to determine whether there is a root reference object referencing the temp1 object, so the generation algorithm is meaningless.

The solution version is to maintain a recordset that records all intergenerational references, which is a list of write buffers. Whenever a memory object in the old generation points to the new generation memory object, the memory reference of that object in the old generation is recorded in the recordset. Because this situation generally occurs in the object write operation, which is called the write barrier, another possibility is that it happens at the time of promotion. The maintenance of the recordset only needs to be concerned about the write operation and promotion operation of the object. This brings about another problem:

Additional overhead of maintaining the recordset each time a write is performed

The means of optimization is that there is no write barrier in some Crankshaft operations, and the write operation of memory objects on the stack does not need a write barrier. There are some, more means are not discussed too much here.

Alleviate the problem of low memory utilization of Scavenge algorithm

The proportion of living objects in the new generation of memory is relatively small, so ToSpace can allocate less when allocating space. The practice is to divide the ToSpace space into two parts: S0 and S1. S0 is used to merge ToSpace,S1 with the original FromSpace as FromSpace.

The difference of depth / breadth first in Scavenge algorithm

In garbage collection algorithms, there are generally two mechanisms to identify whether memory objects are garbage: reference counting and reachability analysis.

Based on reachability analysis, it is to find out all root references (such as global variables, etc.), traverse all root references, all references on recursive root references, all those that are traversed are living objects and marked. At this time, other memory objects in the space are dead objects, thus building a directed graph.

Considering the limitation of recursion, recursive logic is generally implemented by non-recursion, and the common algorithms are breadth-first and depth-first. The difference between the two is:

Depth-first copy to ToSpace changes the order of memory objects so that objects with reference relationships are closer. The reason is that after copying oneself, the objects referenced by yourself are copied directly, so the related objects are closer in ToSpace.

Depth first is just the opposite.

Because of CPU's caching strategy, there is a high probability that the objects behind him will be read together when reading memory objects, in order to hit the cache faster. Because obj1.obj2.obj3 is a common scenario during code development, when CPU reads obj1, if you read the following obj2 and obj3 together, it is easy to hit the cache.

Therefore, the depth-first algorithm is more conducive to business logic hitting the cache, but its implementation needs to rely on additional stack-assisted implementation algorithms, which consumes memory space. Breadth priority, on the contrary, can not improve the cache hit, but its implementation can use pointers to skillfully avoid space consumption, and the execution efficiency of the algorithm is high.

Promotion conditions of the new generation of memory objects

Memory objects in the new generation need to meet the following conditions if they want to be promoted to the older generation:

Whether the object has experienced Scavenge recycling

The memory usage ratio of ToSpace cannot exceed the limit.

The logic to determine whether you have experienced Scavenge's GC is to give the surviving object's age attribute + 1 every time you GC, and to judge the age attribute when you GC again. The basic schematic diagram of promotion is as follows:

There are many long-term living objects in the memory of the old generation, so the reason why the Scavenge algorithm cannot be used for recycling is:

A large number of living objects lead to inefficient replication

Wasted half of the memory space.

An algorithm for reclaiming memory objects in the Old Generation

The garbage collection of memory space in the old age is a combination of tag cleanup (Mark-Sweep) and tag finishing (Mark-Compact). Tag cleanup is divided into two parts:

Marking phase

Clear phase (if it is mark finishing, it is finishing phase)

In the marking phase, all the memory objects in the memory of the old generation heap are traversed, and the living objects are marked, and only the unmarked objects are cleaned in the cleanup phase. The reason is that there are a small number of non-living objects in memory in the old age.

As shown in the figure above, one of the problems with tag cleanup is that there is discontiguous space after the cleanup, which makes it impossible to continue to use it, so the memory cleanup of the old memory space needs to be combined with the scheme of tag demarcation. In this scheme, the living objects are moved to one side during the marking process, and all non-living objects outside the boundary are removed after the movement is completed.

Full pause of garbage collection

During garbage collection, you need to pause the application execution logic, and then resume the application execution logic after the garbage collection mechanism is over. This behavior is called "full pause", which is often called Stop The World, or STW for short. The garbage collection behavior of the new generation memory has little impact on the application execution, but because of the large number of living objects in the old generation memory, the full pause caused by the garbage collection of the old generation memory is very great.

In order to optimize the full pause time of GC, V8 also introduces incremental marking, concurrent tagging, parallel tagging, incremental cleaning, parallel cleaning, delayed cleaning and so on.

STW optimization

An important measure of the time spent on garbage collection is the amount of time that the main thread pauses when executing GC. The impact of STW is unacceptable, so V8 also takes many optimization measures.

Parallel GC

The process of GC needs to do a lot of things to cause the STW phenomenon on the main thread. The practice of parallel GC is to open multiple worker threads to share the work of GC. This practice still cannot avoid the STW phenomenon, but it can reduce the total time of STW, depending on the number of worker threads turned on.

Incremental GC

Incremental GC splits the GC work and performs it step by step in the middle of the main thread. This will not reduce the GC time, on the contrary, it will cost a little, but it will also reduce the total STW time of GC.

Concurrent GC

Concurrent GC means that GC runs in the background and no longer runs in the main thread. This practice will avoid the STW phenomenon.

Idle time GC

The rendering of the animation in Chrome is about 60 frames (about 16ms per frame), and if every time the current rendering reaches 16.6ms, there will be free time to do other things, such as some GC tasks.

Reduce the impact of garbage collection

To improve execution efficiency, minimize garbage collection execution and consumption:

Be careful to regard memory as a cache and objects as a cache. In order to reasonably limit the expiration time and unlimited growth, we can adopt lru strategy.

Avoid using memory to store user sessions in Node, otherwise storing a large number of user session objects in memory will lead to a surge of memory in the old generation, affecting cleaning performance and then affecting application execution performance and memory overflow. Improved way to use the use of redis and so on. The benefits of moving the cache externally:

Reduce the number of resident memory objects, and garbage collection is more efficient

Caches can be shared between processes

These are all the contents of the article "what is the memory management and garbage collection algorithm of V8". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.