Summary of related knowledge points of JavaScript garbage collector 07/01 Update SLTechnology News&Howtos

Summary of related knowledge points of JavaScript garbage collector

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "summary of relevant knowledge points of JavaScript garbage collector". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The garbage collector is a double-edged sword. The advantage is that the memory management code of programs can be greatly simplified because memory management does not require programmers to operate, thus reducing (but not eradicating) memory leaks from long-running programs. For some programmers, it can even improve the performance of the code.

On the other hand, choosing the garbage collector means that there is no complete control of memory in the program, which is the crux of mobile terminal development. For JavaScript, there is no possibility of memory management in the program-no interface to the garbage collector is exposed in the ECMAScript standard. Web applications have no way to manage memory or prompt the garbage collector.

Strictly speaking, languages that use garbage collectors are not necessarily better or worse in performance than languages that do not use garbage collectors. In C language, allocating and freeing memory can be a very expensive operation, and heap management becomes more complex in order to free the allocated memory in the future. In languages that host memory, allocating memory is often just about adding a pointer. But then we will see the huge cost of garbage collectors intervening in recycling when memory is exhausted. An unconsidered garbage collector will cause a long and unexpected pause in the operation of the program, which directly affects the experience of interactive systems (especially those with animated effects). The reference counting system is often touted as a substitute for the garbage collection mechanism, but there is also an unexpected pause when the reference to the last object in a large subgraph is de-referenced. And the reference counting system will have a considerable performance burden when reading, rewriting and storing operations are performed frequently.

For better or worse, JavaScript needs a garbage collector. V8's garbage collector implementation is now mature, with excellent performance, a short pause, and a very manageable performance burden.

Basic concept

The most basic problem to be solved by the garbage collector is to identify the memory that needs to be reclaimed. Once identified, these memory areas can be reused in future allocations or returned to the operating system. An object dies when it is not active (nonsense). An object is active if and only if it is pointed to by a root object or another active object. The root object is defined as active and is an object referenced by the browser or V8. For example, objects pointed to by local variables are root objects because their stacks are treated as root objects; global objects are root objects because they are always accessible; and browser objects, such as DOM elements, are also root objects, although in some cases they are only weak references.

From the side, the above definition is very loose. In fact, we can say that an object is active when it can be referenced by a program. For example:

Function f () {var obj = {x: 12}; g (); / / may contain an endless loop return obj.x } def scavenge (): swap (fromSpace, toSpace) allocationPtr = toSpace.bottom scanPtr = toSpace.bottom for I = 0..len (roots): root = roots [I] if inFromSpace (root): rootCopy = copyObject (& allocationPtr, root) setForwardingAddress (root RootCopy) roots [I] = rootCopy while scanPtr < allocationPtr: obj = object at scanPtr scanPtr + = size (obj) n = sizeInWords (obj) for I = 0.n: if isPointer (obj [I]) and not inOldSpace (obj [I]): fromNeighbor = obj [I] if hasForwardingAddress (fromNeighbor): toNeighbor = getForwardingAddress (fromNeighbor) Else: toNeighbor = copyObject (& allocationPtr) FromNeighbor) setForwardingAddress (fromNeighbor, toNeighbor) obj [I] = toNeighbor def copyObject (* allocationPtr, object): copy = * allocationPtr * allocationPtr + = size (object) memcpy (copy, object, size (object)) return copy

During the execution of this algorithm, we always maintain pointers in two outbound areas: allocationPtr points to the place where we are about to allocate memory for the new object, and scanPtr points to the next object we are about to check for activity. The objects before the address pointed to by scanPtr are processed objects, they and their adjacencies are in the outbound area, and their pointers are updated. Objects located between scanPtr and allocationPtr will be copied to the outbound area, but if the pointers contained in these objects point to the objects in the incoming area, the objects in the incoming area will not be copied. Logically, you can think of objects between scanPtr and allocationPtr as a queue of objects used for breadth-first search.

In the breadth-first search, the node is usually taken out of the queue header and expanded, and the expanded child nodes are stored at the end of the queue and carried out over and over again. This process is similar to the process of updating objects between two pointers.

At the beginning of the algorithm, we copy all the objects that can be reached from the root object in the new area, and then enter a large loop. In each round of the loop, we remove an object from the queue, that is, increment to the scanPtr, and then track the pointer that accesses the object. If the pointer does not point to the entry zone, then ignore it, because it must point to the old zone, and this is not our goal. If the pointer points to an object in the inbound area, but we haven't copied it yet (no forwarding address is set), we copy the object to the outbound area, which is added to the end of our queue, that is, the increment to the allocationPtr. At this point, we will also save a forwarding address to the first word of the outbound object and replace the Map pointer. This forwarding address is the address where the object is copied. The garbage collector can easily distinguish the forwarding address from the Map pointer because the Map pointer is marked and the address is unmarked. If we find a pointer and the object it points to has been copied (the forwarding address has been set), we update the pointer to the forwarding address and mark it.

The algorithm terminates when all objects are processed (that is, scanPtr and allocationPtr meet). At this time, the content entering the area can be regarded as rubbish and may be released or reused in the future.

Secret weapon: writing barrier

One detail has been ignored: if there is only one pointer to an object in the new zone, and this pointer happens to be among the objects in the old zone, how can we know which object in the new zone is active? Obviously we don't want to go through the old area again, because there are so many objects in the old area, and it is too expensive to do so at once.

To solve this problem, there is actually a list in the write buffer that records all the old zone objects pointing to the new zone. When a new object is born, there is no pointer to it, but when an object in the old area appears a pointer to the object in the new area, we record such a cross-zone point. Because this recording behavior always occurs during a write operation, it is called a write barrier-because every write operation goes through this level.

You may wonder, wouldn't there be a lot of code if you had to go through the write barrier every time you write? Yes, this is one of the costs of our garbage collection mechanism. But the situation is not as serious as you think. After all, there are fewer write operations than read operations. Some garbage collection algorithms (not V8) use a read barrier, which requires hardware assistance to ensure a low consumption. V8 also has some optimizations to reduce the consumption of the write barrier:

Most of the script execution time occurs in Crankshaft, and Crankshaft can often statically determine whether an object is in a newborn zone. For write operations that point to these objects, there is no write barrier.

A new optimization in Crankshaft is that an object is assigned on the stack when there is no non-local reference to it. On the other hand, it is obvious that there is no write barrier for the related write operations of objects on a stack. New and old zones are on the heap. )

The situation of "old → new" is relatively rare, so by optimizing the code of two common cases, "new → new" and "old → old", the performance can be improved relatively in most cases. Each page is aligned with 1MB, so given the memory address of an object, you can quickly locate the page by filtering out the low 20bit; and there is a relevant identification in the header to indicate whether it belongs to the new area or the old zone, so by judging the area to which the two objects belong, you can quickly determine whether it is "old → new".

Once we find the pointer to "old → new", we can record it at the end of the write buffer. After a certain amount of time (when the write buffer is full), we sort it, merge the same items, and then remove pointers that no longer meet the "old → new" situation. (translation note: in this way, the number of pointers will be reduced, and the time to write the barrier will be shortened accordingly.)

"Mark-clear" algorithm and "Mark-compact" algorithm

The Scavenge algorithm is very effective for fast recycling and compacting small pieces of memory, but it consumes too much for large pieces of memory. Because the Scavenge algorithm requires two areas: the outbound area and the incoming area, which is OK for small pieces of memory, but for memory that exceeds a few MB, it becomes impractical. The old zone contains hundreds of MB of data. For such a large area, we adopt two other algorithms that are close to each other: the "mark-clear" algorithm and the "mark-compact" algorithm.

Both algorithms include two stages: the marking phase, the purging phase or the contraction phase.

During the marking phase, all active objects on the heap are marked. Each page contains a bitmap to mark, and each bit in the bitmap corresponds to a word in the page. This flag is necessary because the pointer may appear anywhere the word is aligned. Obviously, such a bitmap takes up a certain amount of space (3.1% on 32-bit systems and 1.6% on 64-bit systems), but all memory management mechanisms need to do so, so this is not excessive. In addition, there are two other bits to represent the state of the tag object. Because the object is at least 2 words long, these bits do not overlap. There are three kinds of states: if the state of an object is white, it has not been discovered by the garbage collector; if the state of an object is gray, it has been discovered by the garbage collector, but its adjacent objects have not been fully processed; if the state of an object is black, it is not only discovered by the garbage collector, but also all its adjacent objects have been processed.

If the objects in the heap are regarded as digraphs connected by pointers, the core of the marking algorithm is actually depth-first search. At the beginning of the tag, the bitmap is empty and all objects are white. Objects that are reachable from the root are dyed gray and placed in a separate double-ended queue for marking. For each loop in the marking phase, GC takes an object out of the double-ended queue, colours it black, then stains its adjacent objects to gray, and places them in the double-ended queue. This process ends when the double-ended queue is empty and all objects turn black. Very large objects, such as long arrays, may be fragmented during processing to prevent overflow of the double-ended queue. If the double-ended queue overflows, the objects will still be stained gray, but will no longer be placed in the queue (so that their neighboring objects will not have a chance to be dyed again). So when the double-ended queue is empty, GC still needs to scan once to make sure that all gray objects are black. For gray objects that have not been blackened, GC will queue them again and process them again.

The following is the pseudo code of the marking algorithm:

MarkingDeque = [] overflow = false def markHeap (): for root in roots: mark (root) do: if overflow: overflow = false refillMarkingDeque () while! markingDeque.isEmpty (): obj = markingDeque.pop () setMarkBits (obj BLACK) for neighbor in neighbors (obj): mark (neighbor) while overflow def mark (obj): if markBits (obj) = = WHITE: setMarkBits (obj GREY) if markingDeque.isFull (): overflow = true else: markingDeque.push (obj) def refillMarkingDeque (): for each obj on heap: if markBits (obj) = = GREY: markingDeque.push (obj) if markingDeque.isFull (): overflow = true return

At the end of the marking algorithm, all active objects are dyed black, while all dead objects are still white. This result is exactly what is expected in both clean-up and austerity stages.

After the marking algorithm is finished, we can choose to clean up or compact, both of which can recover memory, and both work at the page level (note that the memory page of V8 is a contiguous memory block of 1MB, unlike virtual memory pages).

The cleanup algorithm scans the continuously stored dead objects, turns them into free space, and adds them to the free memory linked list. Each page contains several free memory linked lists, each of which represents a small memory area

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.