Principle Analysis of garbage Collection Mechanism in Python 11/01 Update SLTechnology News&Howtos

Principle Analysis of garbage Collection Mechanism in Python

2025-11-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge points of this article "Analysis of the principle of garbage collection mechanism in Python", so the editor summarizes the following content, detailed content, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "principle analysis of garbage collection mechanism in Python" article.

Garbage collection mechanism

Garbage collection (GC), you should all know more or less, what is garbage collection? The full spelling of garbage collection GC is Garbage Collection, which is defined in Wikipedia as: garbage collection (English: Garbage Collection, abbreviated as GC) is an automatic memory management mechanism in computer science. When dynamic memory on a computer is no longer needed, it should be released to make way for memory. This kind of memory resource management is called garbage collection. We all know that in Java,Python +, users need to manage and maintain their own memory, and they are very free to manage memory, and they can apply for and release memory at will, but it is extremely prone to memory leaks and suspended pointers; like the current high-level language Java,Python, they all use a garbage collection mechanism to automatically manage memory, while the garbage collection mechanism focuses on two things: "① finds useless garbage resources in memory. "" ② removes these junk resources and frees up memory for other objects to use. "

Python as an interpretive language, because of its easy-to-understand syntax, we can directly assign values to variables without declaring the type of variables. The determination of variable types, the allocation and release of memory space are all carried out automatically by the Python interpreter at run time, so we don't have to care about it. Python, which automatically manages memory, greatly reduces the coding burden of developers and allows developers to focus on business implementation, which is also one of the important reasons for the achievement of Python itself. Next, let's take a look at python's memory management.

Reference counting mechanism

Everything in Python is an object, that is, all the variables you use in Python are essentially class objects. In fact, the core of each object is a "structure PyObject", which has a reference counter ob_refcnt, and the program updates the value of ob_refcnt in real time to reflect the number of names referencing the current object. When the reference count of an object is 0, which means that the object has become garbage, it will be recycled and the memory it uses will be released immediately.

Typedef struct _ object {int ob_refcnt;// reference count struct _ typeobject * ob_type;} PyObject

The following is the situation that causes the reference count to be increased by one:

Objects are created, for example, astat5

Object is referenced, bicona

The object is passed into a function as an argument (note that when a function call occurs, there are two additional references, one from the function stack and the other from the function argument)

Object is stored as an element in a container (for example, in a list)

The following situation causes the reference count to be reduced by one:

The object alias is displayed to destroy del a

The object alias is assigned a new object

An object leaves its scope

The container in which the object is located is destroyed or deleted from the container

We can also get the current reference count of an object referenced by a name through getrefcount () in the sys package (note that getrefcount () itself increases the reference count by one)

Import sysa = [1,2,3] print (sys.getrefcount (a)) # output is 2, indicating that there are two references (once from the definition of an and once from getrefcount) def func (a): print (sys.getrefcount (a)) # output as 4, indicating that there are four references (a definition, Python function call stack, function arguments, and getrefcount) func (a) print (sys.getrefcount (a) # output as 2 Indicates that there are two references (one from the definition of an and one from getrefcount), and the function func call no longer exists

Let's take a look at it from the perspective of using memory:

Import osimport psutildef show_memory_info (hint): "" shows the amount of memory used by the current python program: param hint:: return: "" pid = os.getpid () p = psutil.Process (pid) info = p.memory_full_info () memory = info.rss / 1024 / 1024 print ('{} memory usage of the current process: {} MB'.format (hint) Memory)) def func (): show_memory_info ('initial') a = [i for i in range (9999999)] show_memory_info ('after creating a') func () show_memory_info ('end')

The output is as follows:

Initial memory usage of the current process: 12.125 MB memory usage of the current process after a creation: 205.15625 MB end memory usage of the current process: 12.87890625 MB

As you can see, the initial memory usage of the current process is 12.125 MB. When the function func () is called to create the list a, the memory footprint increases rapidly to 205.15625 MB, and after the function call ends, the memory returns to normal. This is because the list a declared inside the function is a local variable, and after the function returns, the reference to the local variable will be cancelled, and Python will perform garbage collection when the reference count of the object referred to in list an is 0. Python will perform garbage collection, so the large amount of memory used before will come back.

Circular reference

What is a circular reference? To put it simply, two objects refer to each other. Look at the following program:

Def func2 (): show_memory_info ('initial') a = [i for i in range (10000000)] b = [x for x in range (10000001, 20000000)] a.append (b) b.append (a) show_memory_info ('after the creation of a meme b') func2 () show_memory_info ('end')

The output is as follows:

Initial memory usage of the current process: 12.14453125 MB memory usage of the current process after the creation of a MB b: 396.6875 MB end of the current process memory usage: 396.96875 MB

It can be seen that in the program, an and b refer to each other, and as local variables after the end of the function func2 call, an and b no longer exist in the program sense, but from the output, we can see that there is still memory occupation, which is why? The number of references to each other is not zero because of mutual references.

If there is a circular reference in the production environment and there is no other garbage collection mechanism, after running for a long time, the memory occupied by the program will become larger and larger, and if it is not processed in time, it will run all over the server.

If we have to use circular references, we can explicitly call "gc.collect ()" to start garbage collection:

The output is as follows:

Initial memory usage of the current process: 12.29296875 MB memory usage of the current process after the creation of a MB b: 396.69140625 MB end of the current process memory usage: 12.95703125 MB

The reference counting mechanism has the advantages of high efficiency, simplicity and real-time (once it is zero, it will be done directly). Once the reference count of an object returns to zero, the memory will be released directly. You don't have to wait for a specific time like other mechanisms. Garbage collection is randomly assigned to the running stage, the time for processing recycled memory is allocated to normal times, and the normal program runs smoothly. However, reference counting also has some disadvantages, the common ones are:

Although the ① logic is simple, it is a bit troublesome to maintain. Each object needs to allocate separate space to count references, and the reference count needs to be maintained, which requires a bit of resource consumption.

② circular reference. This will be the fatal wound of the reference counting mechanism, which has no solution, so it must be supplemented by other garbage collection algorithms.

In fact, Python uses the tag cleanup (mark-sweep) algorithm and generational collection (generational) to enable automatic garbage collection for circular references.

Tag clears and removes circular references

Python uses the tag-cleanup (Mark and Sweep) algorithm to solve the problem of circular references that may be caused by container objects. Note that only container class objects can generate circular references, such as lists, dictionaries, objects of user-defined classes, tuples, and so on. Simple types such as numbers and strings do not have circular references. As an optimization strategy, tuples that contain only simple types are not considered in the tag removal algorithm)

It is divided into two stages: the first stage is the marking phase, in which GC marks all "active objects", and the second stage is to recycle those unmarked "inactive objects".

So how does Python determine which objects are inactive?

For any collection of objects, we first build a reference count copy table to store their reference count, and then remove all references within the collection (internal reference means that an object in this collection refers to another object within this collection). In the process of release, the reference count is reduced in the copy table, and after all internal references are removed, the reference count in the copy table is still not zero, that is, the root collection. Then start the marking process, that is, gradually restore references from the collection node and increase the reference count of the copy table. Finally, the reference count of 0 in the copy table is garbage objects, and we need to garbage collect them. For example:

The nodes in the above collection have external incoming connections (to an and to b), as well as external connections (c refers to an external object), and on the right is the reference count table, and then we remove all internal connections:

Then the root sets are an and b, and then we start tagging from an and b and restore the reference count:

The nodes that can be reached from an and b are restored, and the reference count is still zero is the garbage (e and f) referenced within the collection. If all objects are regarded as a collection, then all garbage can be recycled, or all objects can be divided into small collections to collect garbage in small collections.

But you need to traverse the graph every time, which is a huge waste of performance for Python.

Generation by generation recovery

Generation recovery is an operation mode of exchanging space for time. Python divides memory into different sets according to the survival time of objects, and each set is called a generation. Python divides memory into three generations, namely, the younger generation (the 0th generation), the middle age (the first generation) and the old age (the second generation). They correspond to three linked lists, and their garbage collection frequency decreases with the increase of the survival time of the object.

The newly created objects will be allocated to the younger generation, and when the total number of linked lists of the younger generation reaches the upper limit, that is, when the new objects in the garbage collector minus deleted objects reach the corresponding threshold, garbage collection will be initiated for this generation of objects. Those objects that can be recycled will be recycled, and those objects that will not be recycled will be moved to the middle age, and so on, the objects in the old era are the objects with the longest survival. Or even survive throughout the life cycle of the system. At the same time, generational recycling is based on label removal technology. In fact, generational recycling is based on the idea that newborn objects are more likely to be garbage collected, and objects that survive longer are more likely to survive. Therefore, through this method, a lot of computation can be saved and the performance of Python can be improved.

The above is the content of this article on "Analysis of the principle of garbage Collection Mechanism in Python". I believe we all have a certain understanding. I hope the content shared by the editor will be helpful to you. If you want to know more about the relevant knowledge, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.