How to understand garbage collection in Python 04/26 Update SLTechnology News&Howtos

How to understand garbage collection in Python

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to understand garbage collection in Python". In daily operation, I believe many people have doubts about how to understand garbage collection in Python. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to understand garbage collection in Python"! Next, please follow the editor to study!

Preface

For python, everything is an object, and all variable assignments follow the object reference mechanism. When the program is running, it needs to open up a space in memory to store the temporary variables generated at run time; after the calculation is completed, the results are output to permanent memory. If the amount of data is too large, poor memory space management will easily lead to OOM (out of memory), commonly known as burst memory, and the program may be aborted by the operating system. For the server, memory management is even more important, otherwise it is easy to cause memory leaks-the leak here does not mean that your memory has information security problems and is used by malicious programs, but that the program itself is not well designed. causes the program to fail to release memory that is no longer in use. -memory leak does not mean that your memory has physically disappeared, but it means that after the code allocates a certain amount of memory, it loses control of the memory because of a design error, resulting in a waste of memory. That is, this piece of memory is out of the control of gc.

Count reference

Because everything in python is an object, all the variables you see are essentially a pointer to the object. When an object is no longer called, that is, when the reference count (pointer count) of the object is 0, it means that the object can never be reached, and naturally it becomes garbage and needs to be recycled. It can be simply understood that there are no variables pointing to it.

Import os import psutil # shows the amount of memory used by the current python program def show_memory_info (hint): pid = os.getpid () p = psutil.Process (pid) info = p.memory_full_info () memory = info.uss / 1024./ 1024 print ({} memory used: {} MB .format (hint, memory))

You can see that when the function func () is called, after listing an is created, the memory footprint quickly increases to 433 MB; after the function call ends, memory returns to normal. This is because the list a declared inside the function is a local variable, and the reference to the local variable will be cancelled after the function returns; at this point, Python will perform garbage collection when the number of references to the object referred to in list an is 0, so the large amount of memory used before will come back.

Def func (): show_memory_info (initial) global an a = [i for i in range (10000000)] show_memory_info (after a created) func () show_memory_info (finished) # output # initial memory used: 48.88671875 MB after a created memory used:433.94921875 MB finished memory used:433.94921875 MB

In the new code, global a means to declare an as a global variable. Then, even after the function returns, the reference to the list still exists, so the object will not be garbage collected and still takes up a lot of memory. Similarly, if we return the generated list and receive it in the main program, the reference still exists, garbage collection will not be triggered, and a lot of memory will still be occupied:

Def func (): show_memory_info (initial) a = [i for i in derange (10000000)] show_memory_info (after a created) return an a = func () show_memory_info (finished) # output # initial memory used: 47.96484375 MB after a created memory used:434.515625 MB finished memory used:434.515625 MB

So how can you see how many times a variable has been referenced? Through sys.getrefcount.

Import sys a = [] # two references, one from a, one from getrefcount print (sys.getrefcount (a)) def func (a): # four references, function call stack, function arguments, and getrefcount print (sys.getrefcount (a)) func (a) # two references, one from a, one from getrefcount Function func call no longer exists print (sys.getrefcount (a)) # output # 2 4 2

If a function call is involved, it will be added twice. Function stack 2. Function call.

From here you can see that python no longer needs to free memory as C thought, but python also provides us with a manual method to free memory, gc.collect ().

Import gc show_memory_info (initial) a = [i for i in range (10000000)] show_memory_info (after a created) del a gc.collect () show_memory_info (finish) print (a) # output # initial memory used: 48.1015625 MB after a created memory used: 434.3828125 MB finish memory used: 48.33203125 MB- -NameErrorTraceback (most recent call last) in 11 12 show_memory_info (finish)-> 13 print (a) NameError: name an isnotdefined

So far, it seems that the garbage collection mechanism of python is very simple. As long as the number of object references is 0, it must trigger gc. Is 0 references a necessary and sufficient condition for triggering gc?

Recycling

If there are two objects that reference each other and are no longer referenced by other objects, should they be garbage collected?

Def func (): show_memory_info (initial) a = [i for i in range (10000000)] b = [i for i in range (10000000)] show_memory_info (after a, b created) a.append (b) b.append (a) func () show_memory_info (finished) # output # initial memory used: 47.984375 MB after a B created memory used:822.73828125 MB finished memory used: 821.73046875 MB

It is obvious from the results that they are not recycled, but programmatically, by the time the function ends, aforme b, as a local variable, no longer exists in a programmatic sense. But because of their mutual reference, the number of their references is not zero. How to avoid it at this time. Rectify the code logically to avoid this kind of circular reference 2. Through manual recycling.

Import gc def func (): show_memory_info (initial) a = [i for i in range (10000000)] b = [i for i in range (10000000)] show_memory_info (after a, b created) a.append (b) b.append (a) func () gc.collect () show_memory_info (finished) # output # initial memory used:49.51171875 MB after a B created memory used: 824.1328125 MB finished memory used:49.98046875 MB

For circular references, python has its automatic garbage collection algorithm 1. Tag removal (mark-sweep) algorithm 2. Generation collection (generational).

Mark clear

The steps for flag removal are summarized as follows: 1. GC marks all "active objects" with 2. How does python determine what is an inactive object if it recycles the "inactive object" that is not marked? Understand the concept of unreachability by using graph theory. For a directed graph, if we traverse from one node and mark all the nodes it passes through, then, after traversing, all the nodes that are not marked are called unreachable nodes. Obviously, the existence of these nodes does not make any sense, naturally, we need to garbage collect them. But traversing the whole graph every time is a huge waste of performance for Python. Therefore, in Python's garbage collection implementation, mark-sweep maintains a data structure using two-way linked lists and only considers container class objects (only container class objects, list, dict, tuple,instance, circular references are possible).

In the figure, the small black circle is regarded as a global variable, that is, as a root object, object 1 can be accessed directly from the small black circle, then it will be marked, objects 2 and 3 can be indirectly reached and will be marked, while 4 and 5 are not reachable, then 1, 2, 3 are active objects, 4 and 5 are inactive objects will be recycled by GC.

Generation by generation recovery

Generation recycling is a space-for-time operation. Python divides memory into different sets according to the survival time of objects, and each set is called a generation. Python divides memory into three "generations", namely the younger generation (the 0th generation), the middle age (the first generation) and the old age (the second generation). They correspond to three linked lists, and their garbage collection frequency decreases with the increase of the object's survival time. Newly created objects will be allocated to the younger generation, and when the total number of linked lists in the younger generation reaches the upper limit (when the new objects in the garbage collector minus deleted objects reach the corresponding threshold), the Python garbage collection mechanism will be triggered to recycle those objects that can be recycled, while those that will not be recycled will be moved to the middle age, and so on, the objects in the old era are the objects that survive the longest time. Or even survive throughout the life cycle of the system. At the same time, generational recycling is based on label removal technology. In fact, generational recycling is based on the idea that newborn objects are more likely to be garbage collected, and objects that survive longer are more likely to survive. Therefore, through this method, a lot of computation can be saved and the performance of Python can be improved. So for the problem just now, reference counting is only a sufficient and necessary condition for triggering gc, and circular references are also triggered.

Debug

You can use objgraph to debug the program, because currently its official documentation has not been carefully read, can only be put here for your reference ~ two of the functions are very useful 1. Show_refs () 2. Show_backrefs ().

At this point, the study on "how to understand garbage collection in Python" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.