In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to realize the garbage collection mechanism in python". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to realize the garbage collection mechanism in python".
Python adopts the strategy of mainly citation counting, supplemented by two mechanisms: tag-clear and * * generational collection (intergenerational collection) * *.
Count reference
Because everything in python is an object, all the variables you see are essentially a pointer to the object. When an object is no longer called, that is, when the reference count (pointer count) of the object is 0, it means that the object can never be reached, and naturally it becomes garbage and needs to be recycled. It can be simply understood that there are no variables pointing to it.
Import os import psutil # shows the amount of memory used by the current python program def show_memory_info (hint): pid = os.getpid () p = psutil.Process (pid) info = p.memory_full_info () memory = info.uss / 1024./ 1024 print ({} memory used: {} MB .format (hint, memory))
You can see that when the function func () is called, after listing an is created, the memory footprint quickly increases to 433 MB; after the function call ends, memory returns to normal. This is because the list a declared inside the function is a local variable, and the reference to the local variable will be cancelled after the function returns; at this point, Python will perform garbage collection when the number of references to the object referred to in list an is 0, so the large amount of memory used before will come back.
Def func (): show_memory_info (initial) global an a = [i for i in range (10000000)] show_memory_info (after a created) func () show_memory_info (finished) # output # initial memory used: 48.88671875 MB after a created memory used:433.94921875 MB finished memory used:433.94921875 MB
In the new code, global a means to declare an as a global variable. Then, even after the function returns, the reference to the list still exists, so the object will not be garbage collected and still takes up a lot of memory. Similarly, if we return the generated list and receive it in the main program, the reference still exists, garbage collection will not be triggered, and a lot of memory will still be occupied:
Def func (): show_memory_info (initial) a = [i for i in derange (10000000)] show_memory_info (after a created) return an a = func () show_memory_info (finished) # output # initial memory used: 47.96484375 MBafter a created memory used:434.515625 MBfinished memory used:434.515625 MB
So how can you see how many times a variable has been referenced? Through sys.getrefcount
Import sys a = [] # two references, one from a, one from getrefcountprint (sys.getrefcount (a)) def func (a): # four references, function call stack, function arguments, and getrefcountprint (sys.getrefcount (a)) func (a) # two references, one from a, one from getrefcount Function func call no longer exists print (sys.getrefcount (a)) # output # 2 4 2
If a function call is involved, it will be added twice. Function stack 2. Function call
From here we can see that python no longer needs to free memory as C thought, but python also provides us with a manual method to free memory gc.collect ()
Import gc show_memory_info (initial) a = [i for i in range (10000000)] show_memory_info (after a created) del agc.collect () show_memory_info (finish) print (a) # output # initial memory used: 48.1015625 MBafter a created memory used: 434.3828125 MB finish memory used: 48.33203125 MB- -NameErrorTraceback (most recent call last) in 11 12 show_memory_info (finish)-> 13 print (a) NameError: name an isnotdefined
So far, it seems that the garbage collection mechanism of python is very simple. As long as the number of object references is 0, it must trigger gc. Is 0 references a necessary and sufficient condition for triggering gc?
Recycling
If there are two objects that reference each other and are no longer referenced by other objects, should they be garbage collected?
Def func (): show_memory_info (initial) a = [i for i in range (10000000)] b = [i for i in range (10000000)] show_memory_info (after a, b created) a.append (b) b.append (a) func () show_memory_info (finished) # output # initial memory used: 47.984375 MB after a B created memory used:822.73828125 MB finished memory used: 821.73046875 MB
It is obvious from the results that they are not recycled, but programmatically, by the time the function ends, aforme b, as a local variable, no longer exists in a programmatic sense. But because of their mutual reference, the number of their references is not zero. How to avoid it at this time?
\ 1. Rectify the code logically to avoid this kind of circular reference
\ 2. Through manual recycling
Import gcdef func (): show_memory_info (initial) a = [i for i in range (10000000)] b = [i for i in range (10000000)] show_memory_info (after a, b created) a.append (b) b.append (a) func () gc.collect () show_memory_info (finished) # output # initial memory used:49.51171875 MB after a B created memory used: 824.1328125 MB finished memory used:49.98046875 MB
For circular references, python has its automatic garbage collection algorithm 1. Tag removal (mark-sweep) algorithm 2. Generation collection (generational)
Mark clear
The steps for flag removal are summarized as follows: 1. GC marks all "active objects" with 2. How does python determine what is an inactive object if it recycles the "inactive object" that is not marked? Understand the concept of unreachability by using graph theory. For a directed graph, if we traverse from one node and mark all the nodes it passes through, then, after traversing, all the nodes that are not marked are called unreachable nodes. Obviously, the existence of these nodes does not make any sense, naturally, we need to garbage collect them. But traversing the whole graph every time is a huge waste of performance for Python. Therefore, in Python's garbage collection implementation, mark-sweep maintains a data structure using two-way linked lists and only considers container class objects (only container class objects, list, dict, tuple,instance, circular references are possible).
Python's garbage Recycling Python's garbage Recycling
In the figure, the small black circle is regarded as a global variable, that is, as a root object, object 1 can be accessed directly from the small black circle, then it will be marked, objects 2 and 3 can be indirectly reached and will be marked, while 4 and 5 are not reachable, then 1, 2, 3 are active objects, 4 and 5 are inactive objects will be recycled by GC.
Generation by generation recovery
Generation recycling is a space-for-time operation. Python divides memory into different sets according to the survival time of objects, and each set is called a generation. Python divides memory into three "generations", namely the younger generation (the 0th generation), the middle age (the first generation) and the old age (the second generation). They correspond to three linked lists, and their garbage collection frequency decreases with the increase of the object's survival time. Newly created objects will be allocated to the younger generation, and when the total number of linked lists in the younger generation reaches the upper limit (when the new objects in the garbage collector minus deleted objects reach the corresponding threshold), the Python garbage collection mechanism will be triggered to recycle those objects that can be recycled, while those that will not be recycled will be moved to the middle age, and so on, the objects in the old era are the objects that survive the longest time. Or even survive throughout the life cycle of the system. At the same time, generational recycling is based on label removal technology. In fact, generational recycling is based on the idea that newborn objects are more likely to be garbage collected, and objects that survive longer are more likely to survive. Therefore, through this method, a lot of computation can be saved and the performance of Python can be improved. So for the problem just now, reference counting is only a sufficient and necessary condition for triggering gc, and circular references are also triggered.
Debug
You can use objgraph to debug the program, because at present, its official documentation has not been carefully read, so you can only put it here for your reference. ~ two of the functions are very useful. Show_refs () 2. Show_backrefs ()
Thank you for your reading, the above is the content of "how to realize the garbage collection mechanism in python". After the study of this article, I believe you have a deeper understanding of how to realize the garbage collection mechanism in python, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.