Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python garbage collection work?

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "Python garbage collection is how to present", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's train of thought slowly in depth, together to study and learn "Python garbage collection is how to show" it!

What is garbage collection?

Garbage collection (GC) you should have known more or less, what is garbage collection? The full spelling of garbage collection GC is Garbage Collection, which is defined in Wikipedia as: garbage collection (English: Garbage Collection, abbreviated as GC) is an automatic memory management mechanism in computer science. When dynamic memory on a computer is no longer needed, it should be released to make way for memory. This kind of memory resource management is called garbage collection. We all know that in Java,Python +, users need to manage and maintain their own memory. They are very free to manage their own memory, and they can apply for and release memory at will, but it is extremely prone to memory leaks and hanging pointers. Like the current high-level language Java,Python, they all use a garbage collection mechanism to manage memory automatically, while the garbage collection mechanism focuses on two things: ① finds useless garbage resources in memory. ② clears up these junk resources and frees up memory for other objects to use.

Python as an interpretive language, because of its easy-to-understand syntax, we can directly assign values to variables without declaring the type of variables. The determination of variable types, the allocation and release of memory space are all carried out automatically by the Python interpreter at run time, so we don't have to care about it. Python, which automatically manages memory, greatly reduces the coding burden of developers and allows developers to focus on business implementation, which is also one of the important reasons for the achievement of Python itself. Next, let's take a look at Python's memory management.

Garbage collection mechanism in Python

Reference count

Everything in Python is an object, that is, all the variables you use in Python are essentially class objects. In fact, the core of each object is a * * "structure PyObject" * *, which has a reference counter ob_refcnt inside. The program updates the value of ob_refcnt in real time to reflect the number of names referencing the current object. When the reference count of an object is 0, which means that the object has become garbage, it will be recycled and the memory it uses will be released immediately.

Typedef struct _ object {int ob_refcnt;// reference count struct _ typeobject * ob_type;} PyObject

The following is the situation that causes the reference count to be increased by one:

A ① object is created, such as astat5

The ② object is referenced, bicona

The ③ object is passed into a function as an argument (note that when a function call occurs, there are two additional references, one from the function stack and the other from the function argument)

The ④ object is stored as an element in a container (for example, in a list)

The following situation causes the reference count to be reduced by one:

The ① object alias is displayed to destroy del a

The ② object alias is assigned a new object

③ an object out of its scope

The container in which the ④ object resides is destroyed or deleted from the container

We can also get the current reference count of an object referenced by a name through getrefcount () in the sys package (note that getrefcount () itself increases the reference count by one)

Import sysa = [1,2,3] print (sys.getrefcount (a)) # output is 2, indicating that there are two references (once from the definition of an and once from getrefcount) def func (a): print (sys.getrefcount (a)) # output as 4, indicating that there are four references (a definition, Python function call stack, function arguments, and getrefcount) func (a) print (sys.getrefcount (a) # output as 2 Indicates that there are two references (one from the definition of an and one from getrefcount), and the function func call no longer exists

Let's take a look at it from the perspective of using memory:

Import osimport psutildef show_memory_info (hint): "" shows the amount of memory used by the current python program: param hint:: return: "" pid = os.getpid () p = psutil.Process (pid) info = p.memory_full_info () memory = info.rss / 1024 / 1024 print ("{} current process memory usage: {} MB" .format (hint) Memory)) def func (): show_memory_info ("initial") a = [i for i in range (9999999)] show_memory_info ("after creating a") func () show_memory_info ("end")

The output is as follows:

Initial memory usage of the current process: 12.125 MB

Memory usage of the current process after an is created: 205.15625 MB

End memory usage of the current process: 12.87890625 MB

As you can see, the initial memory usage of the current process is 12.125 MB. When the function func () is called to create the list a, the memory footprint increases rapidly to 205.15625 MB, and after the function call ends, the memory returns to normal. This is because the list a declared inside the function is a local variable, and after the function returns, the reference to the local variable will be cancelled, and Python will perform garbage collection when the reference count of the object referred to in list an is 0. Python will perform garbage collection, so the large amount of memory used before will come back.

Circular reference

What is a circular reference? To put it simply, two objects refer to each other. Look at the following program:

Def func2 (): show_memory_info ("initial") a = [i for i in range (10000000)] b = [x for x in range (10000001, 20000000)] a.append (b) b.append (a) show_memory_info ("after the creation of a meme b") func2 () show_memory_info ("end")

The output is as follows:

Initial memory usage of the current process: 12.14453125 MB

Memory usage of the current process after the creation of aformab: 396.6875 MB

End memory usage of the current process: 396.96875 MB

It can be seen that in the program, an and b refer to each other, and as local variables after the end of the function func2 call, an and b no longer exist in the program sense, but from the output, we can see that there is still memory occupation, which is why? The number of references to each other is not zero because of mutual references.

If there is a circular reference in the production environment and there is no other garbage collection mechanism, after running for a long time, the memory occupied by the program will become larger and larger, and if it is not processed in time, it will run all over the server.

If we have to use circular references, we can explicitly call gc.collect () to start garbage collection:

Def func2 (): show_memory_info ("initial") a = [i for i in range (10000000)] b = [x for x in range (10000001, 20000000)] a.append (b) b.append (a) show_memory_info ("after the creation of a meme b") func2 () gc.collect () show_memory_info ("end")

The output is as follows:

Initial memory usage of the current process: 12.29296875 MB

Memory usage of the current process after the creation of aformab: 396.69140625 MB

End memory usage of the current process: 12.95703125 MB

The reference counting mechanism has the advantages of high efficiency, simplicity and real-time (once it is zero, it will be done directly). Once the reference count of an object returns to zero, the memory will be released directly. You don't have to wait for a specific time like other mechanisms. Garbage collection is randomly assigned to the running stage, the time for processing recycled memory is allocated to normal times, and the normal program runs smoothly. However, reference counting also has some disadvantages, the common ones are:

Although the ① logic is simple, it is a bit troublesome to maintain. Each object needs to allocate separate space to count references, and the reference count needs to be maintained, which requires a bit of resource consumption.

② circular reference. This will be the fatal wound of the reference counting mechanism, which has no solution, so it must be supplemented by other garbage collection algorithms.

In fact, Python uses the tag cleanup (mark-sweep) algorithm and generational collection (generational) to enable automatic garbage collection for circular references.

Tag clears and removes circular references

Python uses the tag-cleanup (Mark and Sweep) algorithm to solve the problem of circular references that may be caused by container objects. Note that only container class objects can generate circular references, such as lists, dictionaries, objects of user-defined classes, tuples, and so on. Simple types such as numbers and strings do not have circular references. As an optimization strategy, tuples that contain only simple types are not considered in the tag removal algorithm)

It is divided into two stages: the first stage is the marking phase, in which GC marks all the active objects, and the second stage is to recycle the inactive objects that are not marked.

So how does Python determine which objects are inactive?

For any collection of objects, we first build a reference count copy table to store their reference count, and then remove all references within the collection (internal reference means that an object in this collection refers to another object within this collection). In the process of release, the reference count is reduced in the copy table, and after all internal references are removed, the reference count in the copy table is still not zero, that is, the root collection. Then start the marking process, that is, gradually restore references from the collection node and increase the reference count of the copy table. Finally, the reference count of 0 in the copy table is garbage objects, and we need to garbage collect them. For example:

The nodes in the above collection have external incoming connections (to an and to b), as well as external connections (c refers to an external object), and on the right is the reference count table, and then we remove all internal connections:

Then the root sets are an and b, and then we start tagging from an and b and restore the reference count:

The nodes that can be reached from an and b are restored, and the reference count is still zero is the garbage (e and f) referenced within the collection. If all objects are regarded as a collection, then all garbage can be recycled, or all objects can be divided into small collections to collect garbage in small collections.

But you need to traverse the graph every time, which is a huge waste of performance for Python.

Generation by generation recovery

Generation recovery is an operation mode of exchanging space for time. Python divides memory into different sets according to the survival time of objects, and each set is called a generation. Python divides memory into three generations, namely, the younger generation (the 0th generation), the middle age (the first generation) and the old age (the second generation). They correspond to three linked lists, and their garbage collection frequency decreases with the increase of the survival time of the object.

The newly created objects will be allocated to the younger generation, and when the total number of linked lists of the younger generation reaches the upper limit, that is, when the new objects in the garbage collector minus deleted objects reach the corresponding threshold, garbage collection will be initiated for this generation of objects. Those objects that can be recycled will be recycled, and those objects that will not be recycled will be moved to the middle age, and so on, the objects in the old era are the objects with the longest survival. Or even survive throughout the life cycle of the system. At the same time, generational recycling is based on label removal technology. In fact, generational recycling is based on the idea that newborn objects are more likely to be garbage collected, and objects that survive longer are more likely to survive. Therefore, through this method, a lot of computation can be saved and the performance of Python can be improved.

Thank you for your reading, the above is the content of "how Python garbage collection is now", after the study of this article, I believe you have a deeper understanding of how Python garbage collection is now, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report