Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to compare garbage collection between Ruby and Python

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article shows you how to compare Ruby and Python garbage collection, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Instead of posting the slide directly, I think it makes more sense to write it as a blog when I still remember it. In addition to the section on Python, it will compare how the garbage collector of MRI,JRuby and Rubinius works.

If algorithms and business logic are a person's brain, which organ of the human body is the garbage collection mechanism?

At the "Ruby Python" conference, I thought it would be interesting to compare the garbage collection mechanisms within Ruby and Python. Before we begin, why are we talking about garbage collection mechanisms? After all, this is one of the most fascinating and exciting topics, isn't it? How many of you are excited about the garbage collection mechanism? Many of the participants raised their hands!

Recently, there was a post in the Ruby community about how to improve the speed of unit testing by changing the settings of Ruby GC. This is great! It's nice to speed up testing by reducing the processing of GC garbage collection, but not really, GC doesn't really excite me. It's like a boring, boring technical post at first glance.

In fact, garbage collection is a fascinating topic: garbage collection algorithms are not only an important part of the history of computer science, but also a topic of cutting-edge research. For example, the "Mark Sweep" algorithm used by the MRI Ruby interpreter has been used for more than 50 years, while a garbage collection algorithm used in the Rubinius interpreter is another implementation in Ruby, which was only developed in 2008.

However, the name "garbage collection" is very inappropriate.

The heart of the application

The garbage collection system needs to do more than just "recycling". In fact, it mainly accomplishes three important tasks:

Allocate memory for new objects

Mark junk object

Reclaim the memory occupied by garbage objects

Imagine that your application is a person's body: all the elegant code you write, your business logic, your algorithms, will become the brain or intelligence of your application. Similarly, which part of the body do you think the garbage collector will become? (I got a lot of interesting answers from the audience: kidneys, white blood cells)

I think the garbage collector is the heart of an application. Just as the heart provides blood and nourishment for other parts of the body, the garbage collector provides memory and objects for programs to use. If your heart stops beating, you won't live for a few seconds. If the garbage collector stops running or slows down, just like a clogged artery, your program will slow down and die!

A simple example.

It is a good way to verify the theory by examples. Here is a simple class written in Python and Ruby, which we can use as a simple example:

At the same time, I was surprised that the two codes were so similar: there was little difference between Python and Ruby in expressing the same semantics. But is the internal implementation of the two languages the same?

Idle object linked list

In the above code, what will ruby do when we call Node.new (1)? That is, how does Ruby create a new object?

Surprisingly, Ruby does very little! In fact, before the code runs, the Ruby interpreter creates thousands of objects in advance and places them in a linked list called the "free list". A linked list of free objects (`free object list`) looks conceptually like the following:

Each white square can be thought of as a pre-created, unused Ruby object. When we call Node.new,Ruby, simply use an object and return its reference to us:

In the image above, the gray square on the left represents an active Ruby object that is used by our code, while the rest of the white square code does not use the object. (note: of course, the figure is a simplified implementation version. In fact, Ruby will use another object to save the string "ABC", a third object to save the definition of Node, and other objects to hold the abstract syntax number "AST" processed by the code, and so on.)

If we call Node.new,Ruby again, only a reference to another object is returned.

John McCarthy implemented garbage collection in Lisp in 1960.

The simple algorithm for using pre-created object lists was invented more than 50 years ago by the legendary computer scientist John McCarthy who implemented the original Lisp interpreter. Lisp is not only a functional programming language, but also contains many breakthroughs in computer science. One of them is to automatically manage memory through a garbage collection mechanism.

The standard Ruby, also known as "Matz's Ruby Interpreter" (MRI), uses a garbage collection algorithm similar to the Lisp implemented by John McCarthy in 1960. Just like Lisp, Ruby pre-creates the object and returns a reference to the object when you create the object or value.

Allocate object memory in Python

As we can see from the above, Ruby creates objects in advance and saves them in the free object linked list (free list). What about Python?

Of course, Python also uses free object lists internally for a variety of reasons (it uses linked list loops to determine objects), and Python often allocates memory for objects and values in a different way than Ruby.

Suppose we create a Node object using Python:

Python is different from Ruby. When you create an object, Python immediately applies to the operating system for memory allocation. Python actually implements its own memory allocation system, which provides another layer of abstraction on the operating system memory heap, but no events are discussed in depth today. )

When we create the second object, Python will again request more memory from the operating system:

It seems quite simple, and when we create the Python object, it will cost the event to apply for memory.

Ruby throws useless objects everywhere until the next garbage collection process

Ruby developers live in a messy room

Back at Ruby, as we allocate more and more objects, Ruby will continue to fetch pre-allocated objects from the free object list (free list) for us. As a result, the list of free objects will become shorter and shorter:

Or shorter:

Please note that I assigned a new value to N1 JKL Ruby will leave the old value behind. Node objects such as "ABC", "JKL" and "MNO" will remain in memory. Ruby does not immediately clean up old objects even though the program is no longer in use! Being a Ruby developer is like living in a messy room with clothes casually still on the floor and kitchen sinks full of dirty dishes. As a Ruby developer, you have to work in a bunch of junk objects.

When your program is no longer using any objects, Python will clean up immediately.

Python developers live in a neat house

Garbage collection mechanisms are very different in Python and Ruby, so let's go back to the first three examples of Node objects in Python:

Internally, whenever we create a new object, Python will save a number called reference technology in the C language structure corresponding to the object. Initially, Python set its value to 1.

A value of 1 indicates that each object has a pointer or reference to it. Suppose we create a new object, JKL:

As mentioned earlier, Python sets the reference count for "JKL" to 1. Also notice that we changed N1 to point to "JKL" and no longer refer to "ABC" while reducing the reference count of "ABC" to 0.

With this, the Python garbage collector will execute immediately! Whenever an object's reference count changes to zero, python will immediately release the object and return its memory to the operating system.

In the figure above, Python will reclaim the memory of the "ABC" object. Remember, Ruby just leaves old objects there and doesn't free up the memory they occupy.

This garbage collection algorithm, known as "reference counting", was invented by George Collins in 1960. Coincidentally, Uncle John McCarthy invented the "free object linked list algorithm" in the same year. As Mike Bernstein said at the Ruby Conference conference, "1960 belongs to the garbage collector." .

Being a Python developer is like living in a clean room. You know, your roommate is a bit of a cleanliness freak. He will clean everything you've ever used. He will wash the dirty dishes and cups as soon as you put them in the sink.

Now looking at another example, suppose we point N2 and N1 to the same node:

As you can see on the left side of the figure above, Python reduces the reference count of "DEF" and immediately recycles the "DEF" object. At the same time, you can see that because N1 and N2 both refer to the "JKL" object, its reference count becomes 2.

Tag recovery algorithm

Eventually, dirty rooms will pile up rubbish, and life can't always be like this. After running the Ruby program for a period of time, the list of free objects will eventually be exhausted.

All the pre-allocated objects in the image above are exhausted (squares are all gray), and there are no objects available on the linked list (there are no remaining white squares).

At this point, Ruby uses an algorithm called "tag recycling" invented by John McCarthy. First of all, Ruby will stop the execution of the program, and Ruby uses the method of "stop the world and then collect the garbage." Ruby then scans all pointers or references to objects and values. Similarly, Ruby iterates over pointers used inside the virtual machine. It marks the object that each pointer can reach. In the following figure, I use "M" to indicate these tags:

The above three "M" marked objects are active objects and are still used by our program. Within the Ruby interpreter, the data structure of "free bitmap" is usually used to save whether an object is marked:

Ruby keeps the "free bitmap" in a separate memory area so that it can make better use of the "copy-on-write" feature of Unix. For more information, please refer to my other article, "Why does Ruby2.0 's garbage collector excite us so much?"

If active objects are marked, the rest are junk objects, meaning they will no longer be used by the code. In the following figure, I use white squares to represent junk objects:

Next, Ruby will clean up unused junk objects and link them into the free object list (free list):

Within the interpreter, this process is so fast that Ruby doesn't actually copy objects from one place to another. Instead, Ruby forms a new linked list of junk objects and links them into the free object linked list (free list).

Now, when we create a new Ruby object, Ruby will return the collected garbage object for us. In Ruby, objects can be reborn and enjoy many lives!

Tag recovery algorithm vs. Citation counting algorithm

At first glance, Python's garbage collection algorithm is quite surprising to Ruby: if you can live in a clean and clean room, why live in a dirty room? Why does Ruby periodically force programs to stop running to clean up garbage instead of using Python's algorithm?

However, the implementation of reference counting is not as simple as it seems. Here are some reasons why many languages are reluctant to use reference counting algorithms like Python:

First of all, it is very difficult to achieve. Python must leave some space for each object to hold the reference count. This can lead to some minor memory overhead. But to make matters worse, a simple operation such as changing a variable or reference will lead to complex operations, and because Python needs to increase the count of one object and decrease the count of another object, it is possible to release an object.

Second, it slows down. Although Python runs smoothly with garbage collection (it cleans up as soon as you put the dirty dishes in the sink), it doesn't run very quickly. Python is always updating the reference count. And when you stop using a huge data structure, such as a sequence containing a large number of elements, Python must release many objects at a time. Reducing reference count can be a complex, recursive process.

*, it doesn't always work well. As you can see in the next part of my presentation, in my next post, reference counting does not deal with circular reference data structures, it contains circular references.

The above content is how to compare the garbage collection of Ruby and Python. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report