Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What happens if you disable Python GC?

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

In this article, the editor gives you a detailed introduction of "what will happen if you disable Python GC". The content is detailed, the steps are clear, and the details are handled properly. I hope this "what will happen if you disable Python GC" article can help you solve your doubts.

How do we run the Web server?

Instagram's Web server runs Django in multi-process mode, using the main process to create dozens of worker processes that receive incoming user requests. For the application server, we use uWSGI with bifurcation mode to balance memory sharing between the main process and the worker process.

To prevent the Django server from running to the OOM,uWSGI main process, a mechanism is provided to restart the worker process when its RSS memory exceeds a predetermined limit.

Learn about memory

We began to study why RSS memory grows rapidly after it is generated by the main process. One observation is that even when RSS memory starts at 250 MB, its shared memory drops very quickly, from 250 MB to about 140 MB in a few seconds (the shared memory size can be read from / proc / PID / smaps). The numbers here are boring because they can change at any time, but the scale of the decline in shared memory is very interesting-about 1 of the total memory. Next, we want to understand why shared memory becomes private memory for each process when the worker starts to generate.

Our guess: copy on read

The Linux kernel has a mechanism called copy-on-write (Copy-on-Write,CoW), which is used to optimize the fork process. A child process begins by sharing each memory page with its parent process. The page is copied into the child process memory space only when the page is written (see wiki https://en.wikipedia.org/wiki/Copy-on-write for more information).

But in the Python world, things get interesting because of reference counting. Every time we read a Python object, the interpreter increases its reference count, which is essentially a write to its underlying data structure. This leads to the occurrence of CoW. So when we use Python, what we are doing is CoR!

# define PyObject_HEAD

_ PyObject_HEAD_EXTRA

Py_ssize_t ob_refcnt

Struct _ typeobject * ob_type

...

Typedef struct _ object {

PyObject_HEAD

} PyObject

So the question is: do we copy immutable objects such as code objects when we write? Assume that PyCodeObject is indeed a "subclass" of PyObject, and obviously the same is true. Our first idea is to disable PyCodeObject's reference counting.

First attempt: disable reference counting for code objects

On Instagram, let's do a simple thing first. Considering that this is an experiment, we made some minor changes to the CPython interpreter to verify that the reference count did not change to the code object, and then ran CPython on one of our production servers.

The result is disappointing because there is no change in shared memory. When we try to find out why, we realize that we can't find any reliable indicators to prove that our hacking works, nor can we prove the link between shared memory and copies of code objects. Obviously, there is something missing here. Lesson learned: test your theory before you act.

Page error analysis

After doing a Google search on Copy-on-Write, we learned that Copy-on-Write is associated with page errors in the system. Each CoW may trigger a page error during operation. Linux provides Perf tools that allow you to log hardware / software system events, including page errors, and even provide stack traces!

So we use a prod, restart the server, wait for it to fork, get a worker process PID, and run the following command.

Perf record-e page-faults-g-p

Then, when a page error occurs during the stack trace, we have an idea.

The result is different from what we expected. The primary suspect is collect rather than the copy code object, which belongs to gcmodule.c and is called when garbage collection is triggered. After understanding how GC works in CPython, we have the following theory:

CPython's GC is triggered entirely based on the threshold. This default threshold is very low, so it starts at an early stage. It maintains many generations of object linked lists, and the linked lists are reshuffled during GC. Because the linked list structure exists like the object itself (like ob_refcount), overwriting these objects in the linked list can cause the page to be copied on write, which is an unfortunate side effect.

/ * GC information is stored BEFORE the object structure. , /

Typedef union _ gc_head {

Struct {

Union _ gc_head * gc_next

Union _ gc_head * gc_prev

Py_ssize_t gc_refs

} gc

Long double dummy; / * force worst-case alignment * /

} PyGC_Head

Second attempt: let's try disabling GC

So, since GC is secretly hurting us, let's disable it!

We added a function call to gc.disable () to our boot script. We restarted the server, but again, bad luck! If we look at perf again, we will see that gc.collect is still being called and memory is still being copied. While doing some debugging with GDB, we found that the third-party library (msgpack) we used obviously called gc.enable () to restore it, causing gc.disable () to be cleaned in the bootstrapper.

Patching msgpack is the last thing we need to do, because it opens the door for other libraries that do the same thing when we don't notice it in the future. First, we need to prove that disabling GC actually helps. Once again, the answer falls on gcmodule.c. As an alternative to gc.disable, we did gc.set_threshold (0), and this time, no library can restore it.

In this way, we successfully increased the shared memory for each worker process from 140MB to 225MB, and the total memory usage on each machine's host was reduced by 8GB. This saves the entire Django fleet 25% of RAM. With such a large head of space, we can run more processes or run processes with higher RSS memory thresholds. In fact, this increases the throughput of the Django layer by more than 10%.

Third attempt: shutting down GC completely requires many times

After trying a series of settings, we decided to try it on a larger scale: a cluster. The feedback was fairly fast, and our continuous deployment was terminated because it was slow to restart our Web server after GC was disabled. It usually takes less than 10 seconds to restart, but when GC is disabled, it sometimes takes more than 60 seconds.

2016-05-021521 ready in 4615 05.57499 WSGI app 0 (mountpoint='') 115seconds on interpreter 0x92f480 pid: 4024654 (default app)

Copying this bug is very painful because it is not certain that it will happen. After a large number of experiments, a real re-pro is displayed at the top. When this happens, the available memory on the host drops to near zero and jumps back, forcing all cached memory to be cleared. Then when all the code / data needs to be read from disk (DSK 100%), everything becomes very slow.

This is a wake-up call that Python makes a final GC before the interpreter is turned off, which will lead to a huge jump in memory usage in a very short period of time. Again, I want to prove it first, and then figure out how to deal with it correctly. So, I commented out the call to Py_Finalize 's python plug-in in uWSGI, and the problem disappeared.

But obviously we can't just disable Py_Finalize. We have a series of important cleaning tools that use atexit hooks that depend on it. The last thing we do is add a run flag to CPython, which completely disables GC.

Finally, we need to extend it to a larger scale. We tried to use it throughout the fleet after that, but the continuous deployment was terminated again. This time, however, it only happens on older CPU (Sandybridge) machines, and it's even harder to reproduce. Lesson learned: test on old clients / models on a regular basis, because they are usually the most problematic.

Because our continuous deployment is a fairly fast process, to really capture what happened, I added a separate atop to our rollout command. We can seize a moment when cache memory becomes very low, when all uWSGI processes trigger a lot of MINFLT (small page errors). Xiamen forklift Leasing Co., Ltd.

Once again, through perf analysis, we see Py_Finalize again. During shutdown, a series of cleanup operations are done in addition to the final GC,Python, such as destroying type objects and uninstalling modules. Once again, this behavior destroys shared memory.

4th attempt: close GC of the last step of GC: no cleanup

Why on earth do we need to clean up? This process will die, and we will get another substitute. What we really care about is our atexit hook, which cleans up for our application. As for Python cleanup, we don't have to do this. This is how we end up in our own bootstrapping script:

# gc.disable () doesn't work, because some random 3rd-party library will

# enable it back implicitly.

Gc.set_threshold (0)

# Suicide immediately after other atexit functions finishes.

# CPython will do a bunch of cleanups in Py_Finalize which

# will again cause Copy-on-Write, including a final GC

Atexit.register (os._exit, 0)

This is based on the fact that the atexi t function runs in the reverse order of the registry. The atexit function completes the other cleanup, and then calls os._exit (0) in the last step to exit the current process.

With the change of these two lines, we finally allowed it to be implemented throughout the fleet. After carefully adjusting the memory threshold, we won 10% of the global capacity!

Review

In reviewing this performance improvement, we have two questions:

First of all, if there is no garbage collection, will Python's memory be blown up because all allocated memory will never be released? Remember, there is no real stack of memory in Python because all objects are allocated in the heap.

Fortunately, this is not true. The main mechanism for releasing objects in Python is still reference counting. When an object is dereferenced (calling Py_DECREF), the Python runtime always checks to see if its reference count is reduced to zero. In this case, the object's liberator is called. The main purpose of garbage collection is to terminate reference cycles where reference counting does not work.

# define Py_DECREF (op)

Do {

If (_ Py_DEC_REFTOTAL _ Py_REF_DEBUG_COMMA)

-((PyObject*) (op))-> ob_refcnt! = 0)

_ Py_CHECK_REFCNT (op)

Else

_ Py_Dealloc ((PyObject *) (op))

} while (0)

The second question of gain analysis: where does the gain come from?

The gain of disabling GC comes from two reasons:

We released about 8GB's RAM for each server, which we will use to create more worker processes for memory-bound server generation, or to slow down the regeneration rate for bound CPU servers.

As the number of CPU instructions increases by about 10% per cycle (IPC), the CPU throughput is also improved.

# perf stat-a-e cache-misses,cache-references-- sleep 10

Performance counter stats for 'system wide':

268195790 cache-misses # 12.240 of all cache refs [100.005%]

2191115722 cache-references

10.019172636 seconds time elapsed

When GC is disabled, the cache loss rate decreases by 2-3%, which is the main reason for the 10% increase in IPC. The miss of the CPU cache is expensive because it blocks the CPU pipeline. Small improvements to the CPU cache hit ratio can usually significantly improve IPC. With less CoW, more CPU cache lines with different virtual addresses (in different working processes) point to the same physical memory address, making the cache hit ratio higher.

After reading this, the article "what will happen if Python GC is disabled" has been introduced. If you want to master the knowledge points of this article, you still need to practice and use it yourself to understand it. If you want to know more about related articles, welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report