How to write high-performance and thread-safe Python code 07/09 Update SLTechnology News&Howtos

How to write high-performance and thread-safe Python code

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

How to write high-performance and thread-safe Python code, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain in detail for you, people with this need can come to learn, I hope you can get something.

When I was 6, I had a music box. When I wind up, the ballerina on the top of the music box will spin, and the internal device will jingle like "twinkle and twinkle, and there are little stars all over the sky." It must be tacky, but I like that music box. I want to know how it works. Later, when I opened it, I saw a simple device inside the fuselage with a thumb-sized metal cylinder that fiddled with steel combs to produce these notes as it rotated.

Of all the features a programmer has, the curiosity to explore how things work is essential. When I opened the music box and looked at the internal devices, I could see that even if I had not grown into an excellent programmer, I was at least a curious one.

Oddly enough, I've been writing Python programs for years, and I've always had the wrong idea of global interpreter locks (GIL), because I've never been curious enough about how it works. I met other people who were equally hesitant and ignorant about it. It's time for us to open this box and have a look. Let's interpret the source code of the CPython interpreter and find out what GIL is, why it exists in Python, and how it affects multithreaded programs. I will give you an example to help you understand GIL deeply. You will learn how to write fast and thread-safe Python code, and how to choose between threads and processes.

(I only describe CPython in this article, not Jython, PyPy, or IronPython. Because most programmers still use CPython to implement Python.)

Look, global interpreter lock (GIL)

Here:

Static PyThread_type_lock interpreter_lock = 0; / * This is the GIL * /

This line of code is taken from the source code of ceval.c, the CPython 2.7interpreter. Guido van Rossum's comment "This is the GIL" was added in 2003, but the lock itself can be traced back to his multithreaded Python interpreter in 1997. In Unix systems, PyThread_type_lock is an alias for standard C mutex_t locks. When the Python interpreter starts, it initializes:

Void PyEval_InitThreads (void) {interpreter_lock = PyThread_allocate_lock (); PyThread_acquire_lock (interpreter_lock);}

All C code in the interpreter must keep this lock when executing Python. Guido originally added this lock because it is easy to use. And every attempt to remove GIL from CPython consumes too much performance of single-threaded programs. Although removing GIL will improve the performance of multithreaded programs, it is still not worth it. (the former is the biggest concern of Guido and the most important reason for not removing GIL. A simple attempt was made in 1999, and the end result was that the speed of single-threaded programs almost tripled.)

The effect of GIL on threads in a program is simple enough that you can write this principle on the back of your hand: "one thread runs Python while the other N sleeps or waits for Icano." Python threads can also wait for threading.Lock or other synchronization objects in the thread module; threads in this state are also called "sleep".

When does the thread switch? Whenever a thread starts to sleep or waits for network I-O, other threads always have a chance to get GIL to execute Python code. This is collaborative multitasking. CPython also has preemptive multitasking. If a thread runs 1000 bytecode instructions uninterruptedly in Python 2, or 15 milliseconds in Python 3, it will abandon GIL while other threads can run. Think of this as a time slice in the old days when there were multiple threads but only one CPU. I will discuss these two types of multitasking in detail.

Think of Python as an old mainframe with multiple tasks sharing a CPU.

Collaborative multitasking

When a task such as network IPython O starts and there is no need to run any GIL code for a long or uncertain time, a thread will give up the GIL so that other threads can get the GIL and run Python. This politeness is called collaborative multitasking, which allows concurrency; multiple threads wait for different events at the same time.

That is, each of the two threads connects a socket:

Def do_connect (): s = socket.socket () s.connect (('python.org', 80)) # drop the GIL for i in range (2): t = threading.Thread (target=do_connect) t.start ()

Only one of the two threads can execute the Python at a time, but once the thread starts connecting, it abandons the GIL so that the other threads can run. This means that two threads can wait for a socket connection concurrently, which is a good thing. They can do more work at the same time.

Let's open the box and see how a thread actually abandons GIL when a connection is established, in socketmodule.c:

/ * s.connect ((host, port)) method * / static PyObject * sock_connect (PySocketSockObject * s, PyObject * addro) {sock_addr_t addrbuf; int addrlen; int res; / * convert (host, port) tuple to C address * / getsockaddrarg (s, addro, SAS2SA (& addrbuf), & addrlen); Py_BEGIN_ALLOW_THREADS res = connect (s-> sock_fd, addr, addrlen) Py_END_ALLOW_THREADS / * error handling and so on.... * /}

It is at the Py_BEGIN_ALLOW_THREADS macro that the thread abandons the GIL;, which is simply defined as:

PyThread_release_lock (interpreter_lock)

Of course, Py_END_ALLOW_THREADS reacquires the lock. One thread may block at this location, waiting for another thread to release the lock; once this happens, the waiting thread will seize the lock and resume execution of your Python code. To put it simply: when N threads are blocked in the network I GIL O, or waiting to retrieve the Python, while one thread runs.

Let's take a look at a complete example of using collaborative multitasking to quickly crawl many URL. But before that, let's compare collaborative multitasking with other forms of multitasking.

Preemptive multitasking

Python threads can actively release GIL or pre-emptively grab GIL.

Let's review how Python works. Your program runs in two stages. First, the Python text is compiled into a simple binary format called bytecode. Second, the main loop of the Python interpreter, a function called pyeval_evalframeex (), reads the bytecode smoothly and executes the instructions one by one.

When the interpreter passes bytecode, it periodically discards the GIL without the permission of the thread executing the code, so that other threads can run:

For (;) {if (--ticker)

< 0) { ticker = check_interval; /* Give another thread a chance */ PyThread_release_lock(interpreter_lock); /* Other threads may run now */ PyThread_acquire_lock(interpreter_lock, 1); } bytecode = *next_instr++; switch (bytecode) { /* execute the next instruction ... */ } } 默认情况下，检测间隔是1000 字节码。所有线程都运行相同的代码，并以相同的方式定期从他们的锁中抽出。在 Python 3 GIL 的实施更加复杂，检测间隔不是一个固定数目的字节码，而是15 毫秒。然而，对于你的代码，这些差异并不显著。 Python中的线程安全将多个线状物编织在一起，需要技能。如果一个线程可以随时失去 GIL，你必须使让代码线程安全。然而 Python 程序员对线程安全的看法大不同于 C 或者 Java 程序员，因为许多 Python 操作是原子的。在列表中调用 sort()，就是原子操作的例子。线程不能在排序期间被打断，其他线程从来看不到列表排序的部分，也不会在列表排序之前看到过期的数据。原子操作简化了我们的生活，但也有意外。例如，+ = 似乎比 sort() 函数简单，但+ =不是原子操作。你怎么知道哪些操作是原子的，哪些不是? 看看这个代码： n = 0 def foo(): global n n += 1 我们可以看到这个函数用 Python 的标准 dis 模块编译的字节码： >

> > import dis > > dis.dis (foo) LOAD_GLOBAL 0 (n) LOAD_CONST 1 (1) INPLACE_ADD STORE_GLOBAL 0 (n)

In one line of code, n + = 1, it is compiled into four bytecodes, performing four basic operations:

Load n values onto the stack

Load constant 1 onto the stack

Add the two values at the top of the stack

Store the sum back to n

Remember, for every 1000 bytecode a thread runs, the interpreter interrupts and takes away the GIL. If you are unlucky, this may occur during the time that the thread loads the n value into the stack and stores it back to n. It's easy to see how this process can cause updates to be lost:

Threads = [] for i in range (100): t = threading.Thread (target=foo) threads.append (t) for t in threads: t.start () for t in threads: t.join () print (n)

Usually this code outputs 100 because each of the 100 threads increments n. But sometimes you will see 99 or 98 if one thread's update is overwritten by another.

So, despite GIL, you still need to lock to protect the mutable state of the share:

N = 0 lock = threading.Lock () def foo (): global n with lock: n + = 1

What if we use an atomic operation such as the sort () function?

Lst = [4,1,3,2] def foo (): lst.sort ()

The bytecode of this function shows that the sort () function cannot be interrupted because it is atomic:

> dis.dis (foo) LOAD_GLOBAL 0 (lst) LOAD_ATTR 1 (sort) CALL_FUNCTION 0

One line is compiled into 3 bytecodes:

Load lst values onto the stack

Load its sorting method onto the stack

Call the sorting method

Even though this line of lst.sort () is divided into several steps, the call to sort itself is a single bytecode, so the thread has no chance to grab the GIL during the call. We can conclude that there is no need for locking in sort (). Or, to avoid worrying about which operation is atomic, follow a simple principle: always lock reads and writes around shared variable state. After all, it's cheap to get a threading.Lock in Python.

Although GIL does not exempt us from the need for locking, it does mean that there is no need for fine-grained locks (the so-called fine-grained programmers need to add and unlock themselves to ensure thread safety, the typical representative is Java, while CPthon is a coarse-grained lock, that is, the language level itself maintains a global locking mechanism to ensure thread safety). In thread-free languages such as Java, programmers try to lock and access shared data in the shortest possible time, reduce thread contention, and achieve parallelism. However, because threads cannot run in parallel in Python, fine-grained locks have no advantage. As long as there is no thread to hold the lock, such as sleeping, waiting for I GIL O, or some other loss of lock, you should use a lock that is as coarse-grained and simple as possible. In any case, other threads cannot run in parallel.

Concurrency can be done faster

I bet what you really want is to optimize your program through multithreading. By waiting for many network operations at the same time, your task will be completed faster, and so many threads will help, even if only one thread can execute Python at a time. This is concurrency, and threads work well in this case.

The code runs faster in the thread

Import threading import requests urls = [...] Def worker (): while True: try: url = urls.pop () except IndexError: break # Done. Requests.get (url) for _ in range (10): t = threading.Thread (target=worker) t.start ()

As we can see, in getting a URL on the HTTP, these threads give up the GIL while waiting for each socket operation, so they finish the work faster than a thread.

Parallelism parallelism

What if you want to make the task faster just by running Python code at the same time? This approach is called parallelism, in which case GIL is prohibited. You have to use multiple processes, which is more complex than threads and requires more memory, but it can make better use of multiple CPU.

This example fork has 10 processes, which is faster than only one process to complete, because the process runs in parallel in multiple cores. However, 10 threads will not complete faster than 1 thread, because only 1 thread can execute Python at a point in time:

Import os import sys nums = [1 for _ in range (1000000)] chunk_size = len (nums) / / 10 readers = [] while nums: chunk, nums = nums [: chunk_size], nums [chunk _ size:] reader, writer = os.pipe () if os.fork (): readers.append (reader) # Parent. Else: subtotal = 0 for i in chunk: # Intentionally slow code. Subtotal + = I print ('subtotal% d'% subtotal) os.write (writer, str (subtotal). Encode () sys.exit (0) # Parent. Total = 0 for reader in readers: subtotal = int (os.read (reader, 1000) .decode () total + = subtotal print ("Total:% d"% total)

Because each fork process has a separate GIL, this program can assign work and run multiple calculations at a time.

Jython and IronPython provide single-process parallelism, but they are far from fully compatible with CPython. PyPy with software transaction memory can run faster one day. If you're curious about this, try these interpreters.)

Conclusion

Now that you've opened the music box and seen its simple device, you know all you need to know how to write fast, thread-safe Python code. Use threads to perform concurrent Ithumb O operations and perform parallel computing in the process. This principle is simple enough that you don't even need to write it on your hand.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.