In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article focuses on "what Python global interpreter locks can do". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn what Python global interpreter locks can do.
An unresolved question
There is a problem in every field: it is difficult and time-consuming, and just trying to solve it can be shocking. The whole community gave up the problem a long time ago, and now only a few people are trying to solve it. For beginners, solving such a difficult problem will bring him enough reputation. P = NP in the field of computer science is such a problem. If we can solve this problem with polynomial time complexity, it can change the world. The most difficult problem in Python is easier than P = NP, but so far there is no satisfactory answer, and solving this problem is as revolutionary as solving the P = NP problem. Because of this, there are so many people in the Python community paying attention to the question: "what can be done with global interpreter locks (GIL)?"
The bottom layer of Python
To understand the meaning of GIL, we need to start with the basics of Python. A language like C++ is a compiled language. As its name implies, the code for this type of language is input to the compiler, which parses it according to the language's syntax, generates a language-independent intermediate representation, and links it into an executable program made up of highly optimized machine code. Because the compiler can get the whole code (or a large piece of relatively independent code), the compiler can deeply optimize the code. This makes it possible to reason the interaction between different language structures and make more effective optimization.
Python, by contrast, is an interpretive language. The code is entered into the interpreter to run. The interpreter knows nothing about the code before execution; it only knows the rules of Python and how to apply them dynamically during execution. It also has some optimizations, but it is completely different from the optimization of compiled languages. Because the interpreter does not deduce the code well, most of the optimizations of Python are actually optimizations of the interpreter itself. A faster interpreter naturally means faster programs, and this optimization is free for developers. In other words, after the interpreter is optimized, developers can enjoy the benefits of optimization without modifying the Python code.
This is a very important point and it is necessary to emphasize it here. Under the same conditions, the running speed of the Python program is directly related to the "speed" of the interpreter. No matter how developers optimize their code, the execution speed of the program is still limited by the efficiency of the interpreter. Obviously, this is why so much work has been done to optimize the Python interpreter. This is probably the nearest free lunch for Python developers.
Free lunch is over.
It's still not over? Moore's Law tells us a timetable for hardware acceleration, and a whole generation of programmers have learned how to write code under Moore's Law. If programmers write slower code, the easiest way is usually to wait for a faster processor. In fact, Moore's Law is and will be valid for a long time, but there has been a fundamental change in the way it works. The clock frequency will not steadily increase to an unattainable speed, but instead take advantage of the benefits of increased transistor density through multicore. If you want the program to take full advantage of the performance of the new processor, you must rewrite the code in a concurrent manner.
When most developers hear "concurrency", they usually think of multithreaded programs immediately. At present, multithreading is still the most common way to make use of multi-core systems. Multithreaded programming is much more difficult than traditional "sequential" programming, but careful programmers can take full advantage of multithreaded concurrency in their code. Since almost all modern programming languages that are widely used support multithreaded programming, the language's implementation of multithreading should be added afterwards.
Unexpected facts
Now let's look at the crux of the problem. To take advantage of multi-core systems, Python must support multithreading. As an interpretive language, the Python interpreter's support for multithreading must be safe and efficient. We all know the problems with multithreaded programming. The interpreter must avoid manipulating data shared internally by different threads. At the same time, it also ensures that the user thread can complete as many calculations as possible.
So how can data be protected when different threads access data at the same time? The answer is a global interpreter lock. As the name implies, this is a global lock added to the interpreter (in terms of mutexes or the like). This approach is safe, but (for Python beginners) it also means that for any Python program, no matter how many threads or processors there are, there is only one thread executing at any time.
Many people stumble upon this fact. Online discussion groups and message boards are filled with similar questions from Python beginners and experts: why does my new multithreaded Python program run more slowly than when it has only one thread? When asking this question, many people still feel like a fool, because if the program is indeed parallel, then a two-thread program is obviously faster than a single thread. In fact, this question has been asked so many times that Python experts have prepared a standard answer for it: don't use multithreading, use multiprocesses. But the answer is more confusing than the question itself: can't I use multithreading in Python? Even experts advise against how bad it is to use multithreading in a popular language like Python. Is there something I don't understand?
Unfortunately, it's not. Using multithreading to improve performance can be a difficult task because of the design of the Python interpreter. In the worst case, multithreading slows down (sometimes obviously) the speed of the program. A freshman majoring in computer science can tell you what happens when multiple threads compete for a shared resource. The results are usually not ideal. Multithreading works well in many cases, and for interpreter implementations and kernel developers, it may be their wish not to complain too much about Python multithreading performance.
What do we do now? Did you panic?
What can we do now? Should we, as Python developers, give up using multithreading to achieve parallelism? Why does GIL allow only one thread to run at a time? Can't finer-grained locks protect multiple independent objects when accessing concurrently? Why has no one ever made a similar attempt?
These questions are very practical and their answers are very interesting. GIL provides this protection for access to many objects, such as the current thread state and heap allocation objects for garbage collection. This is not surprising to the Python language, which requires the use of a GIL. This is a product that should be realized. There are also Python interpreters (and compilers) that do not use GIL. But for CPython, GIL has been around since it came into being.
So why don't we abandon GIL? Many people may not know that in 1999, Greg Stein submitted a patch called "free threading" for Python 1.5, which is often mentioned but not well understood. This patch attempts to remove GIL completely and replace it with fine-grained locks. However, the cost of GIL removal is a decline in the execution speed of single-threaded programs, by about 40 per cent. Using two threads can increase the speed, but the speed increase does not increase linearly with the increase in the number of cores. Due to the reduction in execution speed, this patch was not accepted and was almost forgotten.
GIL is a headache. Let's think of something else.
Although the patch "free threading" is not accepted, it is still instructive. It proves a basic point about the Python interpreter: it is very difficult to remove GIL. Today's interpreters rely on more global states than they did when the patch was released, which makes it more difficult to remove GIL. It is worth mentioning that, and for this reason, many people have become more interested in removing GIL. Difficult questions are usually interesting.
But this may be a little misguided. Let's assume that if we have a magic patch that removes GIL and doesn't degrade the performance of single-threaded Python code, we'll get what we've always wanted: a thread API that can use all processors concurrently. Now we've got what we wanted, but is that really a good thing?
Thread-based programming is difficult. When a person feels that he knows everything about threads, there are always some new problems. Some very well-known language designers and researchers have come out against the threading model because it is so difficult to get reasonable consistency in this respect. As anyone who has written multithreaded applications can tell you, both multithreaded applications are exponentially more difficult to develop and debug than single-threaded applications. The programmer's thinking model often adapts to the sequential execution model and does not match the parallel execution model. The advent of GIL inadvertently helped developers stay out of trouble. Synchronization primitives are still needed when using multithreading, and GIL actually helps us ensure data consistency between different threads.
So the hardest question for Python seems to be asking the wrong question. Python experts recommend that it makes sense to use multiprocesses instead of multithreading, rather than trying to hide the shame of Python threads. This implementation of Python encourages developers to implement the concurrency model in a safer and more intuitive way, while retaining the use of multithreading for developers to use when necessary. Most people may not know what the parallel programming model is. But most people know that the multithreading approach is not a parallel model.
Don't think that GIL is immutable or unreasonable. Antoine Pitrou implemented a new GIL in Python 3.2, a significantly improved Python interpreter. This is the most important improvement for GIL since 1992. This change is huge and difficult to explain here, but at a high level, the old GIL counted the Python instructions to determine when to release the GIL. Since there is no one-to-one correspondence between Python instructions and translated machine instructions, a single Python instruction may contain a lot of work. The new GIL uses a fixed timeout to instruct the current thread to release the lock. When the current thread holds the lock and the second thread requests the lock, the current thread is forced to release the lock after 5 ms (that is, the current thread checks to see if it needs to release the lock every 5 ms). This makes it easier to predict switching between threads when the task can be executed.
However, this is not an improvement. David Beazley is probably the most active research on the role of GIL in the execution of different types of tasks. In addition to the most in-depth study of GIL before Python 3.2, he also studied this GIL implementation of * and found a lot of interesting program scenarios in which even the new GIL implementation performed poorly. He still promotes the discussion about GIL and publishes the results of practice through practical research.
Regardless of what people think of Python's GIL, it remains the most difficult technical challenge in the Python language. To understand its implementation requires a thorough understanding of operating system design, multithreaded programming, C language, interpreter design and CPython interpreter implementation. These premises alone prevent many developers from studying GIL more thoroughly. However, there is no sign that GIL will be far away from us any time soon. For now, it will continue to bring confusion and surprise to those new to Python and interested in solving technical problems.
The above is based on my current research on the Python interpreter. I'm going to write about other aspects of the interpreter, but there's nothing better known than GIL. Although these technical details come from my thorough study of the CPython code base, there are still possible inaccuracies. If you find anything inaccurate, please let me know in time and I will correct it as soon as possible.
At this point, I believe you have a better understanding of "what Python global interpreter locks can do". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.