Why is Python so slow? 07/12 Update SLTechnology News&Howtos

Why is Python so slow?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "Why Python is so slow". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Python is becoming more and more popular and has rapidly expanded to include DevOps, data science, Web development, information security and other fields.

However, compared with the speed at which Python expands, Python code runs a little less quickly.

How do Java, C, C++, C #, and Python compare code speed? There is no one-size-fits-all standard, because the results largely depend on the type of program running, and the language benchmark Computer Language Benchmarks Games can be used as an aspect of measurement.

Based on my experience with language benchmarking over the years, Python runs slower than many languages. Whether you use C # or Java of the JIT compiler, C # of the AOT compiler, or interpretive languages such as JavaScript, Python runs slower than them.

Note: for the "Python" in the article, it generally refers to the official implementation of CPython. Of course, I will also mention Python implementations in other languages in this article.

What I want to answer is this question: what is the reason why Python is 2 to 10 times slower than other languages for a similar program? Is there any way to improve it?

The mainstream theories are as follows:

"Yes.

Global interpreter lock Global Interpreter Lock

(GIL) reasons "

"because Python is an interpretive language, not a compiled language."

"because Python is a dynamically typed language."

Which is the main reason that affects the running efficiency of Python?

Is it the reason for the global interpreter lock?

Today, many computers are equipped with CPU with multiple cores, and sometimes even multiple processors. To make better use of their processing power, the operating system defines a low-level structure called threads. A process, such as a Chrome browser, can set up multiple threads to perform different operations within the system. In this case, CPU-intensive processes can share the load across cores, which can greatly improve the efficiency of the application.

For example, when I was writing this article, my Chrome browser opened 44 threads. It should be mentioned that the thread structure and API of POSIX-based operating systems (such as Mac OS, Linux) and Windows operating systems are different, so the operating system is also responsible for scheduling each thread.

If you haven't written code for multithreaded execution, you need to understand the concept of thread locks. Multithreaded processes are more complex than single-threaded processes because thread locks are needed to ensure that data in the same memory address is not accessed or changed by multiple threads at the same time.

When the CPython interpreter creates a variable, it first allocates memory and then counts the references to the variable, which is called the reference count reference counting. If the number of references to a variable becomes 0, the variable is released from memory. This is why creating temporary variables within the for loop code block does not increase memory consumption.

When a variable is shared among multiple threads, the key to CPython lock reference counting is the use of GIL, which carefully controls thread execution, and the interpreter allows only one thread to operate at a time, no matter how many threads exist at a time.

How does this affect the performance of Python programs?

If your program is single-threaded and single-process, the speed and performance of the code will not be affected by global interpreter locks.

But if you achieve concurrency by using multiple threads in a single process and are IO-intensive (such as network IO or disk IO) threads, the effect of GIL contention is obvious.

GIL competition chart http://dabeaz.blogspot.com/2010/01/python-gil-visualized.html provided by David Beazley

For a web application (such as Django) and WSGI is also used, a separate Python interpreter is run for each request of the web application, and each request has only one lock. At the same time, because the Python interpreter starts slowly, some WSGI implementations also have a "daemon mode" that keeps the Python process ready.

How are the other Python interpreters performing?

PyPy is also an interpreter with GIL, but it is usually more than three times faster than CPython.

Jython is an interpreter without GIL because Python threads in Jython are implemented using Java threads and are managed by the JVM memory management system.

What does JavaScript do in this respect?

All Javascript engines use the mark-and-sweep garbage collection algorithm, while GIL uses CPython's memory management algorithm.

JavaScript does not have GIL, and it is single-threaded and does not need to use GIL. JavaScript's event loop and Promise/Callback pattern implement asynchronous programming instead of concurrency. There is a similar asyncio event loop in Python.

Is it because Python is an interpretive language?

I hear this all the time, but it crudely simplifies what Python actually does. In fact, when python myscript.py is executed on the terminal, CPython will read, parse, parse, compile, interpret and execute the code.

If you are interested in this series of processes, you can also read my previous article: modify the Python language in 6 minutes.

The creation of .pyc files is the focus of this process. During the code compilation phase, Python 3 writes the bytecode sequence to the file under _ _ pycache__/, while Python 2 writes the bytecode sequence to the .pyc file in the current directory. This is true for scripts you write, all imported code, and third-party modules.

So, in the vast majority of cases (unless your code is one-off... Python interprets the bytecode and executes it locally Compared with Java and C#.NET:

The Java code is compiled into an "intermediate language", and the bytecode is read by the Java virtual machine and compiled into machine code in real time. The same is true of .NET CIL. NET CLR (Common-Language-Runtime) compiles bytecode into machine code in real time.

Since Python uses virtual machines or some kind of bytecode like Java and C #, why is Python still much slower than Java and C # in benchmarking? The first reason is that. NET and Java are compiled by JIT.

Instant Just-in-time (JIT) compilation requires an intermediate language to split the code into blocks (or frames). The advance ahead of time (AOT) compiler needs to make sure that CPU understands every line of code before any interaction occurs.

JIT itself does not speed up execution because it still executes the same bytecode sequence. But JIT allows optimizations at run time. A good JIT optimizer will analyze which parts of the program will be executed multiple times, which is the "hot spot" in the program, and then the optimizer will replace the code with a more efficient version for optimization.

This means that if your program repeats the same operation many times, it is likely to be optimized faster by the optimizer. Moreover, Java and C # are strongly typed languages, so the optimizer can judge the code more accurately.

PyPy uses JIT that is significantly faster than CPython. More detailed results can be seen in this performance benchmark article: which version of Python is the fastest?

So why doesn't CPython use JIT?

JIT is not * either, and one of its obvious disadvantages is startup time. CPython startup time is already relatively slow, while PyPy startup is 2 to 3 times slower than CPython startup. The Java virtual machine is also notoriously slow to start. The .NET CLR optimizes the experience by booting at system startup, while the developers of CLR also develop the operating system on CLR.

So if you have a single long-running Python process, JIT makes more sense because there are "hotspots" in the code to optimize.

However, CPython is a generic implementation. Imagine that if you use Python to develop command-line programs, but you have to wait for JIT to start slowly every time you call CLI, this is a pretty bad experience.

CPython attempts to be used in a variety of use cases. It is possible to insert JIT into CPython, but the progress of this improvement is basically stagnant.

If you want to take full advantage of JIT, please use PyPy.

Is it because Python is a dynamically typed language?

In statically typed languages such as C, C++, Java, C #, and Go, you must specify the type of a variable when you declare it. In a dynamically typed language, although there is a concept of type, the type of a variable can be changed.

A = 1a = "foo"

In the above example, Python frees up the memory space for variable a to initially store variables of integer type, and creates a new memory space for string type, with the same name as the original variable.

Statically typed languages are designed not to embarrass you, but to make it easier for CPU to run. Because in the end, all operations need to be mapped to simple binary operations, high-level data structures such as objects and types must be converted to low-level data structures.

Python also implements such transformations, but users do not see them and do not need to care about them.

Instead of having to declare types not to make Python slow, Python is designed to allow users to make things dynamic: you can change methods on objects at run time, or you can dynamically add underlying system calls to value declarations at run time, almost anything.

But it is this kind of design that makes the optimization of Python extremely difficult.

To prove my point, I used DTrace, a system call tracking tool on Mac OS. There is no built-in DTrace in the CPython release, so CPython must be recompiled. Here is an example of Python 3.6.6:

Wget https://github.com/python/cpython/archive/v3.6.6.zipunzip v3.6.6.zipcd v3.6.6./configure-- with-dtracemake

This way python.exe will use DTrace to track all the code. Paul Ross has also given lightning speeches about DTrace. You can download Python's DTrace startup file to view function calls, execution time, CPU time, system calls, and various other things.

Sudo dtrace-s toolkit/.d-c'.. / cpython/python.exe script.py'

The py_callflow tracker shows all the functions called in the program.

So, does Python's dynamic type slow it down?

Type comparison and type conversion consume a lot of resources, and the type of a variable is checked every time a variable is read, written, or referenced.

The dynamic nature of Python makes it difficult to optimize, so many alternatives to Python can be so fast that they compromise on flexibility in order to improve speed.

On the other hand, Cython combines C's static types with Python to optimize code of known types, which can improve performance by 84 times.

This is the end of "Why is Python so slow"? thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.