What is the Python bytecode and the program execution process 07/08 Update SLTechnology News&Howtos

What is the Python bytecode and the program execution process

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

Today, I would like to share with you the relevant knowledge of Python bytecode and the execution process of the program. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look.

Question:

We have to write some Python programs every day, either to process some text, or to do some system administration work. After the program is written, you only need to hit the python command to start the program and start execution:

$python some-program.py

So, how can a text .py file be converted step by step into machine instructions that can be executed by CPU? In addition, .pyc files may be generated during program execution, so what is the use of these files?

1. Execution process

Although Python behaves more like an explanatory language such as Shell scripts, in fact, Python programs are essentially the same as Java or C #, which can be summarized as virtual machines and bytecode. The Python execution program is divided into two steps: first, compile the program code into bytecode, and then start the virtual machine to execute the bytecode:

Although the Python command is also called the Python interpreter, it is fundamentally different from other scripting language interpreters. In fact, the Python interpreter consists of a compiler and a virtual machine. When the Python interpreter starts, you mainly perform the following two steps:

The compiler compiles the Python source code in the .py file into bytecode. The virtual machine executes the bytecode generated by the compiler line by line.

Therefore, the Python statements in the .py file are not converted directly to machine instructions, but to Python bytecode.

two。 Bytecode

The compilation result of the Python program is bytecode, which has a lot of information about the running of Python. Therefore, whether it is to better understand the running mechanism of the Python virtual machine, or to optimize the efficiency of Python programs, bytecode is the key content. So what exactly does a Python bytecode look like? How can we get the bytecode of a Python program-Python provides a built-in function compile for real-time compilation of source code. We only need to call the compile function with the source code to be compiled as a parameter to obtain the compilation result of the source code.

3. Source code compilation

Next, we compile a program through the compile function:

The source code is saved in the demo.py file:

PI = 3.14def circle_area (r): return PI * r * * 2class Person (object): def _ _ init__ (self, name): self.name = name def say (self): print ('I am', self.name)

You need to read the source code from the file before compilation:

> text = open ('D:\ myspace\ code\ pythonCode\ mix\ demo.py'). Read () > print (text) PI = 3.14def circle_area (r): return PI * r * * 2class Person (object): def _ init__ (self, name): self.name = name def say (self): print ('I am', self.name)

Then call the compile function to compile the source code:

> result = compile (text,'D:\ myspace\ code\ pythonCode\ mix\ demo.py', 'exec')

There are three required parameters for compile function:

Source: source code to be compiled

Filename: the file name where the source code is located

Mode: compilation mode. Exec means to compile the source code as a module.

There are three compilation modes:

Exec: used to compile module source code

Single: used to compile a separate Python statement (Interactive)

Eval: used to compile an eval expression

4. PyCodeObject

Through the compile function, we get the final source code compilation result result:

> > result > result.__class__

Finally, we get an object of type code, and its underlying structure is PyCodeObject.

The PyCodeObject source code is as follows:

/ * Bytecode object * / struct PyCodeObject {PyObject_HEAD int co_argcount; / * # arguments, except * args * / int co_posonlyargcount; / * # positional only arguments * / int co_kwonlyargcount; / * # keyword only arguments * / int co_nlocals; / * # local variables * / int co_stacksize; / * # entries needed for evaluation stack * / int co_flags / * CO_..., see below * / int co_firstlineno; / * first source line number * / PyObject * co_code; / * instruction opcodes * / PyObject * co_consts; / * list (constants used) * / PyObject * co_names; / * list of strings (names used) * / PyObject * co_varnames / * tuple of strings (local variable names) * / PyObject * co_freevars; / * tuple of strings (free variable names) * / PyObject * co_cellvars; / * tuple of strings (cell variable names) * / * The rest aren't used in either hash or comparisons, except for co_name, used in both. This is done to preserve the name and line number for tracebacks and debuggers; otherwise, constant de-duplication would collapse identical functions/lambdas defined on different lines. * / Py_ssize_t * co_cell2arg; / * Maps cell vars which are arguments. * / PyObject * co_filename; / * unicode (where it was loaded from) * / PyObject * co_name; / * unicode (name, for reference) * / PyObject * co_linetable; / * string (encoding addrlineno mapping) See Objects/lnotab_notes.txt for details. * / void * co_zombieframe; / * for optimization only (see frameobject.c) * / PyObject * co_weakreflist; / * to support weakrefs to code objects * / / * Scratch space for extra data relating to the code object. Type is a void* to keep the format private in codeobject.c to force people to go through the proper APIs. * / void * co_extra; / * Per opcodes just-in-time cache * * To reduce cache size, we use indirect mapping from opcode index to * cache object: * cache = co_ opcache_ [co _ opcache_ map [next _ instr-first_instr]-1] * / / co_opcache_map is indexed by (next_instr-first_instr). / / * 0 means there is no cache for this opcode. / / * n > 0 means there is cache in co_ opcache [n-1]. Unsigned char * co_opcache_map; _ PyOpcache * co_opcache; int co_opcache_flag; / / used to determine when create a cache. Unsigned char co_opcache_size; / / length of co_opcache.}

The code object PyCodeObject is used to store compilation results, including bytecode and the constants, names, and so on involved in the code. Key fields include:

Field usage co_argcount parameter number co_kwonlyargcount keyword parameter number co_nlocals local variable number co_stacksize stack space required for code execution co_flags identification co_firstlineno code block first line number co_code instruction opcode, bytecode co_consts constant list co_names name list co_varnames local variable name list

Let's print the data corresponding to these fields:

Get the bytecode through the co_code field:

> > result.co_codeb'd\ x00Z\ x00d\ x01d\ x02\ x84\ x00Z\ x01G\ x00d\ x03d\ x04\ x84\ x00d\ x04e\ x02\ x83\ x03Z\ x03d\ x05S\ x00'

Get all the names involved in the code object through the co_names field:

> result.co_names ('PI',' circle_area', 'object',' Person')

Get all the constants involved in the code object through the co_consts field:

> result.co_consts (3.14, 'circle_area',' Person', None)

As you can see, there are also two code objects in the constant list, one of which is the body of the circle_area function and the other is the body of the Person class definition. Corresponding to the division of scope in Python, it can be naturally associated that each scope corresponds to a code object. If this assumption is true, then the constant list of Person code objects should also include two code objects: the init function body and the say function body. Let's take a look at the Person class code object:

> person_code = result.co_consts [3] > person_code > person_code.co_consts ('Person',' Person.__init__', 'Person.say', None)

Therefore, we come to the conclusion that after the Python source code is compiled, each scope corresponds to a code object, and the child scope code object is located in the constant list of the parent scope code object.

So far, we have the most basic understanding of the compilation result of Python source code-code object PyCodeObject, which will be further studied in virtual machine, function mechanism and class mechanism.

5. Decompilation

A bytecode is a sequence of unreadable bytes, like a binary machine code. If you want to read machine code, you can disassemble it, so can bytecode be decompiled?

Bytecode can be decompiled through the dis module:

> import dis > dis.dis (result.co_code) 0 LOAD_CONST 0 (0) 2 STORE_NAME 0 (0) 4 LOAD_CONST 1 (1) 6 LOAD_CONST 2 (2) 8 MAKE_FUNCTION 010 STORE_NAME 1 (1) 12 LOAD_BUILD_CLASS14 LOAD_CONST 3 (3) 16 LOAD_ CONST 4 (4) 18 MAKE_FUNCTION 020 LOAD_CONST 4 (4) 22 LOAD_NAME 2 (2) 24 CALL_FUNCTION 326 STORE_NAME 3 (3) 28 LOAD_CONST 5 (5) 30 RETURN_VALUE

The result of bytecode decompilation is very similar to assembly language. The first column is the offset of the bytecode, the second column is the instruction, and the third column is the Operand. Taking the first bytecode as an example, the LOAD_CONST instruction loads the constant into the stack, the constant subscript is given by the Operand, and the constant with a subscript of 0 is:

> result.co_consts [0] 3.14

In this way, the meaning of the first bytecode is clear: load the constant 3.14 onto the stack.

Because the code object holds context information such as bytecode, constants, names, and so on, decompiling the code object directly can get clearer results:

> > dis.dis (result) 1 0 LOAD_CONST 0 (3.14) 2 STORE_NAME 0 (PI) 3 4 LOAD_CONST 1 () 6 LOAD_CONST 2 ('circle_area') 8 MAKE_FUNCTION 0 10 STORE_NAME 1 (circle_area) 6 12 LOAD_BUILD_CLASS 14 LOAD_CONST 3 () 16 LOAD_CONST 4 ('Person') 18 MAKE_FUNCTION 0 20 LOAD_CONST 4 (' Person') 22 LOAD_NAME 2 (object) 24 CALL_FUNCTION 3 26 STORE_NAME 3 (Person) 28 LOAD_CONST 5 (None) 30 RETURN_VALUEDisassembly of: 4 0 LOAD_GLOBAL 0 (PI) 2 LOAD_FAST 0 (r) 4 LOAD_CONST 1 (2) 6 BINARY_POWER 8 BINARY_MULTIPLY 10 RETURN_VALUEDisassembly of: 60 LOAD_NAME 0 (_ _ name__) 2 STORE_NAME 1 (_ _ module__) 4 LOAD_CONST 0 ('Person') 6 STORE_NAME 2 (_ _ qualname__) 7 8 LOAD_CONST 1 () 10 LOAD_CONST 2 (' Person.__init__') 12 MAKE_FUNCTION 0 14 STORE_NAME 3 (_ _ init__) 10 16 LOAD_CONST 3 () 18 LOAD_CONST 4 ('Person.say') 20 MAKE_FUNCTION 0 22 STORE_NAME 4 (say) 24 LOAD_CONST 5 (None) 26 RETURN_VALUEDisassembly of: 8 0 LOAD_FAST 1 (name) 2 LOAD_FAST 0 (self) 4 STORE_ATTR 0 (name) 6 LOAD_CONST 0 (None) 8 RETURN_VALUEDisassembly of: 11 0 LOAD_GLOBAL 0 (print) 2 LOAD_CONST 1 ('i am') 4 LOAD_FAST 0 (self) 6 LOAD_ATTR 1 (name) 8 CALL_FUNCTION 2 10 POP_TOP 12 LOAD_CONST 0 (None) 14 RETURN_VALUE

The actual value of the constant or name specified by the Operand is listed in parentheses next to it. in addition, the bytecode is grouped in statements, separated by a blank line, and the line number of the statement is given before the bytecode. For example, the statement PI = 3.14 will be changed into two bytecodes:

1 0 LOAD_CONST 0 (3.14) 2 STORE_NAME 0 (PI) 6. Pyc

If you import demo as a module, Python generates a .pyc file in the same directory as the demo.py file:

> import demo

The pyc file saves the serialized code object PyCodeObject. In this way, when Python imports the demo module later, the code object can be obtained by directly reading the pyc file and deserializing it, avoiding the overhead caused by repeated compilation. Python is recompiled only if there are new changes to demo.py (the timestamp is newer than the .pyc file).

Therefore, compared to Java, .py files in Python can be compared to .java files in Java, which are source files, while .pyc files can be compared to .class files, which are the result of compilation. It's just that the Java program needs to be compiled with the compiler javac command and then executed with the virtual machine java command; the Python interpreter completes both processes.

These are all the contents of the article "what is the Python bytecode and program execution process?" Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.