What are the knowledge points of Python virtual machine framework 04/27 Update SLTechnology News&Howtos

What are the knowledge points of Python virtual machine framework

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces Python virtual machine framework knowledge points, the article is very detailed, has a certain reference value, interested friends must read!

Python virtual machine framework knowledge points explain:

Python bytecode

We know that Python source code is compiled into bytecode sequences before execution, and the Python virtual machine performs a series of operations based on these bytecodes to complete the execution of Python programs. In Python 2.5, 104 bytecode instructions were defined:

opcode.h

If we look closely at the byte code instructions above, we will find that although the byte code is defined from 0 to 143, there are jumps in the middle, such as 5 directly jumps to 9, 13 directly jumps to 15, and 15 directly jumps to 18. So Python 2.5 actually defines only 104 bytecode instructions.

Of Python 2.5's 104 instructions, some require parameters and others have no parameters. All bytecode instructions requiring parameters have encodings greater than 90. Python provides special macros to determine whether a bytecode instruction requires parameters:

opcode.h

We introduced PyCodeObject in Python code object and pyc file (1), Python code object and pyc file (2) and Python code object and pyc file (3). This object is a static object generated in memory after Python compiles the source code. Of course, this object also contains the bytecode after compilation of the source code. We can parse it with the code object parsing tool dis provided by Python.

The leftmost column is the number of lines in the source code corresponding to the bytecode instruction, the second column from the left is the offset position of the current bytecode in co_code, the third column shows the instruction of the current bytecode, the fourth column is the parameter of the instruction, and the last column is the actual parameter after calculation.

Second, Python virtual machine running framework

When Python starts, the Python runtime environment is initialized first. Note that the runtime environment here is different from the execution environment in the previous chapter, Python code objects and pyc files. The runtime environment is a global concept, while the execution environment is actually a stack frame. It is a concept corresponding to a Code Block. The Python virtual machine implementation is in a function, here we list the source code, and the actual source code will do some modifications:

ceval.c

PyEval_EvalFrameEx initializes variables first, where important information contained in PyCodeObject objects in PyFrameObject objects is taken care of. Of course, another important action is to initialize the stack top pointer stack_pointer to f->f_stacktop. The co_code field in the PyCodeObject object stores bytecode instructions and parameters of bytecode instructions. The Python virtual machine executes bytecode instruction sequences by traversing the entire co_code from beginning to end and executing bytecode instructions in turn.

In the Python virtual machine, three variables are used to complete the entire traversal process. co_code is actually a PyStringObject object, and the character array in it is what really makes sense, and the entire sequence of bytecode instructions is actually a character array in C. Therefore, the three variables used in the traversal are all char * variables, first_instr always points to the beginning of the bytecode instruction sequence, next_instr always points to the position of the next bytecode instruction to be executed, and f_lasti points to the position of the previous bytecode instruction that has already been executed.

Figure 1.1 shows three variables at one point in their traversal.

The architecture of Python virtual machines executing bytecode instructions is essentially a for loop with a giant switch/case structure:

ceval.c

The code above is just a very simplified version of the Python VM, the full code is implemented in the PyEval_EvalFrameEx method in the ceval.c file.

In this implementation architecture, step-by-step traversal of bytecode is accomplished through several macros:

ceval.c

In the PyCodeObject object analysis we said, Python bytecode some with parameters, some without parameters, to determine whether the bytecode with parameters specific reference to the HAS_ARG macro implementation, for different bytecode instructions, because there is a difference between whether the instruction parameters are required, so the displacement of next_instr may be different, but in any case, next_instr always points to Python next bytecode to be executed.

After Python obtains a bytecode and its required instruction parameters, it will use switch to judge the bytecode instruction and select different case statements according to the judgment result. Each bytecode instruction will correspond to a case statement. In case statements, Python implements bytecode instructions.

After successfully executing a bytecode instruction, Python's execution flow jumps to fast_next_opcode, or to a for loop, either way, Python's next action is to get the next bytecode instruction and instruction parameters, and complete execution of the next instruction. This iterates through all the bytecode instructions contained in co_code one by one, and finally completes the execution of the Python program.

There is also a variable "why" that indicates the state of Python's execution engine when exiting this giant for loop, because Python's execution engine may not execute correctly every time, and it is very likely that an error will occur when executing a bytecode, which is familiar to us. Exception. So when Python exits the execution engine, you need to know what the execution engine ended for, or ended normally? Or is it because something went wrong and it couldn't be carried out? Why assumes this heavy rule. The value range of the variable why is defined in ceval.c, which is actually the state when Python ends bytecode execution:

ceval.c

III. Python runtime environment

As we said earlier, PyFrameObject corresponds to the stack frame of an executable file at execution time, but stack frames are not enough for an executable file to run in the operating system. We also ignore two concepts that are crucial to executable files: processes and threads. Python creates a main thread at initialization, so there is a main thread in its runtime environment. Because later parsing Python's exception mechanisms will leverage Python's internal threading model, we need to have an overall conceptual understanding of Python's threading model.

Taking the Win32 platform as an example, we know that for native Win32 executables, they are executed within a process. A process is not an active object corresponding to a sequence of machine instructions. The active object corresponding to the sequence of machine instructions in this executable file is abstracted by the concept of thread, and the process is the active environment of the thread.

For a normal single-threaded executable, the operating system creates a process at execution time, within which there is a main thread, while for a multithreaded executable, the operating system creates a process and multiple threads at execution time, which can share global variables in the process address space, which naturally leads to thread synchronization problems. CPU task switching is actually a switch between threads, when switching tasks, the CPU needs to perform the thread environment preservation work, and after switching to a new thread, it needs to restore the thread environment.

The Python virtual machine running framework we saw earlier is actually an abstraction of the CPU, which can be regarded as a soft CPU, which is used by all threads in Python to complete the calculation work. The task switching mechanism of the real machine corresponds to Python, which is the mechanism for different threads to use the virtual machine in turn.

CPU needs to save thread running environment when switching tasks. Python also needs to save information about the current thread before switching threads. In Python, this abstraction of thread state information is implemented through PyThreadState objects, and a thread will have a PyThreadState object. So in another sense, this PyThreadState object can also be seen as an abstraction of the thread itself. PyThreadState is not a simulation of threads themselves, because threads in Python still use the native threads of the operating system. PyThreadState is simply an abstraction of thread state.

Under Win32, threads cannot survive independently, they need to survive in the process environment, and multiple threads can share some resources of the process. The same is true in Python. If there are two threads in a Python program that both perform the same action--import sys, how many copies of this sys module should be stored? Is it shared globally or is it a sys module per thread? If each thread has a separate sys module, then the consumption of Python memory is very alarming, so in Python, modules are globally shared, as if these modules are shared resources in the process. For the concept of process, Python implements PyInterpreterState objects.

Under Win32, there are usually multiple processes, and Python can actually exist with multiple logical interpreters. In general, Python has only one interpreter, which maintains one or more PyThreadState objects, and the threads corresponding to these PyThreadState objects take turns using a bytecode execution engine.

Now, let's show you the PyInterpreterState object that represents the process concept and the PyThreadState object that represents the thread concept just mentioned:

pystate.h

In the PyThreadState object, we see the familiar PyFrameObject(_frame) object. That is, within each PyThreadState object, a list of stack frames is maintained to correspond to the function-calling mechanisms in the PyThreadState object's threads. On Win32, the situation is the same, each thread will have a function call stack

When the Python virtual machine starts execution, it sets the frame in the current thread state object to the current execution environment (frame):

When creating a new PyFrameObject object, take out the old frame from the state object of the current thread and create a PyFrameObject linked list:

The above is all the content of this article "What are the knowledge points of Python virtual machine framework", thank you for reading! Hope to share the content to help everyone, more relevant knowledge, welcome to pay attention to the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.