How to realize the import Mechanism of python 02/28 Update SLTechnology News&Howtos

How to realize the import Mechanism of python

2025-02-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

This article focuses on "how to implement the import mechanism of python". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Now let the editor take you to learn "how to implement the import mechanism of python"!

Import mechanism function

The import mechanism of Python can basically be divided into three different functions:

Maintenance and search of global module pool at Python runtime

Tree structure of parsing and searching module path

Execute dynamic loading mechanism for modules with different file formats

Although the forms of import are ever-changing, they all boil down to the form of import x.y.z. Of course, import sys can also be regarded as a special form of x.y.z. Combinations such as from, as and import actually do the same thing as import x.y.z, except that the symbols introduced in the current namespace are different.

Then import the module and the virtual machine calls _ _ import__, so let's see what the function looks like.

Static PyObject * builtin___import__ (PyObject * self, PyObject * args, PyObject * kwds) {static char * kwlist [] = {"name", "globals", "locals", "fromlist", "level", 0}; / / initialize globals and fromlist are NULL PyObject * name, * globals = NULL, * locals = NULL, * fromlist = NULL; int level = 0 / / indicates default absolute import / / parses the required information from PyTupleObject if (! PyArg_ParseTupleAndKeywords (args, kwds, "U | OOOi:__import__", kwlist, & name, & globals, & locals, & fromlist, & level) return NULL / / Import module return PyImport_ImportModuleLevelObject (name, globals, locals, fromlist, level);}

There is a PyArg_ParseTupleAndKeywords function, which we need to mention, is a widely used function in the virtual machine. The prototype is as follows:

/ / Python/getargs.cint PyArg_ParseTupleAndKeywords (PyObject *, PyObject *, const char *, char * *,...)

This function is used for parameter parsing, and is responsible for parsing all objects (pointers) contained in args and kwds into various target objects according to the specified format format, which can be Python objects, such as PyListObject, PyLongObject, or native objects of C.

We know that the parameter args in this builtin__import__ points to a PyTupleObject and contains the parameters and information needed to run the import function, which is generated by the virtual machine when the IMPORT_NAME instruction is executed.

Here, however, the virtual machine takes a reverse action, taking apart the packaged PyTupleObject and getting back the original parameters. Python makes extensive use of such packaging and unpacking strategies in its own implementation, so that a variable number of objects can be easily passed between functions.

At the end of this series, we will show you how to write an extension to Python in C, and then analyze the usage of this function.

After unpacking the parameter, you will enter PyImport_ImportModuleLevelObject, which we have seen in import_name, and of course, it also calls _ _ import__ internally.

In addition, each package and module has _ _ name__ and _ _ path__ attributes.

Import numpy as npimport numpy.coreimport sixprint (np.__name__, np.__path__) "" numpy ['C:\\ python38\\ lib\\ site-packages\\ numpy'] "print (np.core.__name__, np.core.__path__)"numpy.core ['C:\\ python38\\ lib\\ site-packages\\ numpy\\ core']"print (six.__name__ Six.__path__) "six []"

Name__ is the module name or package name, if it is the package or module under the package, then it is the package name. Package name or package name. Module name; as for _ _ path__, it is the path where the package is located, and for modules, _ _ path is an empty list.

In addition, there is a file attribute, which is its own full path for the module; for a package, there are two cases: if there is a _ _ init__.py file inside the package, then you get the full path to the _ _ init__.py file, or None if not.

Let's take a look at the bytecodes corresponding to different import methods, and then understand these import methods at the virtual machine level.

Single module import

Take a simple module import as an example:

Import sys "" 0 LOAD_CONST 0 (0) 2 LOAD_CONST 1 (None) 4 IMPORT_NAME 0 (sys) 6 STORE_NAME 0 (sys) 8 LOAD_CONST 1 (None) 10 RETURN_VALUE ""

This is the example we looked at at the beginning, and now we have a clear understanding of the behavior of IMPORT_NAME. At the end of the IMPORT_NAME instruction, the virtual machine pushes the PyModuleObject object (pointer) onto the runtime stack, which is then stored in the current local namespace.

Cascade import import sklearn.linear_model.ridge "" 0 LOAD_CONST 0 (0) 2 LOAD_CONST 1 (None) 4 IMPORT_NAME 0 (sklearn.linear_model.ridge) 6 STORE_NAME 1 (sklearn) 8 LOAD_CONST 1 (None) 10 RETURN_VALUE ""

If it is a cascading import, then the instruction parameter of IMPORT_NAME is the complete path information, and the interior of the instruction will parse the path and create a PyModuleObject object for sklearn, sklearn.linear_model, and sklearn.linear_model.ridge, all of which exist in sys.modules.

But we see that STORE_NAME is sklearn, which means that only the symbol sklearn is exposed in the local space of the current module. But why sklearn? Shouldn't it be sklearn.linear_model.ridge?

In fact, after our previous analysis, this is no longer a problem, because import sklearn.linear_model.ridge does not mean to import a module or package called sklearn.linear_model.ridge, but to import sklearn first, then put linear_model in sklearn's attribute dictionary, and then put ridge in linear_model 's attribute dictionary.

By the same token, sklearn.linear_model.ridge means to find sklearn in local space, linear_model in sklearn's attribute dictionary, and ridge in linear_model 's attribute dictionary. Because linear_model and ridge are already in the corresponding attribute dictionary, we can find them through sklearn level by level, so we only need to expose the symbol skearn to the local space.

Or exposing the sklearn.linear_model.ridge itself is unreasonable, because it means importing a module or package called sklearn.linear_model.ridge, but obviously it doesn't exist. And even if we create such a module or package, Python's syntax parsing specification still won't get the desired results. Otherwise, suppose import test_import.a, is it importing a module or package named test_import.a? Or import a under test_import?

Also like the test_import.a we analyzed earlier, when we import test_import.a, we will load test_import, then add a to the attribute dictionary of test_import, and finally just return test_import.

Because a can be found through test_import, or test_import.a stands for getting a from test_import 's attribute dictionary, import test_import.a must return test_import and only test_import.

As for sys.modules, although there is a key with the string name "test_import.a", this is a strategy to avoid repeated loading, which still means to get a from the test_import 's attribute dictionary.

Import pandas.coreprint (pandas.DataFrame ({"a": [1,2,3]})) "" a0 11 22 3 "# so it can be called through pandas.DataFrame

Importing pandas.core first imports pandas, that is, executing the init file inside pandas. Although there are both "pandas" and "pandas.core" in sys.modules, only pandas is exposed in local space, so it makes perfect sense to call pandas.DataFrame. As for pandas.core, it obviously cannot be exposed, because it does not conform to Python's variable naming convention, and there is no decimal point in the name of a variable, it simply means loading core from pandas's attribute dictionary.

From & importfrom sklearn.linear_model import ridge "" 0 LOAD_CONST 0 (0) 2 LOAD_CONST 1 (('ridge',)) 4 IMPORT_NAME 0 (sklearn.linear_model) 6 IMPORT_FROM 1 (ridge) 8 STORE_NAME 1 (ridge) 10 POP_TOP 12 LOAD_CONST 2 (None) 14 RETURN_VALUE "

Notice that the 2 LOAD_CONST at this point is no longer a None, but a tuple, and the virtual machine places the ridge in the local space of the current module. And both sklearn.linear_model and sklearn are imported and stored in sys.modules.

But sklearn is not in the current local space, and although it has been created, it is hidden. IMPORT_NAME is sklearn.linear_model, which also means to import sklearn, and then add the linear_model under sklearn to sklearn's attribute dictionary.

The reason why sklearn is not in the local space can be understood in this way. When there is only import, then we have to call down one level from scratch, so the top-level package must be added to the local space. But here through from. Import... When ridge is exported, ridge has already pointed to ridge under linear_model under sklearn, so there is no need for sklearn, or sklearn does not need to be exposed in local space, but it has been imported.

And the key of "ridge" does not exist in sys.modules, but "sklearn.linear_model.ridge" exists, and the symbol exposed to local space is ridge.

So as mentioned above, no matter what import, it can be summed up as the form of import x.y.z, but the symbols exposed are different.

Import & asimport sklearn.linear_model.ridge as xxx "" 0 LOAD_CONST 0 (0) 2 LOAD_CONST 1 (None) 4 IMPORT_NAME 0 (sklearn.linear_model.ridge) 6 IMPORT_FROM 1 (linear_model) 8 ROT_TWO 10 POP_TOP 12 IMPORT_FROM 2 (ridge) 14 STORE_NAME 3 (xxx) 16 POP_TOP 18 LOAD_CONST 1 (None) 20 RETURN_VALUE ""

This is similar to the from & import above. "sklearn", "sklearn.linear_model" and "sklearn.linear_model.ridge" are all in sys.modules. But if we add as xxx, then the xxx points directly to the ridge under linear_model under sklearn, and sklearn is not needed at this time.

Therefore, only xxx is exposed in the local space of the current module, while sklearn is also imported, but it is only in sys.modules, not exposed to the local space of the current module.

From & import & asfrom sklearn.linear_model import ridge as xxx

I don't even need to post the bytecode, just like the previous from & import, except that the ridge exposed to the local space becomes our own xxx.

Namespace issues related to module objects

Like functions and classes, each PyModuleObject has its own namespace. One module cannot directly access the contents of another module. Although the scope within the module is complex, such as following LEGB rules, the division between the module and the module is obvious.

# test1.pyname = "Gu Ming Di Xue" def print_name (): return name# test2.pyfrom test1 import name, print_namename = "Gu Ming Di Jing" print (print_name ()) # Gu Ming Di Xue

After the implementation of test2.py, it is found that the print is still "Gu Ming Di Jue". We say that Python is searched according to the LEGB rule, and there is no name in the print_name function, so go to the outer layer. The name in test2.py is "Gu Ming's Love", but the print is still "Gu Ming's sense" in test1.py. Why?

Again, the scope between modules is very clear. Print_name is a function in test1.py, so when returning name, it will only search from test1.py. In any case, it will not skip test1.py and run into test2.py.

Let's take another example:

# test1.pyname = "Gu Ming ground sense" nicknames = ["Xiao Wu", "Young Girl's sense"] # test2.pyimport test1test1.name = "❤ Ancient ground sense ❤" test1.nicknames = ["sense Lord"] from test1 import name, nicknamesprint (name) # ❤ Ancient ground sense ❤ print (nicknames) # ['Lord Jue']

At this point, the printed result has changed, very simple, here is to directly modify the variables in test1. Because of this approach, it is equivalent to directly modifying the attribute dictionary of test1. Then when you re-import later, you will print the modified value.

# test1.pyname = "Gu Ming ground sense" nicknames = ["Xiao Wu", "Young Girl sense"] # test2.pyfrom test1 import name, nicknamesname = "Ancient Ming Earth Love" nicknames.remove ("Little five") from test1 import name, nicknamesprint (name) # Gu Ming ground sense print (nicknames) # ["Young Girl's sense"]

If it is from test1 import name, nicknames, it is equivalent to the newly created variables name and nicknames in the current local space, which point to the same object as the name and nicknames in test1.

Name = "ancient sense" is equivalent to a re-assignment, so it will not affect the name; in test1, while nicknames.remove will be modified locally, so it will have an impact.

At this point, I believe you have a deeper understanding of "how to implement the import mechanism of python". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.