How to complete the data extraction of call Library with Python probe 07/13 Update SLTechnology News&Howtos

How to complete the data extraction of call Library with Python probe

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)05/31 Report--

Today, I would like to share with you the relevant knowledge points about how the Python probe completes the data extraction of the call database. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look.

1. A simple and rude method-- encapsulating the mysql library

To count an execution process, you need to know the start and end position of the execution process, so the simplest and roughest method is to encapsulate based on the method to be called, implement an intermediate layer between the framework calling the MySQL library and the MySQL library, and complete the time statistics in the middle layer, such as:

# pseudo code def my_execute (conn, sql, param): # Statistical wrapper component with MyTracer (conn, sql, param) for the MySql library: # the following is the code with conn.cursor as cursor: cursor.execute (sql, param) for normal use of the MySql library.

It seems that the implementation is very good, and the change is very convenient, but because the modification is made on the top-level API, it is actually very inflexible. at the same time, some pre-operations will be carried out in cursor.execute, such as splicing sql and param, calling nextset to clear the data of the current cursor, and so on. The data we finally get, such as time-consuming, is also inaccurate, and there is no way to get some detailed metadata, such as error codes, etc.

If you want to get the most direct and useful data, you can only change the source code, and then call the source code, but if each library needs to change the source code to count, it is too troublesome, fortunately, Python also provides some similar probe interface, you can use the probe to replace the source code of the library to complete our code.

2.Python probe

The function of import hook can be realized through sys.meta_path in Python. When performing import-related operations, the import-related library will be changed according to the objects defined by sys.meta_path. The objects in sys. Meta _ path need to implement a find_module method This find_module method returns None or an object that implements the load_module method. Through this object, we can replace the related methods for some libraries in import. The simple usage is as follows, through hooktime.sleep, it can print the time consumed in sleep.

Import importlibimport sysfrom functools import wrapsdef func_wrapper (func): "" @ wraps (func) def wrapper (* args, * * kwargs): # record start time start = time.time () result = func (* args) * * kwargs) # Statistical elapsed time end = time.time () print (f "speed time: {end-start}") return result return wrapperclass MetaPathFinder: def find_module (self, fullname, path=None): # you can see which modules in import print (f'find module: {path}: {fullname}') return MetaPathLoader () class MetaPathLoader: def load_module (self Fullname): all modules of # import are stored in sys.modules By judging, the repetition of import if fullname in sys.modules can be reduced: return sys.modules [fullname] # prevents recursive calls to finder = sys.meta_path.pop (0) # import module module = importlib.import_module (fullname) if fullname = = 'time': # replacement function module.sleep = func_ Wrapper (module.sleep) sys.meta_path.insert (0 Finder) return modulesys.meta_path.insert (0, MetaPathFinder ()) if _ _ name__ = ='_ main__': import time time.sleep (1) # output example: # find module:datetime# find module:time# load module:time# find module:math# find module:_datetime# speed time:1.000733852386474683. Make probe module

After understanding the main process, you can start to make your own probe module, because the example only involves the aiomysql module, so you need to deal with only the aiomysql module in MetaPathFinder.find_module, and ignore the rest. Then we need to determine which function of aiomysql we want to replace. From a business point of view, generally speaking, we only need the main operations of cursor.execute, cursor.fetchone, cursor.fetchall and cursor.executemany, so we need to go deep into cursor to see how to change the code and which function the latter overloads.

First, the source code of cursor.execute (cursor.executemanay is also similar). It is found that the method of self.nextset will be called first, the data of the last request will be taken first, then the sql statement will be merged, and finally, the query will be made through self._query:

Async def execute (self, query, args=None): "" Executes the given operation Executes the given operation substituting any markers with the given parameters. For example, getting all rows where id is 5: cursor.execute ("SELECT * FROM T1 WHERE id =% s", (5,)): param query: ``str`` sql statement: param args: ``tuple``or ``list``of arguments for sql query: returns: ``int`` Number of rows that has been produced of affected "" conn = self._get_db () while (await self.nextset ()): pass if args is not None: query = query% self._escape_args (args, conn) await self._query (query) self._executed = query if self._echo: logger.info (query) logger.info ("% r", args) return self._rowcount

If you look at the source code of cursor.fetchone (cursor.fetchall is also similar), you can find that it is actually getting data from the cache.

This data is already obtained during the execution of the cursor.execute:

Def fetchone (self): "" Fetch the next row "self._check_executed () fut = self._loop.create_future () if self._rows is None or self._rownumber > = len (self._rows): fut.set_result (None) return fut result = self._ rows [self. _ rownumber] self._rownumber + = 1 fut = self._loop.create_future () fut.set_result (result) return fut

To sum up the above analysis, we only need to overload the core method self._query to get the data we want. We can know from the source code that we can get the self and sql parameters passed into self._query, and we can get the query results according to self. At the same time, we can get the running time through the decorator, and we basically have all the data we need.

According to the idea, the modified code is as follows:

Import importlibimport timeimport sysfrom functools import wrapsfrom typing import cast, Any, Callable, Optional, Tuple, TYPE_CHECKINGfrom types import ModuleTypeif TYPE_CHECKING: import aiomysqldef func_wrapper (func: Callable): @ wraps (func) async def wrapper (* args, * * kwargs)-> Any: start: float = time.time () func_result: Any = await func (* args, * * kwargs) end: float = time.time () # according to _ query The first parameter is self, and the second parameter is sql self: aiomysql.Cursor = args [0] sql: str = args [1] # through self We can get other data db: str = self._connection.db user: str = self._connection.user host: str = self._connection.host port: str = self._connection.port execute_result: Tuple [Tuple] = self._rows # We can send the data to the specified platform according to the agent defined by ourselves Then we can see the corresponding data or monitor it on the platform. # here is just a part of the data printed out print ({"sql": sql, "db": db, "user": user, "host": host, "port": port, "result": execute_result) "speed time": end-start}) return func_result return cast (Callable, wrapper) class MetaPathFinder: @ staticmethod def find_module (fullname: str Path: Optional [str] = None)-> Optional ["MetaPathLoader"]: if fullname = = 'aiomysql': # only aiomysql performs hook return MetaPathLoader () else: return Noneclass MetaPathLoader: @ staticmethod def load_module (fullname: str): if fullname in sys.modules: return sys.modules [fullname] # prevent recursive calls Import module module with finder: "MetaPathFinder" = sys.meta_path.pop (0) #: ModuleType = importlib.import_module (fullname) # hook module.Cursor._query = func_wrapper (module.Cursor._query) sys.meta_path.insert (0) for _ query Finder) return moduleasync def test_mysql ()-> None: import aiomysql pool: aiomysql.Pool = await aiomysql.create_pool (host='127.0.0.1', port=3306, user='root', password='123123', db='mysql') async with pool.acquire () as conn: async with conn.cursor () as cur: await cur.execute ("SELECT 42" ") (r,) = await cur.fetchone () assert r = = 42 pool.close () await pool.wait_closed () if _ name__ = ='_ _ main__': sys.meta_path.insert (0, MetaPathFinder ()) import asyncio asyncio.run (test_mysql ()) # output example: # you can see that the sql statement is the same as what we entered, and so are db, user, host, port and other parameters You can also know the execution result and run time # {'sql':' SELECT 42 ',' db': 'mysql',' user': 'root',' host': '127.0.0.1,' port': 3306, 'result': ((42,),),' speed time': 0.00045609474182128906}

This example looks good, but you need to explicitly call the logic at the entry of the call. Usually, a project may have several entries, each of which shows that it is very troublesome to invoke the logic, and you have to call our hook logic before you can import, so you have to set the introduction specification. Otherwise, there may be some places where hook is not successful. If the logic of introducing hook is executed immediately after the parser is started, this problem can be solved perfectly. After consulting the data, we found that the python interpreter will automatically initialize the sitecustomize and usercustomize modules under import PYTHONPATH. We just need to create the module and write our replacement function in the module.

. ├── _ _ init__.py ├── hook_aiomysql.py ├── sitecustomize.py └── test_auto_hook.py

Hook_aiomysql.py is an example of our code for making probes, and the code stored in sitecustomize.py is as follows. It is very simple to introduce our probe code and insert it into sys.meta_path:

Import sysfrom hook_aiomysql import MetaPathFindersys.meta_path.insert (0, MetaPathFinder ())

Test_auto_hook.py is the test code:

Import asynciofrom hook_aiomysql import test_mysqlasyncio.run (test_mysql ())

Next, just set up PYTHONPATH and run our code (if the project is started by superisor, you can set up PYTHONPATH in the configuration file):

(.venv) ➜python_hook git: (master) ✗ export PYTHONPATH=. (.venv) ➜python_hook git: (master) ✗ python test_auto_hook.py {'sql':' SELECT 42, 'db':' mysql', 'user':' root', 'host':' 127.0.0.1, 'port': 3306,' result': ((42,),), 'speed time': 0.000213623046875} 4. Direct replacement method

We can see that the above method works well and can be easily embedded in our project, but it is difficult to extract it into a third-party library by relying on sitecustomize.py files. If you want to extract it into a third-party library, you have to consider whether there are other ways. In the introduction of MetaPathLoader above, we talked about sys.module, in which sys.modules is used to reduce repeated introduction:

Class MetaPathLoader: def load_module (self, fullname): # import modules are stored in sys.modules By judging, the repetition of import if fullname in sys.modules can be reduced: return sys.modules [fullname] # prevents recursive calls to finder = sys.meta_path.pop (0) # import module module = importlib.import_module (fullname) if fullname = = 'time': # replacement function module.sleep = func_ Wrapper (module.sleep) sys.meta_path.insert (0 Finder) return module

The principle of reducing repeated introduction is that each time a module is introduced, it will be stored in sys.modules, and if it is repeatedly introduced, it will be refreshed directly to the newly introduced module. The reason for the above consideration is to reduce repetitive import because we will not upgrade the dependencies of third-party libraries while the program is running. Taking advantage of the fact that we can not consider the repeated introduction of modules with different implementations of the same name, and that sys.modules will cache the introduction of modules, we can simplify the above logic to introduce modules-> replace the current module method and modify the hook method for us.

Import timefrom functools import wrapsfrom typing import Any, Callable, Tuple, castimport aiomysqldef func_wrapper (func: Callable): "same wrapper function as above Here simply skip "" # to determine whether the hook is over _ IS_HOOK: bool = False# stores the original _ query_query: Callable = aiomysql.Cursor._query# hook function def install_hook ()-> None: _ IS_HOOK = False if _ IS_HOOK: return aiomysql.Cursor._query = func_wrapper (aiomysql.Cursor._query) _ IS_HOOK = True# to restore to the original function method Def reset_hook ()-> None: aiomysql.Cursor._query = _ query _ IS_HOOK = False

The code is simple and straightforward, so let's run the test:

Import asyncioimport aiomysqlfrom demo import install_hook, reset_hookasync def test_mysql ()-> None: pool: aiomysql.Pool = await aiomysql.create_pool (host='127.0.0.1', port=3306, user='root', password='', db='mysql') async with pool.acquire () as conn: async with conn.cursor () as cur: await cur.execute ("SELECT 42 ") (r,) = await cur.fetchone () assert r = = 42 pool.close () await pool.wait_closed () print (" install hook ") install_hook () asyncio.run (test_mysql ()) print (" reset hook ") reset_hook () asyncio.run (test_mysql ()) print (" end ")

Through the test output, we can find that our logic is correct. The meta-information we extracted can appear after install hook, but the original information will not be printed after reset.

Install hook {'sql':' SELECT 42, 'db':' mysql', 'user':' root', 'host':' 127.0.0.1, 'port': 3306,' result': ((42,),), 'speed time': 0.000347137451171875} above are all the contents of this article entitled "how to complete the data extraction of the call database with Python probe". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.