Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

PostgreSQL Source Code interpretation (9)-insert data # 8 (ExecutorRun and standard...

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

This paper briefly introduces the source code of PG inserting data, including the implementation logic of ExecutorRun function and standard_ExecutorRun function, both of which are located in execMain.c file.

It is worth mentioning that:

1. Interpretation method: using a bottom-up approach, that is, reading layer by layer from the bottom of the call stack (please participate in the first article on the call stack). It is recommended to read it in this order.

2. Problem handling: the above interpretations are not in-depth, or just superficial, but with the gradual interpretation of the call stack, information will emerge slowly, requiring patience and persistence.

I. basic information

The data structures, macro definitions and dependent functions used by ExecutorRun and standard_ExecutorRun functions.

Data structure / Macro definition

1 、 QueryDesc

/ / query structure / / structure contains all the information needed to execute the query / *-* query descriptor: * * a QueryDesc encapsulates everything that the executor * needs to execute the query. * * For the convenience of SQL-language functions, we also support QueryDescs * containing utility statements; these must not be passed to the executor * however. *-* / typedef struct QueryDesc {/ * These fields are provided by CreateQueryDesc * / CmdType operation; / * CMD_SELECT, CMD_UPDATE, etc. * / PlannedStmt * plannedstmt; / * planner's output (could be utility, too) * / const char * sourceText; / * source text of the query * / Snapshot snapshot / * snapshot to use for query * / Snapshot crosscheck_snapshot; / * crosscheck for RI update/delete * / DestReceiver * dest; / * the destination for tuple output * / ParamListInfo params; / * param values being passed in * / QueryEnvironment * queryEnv; / * query environment passed in * / int instrument_options; / * OR of InstrumentOption flags * / / * These fields are set by ExecutorStart * / TupleDesc tupDesc / * descriptor for result tuples * / EState * estate; / * executor's query-wide state * / PlanState * planstate; / * tree of per-plan-node state * / / * This field is set by ExecutorRun * / bool already_executed; / * true if previously executed * / / * This is always set NULL by the core system, but plugins can change it * / struct Instrumentation * totaltime / * total time spent in ExecutorRun * /} QueryDesc; / / snapshot pointer typedef struct SnapshotData * Snapshot; # define InvalidSnapshot ((Snapshot) NULL) / * * We use SnapshotData structures to represent both "regular" (MVCC) * snapshots and "special" snapshots that have non-MVCC semantics. * The specific semantics of a snapshot are encoded by the "satisfies" * function. * / typedef bool (* SnapshotSatisfiesFunc) (HeapTuple htup, Snapshot snapshot, Buffer buffer); / * Struct representing all kind of possible snapshots. * * There are several different kinds of snapshots: * * Normal MVCC snapshots * * MVCC snapshots taken during recovery (in Hot-Standby mode) * * Historic MVCC snapshots used during logical decoding * * snapshots passed to HeapTupleSatisfiesDirty () * * snapshots passed to HeapTupleSatisfiesNonVacuumable () * * snapshots used for SatisfiesAny, Toast, Self where no members are * accessed. * * TODO: It's probably a good idea to split this struct using a NodeTag * similar to how parser and executor nodes are handled, with one type for * each different kind of snapshot to avoid overloading the meaning of * individual fields. * / typedef struct SnapshotData {SnapshotSatisfiesFunc satisfies; / * tuple test function * / / * The remaining fields are used only for MVCC snapshots, and are normally * just zeroes in special snapshots. (But xmin and xmax are used * specially by HeapTupleSatisfiesDirty, and xmin is used specially by * HeapTupleSatisfiesNonVacuumable.) * An MVCC snapshot can never see the effects of XIDs > = xmax. It can see * the effects of all older XIDs except those listed in the snapshot. Xmin * is stored as an optimization to avoid needing to search the XID arrays * for most tuples. * / TransactionId xmin; / * all XID

< xmin are visible to me */ TransactionId xmax; /* all XID >

= xmax are invisible to me * / * * For normal MVCC snapshot this contains the all xact IDs that are in * progress, unless the snapshot was taken during recovery in which case * it's empty. For historic MVCC snapshots, the meaning is inverted, i.e. * it contains * committed* transactions between xmin and xmax. * * note: all ids in xip [] satisfy xmin = xmin, but we don't bother filtering * out any that are > = xmax * / TransactionId * subxip; int32 subxcnt; / * # of xact ids in subxip [] * / bool suboverflowed; / * has the subxip array overflowed? * / bool takenDuringRecovery; / * recovery-shaped snapshot? * / bool copied / * false if it's a static snapshot * / CommandId curcid; / * in my xact, CID

< curcid are visible */ /* * An extra return value for HeapTupleSatisfiesDirty, not used in MVCC * snapshots. */ uint32 speculativeToken; /* * Book-keeping information, used by the snapshot manager */ uint32 active_count; /* refcount on ActiveSnapshot stack */ uint32 regd_count; /* refcount on RegisteredSnapshots */ pairingheap_node ph_node; /* link in the RegisteredSnapshots heap */ TimestampTz whenTaken; /* timestamp when snapshot was taken */ XLogRecPtr lsn; /* position in the WAL stream when taken */ } SnapshotData;//存储快照的数据结构 /* ---------------- * PlannedStmt node * * The output of the planner is a Plan tree headed by a PlannedStmt node. * PlannedStmt holds the "one time" information needed by the executor. * * For simplicity in APIs, we also wrap utility statements in PlannedStmt * nodes; in such cases, commandType == CMD_UTILITY, the statement itself * is in the utilityStmt field, and the rest of the struct is mostly dummy. * (We do use canSetTag, stmt_location, stmt_len, and possibly queryId.) * ---------------- *///已Planned的Statement//也就是说已生成了执行计划的语句 typedef struct PlannedStmt { NodeTag type; CmdType commandType; /* select|insert|update|delete|utility */ uint64 queryId; /* query identifier (copied from Query) */ bool hasReturning; /* is it insert|update|delete RETURNING? */ bool hasModifyingCTE; /* has insert|update|delete in WITH? */ bool canSetTag; /* do I set the command result tag? */ bool transientPlan; /* redo plan when TransactionXmin changes? */ bool dependsOnRole; /* is plan specific to current role? */ bool parallelModeNeeded; /* parallel mode required to execute? */ int jitFlags; /* which forms of JIT should be performed */ struct Plan *planTree; /* tree of Plan nodes */ List *rtable; /* list of RangeTblEntry nodes */ /* rtable indexes of target relations for INSERT/UPDATE/DELETE */ List *resultRelations; /* integer list of RT indexes, or NIL */ /* * rtable indexes of non-leaf target relations for UPDATE/DELETE on all * the partitioned tables mentioned in the query. */ List *nonleafResultRelations; /* * rtable indexes of root target relations for UPDATE/DELETE; this list * maintains a subset of the RT indexes in nonleafResultRelations, * indicating the roots of the respective partition hierarchies. */ List *rootResultRelations; List *subplans; /* Plan trees for SubPlan expressions; note * that some could be NULL */ Bitmapset *rewindPlanIDs; /* indices of subplans that require REWIND */ List *rowMarks; /* a list of PlanRowMark's */ List *relationOids; /* OIDs of relations the plan depends on */ List *invalItems; /* other dependencies, as PlanInvalItems */ List *paramExecTypes; /* type OIDs for PARAM_EXEC Params */ Node *utilityStmt; /* non-null if this is utility stmt */ /* statement location in source string (copied from Query) */ int stmt_location; /* start location, or -1 if unknown */ int stmt_len; /* length in bytes; 0 means "rest of string" */ } PlannedStmt; //参数列表信息 typedef struct ParamListInfoData { ParamFetchHook paramFetch; /* parameter fetch hook */ void *paramFetchArg; ParamCompileHook paramCompile; /* parameter compile hook */ void *paramCompileArg; ParserSetupHook parserSetup; /* parser setup hook */ void *parserSetupArg; int numParams; /* nominal/maximum # of Params represented */ /* * params[] may be of length zero if paramFetch is supplied; otherwise it * must be of length numParams. */ ParamExternData params[FLEXIBLE_ARRAY_MEMBER]; } ParamListInfoData; typedef struct ParamListInfoData *ParamListInfo;//查询环境,使用List存储相关信息/* * Private state of a query environment. */ struct QueryEnvironment { List *namedRelList; }; //TODO typedef struct Instrumentation { /* Parameters set at node creation: */ bool need_timer; /* true if we need timer data */ bool need_bufusage; /* true if we need buffer usage data */ /* Info about current plan cycle: */ bool running; /* true if we've completed first tuple */ instr_time starttime; /* Start time of current iteration of node */ instr_time counter; /* Accumulated runtime for this node */ double firsttuple; /* Time for first tuple of this cycle */ double tuplecount; /* Tuples emitted so far this cycle */ BufferUsage bufusage_start; /* Buffer usage at start */ /* Accumulated statistics across all completed cycles: */ double startup; /* Total startup time (in seconds) */ double total; /* Total total time (in seconds) */ double ntuples; /* Total tuples produced */ double ntuples2; /* Secondary node-specific tuple counter */ double nloops; /* # of run cycles for this node */ double nfiltered1; /* # tuples removed by scanqual or joinqual */ double nfiltered2; /* # tuples removed by "other" quals */ BufferUsage bufusage; /* Total buffer usage */ } Instrumentation; 依赖的函数 1、InstrStartNode /* Entry to a plan node */ void InstrStartNode(Instrumentation *instr) { if (instr->

Need_timer) {if (INSTR_TIME_IS_ZERO (instr- > starttime)) INSTR_TIME_SET_CURRENT (instr- > starttime); else elog (ERROR, "InstrStartNode called twice in a row");} / * save buffer usage totals at node entry, if needed * / if (instr- > need_bufusage) instr- > bufusage_start = pgBufferUsage;}

2 、 ScanDirectionIsNoMovement

/ / simple judgment / * * ScanDirectionIsNoMovement * True iff scan direction indicates no movement. * / # define ScanDirectionIsNoMovement (direction)\ (bool) ((direction) = = NoMovementScanDirection))

3 、 ExecutePlan

/ / the previous section has been interpreted

4 、 InstrStopNode

/ / understanding of TODO Instrumentation

/ * Exit from a plan node * / void InstrStopNode (Instrumentation * instr, double nTuples) {instr_time endtime; / * count the returned tuples * / instr- > tuplecount + = nTuples; / * let's update the time only if the timer was requested * / if (instr- > need_timer) {if (INSTR_TIME_IS_ZERO (instr- > starttime)) elog (ERROR, "InstrStopNode called without start") INSTR_TIME_SET_CURRENT (endtime); INSTR_TIME_ACCUM_DIFF (instr- > counter, endtime, instr- > starttime); INSTR_TIME_SET_ZERO (instr- > starttime);} / * Add delta of buffer usage since entry to node's totals * / if (instr- > need_bufusage) BufferUsageAccumDiff (& instr- > bufusage, & pgBufferUsage, & instr- > bufusage_start) / * Is this the firsttuple of this cycle? * / if (! instr- > running) {instr- > running = true; instr- > firsttuple = INSTR_TIME_GET_DOUBLE (instr- > counter);}}

5 、 MemoryContextSwitchTo

/ * Although this header file is nominally backend-only, certain frontend * programs like pg_controldata include it via postgres.h. For some compilers * it's necessary to hide the inline definition of MemoryContextSwitchTo in * this scenario; hence the # ifndef FRONTEND. * / # ifndef FRONTEND static inline MemoryContext MemoryContextSwitchTo (MemoryContext context) {MemoryContext old = CurrentMemoryContext; CurrentMemoryContext = context; return old } # endif / * FRONTEND * / II. Source code interpretation / *-* ExecutorRun * * This is the main routine of the executor module. It accepts * the query descriptor from the traffic cop and executes the * query plan. * * ExecutorStart must have been called already. * If direction is NoMovementScanDirection then nothing is done * except to start up/shut down the destination. Otherwise, * we retrieve up to 'count' tuples in the specified direction. * * Note: count = 0 is interpreted as no portal limit, i.e., run to * completion. Also note that the count limit is only applied to * retrieved tuples, not for instance to those inserted/updated/deleted * by a ModifyTable plan node. * * There is no return value, but output tuples (if any) are sent to * the destination receiver specified in the QueryDesc; and the number * of tuples processed at the top level can be found in * estate- > es_processed. * We provide a function hook variable that lets loadable plugins * get control when ExecutorRun is called. Such a plugin would * normally call standard_ExecutorRun (). * *-* / / * input: queryDesc- query descriptor, which is actually the relevant information of the SQL statement to be executed. Direction- scan direction count- counter execute_once- to execute once? Output: * / voidExecutorRun (QueryDesc * queryDesc, ScanDirection direction, uint64 count, bool execute_once) {if (ExecutorRun_hook) / / if there is a hook function, execute the hook function (* ExecutorRun_hook) (queryDesc, direction, count, execute_once); else// otherwise execute the standard function standard_ExecutorRun (queryDesc, direction, count, execute_once) } / / Standard function / * input & output: see ExecutorRun*/voidstandard_ExecutorRun (QueryDesc * queryDesc, ScanDirection direction, uint64 count, bool execute_once) {EState * estate;// actuator status information CmdType operation;// command type, here is whether the INSERT DestReceiver * dest;// target receiver bool sendTuples;// needs to transmit Tuples MemoryContext oldcontext / / original memory context (PG's own memory manager) / * sanity checks * / Assert (queryDesc! = NULL); estate = queryDesc- > estate;// get actuator status Assert (estate! = NULL); Assert (! (estate- > es_top_eflags & EXEC_FLAG_EXPLAIN_ONLY)); / * Switch into per-query memory context * / oldcontext = MemoryContextSwitchTo (estate- > es_query_cxt) / / switch to the current query context, and save the original context / * Allow instrumentation of Executor overall runtime * / if (queryDesc- > totaltime) / / need timing before switching? For example, Oracle sets the timing InstrStartNode of set timing on in sqlplus (queryDesc- > totaltime); / / * * extract information from the query descriptor and the query feature. * / operation = queryDesc- > operation;// operation type dest = queryDesc- > dest;// destination / * * startup tuple receiver, if we will be emitting tuples * / estate- > es_processed = 0scramble / progress estate- > es_lastoid = InvalidOid;// Last Oid sendTuples = (operation = = CMD_SELECT | queryDesc- > plannedstmt- > hasReturning) / / query statements or those that need to return a value need to transmit Tuples if (sendTuples) dest- > rStartup (dest, operation, queryDesc- > tupDesc); / / start the receiver / * * run plan * / if (! ScanDirectionIsNoMovement (direction)) / / scan {if (execute_once & & queryDesc- > already_executed) elog (ERROR, "can't re-execute query flagged for single execution") QueryDesc- > already_executed = true; ExecutePlan (estate, queryDesc- > planstate, queryDesc- > plannedstmt- > parallelModeNeeded, operation, sendTuples, count, direction, dest, execute_once) / / execute} / * * shutdown tuple receiver, if we started it * / if (sendTuples) dest- > rShutdown (dest); / / close the target receiver if (queryDesc- > totaltime) InstrStopNode (queryDesc- > totaltime, estate- > es_processed); / / complete timing MemoryContextSwitchTo (oldcontext); / / switch back to the original memory context} 3. Tracking analysis

Insert test data:

Testdb=#-- # 8 ExecutorRun&standard_ExecutorRuntestdb=#-- get pidtestdb=# select pg_backend_pid (); pg_backend_pid-1529 (1 row) testdb=#-insert a line of testdb=# insert into t_insert values (16)

Start gdb and trace debugging:

[root@localhost] # gdb-p 3294GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-100.el7Copyright (C) 2013 Free Software Foundation, Inc.... (gdb) b standard_ExecutorRunBreakpoint 1 at 0x690d09: file execMain.c, line 322. (gdb) cContinuing.Breakpoint 1, standard_ExecutorRun (queryDesc=0x2c2d4e0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:322322 estate = queryDesc- > estate # check the parameters # 1, queryDesc (gdb) p * queryDesc$1 = {operation = CMD_INSERT, plannedstmt = 0x2cc1488, sourceText = 0x2c09ef0 "insert into t_insert values. , snapshot = 0x2c866e0, crosscheck_snapshot = 0x0, dest = 0x2cc15e8, params = 0x0, queryEnv = 0x0, instrument_options = 0, tupDesc = 0x2c309d0, estate = 0x2c2f900, planstate = 0x2c2fc50, already_executed = false, totaltime = 0x0} (gdb) p * (queryDesc- > plannedstmt) $2 = {type = T_PlannedStmt, commandType = CMD_INSERT, queryId = 0, hasReturning = false, hasModifyingCTE = false, canSetTag = true, transientPlan = transientPlan, false = false, dependsOnRole = dependsOnRole, dependsOnRole = 0, dependsOnRole = dependsOnRole, = Subplans = 0x0, rewindPlanIDs = 0x0, rowMarks = 0x0, relationOids = 0x2cc1408, invalItems = 0x0, paramExecTypes = 0x2c2f590, utilityStmt = 0x0, stmt_location = 0, stmt_len = 136} (gdb) p * (queryDesc- > snapshot) $3 = {satisfies = 0x9f73fc, xmin = 1612874, xmax = 1612874, xip = 0x0, xcnt = 0, subxip = 0x0, subxcnt = 0, suboverflowed = false, takenDuringRecovery = false, copied = true, curcid = 0, speculativeToken = 0, speculativeToken = 1, speculativeToken = 2, speculativeToken = {active_count = active_count, active_count = active_count Prev_or_parent = 0x0}, whenTaken = 0, lsn = 0} (gdb) p * (queryDesc- > dest) $4 = {receiveSlot = 0x4857ad, rStartup = 0x485196, rShutdown = 0x485bad, rDestroy = 0x485c21, mydest = DestRemote} (gdb) p * (gdb) p * (queryDesc- > tupDesc) $5 = {natts = 0, tdtypeid = 2249, tdtypmod =-1, tdhasoid = false, tdrefcount =-1, constr = 0x0, attrs = 0x2c309f0} (gdb) p * (queryDesc- > estate) $6 = {type = T_EState, es_direction = es_direction, ForwardScanDirection = ForwardScanDirection, es_snapshot = ForwardScanDirection Es_range_table = 0x2cc13b8, es_plannedstmt = 0x2cc1488, es_sourceText = 0x2c09ef0 "insert into t_insert values (16 standard executorRunpact standardExecutorRunhammer, standardExecutorRunhammer, standardExecutorRunframe, standardExecutorRunwise, standardExecutorRunwise) ", es_junkFilter = 0x0, es_output_cid = 0, es_result_relations = 0x2c2fb40, es_num_result_relations = 1, es_result_relation_info = 0x0, es_root_result_relations = 0x0, es_num_root_result_relations = 0, es_tuple_routing_result_relations = 0x0, es_trig_target_relations = 0x0, es_trig_tuple_slot = 0x2c30ab0, es_trig_oldtup_slot = 0x0, es_trig_newtup_slot = 0x0, es_param_list_info = 0x0 Es_param_exec_vals = 0x2c2fb10, es_queryEnv = 0x0, es_query_cxt = 0x2c2f7f0, es_tupleTable = 0x2c30500, es_rowMarks = 0x0, es_processed = 0, es_lastoid = 0, es_top_eflags = 0, es_instrument = 0, es_finished = false, es_exprcontexts = 0x2c2feb0, es_subplanstates = 0x0, es_auxmodifytables = 0x0, es_per_tuple_exprcontext = 0x0, es_epqTuple = 0x0, es_epqTupleSet = 0x0, es_epqScanDone = 0x0, es_use_parallel_mode = false Es_query_dsa = 0x0, es_jit_flags = 0, es_jit = 0x0} (gdb) p * (queryDesc- > planstate) $7 = {type = T_ModifyTableState, plan = 0x2cc10f8, state = 0x2c2f900, ExecProcNode = 0x69a78b, ExecProcNodeReal = 0x6c2485, instrument = 0x0, worker_instrument = 0x0, qual = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, ps_ResultTupleSlot = 0x2c30a00, ps_ExprContext = ps_ExprContext, ps_ExprContext = ps_ExprContext Scandesc = 0x0} # 2, direction (gdb) p direction$8 = ForwardScanDirection#3, count (gdb) p count$9 = 0,4, execute_once (gdb) p execute_once$10 = true# single step debug execution (gdb) next330 oldcontext = MemoryContextSwitchTo (estate- > es_query_cxt) (gdb) 333 if (queryDesc- > totaltime) # MemoryContext is a very important memory management data structure in PG It is necessary to deeply understand (gdb) p * oldcontext$11 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, methods = 0xb8c720, parent = 0x2c6f380, firstchild = 0x2c2f7f0, prevchild = 0x0, nextchild = 0x0, name = 0xb8d2f1 "PortalContext", ident = 0x2c72e98 ", reset_cbs = 0x0} (gdb) p * (estate- > es_query_cxt) $12 = {type = T_AllocSetContext, isReset = false, allowInCritSection = false, methods = 0xb8c720, 0xb8c720 = parent, parent = 0x2c2d3d0, 0x2c2d3d0 = 0x2c2d3d0, 0x2c2d3d0 =", = " Reset_cbs = 0x0} (gdb) next339 operation = queryDesc- > operation (gdb) 340 dest = queryDesc- > dest; (gdb) 345 estate- > es_processed = 0; (gdb) 346 estate- > es_lastoid = InvalidOid; (gdb) 348 sendTuples = (operation = = CMD_SELECT | | (gdb) 349 queryDesc- > plannedstmt- > hasReturning) (gdb) 348 sendTuples = (operation = = CMD_SELECT | | (gdb) 351 if (sendTuples) (gdb) 357 if (! ScanDirectionIsNoMovement (direction)) (gdb) 359 if (execute_once & & queryDesc- > already_executed) (gdb) 361 queryDesc- > already_executed = true (gdb) 363 ExecutePlan (estate, (gdb) 365queryDesc- > plannedstmt- > parallelModeNeeded, (gdb) 363ExecutePlan (estate, (gdb) 377if (sendTuples) (gdb) 380if (queryDesc- > totaltime) (gdb) 383MemoryContextSwitchTo (oldcontext); (gdb) 384} (gdb) ExecutorRun (queryDesc=0x2c2d4e0, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:307307} (gdb) # DONE! IV. Summary

1. Scalability of PG: PG provides hook functions to Hack ExecutorRun.

2. Important data structures: MemoryContext, memory context, need in-depth understanding.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report