PostgreSQL Source Code interpretation (232)-query # 125 (NOT IN implementation # 3) 07/09 Update SLTechnology News&Howtos

PostgreSQL Source Code interpretation (232)-query # 125 (NOT IN implementation # 3)

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This section describes PostgreSQL functions that implement partial dependencies in ExecMaterial functions with NOT IN query statements.

I. data structure

SubPlanState

Run time status of subplan

/ *-* SubPlanState node *-* / typedef struct SubPlanState {NodeTag type; SubPlan * subplan; / * expression plan node * / struct PlanState * planstate; / * subselect plan's state tree * / struct PlanState * parent; / * parent plan node's state tree * / ExprState * testexpr / * combined expression status State of combining expression * / List * args; / * parameter expression status; states of argument _ expression (s) * / HeapTuple curTuple; / * the nearest tuple of subplan Copy of most recent tuple from subplan * / Datum curArray; / * most recent array from ARRAY () subplan * / / * these are used when hashing the subselect's output: * / TupleDesc descRight; / * projected subquery descriptor Subselect desc after projection * / ProjectionInfo * projLeft; / * for projecting lefthand exprs * / ProjectionInfo * projRight; / * for projecting subselect output * / TupleHashTable hashtable; / * hashtable for no-nulls subselect rows * / TupleHashTable hashnulls; / * hashtable for rows with null (s) * / bool havehashrows; / * true if hashtable is not empty * / bool havenullrows; / * true if hashnulls is not empty * / MemoryContext hashtablecxt / * memory context containing hash tables * / MemoryContext hashtempcxt; / * temp memory context for hash tables * / ExprContext * innerecontext; / * econtext for computing inner tuples * / AttrNumber * keyColIdx; / * control data for hash tables * / Oid * tab_eq_funcoids; / * equality funcoids for table * datatype (s) * / Oid * tab_collations / * collations for hash and comparison * / FmgrInfo * tab_hash_funcs; / * hash functions for table datatype (s) * / FmgrInfo * tab_eq_funcs; / * equality functions for table datatype (s) * / FmgrInfo * lhs_hash_funcs; / * hash functions for lefthand datatype (s) * / FmgrInfo * cur_eq_funcs; / * equality functions for LHS vs. Table * / ExprState * cur_eq_comp / * equality comparator for LHS vs. Table * /} SubPlanState

SubPlan

Subquery plan

/ * SubPlan-executable expression node for a subplan (sub-SELECT) * * The planner replaces SubLink nodes in expression trees with SubPlan * nodes after it has finished planning the subquery. SubPlan references * a sub-plantree stored in the subplans list of the toplevel PlannedStmt. * (We avoid a direct link to make it easier to copy expression trees * without causing multiple processing of the subplan.) * the query planner replaces the SubLink node in the expression tree with the SubPlan node after completing the planning of the subquery. * SubPlan refers to the sub-plantree stored in the subplans linked list in the high-level PlannedStmt. * (avoid using direct links, which makes copying expression trees relatively simple) * * In an ordinary subplan, testexpr points to an executable expression * (OpExpr, an AND/OR tree of OpExprs, or RowCompareExpr) for the combining * operator (s); the left-hand arguments are the original lefthand expressions, * and the right-hand arguments are PARAM_EXEC Param nodes representing the * outputs of the sub-select. (NOTE: runtime coercion functions may be * inserted as well.) This is just the same expression tree as testexpr in * the original SubLink node, but the PARAM_SUBLINK nodes are replaced by * suitably numbered PARAM_EXEC nodes. * normally, testexpr points to the executable expression (OpExpr, AND/OR tree of OpExprs, or RowCompareExpr) used for combining operations; * the left parameter is the original left expression, and the right parameter is the PARAM_EXEC parameter node used to represent the output of the subquery. * has the same expression tree as the testexpr of the original SubLink node, but the PARAM_SUBLINK node is replaced with the appropriate numbered PARAM_EXEC node. * * If the sub-select becomes an initplan rather than a subplan, the executable * expression is part of the outer plan's expression tree (and the SubPlan * node itself is not, but rather is found in the outer plan's initPlan * list). In this case testexpr is NULL to avoid duplication. * if the subquery becomes initplan instead of subplan, the executable expression is part of the outer plan expression tree. * in this case, testexpr is NULL to avoid repetition. * The planner also derives lists of the values that need to be passed into * and out of the subplan. Input values are represented as a list "args" of * expressions to be evaluated in the outer-query context (currently these * args are always just Vars, but in principle they could be any expression). * The values are assigned to the global PARAM_EXEC params indexed by parParam * (the parParam and args lists must have the same ordering). SetParam is a * list of the PARAM_EXEC params that are computed by the sub-select, if it * is an initplan; they are listed in order by sub-select output column * position. (parParam and setParam are integer Lists, not Bitmapsets, * because their ordering is significant.) * the planner also derives a linked list of values that need to be passed in and out of the subplan. * enter the "args" linked list of the value identification bit expression, which is parsed in the context of the outer query. * (these args are usually Vars, but in principle they can be arbitrary expressions) * these values assign values to the global PARAM_EXEC parameter with parParam as the index. * setParam is a linked list of PARAM_EXEC parameters, which is calculated by a subquery, such as initplan. * they are sorted and organized into linked lists according to the location of the output columns of the subquery. * (parParam and setParam are integer linked lists, not Bitmapsets linked lists) * * Also, the planner computes startup and per-call costs for use of the * SubPlan. Note that these include the cost of the subquery proper, * evaluation of the testexpr if any, and any hashtable management overhead. * at the same time, the planner calculates the cost of SubPlan startup and each call. Note: it includes the cost of parsing testexpr normally by subquery and the cost of hash table management. * / typedef struct SubPlan {Expr xpr;// expression / * Fields copied from original SubLink: * / / copied from SubLink SubLinkType subLinkType; / * see above * / * The combining operators, transformed to an executable expression: * / combined operator, converted to executable expression Node * testexpr; / * OpExpr or RowCompareExpr expression tree * / List * paramIds / * Parameter IDs;IDs of Params embedded in the above * / / * Identification of the Plan tree to use: * / Plan tree ID int plan_id; / * Index (from 1) in PlannedStmt.subplans * / / * Identification of the SubPlan for EXPLAIN and debugging purposes: * / EXPLAIN and SubPlan ID char * plan_name for debug purposes / * A name assigned during planning * / * Extra data useful for determining subplan's output type: * / additional information used to determine the subplan output type Oid firstColType; / * the first column type of the subplan result; Type of first column of subplan result * / int32 firstColTypmod; / * Typmod;Typmod of first column of subplan result * / Oid firstColCollation; / * first column Collation of the first column Collation of first column of subplan * result * / / * Information about execution strategy: * / related information about the execution phase bool useHashTable; / * whether to use hash tables to store the output of subqueries; true to store subselect output in a hash * table (implies we are doing "IN") * / bool unknownEqFalse / * T if OK, F if unknown; Fast processing of null values; true if it's okay to return FALSE when the * spec result is UNKNOWN; this allows much * simpler handling of null values * / bool parallel_safe / * is it secure in parallel? is the subplan parallel-safe? * / * Note: parallel_safe does not consider contents of testexpr or args * / / * Information for passing params into and out of the subselect: * / / Information used to pass in and out parameters to subqueries / * setParam and parParam are lists of integers (param IDs) * / / setParam and parParam are param IDs List * setParam / * initplan subqueries have to set these * Params for parent plan * / List * parParam; / * indices of input Params from parent plan * / List * args; / * expressions passed by parParam value Exprs to pass as parParam values * / * Estimated execution costs: * / / estimated execution cost Cost startup_cost; / * one-time setup cost * / Cost per_call_cost; / * cost for each subplan evaluation * /} SubPlan

SubLinkType

SubLink Typ

/ * SubLink * * A SubLink represents a subselect appearing in an expression, and in some * cases also the combining operator (s) just above it. The subLinkType * indicates the form of the expression represented: * EXISTS_SUBLINK EXISTS (SELECT...) * ALL_SUBLINK (lefthand) op ALL (SELECT...) * ANY_SUBLINK (lefthand) op ANY (SELECT..) * ROWCOMPARE_SUBLINK (lefthand) op (SELECT...) * EXPR_SUBLINK (SELECT with single targetlist item...) * MULTIEXPR_SUBLINK (SELECT with multiple targetlist items...) * ARRAY_SUBLINK ARRAY (SELECT with single targetlist item...) * CTE_SUBLINK WITH query (never actually part of an expression) * We use SubLink to represent subqueries that appear in expressions In some cases, the combination operator appears on top of SubLink. * subLinkType represents the form of the expression: * EXISTS_SUBLINK EXISTS (SELECT...) * ALL_SUBLINK (lefthand) op ALL (SELECT...) * ANY_SUBLINK (lefthand) op ANY (SELECT...) * ROWCOMPARE_SUBLINK (lefthand) op (SELECT...) * EXPR_SUBLINK (SELECT with single targetlist item...) * MULTIEXPR_SUBLINK (SELECT with multiple targetlist items...) * ARRAY_SUBLINK ARRAY (SELECT with single targetlist item...) * CTE_SUBLINK WITH query (never actually part of an expression) * * For ALL ANY, and ROWCOMPARE, the lefthand is a list of expressions of the * same length as the subselect's targetlist. ROWCOMPARE will * always* have * a list with more than one entry; if the subselect has just one target * then the parser will create an EXPR_SUBLINK instead (and any operator * above the subselect will be represented separately). * ROWCOMPARE, EXPR, and MULTIEXPR require the subselect to deliver at most * one row (if it returns no rows, the result is NULL). * ALL, ANY, and ROWCOMPARE require the combining operators to deliver boolean * results. ALL and ANY combine the per-row results using AND and OR * semantics respectively. * ARRAY requires just one target column, and creates an array of the target * column's type using any number of rows resulting from the subselect. * for ALL,ANY and ROWCOMPARE, the left operator is an expression linked list that is the same length as the target list of the subquery. * ROWCOMPARE usually has a linked list of more than one entry; if the subquery happens to have only one target column, the parser creates EXPR_SUBLINK * (and all operators above the subquery are represented separately) * ROWCOMPARE, EXPR, and MULTIEXPR require the subquery to output at least one row (if 0 rows are returned, the result is NULL). * ALL,ANY and ROWCOMPARE require a combination of operators to output Boolean results. * ALL/ANY uses AND/ OR semantics to combine the results of each row. * * SubLink is classed as an Expr node, but it is not actually executable; * it must be replaced in the expression tree by a SubPlan node during * planning. * SubLink is classified as an Expr node, but it is not actually executable and must be replaced by SubPlan during the planning phase. * * NOTE: in the raw output of gram.y, testexpr contains just the raw form * of the lefthand _ expression (if any), and operName is the String name of * the combining operator. Also, subselect is a raw parsetree. During parse * analysis, the parser transforms testexpr into a complete boolean expression * that compares the lefthand value (s) to PARAM_SUBLINK nodes representing the * output columns of the subselect. And subselect is transformed to a Query. * This is the representation seen in saved rules and in the rewriter. * Note: in the naked output of gram.y, testexpr contains only the naked form of the left expression, and operName is the string name of the combination operator. * at the same time, the subquery is naked parsetree. During parsing, the * parser converts the testexpr to a full Boolean expression to compare the left operator value with the subquery output column value represented by the PARAM_SUBLINK node. * the subquery is converted to a Query structure. * representations visible in stored rules and overrides. * * In EXISTS, EXPR, MULTIEXPR, and ARRAY SubLinks, testexpr and operName * are unused and are always null. * in EXISTS/EXPR/MULTEXPR/ARRAY SubLinks, testexpr and operName no longer use the usual null value. * subLinkId is currently used only for MULTIEXPR SubLinks, and is zero in * other SubLinks. This number identifies different multiple-assignment * subqueries within an UPDATE statement's SET list. It is unique only * within a particular targetlist. The output column (s) of the MULTIEXPR * are referenced by PARAM_MULTIEXPR Params appearing elsewhere in the tlist. * subLinkId is currently only used for MULTIEXPR, and the value is 0 in other SubLinks. * this number identifies multiple assigned subqueries that are different in the SET linked list of UPDATE statements. * it is unique only within a specific targetlist. * PARAM_MULTIEXPR parameters that appear elsewhere in tlist depend on the output column of MULTIEXPR. * The CTE_SUBLINK case never occurs in actual SubLink nodes, but it is used * in SubPlans generated for WITH subqueries. * CTE_SUBLINK does not appear in the actual SubLink node, but is used in the SubPlans generated by the with subquery. * / typedef enum SubLinkType {EXISTS_SUBLINK, ALL_SUBLINK, ANY_SUBLINK, ROWCOMPARE_SUBLINK, EXPR_SUBLINK, MULTIEXPR_SUBLINK, ARRAY_SUBLINK, CTE_SUBLINK / * is only used in SubPlans; for SubPlans only * /} SubLinkType

SubLink

SubLink structure

Typedef struct SubLink {Expr xpr; SubLinkType subLinkType; / * see above * / int subLinkId; / * ID (1.. n); 0 if not MULTIEXPR * / Node * testexpr; / * outer-query test for ALL/ANY/ROWCOMPARE * / List * operName; / * originally specified operator name * / Node * subselect / * subselect as Query* or raw parsetree * / int location; / * token location, or-1 if unknown * /} SubLink

MaterialState

Material statu

/ *-* MaterialState information * materialize nodes are used to materialize the results * of a subplan into a temporary file. * materialize node is used to materialize the result of subplan as a temporary file. * * ss.ss_ScanTupleSlot refers to output of underlying plan. * the output from ss.ss_ScanTupleSlot to underlyling plan (subplan) *-* / typedef struct MaterialState {ScanState ss; / * its first field is NodeTag * / int eflags; / * is passed to the capability tag of tuplestore; capability flags to pass to tuplestore * / bool eof_underlying; / * has reached the end of underlying plan? Reached end of underlying plan? * / Tuplestorestate * tuplestorestate;} MaterialState; II. Source code interpretation

ExecMaterial

Perform a materialization operation.

/ *-* ExecMaterial * * As long as we are at the end of the data collected in the tuplestore, * we collect one new row from the subplan on each call, and stash it * aside in the tuplestore before returning it. The tuplestore is * only read if we are asked to scan backwards, rescan, or mark/restore. * whenever data collection ends in tuplestore, a new row is collected from subplan on each call, * and saved in tuplestore before being returned. * tuplestore is read only when it is scanned, rescanned, or marked / restored. * *-* / static TupleTableSlot * / * the result returned from subplan Result tuple from subplan * / ExecMaterial (PlanState * pstate) {MaterialState * node = castNode (MaterialState, pstate); / / materialization node EState * estate;// runtime status ScanDirection dir;// scan direction bool forward;// forward scan Tuplestorestate * tuplestorestate;//Tuplestorestate structure pointer bool eof_tuplestore;// completed? TupleTableSlot * slot;// storage tuple slot CHECK_FOR_INTERRUPTS (); / * get state info from node * get relevant information from the materialization node * / estate = node- > ss.ps.state; dir = estate- > es_direction;// direction forward = ScanDirectionIsForward (dir); / / whether to scan tuplestorestate = node- > tuplestorestate; / * * If first time through, and we need a tuplestore, initialize it. * for the first time, you need tuplestore and initialize * / if (tuplestorestate = = NULL & & node- > eflags! = 0) {tuplestorestate = tuplestore_begin_heap (true, false, work_mem); tuplestore_set_eflags (tuplestorestate, node- > eflags); if (node- > eflags & EXEC_FLAG_MARK) {/ * Allocate a second read pointer to serve as the mark. We know it * must have index 1, so needn't store that. * assign a read pointer for mark * / int ptrno PG_USED_FOR_ASSERTS_ONLY; ptrno = tuplestore_alloc_read_pointer (tuplestorestate, node- > eflags); Assert (ptrno = = 1);} node- > tuplestorestate = tuplestorestate } / * * If we are not at the end of the tuplestore, or are going backwards, try * to fetch a tuple from tuplestore. * if it is not at the end of tuplestore or is scanning back, try to extract a tuple from tuplestore * / eof_tuplestore = (tuplestorestate = = NULL) | | tuplestore_ateof (tuplestorestate); if (! forward & & eof_tuplestore) {if (! node- > eof_underlying) {/ * * When reversing direction at tuplestore EOF, the first * gettupleslot call will fetch the last-added tuple But we want * to return the one before that, if possible. So do an extra * fetch. * reverse the direction at EOF, and the first gettupleslot call will extract the last added tuple; * but if possible, you want to return the tuple before that and perform additional extraction operations. * / if (! tuplestore_advance (tuplestorestate, forward) return NULL; / * the tuplestore must be empty * /} eof_tuplestore = false;} / * * If we can fetch another tuple from the tuplestore, return it. * if you can extract another tuple from tuplestore, return * / slot = node- > ss.ps.ps_ResultTupleSlot; if (! eof_tuplestore) {if (tuplestore_gettupleslot (tuplestorestate, forward, false, slot)) return slot; if (forward) eof_tuplestore = true;} / * * If necessary, try to fetch another row from the subplan. * if necessary (end of tuplestore), try to extract another line from subplan * * Note: the eof_underlying state variable exists to short-circuit further * subplan calls. It's not optional, unfortunately, because some plan * node types are not robust about being called again when they've already * returned NULL. * / if (eof_tuplestore & &! node- > eof_underlying) {PlanState * outerNode; TupleTableSlot * outerslot; / * * We can only get here with forward==true, so no need to worry about * which direction the subplan will go. * / outerNode = outerPlanState (node); outerslot = ExecProcNode (outerNode); if (TupIsNull (outerslot)) {node- > eof_underlying = true; return NULL;} / * * Append a copy of the returned tuple to tuplestore. NOTE: because * the tuplestore is certainly in EOF state, its read position will * move forward over the added tuple. This is what we want. * append the returned tuples to tuplestore. * Note: because tuplestore is currently in EOF state, the read position will be moved forward to the tuple that has been added, which is what we want to see. * / if (tuplestorestate) tuplestore_puttupleslot (tuplestorestate, outerslot); ExecCopySlot (slot, outerslot); return slot;} / * * Nothing left... * / return ExecClearTuple (slot);}

Tuplestore_begin_heap

Initialize tuplestore

/ * tuplestore_begin_heap * * Create a new tuplestore; other types of tuple stores (other than * "heap" tuple stores, for heap tuples) are possible, but not presently * implemented. * create a new tuplestore: only heap tuples has been implemented. * * randomAccess: if true, both forward and backward accesses to the * tuple store are allowed. * randomAccess: if T, forward and backward access is supported. * * interXact: if true, the files used for on-disk storage persist beyond the * end of the current transaction. NOTE: It's the caller's responsibility to * create such a tuplestore in a memory context and resource owner that will * also survive transaction boundaries, and to ensure the tuplestore is closed * when it's no longer wanted. * interXact: if T, the stored files on disk will also be maintained after the end of the current transaction. * Note: it is the responsibility of the caller to create the tuplestore in the memory context and resource owners that survive within the transaction boundary and to ensure that the tuplestore is destroyed when it is no longer in use. * * maxKBytes: how much data to store in memory (any data beyond this * amount is paged to disk) When in doubt, use work_mem. * maxKBytes: how much data needs to be stored in memory (longer than this size will be paged to disk). * if there is a problem, use work_mem. * / Tuplestorestate * tuplestore_begin_heap (bool randomAccess, bool interXact, int maxKBytes) {Tuplestorestate * state; int eflags; / * * This interpretation of the meaning of randomAccess is compatible with * the pre-8.3 behavior of tuplestores. * / eflags = randomAccess? (EXEC_FLAG_BACKWARD | EXEC_FLAG_REWIND): (EXEC_FLAG_REWIND); state = tuplestore_begin_common (eflags, interXact, maxKBytes); state- > copytup = copytup_heap; state- > writetup = writetup_heap; state- > readtup = readtup_heap; return state;} / * * tuplestore_begin_xxx * * Initialize for a tuplestore operation. * initialize tuplestore * / static Tuplestorestate * tuplestore_begin_common (int eflags, bool interXact, int maxKBytes) {Tuplestorestate * state; state = (Tuplestorestate *) palloc0 (sizeof (Tuplestorestate)); state- > status = TSS_INMEM; state- > eflags = eflags; state- > interXact = interXact; state- > truncated = false; state- > allowedMem = maxKBytes * 1024L; state- > availMem = state- > allowedMem; state- > myfile = NULL; state- > context = CurrentMemoryContext; state- > resowner = CurrentResourceOwner State- > memtupdeleted = 0; state- > memtupcount = 0; state- > tuples = 0; / * Initial size of array must be more than ALLOCSET_SEPARATE_THRESHOLD; * see comments in grow_memtuples (). * / state- > memtupsize = Max (16384 / sizeof (void *), ALLOCSET_SEPARATE_THRESHOLD / sizeof (void *) + 1); state- > growmemtuples = true; state- > memtuples = (void * *) palloc (state- > memtupsize * sizeof (void *)); USEMEM (state, GetMemoryChunkSpace (state- > memtuples)); state- > activeptr = 0; state- > readptrcount = 1; state- > readptrsize = 8 / * arbitrary * / state- > readptrs = (TSReadPointer *) palloc (state- > readptrsize * sizeof (TSReadPointer)); state- > readptrs [0] .eflags = eflags; state- > readptrs [0] .eof _ reached = false; state- > readptrs [0] .current = 0; return state;}

Tuplestore_advance

Advance one line from tuplestore

/ * tuplestore_advance-exported function to adjust position without fetching * * We could optimize this case to avoid palloc/pfree overhead, but for the * moment it doesn't seem worthwhile. * / booltuplestore_advance (Tuplestorestate * state, bool forward) {void * tuple; bool should_free; tuple = tuplestore_gettuple (state, forward, & should_free); if (tuple) {if (should_free) pfree (tuple); return true;} else {return false;}}

Tuplestore_gettupleslot

Get slot

/ * * tuplestore_gettupleslot-exported function to fetch a MinimalTuple * extract MinimalTuple * * If successful, put tuple in slot and return true; else, clear the slot * and return false. * if successful, insert the tuple into slot and return T, otherwise empty slot returns F * * If copy is true, the slot receives a copied tuple (allocated in current * memory context) that will stay valid regardless of future manipulations of * the tuplestore's state. If copy is false, the slot may just receive a * pointer to a tuple held within the tuplestore. The latter is more * efficient but the slot contents may be corrupted if additional writes to * the tuplestore occur. (If using tuplestore_trim, see comments therein.) * if copy is T, slot receives the copied tuple, independent of the state of tuplestore. * if copy is F, slot may receive a tuple pointer in tuplestore. * / booltuplestore_gettupleslot (Tuplestorestate * state, bool forward, bool copy, TupleTableSlot * slot) {MinimalTuple tuple; bool should_free; tuple = (MinimalTuple) tuplestore_gettuple (state, forward, & should_free); if (tuple) {if (copy & & should_free) {tuple = heap_copy_minimal_tuple (tuple); should_free = true } ExecStoreMinimalTuple (tuple, slot, should_free); return true;} else {ExecClearTuple (slot); return false;}}

Tuplestore_gettuple

Return to the next tuple

/ * Fetch the next tuple in either forward or back direction. * Returns NULL if no more tuples. If should_free is set, the * caller must pfree the returned tuple when done with it. * returns the next tuple forward / backward. * if there are no more tuples, return NULL. If should_free has a value, the caller must release the returned tuple * * Backward scan is only allowed if randomAccess was set true or * EXEC_FLAG_BACKWARD was specified to tuplestore_set_eflags () after processing. * only allowed when randomAccess is set to T or EXEC_FLAG_BACKWARD is specified. * / static void * tuplestore_gettuple (Tuplestorestate * state, bool forward, bool * should_free) {TSReadPointer * readptr = & state- > readptrs [state-> activeptr]; / / read pointer unsigned int tuplen; void * tup; Assert (forward | (readptr- > eflags & EXEC_FLAG_BACKWARD)); switch (state- > status) {case TSS_INMEM:// memory * should_free = false If (forward) {if (readptr- > eof_reached) return NULL; if (readptr- > current)

< state->

Memtupcount) {/ * We have another tuple, so return it * / return state- > memtups [readptr-> current++];} readptr- > eof_reached = true; return NULL } else {/ * if all tuples are fetched already then we return last * tuple, else tuple before last returned. * / if (readptr- > eof_reached) {readptr- > current = state- > memtupcount; readptr- > eof_reached = false } else {if (readptr- > current memtupdeleted) {Assert (! state- > truncated); return NULL;} readptr- > current-- / * last returned tuple * /} if (readptr- > current memtupdeleted) {Assert (! state- > truncated); return NULL;} return state- > memtups [readptr-> current-1];} break Case TSS_WRITEFILE:// write file / * Skip state change if we'll just return NULL * / / if you only need to return NULL, skip the state transition if (readptr- > eof_reached & & forward) return NULL; / * * Switch from writing to reading. * switch from write to read * / BufFileTell (state- > myfile, & state- > writepos_file, & state- > writepos_offset) If (! readptr- > eof_reached) if (BufFileSeek (state- > myfile, readptr- > file, readptr- > offset, SEEK_SET)! = 0) ereport (ERROR, (errcode_for_file_access ()) Errmsg ("could not seek in tuplestore temporary file:% m") State- > status = TSS_READFILE; / * FALLTHROUGH * / the processing logic for entering the read file state case TSS_READFILE: * should_free = true If (forward) {/ / forward read if ((tuplen = getlen (state, true)! = 0) {tup = READTUP (state, tuplen); return tup } else {readptr- > eof_reached = true; return NULL;}} / * * Backward. * read * if all tuples are fetched already then we return last tuple, * else tuple before last returned. * if all tuples have been extracted, the last tuple is returned, otherwise the previously returned tuple * * Back up to fetch previously-returned tuple's ending length * word. If seek fails, assume we are at start of file. * extract the previously returned tuple end length word back upwards. If the retrieval fails, it is assumed to be at the beginning of the file. * / if (BufFileSeek (state- > myfile, 0,-(long) sizeof (unsigned int), SEEK_CUR)! = 0) {/ * even a failed backwards fetch gets you out of eof state * / readptr- > eof_reached = false; Assert (! state- > truncated); return NULL } tuplen = getlen (state, false); if (readptr- > eof_reached) {readptr- > eof_reached = false / * We will return the tuple returned before returning NULL * / / returns the previously returned tuple} else {/ * * Back up to get ending length word of tuple before it before returning NULL. * get the ending length word * / if (BufFileSeek (state- > myfile, 0,-(long) (tuplen + 2 * sizeof (unsigned int) SEEK_CUR)! = 0) {/ * * If that fails, presumably the prev tuple is the first * in the file. Back up so that it becomes next to read * in forward direction (not obviously right, but that is * what in-memory case does). * / if (BufFileSeek (state- > myfile, 0,-(long) (tuplen + sizeof (unsigned int)), SEEK_CUR! = 0) ereport (ERROR, (errcode_for_file_access ()) Errmsg ("could not seek in tuplestore temporary file:% m") Assert (! state- > truncated); return NULL;} tuplen = getlen (state, false);} / * * Now we have the length of the prior tuple, back up and read it. * Note: READTUP expects we are positioned after the initial * length word of the tuple, so back up to that point. * the length of the priority tuple has been obtained and read it. * / if (BufFileSeek (state- > myfile, 0,-(long) tuplen, SEEK_CUR)! = 0) ereport (ERROR, (errcode_for_file_access (), errmsg ("could not seek in tuplestore temporary file:% m") Tup = READTUP (state, tuplen); return tup; default: elog (ERROR, "invalid tuplestore state"); return NULL; / * keep compiler quiet * /}} III. Tracking analysis

Perform SQL:

[pg12@localhost] $psql-d testdbTiming is on.Expanded display is used automatically.psql Type "help" for help. [local]: 5432 pg12@testdb=# [local]: 5432 pg12@testdb=# select * from tbl; id | value-+-1 | 2 (1 row) Time: 2.678 ms [local]: 5432 pg12@testdb=# select count (*) from t_big_null Count-10000001 (1 row) Time: 679.972 ms [local]: 5432 pg12@testdb=# analyze tbl;ANALYZETime: 64.442 ms [local]: 5432 pg12@testdb=# analyze tweets. [ms]: 434.702 ms [local]: 5432 pg12@testdb=# [local]: 5432 pg12@testdb=# select pg_backend_pid () Pg_backend_pid-18758 (1 row) Time: 1.990 ms [local]: 5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b)

Start gdb trace

(gdb) b ExecMaterialBreakpoint 1 at 0x720edb: file nodeMaterial.c, line 41. (gdb) cContinuing.Breakpoint 1, ExecMaterial (pstate=0x1230128) at nodeMaterial.c:4141 MaterialState * node = castNode (MaterialState, pstate); (gdb)

Single-step debugging

(gdb) n49 CHECK_FOR_INTERRUPTS (); (gdb) 54 estate = node- > ss.ps.state; (gdb) 55 dir = estate- > es_direction; (gdb) 56 forward = ScanDirectionIsForward (dir); (gdb) 57 tuplestorestate = node- > tuplestorestate; (gdb) 62 if (tuplestorestate = = NULL & node- > eflags! = 0) (gdb) 64 tuplestorestate = tuplestore_begin_heap (true, false, work_mem) (gdb) 65 tuplestore_set_eflags (tuplestorestate, node- > eflags); (gdb) 66 if (node- > eflags & EXEC_FLAG_MARK) (gdb) 78 node- > tuplestorestate = tuplestorestate; (gdb) 85 eof_tuplestore = (tuplestorestate = = NULL) | | (gdb) 86 tuplestore_ateof (tuplestorestate) (gdb) 85 eof_tuplestore = (tuplestorestate = = NULL) | | (gdb) 88 if (! forward & & eof_tuplestore) (gdb) p eof_tuplestore$1 = false (gdb)

Enter tuplestore_gettupleslot

(gdb) N107 slot= node- > ss.ps.ps_ResultTupleSlot; (gdb) 108 if (! eof_tuplestore) (gdb) 110 if (tuplestore_gettupleslot (tuplestorestate, forward, false, slot)) (gdb) steptuplestore_gettupleslot (state=0x3069c18, forward=true, copy=false, slot=0x30687a8) at tuplestore.c:10841084 tuple = (MinimalTuple) tuplestore_gettuple (state, forward, & should_free); (gdb)

Enter tuplestore_gettuple

(gdb) steptuplestore_gettuple (state=0x3069c18, forward=true, should_free=0x7ffd18474ff7) at tuplestore.c:906906 TSReadPointer * readptr = & state- > readptrs [state-> activeptr]; (gdb)

Tuplestore_gettuple- > File read and write pointer information

(gdb) n910 Assert (forward | | (readptr- > eflags & EXEC_FLAG_BACKWARD)); (gdb) p * readptr$2 = {eflags = 2, eof_reached = false, current = 0, file = 2139062143, offset = 9187201950435737471}

Tuplestore_gettuple- > current status is TSS_INMEM

(gdb) n912 switch (state- > status) (gdb) p * state$3 = {status = TSS_INMEM, eflags = 2, backward = false, interXact = false, truncated = false, availMem = 4177896, allowedMem = 4194304, tuples = 0, myfile = 0x0, context = 0x3067da0, resowner = 0x2fa62c8, copytup = 0xaba7bd, writetup = 0xaba811, readtup = 0xaba9d9, memtuples = 0x3051e90, memtupdeleted = 0, memtupcount = 0, memtupsize = 2048, growmemtuples = true, true = true, readptrs = 0, true = 8, readptrs = 0 Writepos_offset = 0} (gdb) p state- > status$4 = TSS_INMEM (gdb)

Tuplestore_gettuple- > returns NULL

(gdb) n915 * should_free = false; (gdb) n916 if (forward) (gdb) 918 if (readptr- > eof_reached) (gdb) 920 if (readptr- > current)

< state->

Memtupcount) (gdb) p readptr- > current$5 = 0 (gdb) p state- > memtupcount$6 = 0 (gdb) n925 readptr- > eof_reached = true; (gdb) 926 return NULL; (gdb) 1062} (gdb)

Tuplestore_gettupleslot- > returns false

(gdb) ntuplestore_gettupleslot (state=0x3069c18, forward=true, copy=false, slot=0x30687a8) at tuplestore.c:10861086 if (tuple) (gdb) 1098 ExecClearTuple (slot); (gdb) 1099 return false; (gdb)

Back to ExecMaterial

(gdb) n1101} (gdb) ExecMaterial (pstate=0x3068158) at nodeMaterial.c:112112 if (forward) (gdb) 113 eof_tuplestore = true; (gdb)

Get a row from outerPlan (that is, get a row from t_big_null)

(gdb) n124 if (eof_tuplestore & &! node- > eof_underlying) (gdb) p node- > eof_underlying$7 = false (gdb) n133 outerNode = outerPlanState (node); (gdb) # define innerPlanState (node) ((PlanState *) (node))-> righttree) # define outerPlanState (node) ((PlanState *) (node))-> lefttree) # 134 outerslot = ExecProcNode (outerNode) (gdb) p outerNode$8 = (PlanState *) 0x3068270 (gdb) p * outerNode$9 = {type = T_SeqScanState, plan = 0x3037628, state = 0x3067eb8, ExecProcNode = 0x6f802a, ExecProcNodeReal = 0x72b904, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, ps_ResultTupleDesc = 0x3068578, 0x3068578, ps_ResultTupleSlot, 0x0, 0x0, ps_ExprContext, Resultops = 0xc3e780, scanopsfixed = true, outeropsfixed = false, inneropsfixed = false, resultopsfixed = true, scanopsset = true, outeropsset = false, inneropsset = false, resultopsset = true} (gdb) p * outerNode- > state$10 = {type = T_EState, es_direction = ForwardScanDirection, es_snapshot = 0x2f9cd10, es_crosscheck_snapshot = 0x0, es_range_table = 0x3042130, es_range_table_array = 0x3068108, es_range_table_size = 2, es_relations = 0x3068130, 0x3068130 = es_rowmarks, es_rowmarks = 0x0 Es_sourceText = 0x2f74d88 "select * from tbl a where a.id not in (select b.id from t_big_null b) ", es_junkFilter = 0x0, es_output_cid = 0, es_result_relations = 0x0, es_num_result_relations = 0, es_result_relation_info = 0x0, es_root_result_relations = 0x0, es_num_root_result_relations = 0, es_partition_directory = 0x0, es_tuple_routing_result_relations = 0x0, es_trig_target_relations = 0x0, es_param_list_info = 0x0, es_param_exec_vals = 0x30680d0, es_queryEnv = 0x0 Es_query_cxt = 0x3067da0, es_tupleTable = 0x3068540, es_processed = 0, es_top_eflags = 16, es_instrument = 0, es_finished = false, es_exprcontexts = 0x3068448, es_subplanstates = 0x3068950, es_auxmodifytables = 0x0, es_per_tuple_exprcontext = 0x0, es_epq_active = 0x0, es_use_parallel_mode = false, es_query_dsa = 0x0, es_jit_flags = 25, es_jit = 0x0 Es_jit_worker_instr = 0x0} (gdb) p ((PlanState *) node)-> righttree$21 = (struct PlanState *) 0x0 (gdb)

Looking back at the execution plan, the lefttree of Materialize Node is Seq Scan on public.t_big_null bjinghttree is NULL.

[local]: 5432 pg12@testdb=# explain verbose select * from tbl a where a.id not in (select b.id from t_big_null b) QUERY PLAN -Seq Scan on public.tbl a (cost=0.00..129156.33 rows=1 width=8) Output: a.id A.value Filter: (NOT (SubPlan 1)) SubPlan 1-> Materialize (cost=0.00..233310.68 rows=9999979 width=4) Output: b.id-> Seq Scan on public.t_big_null b (cost=0.00..144247.79 rows=9999979 width=4) Output: b.id (8 rows) Time: 7.681 ms

Get outerslot

(gdb) n135 if (TupIsNull (outerslot)) (gdb) p * outerslot$16 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 0, tts_ops = 0xc3e780, tts_tupleDescriptor = 0x7fab449cae98, tts_values = 0x30684f0, tts_isnull = 0x30684f8, tts_mcxt = 0x3067da0, tts_tid = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 1} Tts_tableOid = 49155} (gdb) p * outerslot- > tts_values$17 = 0 (gdb) p outerslot- > tts_values [1] $18 = 0 (gdb) p outerslot- > tts_values [0] $19 = 0 (gdb) p * outerslot- > tts_tupleDescriptor$20 = {natts = 1, tdtypeid = 49157, tdtypmod =-1, tdrefcount = 2, constr = 0x0, attrs = 0x7fab449caeb0}

After obtaining the outerslot, put to the tuplestore

(gdb) p * node$22 = {ss = {ps = {type = T_MaterialState, plan = 0x3040a60, state = 0x3067eb8, ExecProcNode = 0x720ecf, ExecProcNodeReal = 0x720ecf, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual = 0x0, lefttree = 0x3068270, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, ps_ResultTupleDesc = 0x3068690, ps_ResultTupleSlot = 0x30687a8, 0x30687a8 = 0x0, ps_ResultTupleDesc = 0x3068690, ps_ResultTupleSlot = 0x30687a8, 0x30687a8 = ps_ExprContext, 0x0 = 0x0, 0x0 = 0x0, = Innerops = 0x0, resultops = 0xc3e720, scanopsfixed = true, outeropsfixed = false, inneropsfixed = false, resultopsfixed = true, scanopsset = true, outeropsset = false, inneropsset = false, resultopsset = true}, ss_currentRelation = 0x0, ss_currentScanDesc = 0x0, ss_ScanTupleSlot = 0x3068868}, eflags = 2, eof_underlying = false, tuplestorestate = 0x3069c18} (gdb) n146 if (tuplestorestate) (gdb) 147 tuplestore_puttupleslot (tuplestore_puttupleslot, tuplestorestate) (gdb) p outerslot- > tts_values [0] $23 = 0 (gdb) n149 ExecCopySlot (slot, outerslot); (gdb) p outerslot- > tts_values [0] $24 = 0 (gdb) n150 return slot; (gdb) p outerslot- > tts_values [0] $25 = 0 (gdb) p slot- > tts_values [0] $26 = 0 (gdb) n157} (gdb)

Continue to "materialize"

(gdb) nExecProcNodeFirst (node=0x3068158) at execProcnode.c:446446} (gdb) cContinuing.Breakpoint 1, ExecMaterial (pstate=0x3068158) at nodeMaterial.c:4141 MaterialState * node= castNode (MaterialState, pstate); (gdb) n49 CHECK_FOR_INTERRUPTS (); (gdb) 54 estate = node- > ss.ps.state; (gdb) 55 dir = estate- > es_direction; (gdb) 56 forward = ScanDirectionIsForward (dir); (gdb) 57 tuplestorestate = node- > tuplestorestate (gdb) 62 if (tuplestorestate = = NULL & & node- > eflags! = 0) (gdb) 85 eof_tuplestore = (tuplestorestate = = NULL) | | (gdb) 86 tuplestore_ateof (tuplestorestate); (gdb) 85 eof_tuplestore = (tuplestorestate = = NULL) | | (gdb) 88 if (! forward & eof_tuplestore) (gdb) 107 slot = node- > ss.ps.ps_ResultTupleSlot (gdb) 108 if (! eof_tuplestore) (gdb) 124 if (eof_tuplestore & &! node- > eof_underlying) (gdb) 133 outerNode = outerPlanState (node); (gdb) p eof_tuplestore$27 = true (gdb) n134 outerslot = ExecProcNode (outerNode); (gdb) 135if (TupIsNull (outerslot)) (gdb) 146if (tuplestorestate) (gdb) 147tuplestore_puttupleslot (tuplestorestate, outerslot) (gdb) 149 ExecCopySlot (slot, outerslot); (gdb) 150 return slot; (gdb) p slot- > tts_values [0] $28 = 2 (gdb)

The first execution takes a long time, and the second is relatively faster by 2 orders of magnitude, which needs to be further studied.

[local]: 5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b); id | value-+-- (0 rows) Time: 3633462.666 ms (01from tbl a where a.id not in 33.463)-> including debug time, the actual time is about 5s [local]: 5432 pg12@testdb=# [local]: 5432 pg12@testdb=# select * from tbl a where a.id not in (select b.id from t_big_null b) Id | value-+-(0 rows) Time: 6.480 ms-> the second + time is much faster [local]: 5432 pg12@testdb=#

DONE

IV. Reference materials

N/A

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.