Implementation Logic Analysis of ExecHashJoin dependence on other functions in PostgreSQL 07/12 Update SLTechnology News&Howtos

Implementation Logic Analysis of ExecHashJoin dependence on other functions in PostgreSQL

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "the implementation logic analysis of ExecHashJoin dependence on other functions in PostgreSQL". In daily operation, I believe that many people have doubts about the realization of logic analysis of ExecHashJoin dependence on other functions in PostgreSQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "ExecHashJoin depends on other functions in PostgreSQL". Next, please follow the editor to study!

These functions are used in the HJ_NEED_NEW_OUTER phase, including ExecHashJoinOuterGetTuple, ExecPrepHashTableForUnmatched, ExecHashGetBucketAndBatch, ExecHashGetSkewBucket, ExecHashJoinSaveTuple, ExecFetchSlotMinimalTuple, and so on.

I. data structure

Plan

All plan nodes "derive" from the Plan structure by using the Plan structure as the first field. This ensures that everything works properly when you convert a node to a planned node. (node pointers are often converted to Plan * when passed in a general manner in an actuator.)

/ *-* Plan node * All plan nodes "derive" from the Plan structure by having the * Plan structure as the first field. This ensures that everything works * when nodes are cast to Plan's. (node pointers are frequently cast to Plan* * when passed around generically in the executor) * all plan nodes "derive" from the Plan structure by using the Plan structure as the first field. * this ensures that everything works properly when converting a node to a scheduled node. * (node pointers are often converted to Plan * when passed in a general manner in the executor) * * We never actually instantiate any Plan nodes; this is just the common * abstract superclass for all Plan-type nodes. * never instantiate any Plan node; this is just a common abstract superclass for all Plan-type nodes. *-* / typedef struct Plan {NodeTag type;// node type / * * cost estimation information; estimated execution costs for plan (see costsize.c for more info) * / Cost startup_cost; / * startup cost; cost expended before fetching any tuples * / Cost total_cost; / * total cost Total cost (assuming all tuples fetched) * / / * * optimizer estimate information; planner's estimate of result size of this plan step * / double plan_rows; / * rows; number of rows plan is expected to emit * / int plan_width; / * average row size (in Byte); average row width in bytes * / / * information related to parallel execution Does information needed for parallel query * / bool parallel_aware; / * participate in parallel execution of logic? engage parallel-aware logic? * / bool parallel_safe; / * is parallel security; OK to use as part of parallel plan? * / * * information common to Plan type nodes. Common structural data for all Plan types. * / int plan_node_id; / * unique across entire final plan tree * / List * targetlist; / * targetlist to be computed at this node * / List * qual; / * implicitly-ANDed qual conditions * / struct Plan * lefttree; / * input plan tree (s) * / struct Plan * righttree; List * initPlan / * Init Plan nodes (un-correlated expr * subselects) * / / * Information for management of parameter-change-driven rescanning * parameter-change-driven rescan management information. * extParam includes the paramIDs of all external PARAM_EXEC params * affecting this plan node or its children. SetParam params from the * node's initPlans are not included, but their extParams are. * * allParam includes all the extParam paramIDs, plus the IDs of local * params that affect the node (i.e., the setParams of its initplans). * These are _ all_ the PARAM_EXEC params that affect this node. * / Bitmapset * extParam; Bitmapset * allParam;} Plan

JoinState

The base class of Hash/NestLoop/Merge Join

/ *-* JoinState information * * Superclass for state nodes of join plans. * Hash/NestLoop/Merge Join base class *-* / typedef struct JoinState {PlanState ps;// base class PlanState JoinType jointype;// connection type / / when a matching inner tuple is found, if you need to jump to the next outer tuple, the value is T bool single_match. / * True if we should skip to next outer tuple * after finding one inner match * / / join condition expression (except ps.qual) ExprState * joinqual; / * JOIN quals (in addition to ps.qual) * /} JoinState

HashJoinState

Hash Join runtime status structure

/ * these structs are defined in executor/hashjoin.h: * / typedef struct HashJoinTupleData * HashJoinTuple;typedef struct HashJoinTableData * HashJoinTable;typedef struct HashJoinState {JoinState js; / * Base class; its first field is NodeTag * / ExprState * hashclauses;//hash connection condition List * hj_OuterHashKeys; / * external condition linked list; list of ExprState nodes * / List * hj_InnerHashKeys; / * Internal table connection condition List of ExprState nodes * / List * hj_HashOperators; / * operator OIDs linked list; list of operator OIDs * / HashJoinTable hj_HashTable;//Hash table uint32 hj_CurHashValue;// current hash value int hj_CurBucketNo;// current bucket number int hj_CurSkewBucketNo;// line slant bucket number HashJoinTuple hj_CurTuple;// current tuple TupleTableSlot * hj_OuterTupleSlot / / outer relation slot TupleTableSlot * hj_HashTupleSlot;//Hash tuple slot TupleTableSlot * hj_NullOuterTupleSlot;// outer virtual slot TupleTableSlot * hj_NullInnerTupleSlot;// for external connection inner virtual slot TupleTableSlot * hj_FirstOuterTupleSlot;// int hj_JoinState;//JoinState status bool hj_MatchedOuter;// matches whether bool hj_OuterNotEmpty;//outer relation is empty} HashJoinState

HashJoinTable

Hash data structure

Typedef struct HashJoinTableData {hash barrels in int nbuckets; / * memory; logarithm of # buckets in the in-memory hash table * / int log2_nbuckets; / * 2 (nbuckets must be a power of 2); its log2 (nbuckets must be a power of 2) * / int nbuckets_original; / * the number of barrels in the first hash; # buckets when starting the first hash * / int nbuckets_optimal / * optimized number of barrels (per batch); logarithm of optimal # buckets (per batch) * / int log2_nbuckets_optimal; / * 2 Log2 (nbuckets_optimal) * / * buckets [I] is head of list of tuples in i'th in-memory bucket * / bucket [I] is the head item union {/ * unshared array is per-batch storage of the tuple linked list in the first bucket in memory, as are all the tuples * / / unshared arrays are stored in batches, all tuples are like this struct HashJoinTupleData * * unshared / * shared array is per-query DSA area, as are all the tuples * / / shared array is the DSA area of each query. All tuples are such dsa_pointer_atomic * shared;} buckets; bool keepNulls; / * if they do not match, NULL tuples are stored. The value is true to store unmatchable NULL tuples * / bool skewEnabled; / *. Do you use skew optimization? Are we using skew optimization? * / HashSkewBucket * * skewBucket; / * number of tilted hash buckets; hashtable of skew buckets * / int skewBucketLen; / * skewBucket array size; size of skewBucket array (a power of 2!) * / int nSkewBuckets; / * active tilted buckets; number of active skew buckets * / int * skewBucketNums; / * active tilt bucket array index Array indexes of active skew buckets * / int nbatch; / * number of batches; number of batches * / int curbatch; / * current batch, the first round is 0 * current batch #; 0 during 1st pass * / int nbatch_original; / * batch at the start of inner scanning; nbatch when we started inner scan * / int nbatch_outstart / * batch at start of outer scan; nbatch when we started outer scan * / bool growEnabled; / * turn off the tag added by nbatch; flag to shut off nbatch increases * / double totalTuples; / * number of tuples obtained from inner plan; # tuples obtained from inner plan * / double partialTuples; / * number of inner tuples obtained through hashjoin; # tuples obtained from inner plan by me * / double skewTuples / * number of tilted tuples; # tuples inserted into skew tuples * / * * These arrays are allocated for the life of the hash join, but only if * nbatch > 1. A file is opened only when we first write a tuple into it * (otherwise its pointer remains NULL). Note that the zero'th array * elements never get used, since we will process rather than dump out any * tuples of batch zero. * these arrays are allocated during the lifetime of the hash join, but only if nbatch > 1. * the file is opened only when the tuple is written to the file for the first time (otherwise its pointer will remain NULL). * Note that the 0 th array element is never used because the tuple of batch 0 is never dumped. * / BufFile * * innerBatchFile; / * inner virtual temporary file cache for each batch; buffered virtual temp file per batch * / BufFile * * outerBatchFile; / * outer virtual temporary file cache for each batch; buffered virtual temp file per batch * / * Info about the datatype-specific hash functions for the datatypes being * hashed. These are arrays of the same length as the number of hash join * clauses (hash keys). * Information about the data type-specific hash function of the data type being hashed. * these arrays are the same length as the number of hash join clauses (hash keys). * / FmgrInfo * outer_hashfunctions; / * outer hash function FmgrInfo structure; lookup data for hashfunctions * / FmgrInfo * inner_hashfunctions; / * inner hash function FmgrInfo structure; lookup data for hashfunctions * / bool * hashStrict; / * each hash operator is strict? current memory space used by is each hash join operator strict? * / Size spaceUsed; / * tuple Memory space currently used by tuples * / Size spaceAllowed; / * space usage limit; upper limit for space used * / Size spacePeak; / * peak space usage; current space usage of peak space used * / Size spaceUsedSkew; / * tilted hash table; upper limit of skew hash table's current space usage * / Size spaceAllowedSkew; / * tilted hash table usage Upper limit for skew hashtable * / MemoryContext hashCxt; / * the context of the entire hash connection storage; context for whole-hash-join storage * / MemoryContext batchCxt; / * the context of the batch storage; context for this-batch-only storage * / * used for dense allocation of tuples (into linked chunks) * / / for dense allocation of tuples (to link blocks) HashMemoryChunk chunks; / * the whole batch uses a linked list One list for the whole batch * / * Shared and private state for Parallel Hash. * / / shared and private status used by parallel hash HashMemoryChunk current_chunk; / * current chunk;this backend's current chunk of background processes * / dsa_area * area; / * DSA area used to allocate memory; DSA area to allocate memory from * / ParallelHashJoinState * parallel_state;// parallel execution status ParallelHashJoinBatchAccessor * batches;// parallel accessor dsa_pointer current_chunk_shared / / start pointer of the current chunk} HashJoinTableData;typedef struct HashJoinTableData * HashJoinTable

HashJoinTupleData

Hash connection tuple data

/ *-* hash-join hash table structures * * Each active hashjoin has a HashJoinTable control block, which is * palloc'd in the executor's per-query context. All other storage needed * for the hashjoin is kept in private memory contexts, two for each hashjoin. * This makes it easy and fast to release the storage when we don't need it * anymore. (Exception: data associated with the temp files lives in the * per-query context too, since we always call buffile.c in that context.) * each active hashjoin has a hashable control block, which is allocated through palloc in each query context of the executing program. * all other storage required by hashjoin is kept in a private memory context, with two for each hashjoin. This makes it easy and fast to release it when it is no longer needed. * (exception: data related to temporary files also exists in the context of each query, because buffile.c is always called in this case.) * * The hashtable contexts are made children of the per-query context, ensuring * that they will be discarded at end of statement even if the join is * aborted early by an error. The (Likewise, any temporary files we make will * be cleaned up by the virtual file manager in event of an error.) * hashtable context is a subcontext of each query context, ensuring that they are discarded at the end of the statement, even if the connection is aborted early due to an error. * (similarly, if an error occurs, the virtual file manager cleans up any temporary files created.) * * Storage that should live through the entire join is allocated from the * "hashCxt", while storage that is only wanted for the current batch is * allocated in the "batchCxt". By resetting the batchCxt at the end of * each batch, we free all the per-batch storage reliably and without tedium. * Storage space over the entire connection should be allocated from "hashCxt", while only storage space that requires the current batch should be allocated in "batchCxt". * by resetting the batchCxt at the end of each batch, you can reliably release all storage for each batch without feeling tedious. * During first scan of inner relation, we get its tuples from executor. * If nbatch > 1 then tuples that don't belong in first batch get saved * into inner-batch temp files. The same statements apply for the * first scan of the outer relation, except we write tuples to outer-batch * temp files. After finishing the first scan, we do the following for * each remaining batch: * 1. Read tuples from inner batch file, load into hash buckets. 2. Read tuples from outer batch file, match to hash buckets and output. * in the first scan of the internal relationship, its tuple was obtained from the executor. * if nbatch > 1, tuples that are not part of the first batch will be saved to an intra-batch temporary file. * the same statement applies to the first scan of the external relationship, but we write the tuple to the external batch temporary file. * after the first scan, we do the following for the remaining tuples of each batch: * 1. The tuple is read from the internal batch file and loaded into the hash bucket. * 2. Reads tuples from an external batch file, matching hash buckets and output. * It is possible to increase nbatch on the fly if the in-memory hash table * gets too big. The hash-value-to-batch computation is arranged so that this * can only cause a tuple to go into a later batch than previously thought, * never into an earlier batch. When we increase nbatch, we rescan the hash * table and dump out any tuples that are now of a later batch to the correct * inner batch file. Subsequently, while reading either inner or outer batch * files, we might find tuples that no longer belong to the current batch; * if so, we just dump them out to the correct batch file. * if the hash table in memory is too large, you can dynamically increase the nbatch. * the calculation of hash values to batches is arranged as follows: * this only causes tuples to enter later batches than previously thought, not earlier batches. * when nbatch is added, rescan the hash table and dump any tuples that now belong to later batches to the correct internal batch file. * later, when reading internal or external batch files, you may find tuples that are no longer part of the current batch; * if so, simply dump them to the correct batch file. *-* / / * these are in nodes/execnodes.h: * / * typedef struct HashJoinTupleData * HashJoinTuple; * / / * typedef struct HashJoinTableData * HashJoinTable * / typedef struct HashJoinTupleData {/ * link to next tuple in same bucket * / / link the hash value of the next tuple union {struct HashJoinTupleData * unshared; dsa_pointer shared;} next; uint32 hashvalue; / * tuple in the same bucket; tuple's hash code * / * Tuple data, in MinimalTuple format, follows on a MAXALIGN boundary * /} HashJoinTupleData # define HJTUPLE_OVERHEAD MAXALIGN (sizeof (HashJoinTupleData)) # define HJTUPLE_MINTUPLE (hjtup)\ ((MinimalTuple) ((char *) (hjtup) + HJTUPLE_OVERHEAD)) II. Source code interpretation

ExecHashJoinOuterGetTuple

Gets the next external tuple of hashjoin in non-parallel mode: either on the first execution of the external plan node or from a temporary file in the hashjoin batch.

/ *- HJ_NEED_NEW_OUTER stage-* / * * ExecHashJoinOuterGetTuple * * Get the next outer tuple for a parallel oblivious hashjoin: either by * executing the outer plan node in the first pass Or from the temp * files for the hashjoin batches. * get the next external tuple of hashjoin in non-parallel mode: either when the external plan node is executed for the first time, or from a temporary file in the hashjoin batch. * * Returns a null slot if no more outer tuples (within the current batch) * if there are no more external tuples (in the current batch), an empty slot is returned. * * On success, the tuple's hashvalue is stored at * hashvalue-this is * either originally computed, or re-read from the temp file. * if successful, the hash value of the tuple is stored in the input parameter * hashvalue-- which is initially calculated or re-read from a temporary file. * / static TupleTableSlot * ExecHashJoinOuterGetTuple (PlanState * outerNode,//outer node HashJoinState * hjstate,//Hash Join execution status uint32 * hashvalue) / / hash value {HashJoinTable hashtable = hjstate- > hj_HashTable;//hash table int curbatch = hashtable- > slot if returned by curbatch;// current batch TupleTableSlot * slot;// (curbatch = = 0) / * first batch If it is the first pass * / {/ * Check to see if first outer tuple was already fetched by * ExecHashJoin () and not used yet. * check whether the first external tuple has been obtained by the ExecHashJoin () function and has not been used. * / slot = hjstate- > hj_FirstOuterTupleSlot; if (! TupIsNull (slot)) hjstate- > hj_FirstOuterTupleSlot = NULL;// reset slot else slot = ExecProcNode (outerNode); / / if NULL, get slot while (! TupIsNull (slot)) / / slot is not NULL {/ * * We have to compute the tuple's hash value. * evaluate hash value * / ExprContext * econtext = hjstate- > js.ps.ps_ExprContext;// expression evaluation context econtext- > ecxt_outertuple = slot / / Storage the acquired slot if (ExecHashGetHashValue (hashtable, econtext, hjstate- > hj_OuterHashKeys, true, / * outer tuple * / HJ_FILL_OUTER (hjstate)) Hashvalue)) / / calculate the hash value {/ * remember outer relation is not empty for possible rescan * / hjstate- > hj_OuterNotEmpty = true / / set the tag (outer is not empty) return slot;// returns matching slot} / * * That tuple couldn't match because of a NULL, so discard it and * continue with the next one. * the tuple does not match. Discard it and move on to the next tuple. * / slot = ExecProcNode (outerNode); / / continue to get the next}} else if (curbatch)

< hashtable->

Nbatch) / / is not the first batch {BufFile * file = hashtable- > outerBatchFile [curbatch]; / / get the buffered file / * * In outer-join cases, we could get here even though the batch file * is empty. * in the case of an external connection, it can be processed here even if the batch file is empty. * / if (file = = NULL) return NULL;// returns slot = ExecHashJoinGetSavedTuple (hjstate, file, hashvalue, hjstate- > hj_OuterTupleSlot) if the file is NULL / / get slot if (! TupIsNull (slot)) return slot;// non-NULL from the file, return} / * End of this batch * / completed, return NULL return NULL;} / * * ExecHashGetHashValue * Compute the hash value for a tuple * ExecHashGetHashValue-calculate the hash value of the tuple * * The tuple to be tested must be in either econtext- > ecxt_outertuple or * econtext- > ecxt_innertuple. Vars in the hashkeys expressions should have * varno either OUTER_VAR or INNER_VAR. * the tuple to be tested must be in econtext- > ecxt_outertuple or econtext- > ecxt_innertuple. * the Vars in the hashkeys expression should have varno, that is, OUTER_VAR or INNER_VAR. * * A true result means the tuple's hashvalue has been successfully computed * and stored at * hashvalue. A false result means the tuple cannot match * because it contains a null attribute, and hence it should be discarded * immediately. (If keep_nulls is true then false is never returned.) * T means that the hash value of tuple has been successfully calculated and stored in the * hashvalue parameter. * F means that the tuple cannot match because it contains the null attribute, so it should be discarded immediately. * (if keep_nulls is true, F will never be returned.) * / boolExecHashGetHashValue (HashJoinTable hashtable,//Hash table ExprContext * econtext,// context List * hashkeys,//Hash key linked list whether the outer tuple bool keep_nulls,// saves NULL uint32 * hashvalue) / / the returned hash value {uint32 hashkey = 0 / / FmgrInfo * hashfunctions;//hash function ListCell * hk;// temporary variable int I = 0; MemoryContext oldContext; / * * We reset the eval context each time to reclaim any memory leaked in the * hashkey expressions. * each time we reset the eval context to reclaim the memory allocated in the hashkey expression. * / ResetExprContext (econtext); / / switch context oldContext = MemoryContextSwitchTo (econtext- > ecxt_per_tuple_memory); if (outer_tuple) hashfunctions = hashtable- > outer_hashfunctions;// outer tuple else hashfunctions = hashtable- > inner_hashfunctions;// inner table tuple foreach (hk, hashkeys) / / traversal Hash key {ExprState * keyexpr = (ExprState *) lfirst (hk) / / key value expression Datum keyval; bool isNull; / * rotate hashkey left 1 bit at each step * / / the hash key moves 1 bit to the left hashkey = (number of hashkey nbuckets;// buckets uint32 nbatch = (uint32) hashtable- > nbatch / / batch number if (nbatch > 1) / / batch > 1 {/ * we can do MOD by masking, DIV by shifting * / / We can implement MOD by shielding and DIV * bucketno = hashvalue & (nbuckets-1) by moving; / / nbuckets-1 equals N 1 * batchno = (hashvalue > hashtable- > log2_nbuckets) & (nbatch-1) } else {* bucketno = hashvalue & (nbuckets-1); / / there is only one batch, and you can simply process it * batchno = 0;}}

ExecHashGetSkewBucket

Returns the index of the tilt bucket of this hash value, or INVALID_SKEW_BUCKET_NO if the hash value is not associated with any active tilt bucket.

/ * ExecHashGetSkewBucket * * Returns the index of the skew bucket for this hashvalue, * or INVALID_SKEW_BUCKET_NO if the hashvalue is not * associated with any active skew bucket. * returns the index of the tilt bucket of this hash value, or INVALID_SKEW_BUCKET_NO if the hash value is not associated with any active tilt bucket. * / intExecHashGetSkewBucket (HashJoinTable hashtable, uint32 hashvalue) {int bucket; / * * Always return INVALID_SKEW_BUCKET_NO if not doing skew optimization (in * particular, this happens after the initial batch is done) * if skew optimization is not performed (especially after the initial batch is complete), INVALID_SKEW_BUCKET_NO is returned. * / if (! hashtable- > skewEnabled) return INVALID_SKEW_BUCKET_NO; / * * Since skewBucketLen is a power of 2, we can do a modulo by ANDing.' * because skewBucketLen is the power of 2, you can make a module through AND operation. * / bucket = hashvalue & (hashtable- > skewBucketLen-1); / * * While we have not hit a hole in the hashtable and have not hit the * desired bucket, we have collided with some other hashvalue, so try the * next bucket location. * although we did not find a hole in the hash table, nor did we find the required bucket, * it conflicts with some other hash values, so try the next bucket location. * / while (hashtable- > skewBucket [bucket]! = NULL & & hashtable- > skewBucket [bucket]-> hashvalue! = hashvalue) bucket = (bucket + 1) & (hashtable- > skewBucketLen-1); / * * Found the desired bucket? * find bucket and return * / if (hashtable- > skewBucket [bucket]! = NULL) return bucket; / * There must not be any hashtable entry for this hash value. * / otherwise return INVALID_SKEW_BUCKET_NO return INVALID_SKEW_BUCKET_NO;}

ExecHashJoinSaveTuple

Save tuples in a batch file. Each tuple records its hash value in the file, followed by a tuple in a minimized format.

/ * * ExecHashJoinSaveTuple * save a tuple to a batch file. * Save tuples in batch file * * The data recorded in the file for each tuple is its hash value, * then the tuple in MinimalTuple format. * each tuple records its hash value in the file, followed by a tuple in a minimized format. * * Note: it is important always to call this in the regular executor * context, not in a shorter-lived context; else the temp file buffers * will get messed up. * Note: it is always important to call it in the context of a regular executor, not in a short life cycle; * otherwise the temporary file buffer will be messed up. * / voidExecHashJoinSaveTuple (MinimalTuple tuple, uint32 hashvalue, BufFile * * fileptr) {BufFile * file = * fileptr;// file pointer size_t written;// write size if (file = = NULL) {/ * First write to this batch file, so open it. * / / the file pointer is NULL. If you write for the first time, open the batch file file = BufFileCreateTemp (false); * fileptr = file;} / / write the hash value first, and return the written size written = BufFileWrite (file, (void *) & hashvalue, sizeof (uint32)) If (written! = sizeof (uint32)) / / write error ereport (ERROR, (errcode_for_file_access (), errmsg ("could not write to hash-join temporary file:% m"); / / write tuple written = BufFileWrite (file, (void *) tuple, tuple- > t_len) If (written! = tuple- > t_len) / / write error, error ereport (ERROR, (errcode_for_file_access (), errmsg ("could not write to hash-join temporary file:% m"));}

ExecFetchSlotMinimalTuple

Extract slot data in a format that minimizes physical tuples

/ *-* ExecFetchSlotMinimalTuple * Fetch the slot's minimal physical tuple * extract slot data in a format that minimizes physical tuples. * * If the given tuple table slot can hold a minimal tuple, indicated by a * non-NULL get_minimal_tuple callback, the function returns the minimal * tuple returned by that callback. It assumes that the minimal tuple * returned by the callback is "owned" by the slot i.e. The slot is * responsible for freeing the memory consumed by the tuple. Hence it sets * * shouldFree to false, indicating that the caller should not free the * memory consumed by the minimal tuple. In this case the returned minimal * tuple should be considered as read-only. * if the given tuple table slot can hold the minimum tuple indicated by the non-NULL get_minimal_tuple callback function, * the function will return the smallest tuple returned by the callback function. * it assumes that the smallest tuple returned by the callback function is "owned" by slot, that is, slot is responsible for releasing the memory consumed by the tuple. * therefore, it sets * shouldFree to false, indicating that the caller should not free memory. * in this case, the smallest tuple returned should be considered read-only. * * If that callback is not supported, it calls copy_minimal_tuple callback * which is expected to return a copy of minimal tuple represnting the * contents of the slot. In this case * shouldFree is set to true, * indicating the caller that it should free the memory consumed by the * minimal tuple. In this case the returned minimal tuple may be written * up. * if the callback function is not supported, the copy_minimal_tuple callback function is called. * the callback will return a minimum copy of the tuple representing the slot content. * * shouldFree is set to true, which means that the caller should free memory. * in this case, the smallest tuple returned can be written. *-/ MinimalTupleExecFetchSlotMinimalTuple (TupleTableSlot * slot, bool * shouldFree) {/ * sanity checks * Security check * / Assert (slot! = NULL); Assert (! TTS_EMPTY (slot)) If (slot- > tts_ops- > get_minimal_tuple) / / call slot- > tts_ops- > get_minimal_tuple {/ / if the call is successful, the tuple is read-only and slot is responsible for releasing if (shouldFree) * shouldFree = false; return slot- > tts_ops- > get_minimal_tuple (slot) } else {/ / call is not successful, set to true, and the caller releases if (shouldFree) * shouldFree = true; return slot- > tts_ops- > copy_minimal_tuple (slot); / / calls the copy_minimal_tuple function}} III. Tracking analysis

The test script is as follows

Testdb=# set enable_nestloop=false;SETtestdb=# set enable_mergejoin=false SETtestdb=# explain verbose select dw.*,grjf.grbh,grjf.xm,grjf.ny,grjf.je testdb-# from t_dwxx dw,lateral (select gr.grbh,gr.xm,jf.ny Jf.je testdb (# from t_grxx gr inner join t_jfxx jftestdb (# on gr.dwbh = dw.dwbh testdb (# and gr.grbh = jf.grbh) grjftestdb-# order by dw.dwbh QUERY PLAN -Sort (cost=14828.83..15078.46 rows=99850 width=47) Output: dw.dwmc Dw.dwbh, dw.dwdz, gr.grbh, gr.xm, jf.ny, jf.je Sort Key: dw.dwbh-> Hash Join (cost=3176.00..6537.55 rows=99850 width=47) Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh, gr.xm, jf.ny Jf.je Hash Cond: (gr.grbh):: text = (jf.grbh):: text)-> Hash Join (cost=289.00..2277.61 rows=99850 width=32) Output: dw.dwmc, dw.dwbh, dw.dwdz, gr.grbh Gr.xm Inner Unique: true Hash Cond: (gr.dwbh):: text = (dw.dwbh):: text)-> Seq Scan on public.t_grxx gr (cost=0.00..1726.00 rows=100000 width=16) Output: gr.dwbh, gr.grbh, gr.xm, gr.xb Gr.nl-> Hash (cost=164.00..164.00 rows=10000 width=20) Output: dw.dwmc, dw.dwbh, dw.dwdz-> Seq Scan on public.t_dwxx dw (cost=0.00..164.00 rows=10000 width=20) Output: dw.dwmc, dw.dwbh Dw.dwdz-> Hash (cost=1637.00..1637.00 rows=100000 width=20) Output: jf.ny, jf.je, jf.grbh-> Seq Scan on public.t_jfxx jf (cost=0.00..1637.00 rows=100000 width=20) Output: jf.ny, jf.je, jf.grbh (20 rows)

Start gdb and set breakpoint

(gdb) b ExecHashJoinOuterGetTupleBreakpoint 1 at 0x702edc: file nodeHashjoin.c, line 807. (gdb) b ExecHashGetHashValueBreakpoint 2 at 0x6ff060: file nodeHash.c, line 1778. (gdb) b ExecHashGetBucketAndBatchBreakpoint 3 at 0x6ff1df: file nodeHash.c, line 1880. (gdb) b ExecHashJoinSaveTupleBreakpoint 4 at 0x703973: file nodeHashjoin.c, line 1214. (gdb)

ExecHashGetHashValue

ExecHashGetHashValue- > enter the function ExecHashGetHashValue

(gdb) cContinuing.Breakpoint 2, ExecHashGetHashValue (hashtable=0x14acde8, econtext=0x149c3d0, hashkeys=0x14a8e40, outer_tuple=false, keep_nulls=false, hashvalue=0x7ffc7eba5c20) at nodeHash.c:17781778 uint32 hashkey = 0

ExecHashGetHashValue- > initialize, switch memory context

1778 uint32 hashkey = 0; (gdb) n1781 int I = 0; (gdb) 1788 ResetExprContext (econtext); (gdb) 1790 oldContext = MemoryContextSwitchTo (econtext- > ecxt_per_tuple_memory); (gdb) 1792 if (outer_tuple)

ExecHashGetHashValue- > inner hash function

1792 if (outer_tuple) (gdb) 1795 hashfunctions = hashtable- > inner_hashfunctions

ExecHashGetHashValue- > get HHS key information

The dwbh field of No. 1 RTE (varnoold = 1, that is, t_dwxx) (varattno = 2)

(gdb) 1797 foreach (hk, hashkeys) (gdb) 1799 ExprState * keyexpr = (ExprState *) lfirst (hk) (gdb) 1804 hashkey = (hashkey expr$3 = {xpr = {type = T_RelabelType}, arg = 0x1499018, resulttype = 25, resulttypmod =-1, resultcollid = 100, relabelformat = COERCE_IMPLICIT_CAST, location =-1} (gdb) p * ((RelabelType *) keyexpr- > expr)-> arg$4 = {type = T_Var} (gdb) p * (Var *) ((RelabelType *) keyexpr- > expr)-> arg$5 = {xpr = {type = T_Var}, varno = 65000, varattno = 2, vartype = 1043, vartypmod = 24 Varcollid = 100, varlevelsup = 0, varnoold = 1, varoattno = 2, location = 218} (gdb)

ExecHashGetHashValue- > get hash value, parse expression

(gdb) n1809 keyval = ExecEvalExpr (keyexpr, econtext, & isNull); (gdb) 1824 if (isNull) (gdb) p hashkey$6 = 0 (gdb) p keyval$7 = 140460362257270 (gdb)

ExecHashGetHashValue- > the returned value is not NULL

(gdb) p isNull$8 = false (gdb) n1838 hkey = DatumGetUInt32 (FunctionCall1 (& hashfunctions [I], keyval))

ExecHashGetHashValue- > calculate the hash value

(gdb) n1839 hashkey ^ = hkey; (gdb) p hkey$9 = 3663833849 (gdb) p hashkey$10 = 0 (gdb) n1842 iTunes; (gdb) p hashkey$11 = 3663833849 (gdb)

ExecHashGetHashValue- > returns the calculation result

(gdb) n1797 foreach (hk, hashkeys) (gdb) 1845 MemoryContextSwitchTo (oldContext); (gdb) 1847 * hashvalue = hashkey; (gdb) 1848 return true; (gdb) 1849}

ExecHashGetBucketAndBatch

ExecHashGetBucketAndBatch- > enter ExecHashGetBucketAndBatch

(gdb) cContinuing.Breakpoint 3, ExecHashGetBucketAndBatch (hashtable=0x14acde8, hashvalue=3663833849, bucketno=0x7ffc7eba5bdc, batchno=0x7ffc7eba5bd8) at nodeHash.c:18801880 uint32 nbuckets = (uint32) hashtable- > nbuckets

ExecHashGetBucketAndBatch- > get the number of bucket and batches

1880 uint32 nbuckets = (uint32) hashtable- > nbuckets; (gdb) n1881 uint32 nbatch = (uint32) hashtable- > nbatch; (gdb) 1883 if (nbatch > 1) (gdb) p nbuckets$12 = 16384 (gdb) p nbatch$13 = 1 (gdb)

ExecHashGetBucketAndBatch- > calculate bucket number and batch number (only one batch, set to 0)

(gdb) n1891 * bucketno = hashvalue & (nbuckets-1); (gdb) 1892 * batchno = 0; (gdb) 1894} (gdb) p bucketno$14 = (int *) 0x7ffc7eba5bdc (gdb) p * bucketno$15 = 11001 (gdb)

ExecHashJoinOuterGetTuple

ExecHashJoinOuterGetTuple- > enter the ExecHashJoinOuterGetTuple function

(gdb) info breakNum Type Disp Enb Address What1 breakpoint keep y 0x0000000000702edc in ExecHashJoinOuterGetTuple at nodeHashjoin.c:8072 breakpoint keep y 0x00000000006ff060 in ExecHashGetHashValue at nodeHash.c:1778 breakpoint already hit 4 times3 breakpoint keep y 0x00000000006ff1df in ExecHashGetBucketAndBatch at nodeHash.c:1880 breakpoint already hit 4 times4 breakpoint keep y 0x0000000000703973 in ExecHashJoinSaveTuple at nodeHashjoin.c:1214 (gdb) del 2 (gdb) del 3 (gdb) cContinuing.Breakpoint 1, ExecHashJoinOuterGetTuple (outerNode=0x149ba10, hjstate=0x149b738 Hashvalue=0x7ffc7eba5ccc) at nodeHashjoin.c:807807 HashJoinTable hashtable = hjstate- > hj_HashTable (gdb)

ExecHashJoinOuterGetTuple- > View input parameters

OuterNode:outer relation is the relation obtained by sequential scanning (sequential scanning of t_jfxx)

Hjstate:Hash Join execution status

Hashvalue:Hash value

(gdb) p * outerNode$16 = {type = T_SeqScanState, plan = 0x1494d10, state = 0x149b0f8, ExecProcNode = 0x71578d, ExecProcNodeReal = 0x71578d, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, ps_ResultTupleSlot = 0x149c178, ps_ExprContext = 0x149bb28, ps_ProjInfo = 0x0, 0x0 = scandesc} (scandesc) p * 0x7fbfa69a8308 = {0x7fbfa69a8308 = {0x7fbfa69a8308 = {0x7fbfa69a8308 = gdb, =, =, ExecProcNodeReal = 0x70291d, instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual = 0x0, lefttree = 0x149ba10, righttree = 0x149c2b8, initPlan = 0x0, subPlan = 0x0, chgParam = 0x0, ps_ResultTupleSlot = 0x14a7498, ps_ExprContext = 0x149b950, ps_ProjInfo = 0x149cef0, scandesc = 0x0}, jointype = JOIN_INNER, single_match = true, joinqual = 0x0}, hashclauses = 0x14a7b30, hj_OuterHashKeys = true, joinqual = 0x0}, hashclauses = 0x14a7b30, hj_OuterHashKeys = hj_OuterHashKeys, hj_OuterHashKeys = 0x14a8930, hj_InnerHashKeys, = 0 Hj_CurBucketNo = 0, hj_CurSkewBucketNo =-1, hj_CurTuple = 0x0, hj_OuterTupleSlot = 0x14a79f0, hj_HashTupleSlot = 0x149cc18, hj_NullOuterTupleSlot = 0x0, hj_NullInnerTupleSlot = 0x0, hj_FirstOuterTupleSlot = 0x149bbe8, hj_JoinState = 2, hj_MatchedOuter = false, hj_OuterNotEmpty = false} (gdb) p * hashvalue$18 = 32703 (gdb)

ExecHashJoinOuterGetTuple- > has only one batch and the batch number is 0

(gdb) n808 int curbatch = hashtable- > curbatch; (gdb) 811 if (curbatch = = 0) / * if it is the first pass * / (gdb) p curbatch$20 = 0

ExecHashJoinOuterGetTuple- > get the first outer tuple slot (not NULL), and reset hjstate- > hj_FirstOuterTupleSlot to NULL

(gdb) n817 slot = hjstate- > hj_FirstOuterTupleSlot (gdb) 818 if (! TupIsNull (slot)) (gdb) p * slot$21 = {type = T_TupleTableSlot, tts_isempty = false, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = false, tts_tuple = 0x14ac200, tts_tupleDescriptor = 0x7fbfa69a8308, tts_mcxt = 0x149afe0, tts_buffer = 345, tts_nvalid = 0, tts_values = 0x149bc48, tts_isnull = 0x149bc70, tts_mintuple = 0x0, tts_minhdr = {t_len = 0 T_self = {ip_blkid = {bi_hi = 0, bi_lo = 0}, ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 0, tts_fixedTupleDescriptor = true} (gdb) (gdb) n819 hjstate- > hj_FirstOuterTupleSlot = NULL (gdb)

ExecHashJoinOuterGetTuple- > get it through a loop to find the matching slot

(gdb) 823 while (! TupIsNull (slot)) (gdb) n828 ExprContext * econtext = hjstate- > js.ps.ps_ExprContext; (gdb)

ExecHashJoinOuterGetTuple- > successful match. Return slot

(gdb) n830 econtext- > ecxt_outertuple = slot; (gdb) 834 HJ_FILL_OUTER (hjstate), (gdb) 831 if (ExecHashGetHashValue (hashtable, econtext, (gdb) 838 hjstate- > hj_OuterNotEmpty = true; (gdb) 840 return slot (gdb) p * slot$22 = {type = T_TupleTableSlot, tts_isempty = false, tts_shouldFree = false, tts_shouldFreeMin = false, tts_slow = true, tts_tuple = 0x14ac200, tts_tupleDescriptor = 0x7fbfa69a8308, tts_mcxt = 0x149afe0, tts_buffer = 345, tts_nvalid = 1, tts_values = 0x149bc48, tts_isnull = 0x149bc70, tts_mintuple = 0x0, tts_minhdr = {t_len = 0, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0} Ip_posid = 0}, t_tableOid = 0, t_data = 0x0}, tts_off = 2, tts_fixedTupleDescriptor = true} (gdb) so far The study on "implementation logic analysis of ExecHashJoin dependence on other functions in PostgreSQL" is over. I hope to be able to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.