PostgreSQL Source Code interpretation (48)-query statement # 33 (query_planner function # 9) 07/04 Update SLTechnology News&Howtos

PostgreSQL Source Code interpretation (48)-query statement # 33 (query_planner function # 9)

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

The previous section has introduced the main implementation logic of the subfunctions remove_useless_joins, reduce_unique_semijoins, and add_placeholders_to_base_rels of the function query_planner, and this section continues to introduce the implementation logic of create_lateral_join_info, match_foreign_keys_to_quals, and extract_restriction_or_clauses.

Query_planner code snippet:

/ /... / * * Construct the lateral reference sets now that we have finalized * PlaceHolderVar eval levels. * / create_lateral_join_info (root); / / create Lateral connection information / * * Match foreign keys to equivalence classes and join quals. This must be * done after finalizing equivalence classes, and it's useful to wait till * after join removal so that we can skip processing foreign keys * involving removed relations. * / match_foreign_keys_to_quals (root); / / match foreign key information / * * Look for join OR clauses that we can extract single-relation * restriction OR clauses from. * / extract_restriction_or_clauses (root); / / extract constraints from OR statements / * * We should now have size estimates for every actual table involved in * the query, and we also know which if any have been deleted from the * query by join removal; so we can compute total_table_pages. * * Note that appendrels are not double-counted here, even though we don't * bother to distinguish RelOptInfos for appendrel parents, because the * parents will still have size zero. * * XXX if a table is self-joined, we will count it once per appearance, * which perhaps is the wrong thing. But that's not completely clear, * and detecting self-joins here is difficult, so ignore it for now. * / total_pages = 0; for (rti = 1; rti)

< root->

Simple_rel_array_size; rti++) / / calculate the total pages {RelOptInfo * brel = root- > simple_rel_ array [RTI]; if (brel = = NULL) continue; Assert (brel- > relid = = rti); / * sanity check on array * / if (IS_SIMPLE_REL (brel)) total_pages + = (double) brel- > pages;} root- > total_table_pages = total_pages / / assign / /. I. data structure

RelOptInfo

Data structures related to LATERAL in RelOptInfo

Typedef struct RelOptInfo {the NodeTag type;// node identifies the RelOptKind reloptkind;//RelOpt type / /... / * parameterization information needed for both base rels and join rels * / / * (see also lateral_vars and lateral_referencers) * / Relids direct_lateral_relids; / * uses the explicit syntax and depends on the Relids rels directly laterally referenced * / Relids lateral_relids; / * minimum parameterization of rel * /. List * lateral_vars; / * Vars/PHVs LATERAL Vars and PHVs referenced by rel * / Relids lateral_referencers; / * depends on the Relids rels that reference me laterally * / /...} RelOptInfo; of the relationship. 2. Source code interpretation

Create_lateral_join_info

Before PG provides LATERAL syntax, it is assumed that all subqueries can exist independently and cannot refer to each other or upper-level attributes. In order to reference other or upper-level attributes, the LATERAL keyword needs to be explicitly specified before the subquery.

For example, the following SQL statement does not work properly without explicitly specifying the LATERAL keyword:

Testdb=# select A. from t_grxx from t_grxx b.inner join t_jfxx b.je testdb-# from t_dwxx a, btestdb-# where a.dwbh = '1001'testdb-# order by b.dwbhterEror: invalid reference to FROM-clause entry for table "a" a "LINE 2:. From t_grxx T1 inner join t_jfxx T2 on t1.dwbh = a.dwbh and... ^ HINT: There is an entry for table "a", but it cannot be referenced from this part of the query.

After you explicitly specify LATERAL before the subquery, you can run normally:

As described in the function notes, the purpose of the create_lateral_join_info function is to populate the four related variables in RelOptInfo, "Fill in the per-base-relation direct_lateral_relids, lateral_relids and and lateral_referencers sets"

The source code is as follows:

/ * * create_lateral_join_info * Fill in the per-base-relation direct_lateral_relids, lateral_relids * and lateral_referencers sets. * This has to run after deconstruct_jointree, because we need to know the * final ph_eval_at values for PlaceHolderVars. * / void create_lateral_join_info (PlannerInfo * root) {bool found_laterals = false; Index rti; ListCell * lc; / * We need do nothing if the query contains no LATERAL RTEs * / if (! root- > hasLateralRTEs) / / whether LateralRTE return; / * * Examine all baserels (the rel array has been set up by now) exists. * / for (rti = 1; rti

< root->

Simple_rel_array_size; rti++) / / traversing {RelOptInfo * brel = root- > simple_rel_ Array [RTI]; Relids lateral_relids; / * there may be empty slots corresponding to non-baserel RTEs * / if (brel = = NULL) continue; Assert (brel- > relid = = rti) / * sanity check on array * / ignore RTEs that are "other rels" * / if (brel- > reloptkind! = RELOPT_BASEREL) continue; lateral_relids = NULL; / * consider each laterally-referenced Var or PHV * / foreach (lc, brel- > lateral_vars) {Node * node = (Node *) lfirst (lc) If (IsA (node, Var)) {Var * var = (Var *) node; found_laterals = true; lateral_relids = bms_add_member (lateral_relids, var- > varno) } else if (IsA (node, PlaceHolderVar)) {PlaceHolderVar * phv = (PlaceHolderVar *) node; PlaceHolderInfo * phinfo = find_placeholder_info (root, phv, false); found_laterals = true Lateral_relids = bms_add_members (lateral_relids, phinfo- > ph_eval_at);} else Assert (false);} / * We now have all the simple lateral refs from this rel * / brel- > direct_lateral_relids = lateral_relids Brel- > lateral_relids = bms_copy (lateral_relids);} / * Now check for lateral references within PlaceHolderVars, and mark their * eval_at rels as having lateral references to the source rels. * * For a PHV that is due to be evaluated at a baserel, mark its source (s) * as direct lateral dependencies of the baserel (adding onto the ones * recorded above). If it's due to be evaluated at a join, mark its * source (s) as indirect lateral dependencies of each baserel in the join, * ie put them into lateral_relids but not direct_lateral_relids. This is * appropriate because we can't put any such baserel on the outside of a * join to one of the PHV's lateral dependencies, but on the other hand we * also can't yet join it directly to the dependency. * / foreach (lc, root- > placeholder_list) {PlaceHolderInfo * phinfo = (PlaceHolderInfo *) lfirst (lc); Relids eval_at = phinfo- > ph_eval_at; int varno; if (phinfo- > ph_lateral = = NULL) continue; / * PHV is uninteresting if no lateral refs * / found_laterals = true If (bms_get_singleton_member (eval_at, & varno)) {/ * Evaluation site is a baserel * / RelOptInfo * brel = find_base_rel (root, varno); brel- > direct_lateral_relids = bms_add_members (brel- > direct_lateral_relids, phinfo- > ph_lateral) Brel- > lateral_relids = bms_add_members (brel- > lateral_relids, phinfo- > ph_lateral);} else {/ * Evaluation site is a join * / varno =-1 While ((varno = bms_next_member (eval_at, varno)) > = 0) {RelOptInfo * brel = find_base_rel (root, varno); brel- > lateral_relids = bms_add_members (brel- > lateral_relids, phinfo- > ph_lateral) } / * If we found no actual lateral references, we're done; but reset the * hasLateralRTEs flag to avoid useless work later. * / if (! found_laterals) {root- > hasLateralRTEs = false; return;} / * * Calculate the transitive closure of the lateral_relids sets, so that * they describe both direct and indirect lateral references. If relation * X references Y laterally, and Y references Z laterally, then we will * have to scan X on the inside of a nestloop with Z, so for all intents * and purposes X is laterally dependent on Z too. * * This code is essentially Warshall's algorithm for transitive closure. * The outer loop considers each baserel, and propagates its lateral * dependencies to those baserels that have a lateral dependency on it. * / for (rti = 1; rti

< root->

Simple_rel_array_size; rti++) {RelOptInfo * brel = root- > simple_rel_ Array [RTI]; Relids outer_lateral_relids; Index rti2; if (brel = = NULL | | brel- > reloptkind! = RELOPT_BASEREL) continue; / * need not consider baserel further if it has no lateral refs * / outer_lateral_relids = brel- > lateral_relids If (outer_lateral_relids = = NULL) continue; / * else scan all baserels * / for (rti2 = 1; rti2

< root->

Simple_rel_array_size; rti2++) {RelOptInfo * brel2 = root- > simple_rel_ array [rti2]; if (brel2 = = NULL | | brel2- > reloptkind! = RELOPT_BASEREL) continue / * if brel2 has lateral ref to brel, propagate brel's refs * / if (bms_is_member (rti, brel2- > lateral_relids)) brel2- > lateral_relids = bms_add_members (brel2- > lateral_relids, outer_lateral_relids) }} / * * Now that we've identified all lateral references, mark each baserel * with the set of relids of rels that reference it laterally (possibly * indirectly)-that is, the inverse mapping of lateral_relids. * / for (rti = 1; rti

< root->

Simple_rel_array_size; rti++) {RelOptInfo * brel = root- > simple_rel_ Array [RTI]; Relids lateral_relids; int rti2; if (brel = = NULL | | brel- > reloptkind! = RELOPT_BASEREL) continue; / * Nothing to do at rels with no lateral refs * / lateral_relids = brel- > lateral_relids If (lateral_relids = = NULL) continue; / * We should not have broken the invariant that lateral_relids is * exactly NULL if empty. * / Assert (! bms_is_empty (lateral_relids)); / * Also, no rel should have a lateral dependency on itself * / Assert (! bms_is_member (rti, lateral_relids)); / * Mark this rel's referencees * / rti2 =-1 While ((rti2 = bms_next_member (lateral_relids, rti2)) > = 0) {RelOptInfo * brel2 = root- > simple_rel_ Array [rti2]; Assert (brel2! = NULL & & brel2- > reloptkind = = RELOPT_BASEREL); brel2- > lateral_referencers = bms_add_member (brel2- > lateral_referencers, rti) } / * * Lastly, propagate lateral_relids and lateral_referencers from appendrel * parent rels to their child rels. We intentionally give each child rel * the same minimum parameterization, even though it's quite possible that * some don't reference all the lateral rels. This is because any append * path for the parent will have to have the same parameterization for * every child anyway, and there's no value in forcing extra * reparameterize_path () calls. Similarly, a lateral reference to the * parent prevents use of otherwise-movable join rels for each child. * / for (rti = 1; rti

< root->

Simple_rel_array_size; rti++) {RelOptInfo * brel = root- > simple_rel_ Array [RTI]; RangeTblEntry * brte = root- > simple_rte_ Array [RTI]; / * Skip empty slots. Also skip non-simple relations i.e. Dead * relations. * / if (brel = = NULL | |! IS_SIMPLE_REL (brel)) continue; / * In the case of table inheritance, the parent RTE is directly linked * to every child table via an AppendRelInfo. In the case of table * partitioning, the inheritance hierarchy is expanded one level at a * time rather than flattened. Therefore, an other member rel that is * a partitioned table may have children of its own, and must * therefore be marked with the appropriate lateral info so that those * children eventually get marked also. * / Assert (brte); if (brel- > reloptkind = = RELOPT_OTHER_MEMBER_REL & & (brte- > rtekind! = RTE_RELATION | | brte- > relkind! = RELKIND_PARTITIONED_TABLE) continue If (brte- > inh) {foreach (lc, root- > append_rel_list) {AppendRelInfo * appinfo = (AppendRelInfo *) lfirst (lc); RelOptInfo * childrel; if (appinfo- > parent_relid! = rti) continue Childrel = root- > simple_rel_ array [appinfo-> child_relid]; Assert (childrel- > reloptkind = = RELOPT_OTHER_MEMBER_REL); Assert (childrel- > direct_lateral_relids = = NULL); childrel- > direct_lateral_relids = brel- > direct_lateral_relids; Assert (childrel- > lateral_relids = = NULL); childrel- > lateral_relids = brel- > lateral_relids Assert (childrel- > lateral_referencers = = NULL); childrel- > lateral_referencers = brel- > lateral_referencers;}

Follow-up analysis:

(gdb) b planmain.c:173Breakpoint 1 at 0x76961a: file planmain.c, line 173. (gdb) cContinuing.Breakpoint 1, query_planner (root=0x1702b80, tlist=0x174a870, qp_callback=0x76e97d, qp_extra=0x7ffd35e059c0) at planmain.c:177... (gdb) 212 create_lateral_join_info (root)

View the root variable:

(gdb) p * root$11 = {..., hasLateralRTEs = false,...}

After processing, LATERAL has disappeared (hasLateralRTEs = false) and does not need to be processed.

Match_foreign_keys_to_quals

This is the foreign key related processing, and the equivalence class is matched with the foreign key constraint and added to the conditional statement (quals).

/ * * match_foreign_keys_to_quals * Match foreign-key constraints to equivalence classes and join quals * * The idea here is to see which query join conditions match equality * constraints of a foreign-key relationship. For such join conditions, * we can use the FK semantics to make selectivity estimates that are more * reliable than estimating from statistics, especially for multiple-column * FKs, where the normal assumption of independent conditions tends to fail. * * In this function we annotate the ForeignKeyOptInfos in root- > fkey_list * with info about which eclasses and join qual clauses they match, and * discard any ForeignKeyOptInfos that are irrelevant for the query. * / void match_foreign_keys_to_quals (PlannerInfo * root) {List * newlist = NIL; ListCell * lc; foreach (lc, root- > fkey_list) {ForeignKeyOptInfo * fkinfo = (ForeignKeyOptInfo *) lfirst (lc); RelOptInfo * con_rel; RelOptInfo * ref_rel; int colno / * * Either relid might identify a rel that is in the query's rtable but * isn't referenced by the jointree so won't have a RelOptInfo. Hence * don't use find_base_rel () here. We can ignore such FKs. * / if (fkinfo- > con_relid > = root- > simple_rel_array_size | | fkinfo- > ref_relid > = root- > simple_rel_array_size) continue; / * just paranoia * / con_rel = root- > simple_rel_ Array [fkinfo-> con_relid]; if (con_rel = = NULL) continue; ref_rel = root- > simple_rel_ Array [fkinfo-> ref_relid] If (ref_rel = = NULL) continue; / * * Ignore FK unless both rels are baserels. This gets rid of FKs that * link to inheritance child rels (otherrels) and those that link to * rels removed by join removal (dead rels). * / if (con_rel- > reloptkind! = RELOPT_BASEREL | | ref_rel- > reloptkind! = RELOPT_BASEREL) continue; / * * Scan the columns and try to match them to eclasses and quals. * * Note: for simple inner joins, any match should be in an eclass. * "Loose" quals that syntactically match an FK equality must have * been rejected for EC status because they are outer-join quals or * similar. We can still consider them to match the FK if they are * not outerjoin_delayed. * / for (colno = 0; colno)

< fkinfo->

Nkeys; colno++) {AttrNumber con_attno, ref_attno; Oid fpeqop; ListCell * lc2 Fkinfo- > eclass [colno] = match_eclasses_to_foreign_key_col (root, fkinfo, colno) / * Don't bother looking for loose quals if we got an EC match * / if (fkinfo- > eclass [colno]! = NULL) {fkinfo- > nmatched_ec++; continue;} / * * Scan joininfo list for relevant clauses. Either rel's joininfo * list would do equally well; we use con_rel's. * / con_attno = fkinfo- > conkey [colno]; ref_attno = fkinfo- > confkey [colno]; fpeqop = InvalidOid / * we'll look this up only if needed * / foreach (lc2, con_rel- > joininfo) {RestrictInfo * rinfo = (RestrictInfo *) lfirst (lc2); OpExpr * clause = (OpExpr *) rinfo- > clause; Var * leftvar; Var * rightvar / * Ignore outerjoin-delayed clauses * / if (rinfo- > outerjoin_delayed) continue; / * Only binary OpExprs are useful for consideration * / if (! IsA (clause, OpExpr) | | list_length (clause- > args)! = 2) continue Leftvar = (Var *) get_leftop ((Expr *) clause); rightvar = (Var *) get_rightop ((Expr *) clause); / * Operands must be Vars, possibly with RelabelType * / while (leftvar & & IsA (leftvar, RelabelType) leftvar = (Var *) ((RelabelType *) leftvar)-> arg If (! (leftvar & & IsA (leftvar, Var)) continue; while (rightvar & & IsA (rightvar, RelabelType)) rightvar = (Var *) ((RelabelType *) rightvar)-> arg; if (! (rightvar & & IsA (rightvar, Var)) continue / * Now try to match the vars to the current foreign key cols * / if (fkinfo- > ref_relid = = leftvar- > varno & & ref_attno = = leftvar- > varattno & & fkinfo- > con_relid = = rightvar- > varno & & con_attno = = rightvar- > varattno) {/ * Vars match But is it the right operator? * / if (clause- > opno = = fkinfo- > conpfeqop [colno]) {fkinfo- > rinfos [colno] = lappend (fkinfo- > rinfos [colno], rinfo) Fkinfo- > nmatched_ri++ }} else if (fkinfo- > ref_relid = = rightvar- > varno & & ref_attno = = rightvar- > varattno & & fkinfo- > con_relid = = leftvar- > varno & & con_attno = = leftvar- > varattno) {/ * * Reverse match Must check commutator operator. Look it * up if we didn't already. (In the worst case we might * do multiple lookups here, but that would require an FK * equality operator without commutator, which is * unlikely.) * / if (! OidIsValid (fpeqop)) fpeqop = get_commutator (fkinfo- > conpfeqop [colno]) If (clause- > opno = = fpeqop) {fkinfo- > rinfos [colno] = lappend (fkinfo- > rinfos [colno], rinfo); fkinfo- > nmatched_ri++ } / * If we found any matching loose quals, count col as matched * / if (fkinfo- > rinfos [colno]) fkinfo- > nmatched_rcols++;} / * * Currently, we drop multicolumn FKs that aren't fully matched to the * query. Later we might figure out how to derive some sort of * estimate from them, in which case this test should be weakened to * "if ((fkinfo- > nmatched_ec + fkinfo- > nmatched_rcols) > 0)" * / if ((fkinfo- > nmatched_ec + fkinfo- > nmatched_rcols) = fkinfo- > nkeys) newlist = lappend (newlist, fkinfo);} / * Replace fkey_list, thereby discarding any useless entries * / root- > fkey_list = newlist;}

Extract_restriction_or_clauses

Check the OR-of-AND statement of the join connection condition and extract it if there are useful OR constraints.

As described in the code comments, (A.X = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45), conditions can be extracted (a.x = 42 OR a.x = 44) AND (b.y = 43 OR b.z = 45). The purpose of extracting these conditions is to push these conditions into the relationship before joining and reduce the number of tuples involved in join operations.

For example:

Testdb=# explain verbose select t1.*from t_dwxx T1 inner join t_grxx T2 on (t1.dwbh = '1001' and t2.grbh =' 901') OR (t1.dwbh = '1002' and t2.grbh =' 902') QUERY PLAN-- - -Nested Loop (cost=0.00..17.23 rows=5 width=474) Output: t1.dwmc T1.dwbh, t1.dwdz Join Filter: (t1.dwbh):: text = '1001'::text) AND ((t2.grbh):: text =' 901'::text)) OR (t1.dwbh):: text = '1002'::text) AND ((t2.grbh):: text =' 902'::text)-> Seq Scan on public.t_grxx T2 (cost=0.00..16.00 rows=4 width=38) Output: t2.dwbh, t2.grbh T2.xm, t2.xb, t2.nl Filter: ((t2.grbh):: text = '901'::text) OR ((t2.grbh):: text =' 902'::text))-> Materialize (cost=0.00..1.05 rows=2 width=474) Output: t1.dwmc, t1.dwbh, t1.dwdz-> Seq Scan on public.t_dwxx T1 (cost=0.00..1.04 rows=2 width=474) Output: t1.dwmc T1.dwbh, t1.dwdz Filter: ((t1.dwbh):: text = '1001'::text) OR ((t1.dwbh):: text =' 1002'::text)) (11 rows)

As you can see, t1.dwbh = '1001' OR t1.dwbh =' 1002' and t2.grbh = '901' OR t2.grbh =' 902' are pushed down to the datasheet scan before connection as filter conditions.

/ * * extract_restriction_or_clauses * Examine join OR-of-AND clauses to see if any useful restriction OR * clauses can be extracted. If so, add them to the query. * * Although a join clause must reference multiple relations overall, * an OR of ANDs clause might contain sub-clauses that reference just one * relation and can be used to build a restriction clause for that rel. * For example consider * WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45); * We can transform this into * WHERE ((a.x = 42 AND b.y = 43) OR (a.x = 44 AND b.z = 45) * AND (a.x = 42 OR a.x = 44) * AND (b.y = 43 OR b.z = 45) * which allows the latter clauses to be applied during the scans of an and b, * perhaps as index qualifications, and in any case reducing the number of * rows arriving at the join. In essence this is a partial transformation to * CNF (AND of ORs format). It is not complete, however, because we do not * unravel the original OR-doing so would usually bloat the qualification * expression to little gain. * * The added quals are partially redundant with the original OR, and therefore * would cause the size of the joinrel to be underestimated when it is finally * formed. This would be true of a full transformation to CNF as well; the * fault is not really in the transformation, but in clauselist_selectivity's * inability to recognize redundant conditions. We can compensate for this * redundancy by changing the cached selectivity of the original OR clause, * canceling out the (valid) reduction in the estimated sizes of the base * relations so that the estimated joinrel size remains the same. This is * a MAJOR HACK: it depends on the fact that clause selectivities are cached * and on the fact that the same RestrictInfo node will appear in every * joininfo list that might be used when the joinrel is formed. * And it doesn't work in cases where the size estimation is nonlinear * (i.e., outer and IN joins). But it beats not doing anything. * We examine each base relation to see if join clauses associated with it * contain extractable restriction conditions. If so, add those conditions * to the rel's baserestrictinfo and update the cached selectivities of the * join clauses. Note that the same join clause will be examined afresh * from the point of view of each baserel that participates in it, so its * cached selectivity may get updated multiple times. * / void extract_restriction_or_clauses (PlannerInfo * root) {Index rti; / * Examine each baserel for potential join OR clauses * / for (rti = 1; rti)

< root->

Simple_rel_array_size; rti++) {RelOptInfo * rel = root- > simple_rel_ Array [RTI]; ListCell * lc; / * there may be empty slots corresponding to non-baserel RTEs * / if (rel = = NULL) continue; Assert (rel- > relid = = rti) / * sanity check on array * / * ignore RTEs that are "other rels" * / if (rel- > reloptkind! = RELOPT_BASEREL) continue; / * * Find potentially interesting OR joinclauses. We can use any * joinclause that is considered safe to move to this rel by the * parameterized-path machinery, even though what we are going to do * with it is not exactly a parameterized path. * * However, it seems best to ignore clauses that have been marked * redundant (by setting norm_selec > 1). That likely can't happen * for OR clauses, but let's be safe. * / foreach (lc, rel- > joininfo) {RestrictInfo * rinfo = (RestrictInfo *) lfirst (lc); if (restriction_is_or_clause (rinfo) & & join_clause_is_movable_to (rinfo, rel) & & rinfo- > norm_selec

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.