What is the implementation logic of remove_useless_joins in PostgreSQL 07/02 Update SLTechnology News&Howtos

What is the implementation logic of remove_useless_joins in PostgreSQL

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is the implementation logic of remove_useless_joins in PostgreSQL". In daily operation, I believe that many people have doubts about the implementation logic of remove_useless_joins in PostgreSQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the question of "how is the realization logic of remove_useless_joins in PostgreSQL"! Next, please follow the editor to study!

Query_planner code snippet:

/ /... / * * Remove any useless outer joins. Ideally this would be done during * jointree preprocessing, but the necessary information isn't available * until we've built baserel data structures and classified qual clauses. * / joinlist = remove_useless_joins (root, joinlist); / / clear useless external connections / * * Also, reduce any semijoins with unique inner rels to plain inner joins. * Likewise, this can't be done until now for lack of needed info. * / reduce_unique_semijoins (root); / / eliminate semi-connections / * * Now distribute "placeholders" to base rels as needed. This has to be * done after join removal because removal could change whether a * placeholder is evaluable at a base rel. * / add_placeholders_to_base_rels (root); / / add PH / / to "base rels". I. data structure

PlaceHolderVar

PHInfo was introduced in the previous section

/ * * Placeholder node for an expression to be evaluated below the top level * of a plan tree. This is used during planning to represent the contained * expression. At the end of the planning process it is replaced by either * the contained expression or a Var referring to a lower-level evaluation of * the contained expression. Typically the evaluation occurs below an outer * join, and Var references above the outer join might thereby yield NULL * instead of the expression value. * * Although the planner treats this as an expression node type, it is not * recognized by the parser or executor, so we declare it here rather than * in primnodes.h. * / typedef struct PlaceHolderVar {Expr xpr; Expr * phexpr; / * the represented expression * / Relids phrels; / * base relids syntactically within expr src * / Index phid; / * ID for PHV (unique within planner run) * / Index phlevelsup; / * > 0 if PHV belongs to outer query * /} PlaceHolderVar

SpecialJoinInfo

/ * "Special join" info * One-sided outer joins constrain the order of joining partially but not * completely. We flatten such joins into the planner's top-level list of * relations to join, but record information about each outer join in a * SpecialJoinInfo struct. These structs are kept in the PlannerInfo node's * join_info_list. * * Similarly, semijoins and antijoins created by flattening IN (subselect) * and EXISTS (subselect) clauses create partial constraints on join order. * These are likewise recorded in SpecialJoinInfo structs. * * We make SpecialJoinInfos for FULL JOINs even though there is no flexibility * of planning for them, because this simplifies make_join_rel () 's API. * min_lefthand and min_righthand are the sets of base relids that must be * available on each side when performing the special join. Lhs_strict is * true if the special join's condition cannot succeed when the LHS variables * are all NULL (this means that an outer join can commute with upper-level * outer joins even if it appears in their RHS). We don't bother to set * lhs_strict for FULL JOINs, however. * It is not valid for either min_lefthand or min_righthand to be empty sets; * if they were, this would break the logic that enforces join order. * syn_lefthand and syn_righthand are the sets of base relids that are * syntactically below this special join. (These are needed to help compute * min_lefthand and min_righthand for higher joins.) * delay_upper_joins is set true if we detect a pushed-down clause that has * to be evaluated after this join is formed (because it references the RHS). * Any outer joins that have such a clause and this join in their RHS cannot * commute with this join, because that would leave noplace to check the * pushed-down clause. (We don't track this for FULL JOINs, either.) * For a semijoin, we also extract the join operators and their RHS arguments * and set semi_operators, semi_rhs_exprs, semi_can_btree, and semi_can_hash. * This is done in support of possibly unique-ifying the RHS, so we don't * bother unless at least one of semi_can_btree and semi_can_hash can be set * true. (You might expect that this information would be computed during * join planning; but it's helpful to have it available during planning of * parameterized table scans, so we store it in the SpecialJoinInfo structs.) * jointype is never JOIN_RIGHT; a RIGHT JOIN is handled by switching * the inputs to make it a LEFT JOIN. So the allowed values of jointype * in a join_info_list member are only LEFT, FULL, SEMI, or ANTI. * * For purposes of join selectivity estimation, we create transient * SpecialJoinInfo structures for regular inner joins; so it is possible * to have jointype = = JOIN_INNER in such a structure, even though this is * not allowed within join_info_list. We also create transient * SpecialJoinInfos with jointype = = JOIN_INNER for outer joins, since for * cost estimation purposes it is sometimes useful to know the join size under * plain innerjoin semantics. Note that lhs_strict, delay_upper_joins, and * of course the semi_xxx fields are not set meaningfully within such structs. * / typedef struct SpecialJoinInfo {NodeTag type; Relids min_lefthand; / * base relids in minimum LHS for join * / Relids min_righthand; / * base relids in minimum RHS for join * / Relids syn_lefthand; / * base relids syntactically within LHS * / Relids syn_righthand; / * base relids syntactically within RHS * / JoinType jointype; / * always INNER, LEFT, FULL, SEMI, or ANTI * / bool lhs_strict / * joinclause is strict for some LHS rel * / bool delay_upper_joins; / * can't commute with upper RHS * / / * Remaining fields are set only for JOIN_SEMI jointype: * / bool semi_can_btree; / * true if semi_operators are all btree * / bool semi_can_hash; / * true if semi_operators are all hash * / List * semi_operators / * OIDs of equality join operators * / List * semi_rhs_exprs; / * righthand-side expressions of these ops * /} SpecialJoinInfo; II. Source code interpretation

Remove_useless_joins

Clear useless connections, such as the following SQL statement:

Select t1.dwbh from t_grxx T1 left join t_dwxx T2 on t1.dwbh = t2.dwbh

Left connection, and t_dwxx.dwbh unique, such a connection is not needed connection, directly query t_grxx.

From the point of view of the execution plan, PG scans only t_grxx:

Testdb=# explain verbose select t1.dwbh from t_grxx T1 left join t_dwxx T2 on t1.dwbh = t2.dwbh QUERY PLAN-Seq Scan on public.t_grxx T1 (cost=0.00..14.00 rows=400 width=38) Output: t1.dwbh (2 rows)

The source code is as follows:

/ * * remove_useless_joins * Check for relations that don't actually need to be joined at all, * and remove them from the query. * * We are passed the current joinlist and return the updated list. Other * data structures that have to be updated are accessible via "root". * / List * remove_useless_joins (PlannerInfo * root, List * joinlist) {ListCell * lc; / * * We are only interested in relations that are left-joined to, so we can * scan the join_info_list to find them easily. * / restart: foreach (lc, root- > join_info_list) / / traverse the connection information linked list {SpecialJoinInfo * sjinfo = (SpecialJoinInfo *) lfirst (lc); int innerrelid; int nremoved; / * Skip if not removable * / if (! join_is_removable (root, sjinfo)) / / determine whether the connection continue can be cleared / * Currently, join_is_removable can only succeed when the sjinfo's * righthand is a single baserel. Remove that rel from the query and * joinlist. * / innerrelid = bms_singleton_member (sjinfo- > min_righthand); remove_rel_from_query (root, innerrelid, bms_union (sjinfo- > min_lefthand, sjinfo- > min_righthand)); / / remove the corresponding Rel / * We verify that exactly one reference gets removed from joinlist * / nremoved = 0 from the query Joinlist = remove_rel_from_joinlist (joinlist, innerrelid, & nremoved); if (nremoved! = 1) elog (ERROR, "failed to find relation% d in joinlist", innerrelid); / * We can delete this SpecialJoinInfo from the list too, since it's no * longer of interest. * / / update connection list information root- > join_info_list = list_delete_ptr (root- > join_info_list, sjinfo); / * Restart the scan. This is necessary to ensure we find all * removable joins independently of ordering of the join_info_list * (note that removal of attr_needed bits may make a join appear * removable that did not before). Also, since we just deleted the * current list cell, we'd have to have some kluge to continue the * list scan anyway. * / goto restart;} return joinlist;}

Reduce_unique_semijoins

Convert a semi-connection that can be simplified into an inner connection.

For example, the following SQL statement:

Select t1.*from t_grxx T1 where dwbh IN (select t2.dwbh from t_dwxx T2)

Since the dwbh of the subquery "select t2.dwbh from t_dwxx T2" is competitive, after the subquery is promoted, the dwbh of the t_grxx only corresponds to a unique record of the t_dwxx, so the semi-join can be converted into an inner join. The execution plan is as follows:

Testdb=# explain verbose select t1.*from t_grxx T1 where dwbh IN (select t2.dwbh from t_dwxx T2) QUERY PLAN-Hash Join (cost=1.07. .20.10 rows=6 width=176) Output: t1.dwbh T1.grbh, t1.xm, t1.xb, t1.nl Inner Unique: true Hash Cond: (t1.dwbh):: text = (t2.dwbh):: text)-> Seq Scan on public.t_grxx T1 (cost=0.00..14.00 rows=400 width=176) Output: t1.dwbh, t1.grbh, t1.xm, t1.xb T1.nl-> Hash (cost=1.03..1.03 rows=3 width=38) Output: t2.dwbh-> Seq Scan on public.t_dwxx T2 (cost=0.00..1.03 rows=3 width=38) Output: t2.dwbh (10 rows)

Follow-up analysis:

(gdb) n199 reduce_unique_semijoins (root); (gdb) stepreduce_unique_semijoins (root=0x1702968) at analyzejoins.c:520520 for (lc = list_head (root- > join_info_list); lc! = NULL; lc = next) (gdb) n522 SpecialJoinInfo * sjinfo = (SpecialJoinInfo *) lfirst (lc); (gdb)

View the SpecialJoinInfo memory structure:

Next = lnext (lc); (gdb) p * sjinfo$1 = {type = T_SpecialJoinInfo, min_lefthand = 0x1749818, min_righthand = 0x1749830, syn_lefthand = 0x1749570, syn_righthand = 0x17495d0, jointype = JOIN_SEMI, lhs_strict = true, delay_upper_joins = false, semi_can_btree = true, semi_can_hash = true, semi_operators = 0x17496c8, semi_rhs_exprs = 0x17497b8}

If the inner table (innerrel, that is, t_dwxx) supports uniqueness, you can consider converting semi-joins to inner joins

550 if (! rel_supports_distinctness (root, innerrel))... 575 root- > join_info_list = list_delete_ptr (root- > join_info_list, sjinfo);

The source code is as follows:

/ * * reduce_unique_semijoins * Check for semijoins that can be simplified to plain inner joins * because the inner relation is provably unique for the join clauses. * Ideally this would happen during reduce_outer_joins, but we don't have * enough information at that point. * * To perform the strength reduction when applicable, we need only delete * the semijoin's SpecialJoinInfo from root- > join_info_list. (We don't * bother fixing the join type attributed to it in the query jointree, * since that won't be consulted again.) * / void reduce_unique_semijoins (PlannerInfo * root) {ListCell * lc; ListCell * next; / * * Scan the join_info_list to find semijoins. We can't use foreach * because we may delete the current cell. * / for (lc = list_head (root- > join_info_list); lc! = NULL; lc = next) {SpecialJoinInfo * sjinfo = (SpecialJoinInfo *) lfirst (lc); / / Special connection information, previously generated int innerrelid; RelOptInfo * innerrel; Relids joinrelids; List * restrictlist; next = lnext (lc) through deconstruct function / * Must be a non-delaying semijoin to a single baserel, else we aren't * going to be able to do anything with it. (It's probably not * possible for delay_upper_joins to be set on a semijoin, but we * might as well check.) * / if (sjinfo- > jointype! = JOIN_SEMI | | sjinfo- > delay_upper_joins) continue; if (! bms_get_singleton_member (sjinfo- > min_righthand, & innerrelid)) continue Innerrel = find_base_rel (root, innerrelid); / * * Before we trouble to run generate_join_implied_equalities, make a * quick check to eliminate cases in which we will surely be unable to * prove uniqueness of the innerrel. * / if (! rel_supports_distinctness (root, innerrel) continue; / * Compute the relid set for the join we are considering * / joinrelids = bms_union (sjinfo- > min_lefthand, sjinfo- > min_righthand); / * Since we're only considering a single-rel RHS, any join clauses it * has must be clauses linking it to the semijoin's min_lefthand. We * can also consider EC-derived join clauses. * / restrictlist = list_concat (generate_join_implied_equalities (root, joinrelids, sjinfo- > min_lefthand, innerrel) Innerrel- > joininfo) / * Test whether the innerrel is unique for those clauses. * / if (! innerrel_is_unique (root, joinrelids, sjinfo- > min_lefthand, innerrel, JOIN_SEMI, restrictlist, true)) continue; / * OK, remove the SpecialJoinInfo from the list. * / root- > join_info_list = list_delete_ptr (root- > join_info_list, sjinfo); / / Delete special connection information}}

Add_placeholders_to_base_rels

Distribute PHV to base rels, the code is relatively simple

/ * * add_placeholders_to_base_rels * Add any required PlaceHolderVars to base rels' targetlists. * * If any placeholder can be computed at a base rel and is needed above it, * add it to that rel's targetlist. This might look like it could be merged * with fix_placeholder_input_needed_levels, but it must be separate because * join removal happens in between, and can change the ph_eval_at sets. There * is essentially the same logic in add_placeholders_to_joinrel, but we can't * do that part until joinrels are formed. * / void add_placeholders_to_base_rels (PlannerInfo * root) {ListCell * lc; foreach (lc, root- > placeholder_list) / / traversing the PH linked list {PlaceHolderInfo * phinfo = (PlaceHolderInfo *) lfirst (lc); Relids eval_at = phinfo- > ph_eval_at; int varno If (bms_get_singleton_member (eval_at, & varno) & & bms_nonempty_difference (phinfo- > ph_needed, eval_at)) / / add to the required RelOptInfo {RelOptInfo * rel = find_base_rel (root, varno) Rel- > reltarget- > exprs = lappend (rel- > reltarget- > exprs, copyObject (phinfo- > ph_var)); / * reltarget's cost and width fields will be updated later * /} at this point, the study of "what is the implementation logic of remove_useless_joins in PostgreSQL" is over, hoping to solve everyone's doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.