Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analysis of simplified processing of Having and Group By clauses in PostgreSQL query Optimization

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "simplified processing and analysis of Having and Group By clauses in PostgreSQL query optimization". In daily operations, I believe that many people have doubts about the simplified processing and analysis of Having and Group By clauses in PostgreSQL query optimization. Xiaobian consulted all kinds of data and sorted out simple and useful operation methods. I hope it will be helpful to answer the doubts of "simplified processing and analysis of Having and Group By clauses in PostgreSQL query optimization". Next, please follow the editor to study!

I. basic concepts

Simplify Having statement

Move the constraints in Having, such as those that can be promoted to Where, into the Where clause, otherwise they will remain in the Having statement. The purpose of this is because Having filtering is performed after Group by. If the filtering in Having can be promoted to Where, the "select" operation can be performed in advance to reduce the overhead of Group by.

In the following statement, conditional dwbh='1002' is promoted to Where for execution:

Testdb=# explain verbose select a.dwbh testdb-# from t_grxx a testdb-# group by a.dwbh # having count (*) > = 1 dwbh = '1002' QUERY PLAN-GroupAggregate (cost=15.01..15.06 rows=1 width=84) Output: dwbh Xb, count (*) Group Key: a.dwbh, a.xb Filter: (count (*) > = 1)-- count (*) > = 1 still remains in Having-> Sort (cost=15.01..15.02 rows=2 width=76) Output: dwbh, xb Sort Key: a.xb-> Seq Scan on public.t_grxx a (cost=0.00..15.00 rows=2 width=76) Output: dwbh Xb Filter: (a.dwbh):: text = '1002'::text)-- promote to Where and filter Tuple (10 rows) when scanning

If Group by & Grouping sets exists, it will not be processed:

Testdb=# explain verbosetestdb-# select a.dwbh testdb=# explain verbosetestdb-# select a. XB testdb-# group by testdb-# grouping sets count (*) dwbh testdb-# group by testdb-# grouping sets ((a.dwbh), (a.xb), ()) testdb-# having count (*) > = 1 and dwbh = '1002'testdb-# order by a.dwbhmena.xb QUERY PLAN-Sort (cost=28. 04.. 28.05 rows=3 width=84) Output: dwbh Xb, (count (*) Sort Key: a.dwbh, a.xb-> MixedAggregate (cost=0.00..28.02 rows=3 width=84) Output: dwbh, xb Count (*) Hash Key: a.dwbh Hash Key: a.xb Group Key: () Filter: (count (*) > = 1) AND ((a.dwbh):: text = '1002'::text))-- filter after scanning the data table-> Seq Scan on public.t_grxx a (cost=0.00..14.00 rows=400 width=76) Output: dwbh, grbh, xm, xb Nl (11 rows)

Simplify Group by statement

If the field list in Group by already contains all the columns of a table's primary key, the other columns of the table in the Group by statement can be deleted, which helps to improve the performance of sorting or Hash in the Group by process and reduce unnecessary overhead.

Testdb=# explain verbose select a.dwbh # having count (*) testdb-# from t_dwxx a testdb-# group by a.dwbh # dwbh (*) QUERY PLAN-HashAggregate (cost=13.20..15.20 rows=53 width=264) Output: dwbh, dwmc Count (*) Group Key: a.dwbh, a.dwmc-the grouping key is dwbh & dwmc Filter: (count (*) > = 1)-> Seq Scan on public.t_dwxx a (cost=0.00..11.60 rows=160 width=256) Output: dwmc, dwbh, dwdz (6 rows) testdb=# alter table t_dwxx add primary key (dwbh) -- add the primary key ALTER TABLEtestdb=# explain verbose select a.dwbh.dwmcjournal count (*) from t_dwxx a group by a.dwbhrea.dwmchaving count (*) > = 1 QUERY PLAN-HashAggregate (cost=1.05..1.09 rows=1 width=264) Output: dwbh, dwmc Count (*) Group Key: a.dwbh-the grouping key retains only dwbh Filter: (count (*) > = 1)-> Seq Scan on public.t_dwxx a (cost=0.00..1.03 rows=3 width=256) Output: dwmc, dwbh, dwdz (6 rows) II. Source code interpretation

The source code of related processing is located in the file subquery_planner.c, and the main function is subquery_planner. The code snippet is as follows:

/ * In some cases we may want to transfer a HAVING clause into WHERE. We * cannot do so if the HAVING clause contains aggregates (obviously) or * volatile functions (since a HAVING clause is supposed to be executed * only once per group). We also can't do this if there are any nonempty * grouping sets; moving such a clause into WHERE would potentially change * the results, if any referenced column isn't present in all the grouping * sets. (If there are only empty grouping sets, then the HAVING clause * must be degenerate as discussed below.) * * Also, it may be that the clause is so expensive to execute that we're * better off doing it only once per group, despite the loss of * selectivity. This is hard to estimate short of doing the entire * planning process twice, so we use a heuristic: clauses containing * subplans are left in HAVING. Otherwise, we move or copy the HAVING * clause into WHERE, in hopes of eliminating tuples before aggregation * instead of after. * * If the query has explicit grouping then we can simply move such a * clause into WHERE; any group that fails the clause will not be in the * output because none of its tuples will reach the grouping or * aggregation stage. Otherwise we must have a degenerate (variable-free) * HAVING clause, which we put in WHERE so that query_planner () can use it * in a gating Result node, but also keep in HAVING to ensure that we * don't emit a bogus aggregated row. (This could be done better, but it * seems not worth optimizing.) * * Note that both havingQual and parse- > jointree- > quals are in * implicitly-ANDed-list form at this point, even though they are declared * as Node. * / newHaving = NIL; foreach (l, (List *) parse- > havingQual) / / there is a Having conditional statement {Node * havingclause = (Node *) lfirst (l) / / get the predicate if ((parse- > groupClause & & parse- > groupingSets) | | contain_agg_clause (havingclause) | | contain_volatile_functions (havingclause) | | contain_subplans (havingclause)) {/ * keep it in HAVING * / / if there is a Group&&Group Sets statement / / keep newHaving = lappend (newHaving) Havingclause) } else if (parse- > groupClause & &! parse- > groupingSets) {/ * move it to WHERE * / / only group statements can be added to the condition of jointree: parse- > jointree- > quals = (Node *) lappend ((List *) parse- > jointree- > quals, havingclause) } else// has neither group nor grouping set. Copy a copy to the conditions of jointree {/ * put a copy in WHERE, keep it in HAVING * / parse- > jointree- > quals = (Node *) lappend ((List *) parse- > jointree- > quals, copyObject (havingclause)); newHaving = lappend (newHaving, havingclause) }} parse- > havingQual = (Node *) newHaving;// adjust having clause / * Remove any redundant GROUP BY columns * / remove_useless_groupby_columns (root); / / remove useless data columns from groupby

Remove_useless_groupby_columns

/ * * remove_useless_groupby_columns * Remove any columns in the GROUP BY clause that are redundant due to * being functionally dependent on other GROUP BY columns. * * Since some other DBMSes do not allow references to ungrouped columns, it's * not unusual to find all columns listed in GROUP BY even though listing the * primary-key columns would be sufficient. Deleting such excess columns * avoids redundant sorting work, so it's worth doing. When we do this, we * must mark the plan as dependent on the pkey constraint (compare the * parser's check_ungrouped_columns () and check_functional_grouping ()) * In principle, we could treat any NOT-NULL columns appearing in a UNIQUE * index as the determining columns. But as with check_functional_grouping (), * there's currently no way to represent dependency ona NOT NULL constraint, * so we consider only the pkey for now. * / static void remove_useless_groupby_columns (PlannerInfo * root) {Query * parse = root- > parse;// query tree Bitmapset * * groupbyattnos;// bitmap collection Bitmapset * * surplusvars;// bitmap collection ListCell * lc; int relid; / * No chance to do anything if there are less than two GROUP BY items * / if (list_length (parse- > groupClause)

< 2)//如果只有1个ITEMS,无需处理 return; /* Don't fiddle with the GROUP BY clause if the query has grouping sets */ if (parse->

GroupingSets) / / exists Grouping sets and does not deal with return; / * * Scan the GROUP BY clause to find GROUP BY items that are simple Vars. * Fill groupbyattnos [k] with a bitmapset of the column attnos of RTE k * that are GROUP BY items. * / / attributes for grouping groupbyattnos = (Bitmapset * *) palloc0 (sizeof (Bitmapset *) * (list_length (parse- > rtable) + 1)); foreach (lc, parse- > groupClause) {SortGroupClause * sgc = lfirst_node (SortGroupClause, lc); TargetEntry * tle = get_sortgroupclause_tle (sgc, parse- > targetList) Var * var = (Var *) tle- > expr; / * * Ignore non-Vars and Vars from other query levels. * * XXX in principle, stable expressions containing Vars could also be * removed, if all the Vars are functionally dependent on other GROUP * BY items. But it's not clear that such cases occur often enough to * be worth troubling over. * / if (! IsA (var, Var) | | var- > varlevelsup > 0) continue; / * OK, remember we have this Var * / relid = var- > varno; Assert (relid rtable); groupbyattnos [relid] = bms_add_member (groupbyattnos [relid], var- > varattno-FirstLowInvalidHeapAttributeNumber) } / * * Consider each relation and see if it is possible to remove some of its * Vars from GROUP BY. For simplicity and speed, we do the actual removal * in a separate pass. Here, we just fill surplusvars [k] with a bitmapset * of the column attnos of RTE k that are removable GROUP BY items. * / surplusvars = NULL; / * don't allocate array unless required * / relid = 0; / / if the grouping key of a Relation already contains a primary key column, remove the other columns foreach (lc, parse- > rtable) {RangeTblEntry * rte = lfirst_node (RangeTblEntry, lc); Bitmapset * relattnos; Bitmapset * pkattnos; Oid constraintOid; relid++ / * Only plain relations could have primary-key constraints * / if (rte- > rtekind! = RTE_RELATION) continue; / * Nothing to do unless this rel has multiple Vars in GROUP BY * / relattnos = groupbyattnos [relid]; if (bms_membership (relattnos)! = BMS_MULTIPLE) continue / * * Can't remove any columns for this rel if there is no suitable * (i.e., nondeferrable) primary key constraint. * / pkattnos = get_primary_key_attnos (rte- > relid, false, & constraintOid); if (pkattnos = = NULL) continue; / * * If the primary key is a proper subset of relattnos then we have * some items in the GROUP BY that can be removed. * / if (bms_subset_compare (pkattnos, relattnos) = = BMS_SUBSET1) {/ * To easily remember whether we've found anything to do, we don't * allocate the surplusvars [] array until we find something. * / if (surplusvars = = NULL) surplusvars = (Bitmapset * *) palloc0 (sizeof (Bitmapset *) * (list_length (parse- > rtable) + 1)); / * Remember the attnos of the removable columns * / surplusvars [relid] = bms_difference (relattnos, pkattnos) / * Also, mark the resulting plan as dependent on this constraint * / parse- > constraintDeps = lappend_oid (parse- > constraintDeps, constraintOid);} / * * If we found any surplus Vars, build a new GROUP BY clause without them. * (Note: this may leave some TLEs with unreferenced ressortgroupref * markings, but that's harmless.) * / if (surplusvars! = NULL) {List * new_groupby = NIL; foreach (lc, parse- > groupClause) {SortGroupClause * sgc = lfirst_node (SortGroupClause, lc); TargetEntry * tle = get_sortgroupclause_tle (sgc, parse- > targetList) Var * var = (Var *) tle- > expr; / * * New list must include non-Vars, outer Vars, and anything not * marked as surplus. * / if (! IsA (var, Var) | | var- > varlevelsup > 0 |! bms_is_member (var- > varattno-FirstLowInvalidHeapAttributeNumber, surplusvars [var-> varno]) new_groupby = lappend (new_groupby, sgc);} parse- > groupClause = new_groupby }} at this point, the study on "simplified processing and Analysis of Having and Group By clauses in PostgreSQL query Optimization" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report