What is the difference between HashAggregate and GroupAggregate in PostgreSQL 04/10 Update SLTechnology News&Howtos

What is the difference between HashAggregate and GroupAggregate in PostgreSQL

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the difference between HashAggregate and GroupAggregate in PostgreSQL". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Case one

First of all, let's look at a case study:

Test table:

Drop table if exists tours aggressiveness create table t_agg (bh varchar (20), C1 int,c2 int,c3 int,c4 int,c5 int,c6 int); insert into t_agg select 'GZ01',col,col,col,col,col,col from generate_series (1m 100000) as col;insert into t_agg select 'GZ02',col,col,col,col,col,col from generate_series (1m 100000) as col;insert into t_agg select' GZ03',col,col,col,col,col,col from generate_series (1m 100000) as col Insert into t_agg select 'GZ04',col,col,col,col,col,col from generate_series (1m 100000) as col;insert into t_agg select' GZ05',col,col,col,col,col,col from generate_series (1m 100000) as col

Execute the query:

Testdb=#-disable parallel testdb=# set max_parallel_workers_per_gather=0;SETtestdb=# explain verbose select bh,min (C1), max (C1), min (c2), max (c2), min (c3), max (c3), min (c4), max (c4), min (c5), max (c5) from t_agg group by bh QUERY PLAN -HashAggregate (cost=22427.00..22427.05 rows=5 width=45) Output: bh Min (C1), max (C1), min (c2), max (c2), min (c3), max (c3), min (c4), max (c4), min (c5), max (c5) Group Key: t_agg.bh-> Seq Scan on public.t_agg (cost=0.00..8677.00 rows=500000 width=25) Output: bh, C1, c2, c3, c4, c5, c6 (5 rows)

The optimizer of PG chooses HashAggregate.

HashAggregate is disabled below, and the optimizer can only choose GroupAggregate. You can see the total cost comparison between the two: 22427.05 vs 82968.97

Testdb=# set enable_hashagg = off;SETtestdb=# explain verbose select bh,min (C1), max (C1), min (c2), max (c2), min (c3), max (c3), min (c4), max (c4), min (c5), max (c5) from t_agg group by bh QUERY PLAN -GroupAggregate (cost=67968.92..82968.97 rows=5 width=45) Output: bh Min (C1), max (C1), min (c2), max (c2), min (c3), max (c3), min (c4), max (c4), min (c5), max (c5) Group Key: t_agg.bh-> Sort (cost=67968.92..69218.92 rows=500000 width=25) Output: bh, C1, c2, c3, c4 C5 Sort Key: t_agg.bh-> Seq Scan on public.t_agg (cost=0.00..8677.00 rows=500000 width=25) Output: bh, C1, c2, c3, c4, c5 (8 rows)

Case two

The following is tested with a wide table: there are few grouping keys, but many aggregate columns

Drop table if exists t_agg_width Create table t_agg_width (bh varchar (20), C1 int,c2 int,c3 int,c4 int,c5 int,c6 int,c7 int,c8 int,c9 int,c11 int,c12 int,c13 int,c14 int,c15 int,c16 int,c17 int,c18 int,c19 int,c21 int,c22 int,c23 int,c24 int,c25 int,c26 int,c27 int,c28 int,c29 int,c31 int,c32 int,c33 int,c34 int,c35 int,c36 int,c37 int,c38 int,c39 int) Insert into t_agg_width select 'GZ01',col,col,col,col,col,col,col,col,col, col,col,col,col,col,col,col,col,col from generate_series (1mem100000) as col Insert into t_agg_width select 'GZ02',col,col,col,col,col,col,col,col,col, col,col,col,col,col,col,col,col,col from generate_series (1mem100000) as col Insert into t_agg_width select 'GZ03',col,col,col,col,col,col,col,col,col, col,col,col,col,col,col,col,col,col from generate_series (1mem100000) as col Insert into t_agg_width select 'GZ04',col,col,col,col,col,col,col,col,col, col,col,col,col,col,col,col,col,col from generate_series (1m 100000) as col;-- disable hashaggset enable_hashagg = off;-- disable parallel set max_parallel_workers_per_gather=0 Select bh,min (c1), min (c2), min (c3), min (c4), min (c5), min (c6), min (c7), min (c8), min (c9), min (c11), min (c12), min (c13), min (c14), min (c15), min (c16), min (c17), min (c18), min (c19), min (c21), min (c22), min (c23), min (c24) Min (c25), min (c26), min (c27), min (c28), min (c29), min (c31), min (c32), min (c33), min (c34), min (c35), min (c36), min (c37), min (c38), min (c39) from t_agg_width group by bh

In this case, the optimizer will still choose Hash

Testdb=# explain verbose select bhtestdb-#, min (c1), min (c2), min (c3), min (c4), min (c5), min (c6), min (c7), min (c8), min (c9) testdb-#, min (c11), min (c12), min (c13), min (c14), min (c15), min (c16), min (c17), min (c18), min (c19) testdb-#, min (c21) Min (c22), min (c23), min (c24), min (c25), min (c26), min (c27), min (c28), min (c29) testdb-#, min (c31), min (c32), min (c33), min (c34), min (c35), min (c36), min (c37), min (c38), min (C39) testdb-# from t_agg_width group by bh QUERY PLAN -HashAggregate (cost=49889.00..49889.04 rows=4 width=149) Output: bh Min (c1), min (c2), min (c3), min (c4), min (c5), min (c6), min (c7), min (c8), min (c9), min (c11), min (c12), min (c13), min (c14), min (c15), min (c16), min (c17), min (c18), min (c19), min (c21), min (c22), min (c23), min (c24), min (c25), min (c26), min (c27), min (c28) Min (c29), min (c31), min (c32), min (c33), min (c34), min (c35), min (C36), min (c37), min (c38), min (C39) Group Key: t_agg_width.bh-> Seq Scan on public.t_agg_width (cost=0.00..12889.00 rows=400000 width=149) Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, C17, C18, c19, c21 C22, c23, c24, c25, c26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39 (5 rows) testdb=# set enable_hashagg = off SETtestdb=# explain verbose select bh,min (c1), min (c2), min (c3), min (c4), min (c5), min (c6), min (c7), min (c8), min (c9), min (c11), min (c12), min (c13), min (c14), min (c15), min (c16), min (c17), min (c18), min (c19), min (c21), min (c22), min (c23) Min (c24), min (c25), min (c26), min (c27), min (c28), min (c29), min (c31), min (c32), min (c33), min (c34), min (c35), min (c36), min (c37), min (c38), min (C39) from t_agg_width group by bh QUERY PLAN -GroupAggregate (cost=110266.28..148266.32 rows=4 width=149) Output: bh Min (c1), min (c2), min (c3), min (c4), min (c5), min (c6), min (c7), min (c8), min (c9), min (c11), min (c12), min (c13), min (c14), min (c15), min (c16), min (c17), min (c18), min (c19), min (c21), min (c22), min (c23), min (c24), min (c25), min (c26), min (c27), min (c28) Min (c29), min (c31), min (c32), min (c33), min (c34), min (c35), min (c36), min (c37), min (c38), min (C39) Group Key: t_agg_width.bh-> Sort (cost=110266.28..111266.28 rows=400000 width=149) Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, C18, c19, c21, c22, c23, c24, c25 C26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39 Sort Key: t_agg_width.bh-> Seq Scan on public.t_agg_width (cost=0.00..12889.00 rows=400000 width=149) Output: bh, C1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, c18, c19, c21, c22, c23, c24, c25, c26, c27, c28 C29, c31, c32, c33, c34, c35, c36, c37, c38, c39 (8 rows) testdb=#

The following is to increase the distribution of grouping keys and improve the selection rate of columns such as C1, and test again:

Testdb=# insert into t_agg_width testdb-# select 'GZ' | | coltestdb-#, mod (col,100), mod (col,100) testdb-#, mod (col,100), mod (col,100) Mod (col,100), mod (col,100), mod (col,100) testdb-#, mod (col,100), mod (col,100) testdb-#, mod (col,100), mod (col,100) 100), mod (col,100), mod (col,100), mod (col,100) testdb-# from generate_series (1m 1000000) as col INSERT 0 1000000testdb=# set enable_hashagg = on SETtestdb=# explain verbose select bh,min (c1), min (c2), min (c3), min (c4), min (c5), min (c6), min (c7), min (c8), min (c9), min (c11), min (c12), min (c13), min (c14), min (c15), min (c16), min (c17), min (c18), min (c19), min (c21), min (c22), min (c23) Min (c24), min (c25), min (c26), min (c27), min (c28), min (c29), min (c31), min (c32), min (c33), min (c34), min (c35), min (c36), min (c37), min (c38), min (C39) from t_agg_width group by bh QUERY PLAN -GroupAggregate (cost=440012.46..586553.52 rows=7414 width=149) Output: bh Min (c1), min (c2), min (c3), min (c4), min (c5), min (c6), min (c7), min (c8), min (c9), min (c11), min (c12), min (c13), min (c14), min (c15), min (c16), min (c17), min (c18), min (c19), min (c21), min (c22), min (c23), min (c24), min (c25), min (c26), min (c27), min (c28) Min (c29), min (c31), min (c32), min (c33), min (c34), min (c35), min (c36), min (c37), min (c38), min (C39) Group Key: t_agg_width.bh-> Sort (cost=440012.46..443866.86 rows=1541757 width=149) Output: bh, c1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, C18, c19, c21, c22, c23, c24, c25 C26, c27, c28, c29, c31, c32, c33, c34, c35, c36, c37, c38, c39 Sort Key: t_agg_width.bh-> Seq Scan on public.t_agg_width (cost=0.00..49681.57 rows=1541757 width=149) Output: bh, C1, c2, c3, c4, c5, c6, c7, c8, c9, c11, c12, c13, c14, c15, c16, c17, c18, c19, c21, c22, c23, c24, c25, c26, c27, c28 C29, c31, c32, c33, c34, c35, c36, c37, c38, c39 (8 rows) testdb=#

This time the choice is GroupAggregate.

HashAggregate

HashAggregate, the database will calculate the hash value according to the value after the group by field, and maintain the corresponding Hash table in memory. For example, if select has n aggregate functions, then n Hash tables will be maintained in memory. This method uses more memory than GroupAggregate, which is proportional to the unique key value of COLUMN in group by COLUMN and the number of aggregate columns.

GroupAggregate

GroupAggregate, the database first sorts the data in the table according to the fields of group by, then scans the sorted data, and calculates the aggregate result. This method needs to perform a sort first, and the computational complexity is higher than that of HashAggregate, but the advantage of this method is that it has nothing to do with the number of unique keys / aggregate columns of COLUMN in group by COLUMN. In the case of a large number of grouped keys, a large number of aggregation columns and a high choice of column data, it will be better than HashAggregate.

This is the end of the content of "what is the difference between HashAggregate and GroupAggregate in PostgreSQL". Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.