The way GROUP BY in MYSQL (1) (loose index scan loose scan tight index scan Compact scan) 07/19 Update SLTechnology News&Howtos

The way GROUP BY in MYSQL (1) (loose index scan loose scan tight index scan Compact scan)

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

If there is a mistake in the limited level, please point out, and reprint it, please indicate the source.

Test script:

Create table tgrploose (p_id int primary key auto_increment,s_id1 int,s_id2 int,s_id3 int, key (slicid1, key))

Create table tgrpnloose (p_id int primary key auto_increment,s_id1 int,s_id2 int,s_id3 int, key (slicid1, key))

Delimiter / /

Create procedure inloose1 ()

Begin

Declare i int

Set iTunes 0

While i explain select scurid1, from tgrploose group by 2, 3 (p_id) from tgrploose group by, 2, and 3.

+- -+

+- -+

| | 1 | SIMPLE | tgrploose | NULL | range | s_id1 | s_id1 | 15 | NULL | 25 | 100.00 | Using index for group-by |

+- -+

1 row in set, 1 warning (0.00 sec)

Because there are only 24 different values in this column, which are quite sparse, you can access the leaf node according to the index structure to find the smallest value of pid.

You don't have to look at the other p_id, and then skip to the next combination of sparse scanning, which is the advantage of sparse scanning. If you have to define sparse, then it's group by.

The ratio of the combination of the latter fields to the total number of rows of the table, where the table is 20000 rows, and the table has 24 values, so the ratio is 24 ppm 20000. The higher the value is, the denser it is.

The smaller the more sparse (which reminds me of sparse matrices).

Secondly, we need to consider the performance of loose sparse index scanning (this part is for our own understanding, there are no references). In general, we do it in a way like type=INDEX.

Group by, in this way, the access is relatively sequential and the leaf node can be accessed, while sparse index scanning has to use the root node branch node many times to locate, the distance skipped each time, this

It is possible to access randomly, and multiple visits also require overhead with nodes and branch nodes. There is a comprehensive performance consideration between indexing them, whether sparse index scanning or compact index scanning is used.

The root of index scanning lies in the sparse proportion problem mentioned above. Consider the tgrpnloose table given earlier. When I insert the data, I set the different values of sdistribuid1, maxiid2, and 3 to 10000.

Then it will be very dense at this time, and the comparative implementation plan is as follows:

Mysql > explain select sprints 1, 2, 3 (p_id) from tgrpnloose group by 2, 3, 3 (p_id)

+-- +

+-- +

| | 1 | SIMPLE | tgrpnloose | NULL | index | s_id1 | s_id1 | 15 | NULL | 19982 | 100.00 | Using index |

+-- +

1 row in set, 1 warning (0.00 sec)

Mysql > explain select sprints 1, 2, 3 (p_id) from tgrploose group by 2, 3, 3 (p_id)

+- -+

+- -+

| | 1 | SIMPLE | tgrploose | NULL | range | s_id1 | s_id1 | 15 | NULL | 25 | 100.00 | Using index for group-by |

+- -+

1 row in set, 1 warning (0.00 sec)

You can see that MYSQL made the choice by using the compact index scan tight index scan for tgrpnloose (described later), when Using index for group-by completes the group using sparse index scanning

By . Some restrictions that sparse index scans cannot be used to complete group by are as follows:

1. Performance considerations (refer to the column of tgrpnloose above)

2. Group by a single data table (see tight index scan section for example)

3. Cannot use prefix index (prefix index)

4. Only max () and min () aggregate functions can be used. Other aggregate functions such as count () and sum () are not supported.

Such as:

Mysql > explain select a.smemorid1mema.sillustrid2 count (*) from tgrploose a where a.s_id1 > 30 group by a.sclassiid1mema.sclassiid2

+- -+

+- -+

| | 1 | SIMPLE | a | NULL | range | s_id1 | s_id1 | 5 | NULL | 1 | 100.00 | Using where; Using index |

+- -+

1 row in set, 1 warning (0.00 sec)

5. The leftmost column must be consistent in group by, and loose scanning mode must be abandoned if it is inconsistent.

Such as:

Mysql > explain select a.splayid2 from tgrploose a group by a.slicid3

+- -- +

+- -- +

| | 1 | SIMPLE | a | NULL | index | s_id1 | s_id1 | 15 | NULL | 19982 | 100.00 | Using index; Using temporary; Using filesort |

+- -- +

1 row in set, 1 warning (0.00 sec)

Notice that this query uses both index scanning and Using temporary; Using filesort, that is, a temporary table is used to store a.sclassiid2 and values and then sort.

But this is not a compact index scan because temporary tables and sorting are used.

Sparse index scans can be used for the following queries in the official documents (key (c1MagneC2MagneC3) table (C1MagneC2MagneC3Powerc4)

SELECT c1, c2 FROM t1 GROUP BY c1, c2

SELECT DISTINCT c1, c2 FROM t1

SELECT C1, MIN (c2) FROM T1 GROUP BY C1

SELECT c1, c2 FROM t1 WHERE c1

< const GROUP BY c1, c2; SELECT MAX(c3), MIN(c3), c1, c2 FROM t1 WHERE c2 >

Const GROUP BY c1, c2

SELECT c2 FROM t1 WHERE c1

< const GROUP BY c1, c2; SELECT c1, c2 FROM t1 WHERE c3 = const GROUP BY c1, c2; SELECT COUNT(DISTINCT c1), SUM(DISTINCT c1) FROM t1; SELECT COUNT(DISTINCT c1, c2), COUNT(DISTINCT c2, c1) FROM t1; 其实根据索引的结构和稀疏扫描的原理稍加考虑可以明白为什么能够使用到稀疏索引扫描 2、tight index scan(紧凑索引扫描) 执行计划必然出现Using index但是不涉及Using temporary; Using filesort 描述为仅对驱动表(注意是驱动表)中数据进行分组的时候，如果group by按照索引的顺序给出，如果有缺失需要使用column=constant的情况比如：按照顺序给出 mysql>

Explain select b.sorganizid1 and b.sclassiid2 max (a.s_id3) from tgrpnloose b STRAIGHT_JOIN tgrploose an on a.s_id1=b.s_id1 group by b.sdistribuid1

+- -- +

+- -- +

| | 1 | SIMPLE | b | NULL | index | s_id1 | s_id1 | 15 | NULL | 19982 | 100.00 | Using where; Using index |

| | 1 | SIMPLE | a | NULL | ref | s_id1 | s_id1 | 5 | test.b.s_id1 | 9991 | 100.00 | Using index |

+- -- +

2 rows in set, 1 warning (0.01sec)

Missing sequence but using s_id1=10

Mysql > explain select b.sorganizid2MagneMax (a.s_id3) from tgrpnloose b STRAIGHT_JOIN tgrploose an on a.s_id1=b.s_id1 where b.s_id1=10 group by b.s_id2

+- -+

+- -+

| | 1 | SIMPLE | b | NULL | ref | s_id1 | s_id1 | 5 | const | 2 | 100.00 | Using where; Using index |

| | 1 | SIMPLE | a | NULL | ref | s_id1 | s_id1 | 5 | const | 1 | 100.00 | Using index |

+- -+

2 rows in set, 1 warning (0.02 sec)

We can clearly see that index sparse scanning is not needed in this case because it is not a single table query, but there is no Using temporary; Using filesort in Extra because tight is used.

Index scan (Compact Index scan), note that type=ref is b.s_id1=10 because the index is not unique.

Take a look at the following example:

Mysql > explain select b.sorganizid2MagneMax (a.s_id3) from tgrpnloose b STRAIGHT_JOIN tgrploose an on a.s_id1=b.s_id1 group by b.s_id2

+- -- +

+- -- +

| | 1 | SIMPLE | b | NULL | index | s_id1 | s_id1 | 15 | NULL | 19982 | 100.00 | Using where; Using index; Using temporary; Using filesort |

| | 1 | SIMPLE | a | NULL | ref | s_id1 | s_id1 | 5 | test.b.s_id1 | 9991 | 100.00 | Using index |

+- -- +

2 rows in set, 1 warning (0.01sec)

Obviously, group by b.s_id2 does not satisfy the group by according to the index order, that is, it does not satisfy the leftmost principle, and there is no constant like s_id1=10, which uses the

Using index; Using temporary; Using filesort uses temporary tables and filesort sorting.

To be clear, here is the driver table. Be sure to note that if you don't add STRAIGHT_JOIN MYSQL, give the following execution plan

Mysql > explain select b.sorganizid1 recorder b.sdistribuid2 Magi (a.s_id3) from tgrpnloose b join tgrploose an on a.s_id1=b.s_id1 group by b.sclassiid1 relegating b.sclassiid2

+- -- +

+- -- +

| | 1 | SIMPLE | a | NULL | index | s_id1 | s_id1 | 15 | NULL | 19982 | 100.00 | Using where; Using index; Using temporary; Using filesort |

| | 1 | SIMPLE | b | NULL | ref | s_id1 | s_id1 | 5 | test.a.s_id1 | 2 | 100.00 | Using index |

+- -- +

2 rows in set, 1 warning (0.00 sec)

The difference from the first column is that a becomes the driven table and b becomes the driven table, using Using index; Using temporary; Using filesort

In addition, tight index scan (compact index scan) is also suitable for the case of a single table, such as:

-- do not meet the leftmost principle, abandon loose index scan

Mysql > explain select b.s_id2 from tgrploose b where b.s_id1=10 group by b.s_id2

+- -+

+- -+

| | 1 | SIMPLE | b | NULL | ref | s_id1 | s_id1 | 5 | const | 1 | 100.00 | Using where; Using index |

+- -+

1 row in set, 1 warning (0.01sec)

-- aggregate function count, deprecating loose index scan

Mysql > explain select b.sorganizid1 from tgrploose b group by b.s_id1 count (*)

+-- +

+-- +

| | 1 | SIMPLE | b | NULL | index | s_id1 | s_id1 | 15 | NULL | 19982 | 100.00 | Using index |

+-- +

-- deprecate loose index scan for performance consideration

Mysql > explain select sprints 1, 2, 3 (p_id) from tgrpnloose group by 2, 3, 3 (p_id)

+-- +

+-- +

| | 1 | SIMPLE | tgrpnloose | NULL | index | s_id1 | s_id1 | 15 | NULL | 19982 | 100.00 | Using index |

+-- +

1 row in set, 1 warning (0.00 sec)

In short, if you cannot use loose index scan, tight index scan will be given priority to avoid possible use of temporary tables and sort operations

3. Scanning in conventional mode, execution plan involves Using temporary; Using filesort

To put it simply, when there is no way to use an index to avoid sorting, you need to use this method for group by. You need to use this conventional way to put the fields of group by into a temporary table first.

Then sort the deduplication operation.

A list has been given above. Here, look at another one.

-- / / do not meet the leftmost principle, discard loose index scan, do not meet the missing part of column=constant discard tight index scan

Mysql > explain select b.sorganizid3 from tgrploose b where b.s_id1=10 group by b.s_id3 count (*)

+- -- +

+- -- +

| | 1 | SIMPLE | b | NULL | ref | s_id1 | s_id1 | 5 | const | 1 | 100.00 | Using where; Using index; Using temporary; Using filesort |

+- -- +

Change to

Mysql > explain select b.sorganizid3 from tgrploose b where b.s_id1=10 and b.s_id2=10 group by b.s_id3 count (*)

+- -+

+- -+

| | 1 | SIMPLE | b | NULL | ref | s_id1 | s_id1 | 10 | const,const | 1 | 100.00 | Using where; Using index |

+- -+

You can use tight index scan (Compact Index scan), which is the principle mentioned earlier

3. Index skip scan in oracle

There is a similar use in ORACLE, but the wider scope is not limited to group by, but it is generally considered that the prefix index of predicates is not the best, but the efficiency is not optimal.

It is better than the full table, it is only suitable for situations where the leading column has few distinct values and the selectivity of the non-leading column is relatively high, that is, the leading column is sparse and the non-leading column is dense. Of its performance.

The problem should also lie in the constant use of root and branch node positioning and possible random reading, so no examples are given here.

IV. Summary

You can see that the three group by methods of MYSQL are becoming less and less efficient (when the optimizer chooses correctly):

Loose index scan (loose index scan) > tight index scan (compact index scan) > conventional scan

But the scope of application is getting wider and wider.

With regard to the calculation of the cost found in trace, this also leads to the switch between loose index scan (loose index scan) and tight index scan (compact index scan)

According to, then will look at the judgment of the source code according to trace.

Tight index scan: loose index scan:

Titled 2: | opt: distinct_aggregate: 0 Thum2: | opt: distinct_aggregate: 0

Titled 2: | opt: rows: 19983 titled 2: | opt: rows: 7

Titled 2: | opt: cost: 8040.2 titled 2: | opt: cost

This paper also gives the arrangement of the index of 12 pieces of data key (p_id 1) primary key (p_id), which is convenient for everyone to understand.

Obviously sort by s_id1 first, sort by s_id2 with the same s_id1, sort by p_id as the primary key s_id2

S_id1000000111111s_id2001122001122p_id512163928410011

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.