Detailed explanation of the basic implementation principle of MySQL DISTINCT 04/17 Update SLTechnology News&Howtos

Detailed explanation of the basic implementation principle of MySQL DISTINCT

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Preface

DISTINCT is actually very similar to the implementation of GROUP BY operations, except that only one record is fetched from each group after GROUP BY. Therefore, the implementation of DISTINCT is basically the same as that of GROUP BY, and there is not much difference. This can also be done through loose index scans or compact index scans, but of course, when DISTINCT cannot be done with indexes alone, MySQL can only be done through temporary tables.

However, unlike GROUP BY, DISTINCT does not require sorting. In other words, if the Query which is only a DISTINCT operation cannot use the index alone to complete the operation, MySQL will use the temporary table to "cache" the data once, but will not filesort the data in the temporary table.

Of course, if we also use GROUP BY and group when doing DISTINCT, and use aggregate function operations like MAX, we can't avoid filesort.

Let's demonstrate the implementation of DISTINCT through a few simple Query examples.

1. First, take a look at the operation of completing DISTINCT through a loose index scan:

Sky@localhost: example 11:03:41 > EXPLAIN SELECT DISTINCT group_id-> FROM group_messageG** 1. Row * * id: 1 SELECT_type: SIMPLE table: group_message type: rangepossible_keys: NULL key: idx_gid_uid_gc key_ Len: 4 ref: NULL rows: 10 Extra: Using index for group-by1 row in set (0.00 sec)

We can clearly see that the Extra message in the execution plan is "Using index for group-by". What does this mean? Why does the execution plan tell me that GROUP BY is done by index when I am not doing GROUP BY operations?

In fact, this is related to the implementation principle of DISTINCT, in the process of implementing DISTINCT, we also need to group, and then take one of each group of data and return it to the client. The Extra information here tells us that MySQL completes the whole operation using a loose index scan.

Of course, it would be better and easier for people to understand if MySQL Query Optimizer could humanize the information here and replace it with "Using index for distinct".

two。 Let's take a look at an example of a compact index scan:

Sky@localhost: example 11:03:53 > EXPLAIN SELECT DISTINCT user_id-> FROM group_message-> WHERE group_id = 2Graph * 1. Row * * id: 1 SELECT_type: SIMPLE table: group_message type: refpossible_keys: idx_gid_uid_ Gc key: idx_gid_uid_gc key_len: 4 ref: const rows: 4 Extra: Using WHERE Using index1 row in set (0.00 sec)

The display here is exactly the same as implementing GROUP BY through a compact index scan. In fact, during the implementation of this Query, MySQL will let the storage engine scan all the index keys of group_id = 2, get all the user_id, and then make use of the sorted feature of the index to retain a piece of information each time the index key value of user_id is changed, that is, the entire DISTINCT operation can be completed when scanning all the index keys of gruop_id = 2.

3. Let's take a look at what happens when you can't do DISTINCT with an index alone:

Sky@localhost: example 11:04:40 > EXPLAIN SELECT DISTINCT user_id-> FROM group_message-> WHERE group_id > 1 AND group_id

< 10G*************************** 1. row *************************** id: 1 SELECT_type: SIMPLE table: group_message type: rangepossible_keys: idx_gid_uid_gc key: idx_gid_uid_gc key_len: 4 ref: NULL rows: 32 Extra: Using WHERE; Using index; Using temporary1 row in set (0.00 sec) 当 MySQL 无法仅仅依赖索引即可完成 DISTINCT 操作的时候，就不得不使用临时表来进行相应的操作了。但是我们可以看到，在 MySQL 利用临时表来完成 DISTINCT 的时候，和处理 GROUP BY 有一点区别，就是少了 filesort。实际上，在 MySQL 的分组算法中，并不一定非要排序才能完成分组操作的，这一点在上面的 GROUP BY 优化小技巧中我已经提到过了。实际上这里 MySQL 正是在没有排序的情况下实现分组最后完成 DISTINCT 操作的，所以少了 filesort 这个排序操作。 4.最后再和 GROUP BY 结合试试看： sky@localhost : example 11:05:06>

EXPLAIN SELECT DISTINCT max (user_id)-> FROM group_message-> WHERE group_id > 1 AND group_id

< 10 ->

GROUP BY group_idG** 1. Row * * id: 1 SELECT_type: SIMPLE table: group_message type: rangepossible_keys: idx_gid_uid_gc key: idx_gid_uid_gc key_len: 4 ref: NULL rows: 32 Extra: Using WHERE; Using index Using temporary; Using filesort1 row in set (0.00 sec)

Finally, let's take a look at this example of using an aggregate function with GROUP BY. Compared to the third example above, we can see that there are more filesort sorting operations, precisely because we use the MAX function. To get the grouped MAX value, you cannot use the index to complete the operation, you can only sort it.

Since the implementation of DISTINCT is basically similar to that of GROUP BY, this article will no longer draw a picture to show the implementation process.

The above is the whole content of this article, I hope it will be helpful to your study, and I also hope that you will support it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.