Analysis on the instability of sorting order by limit value in MySQL 07/01 Update SLTechnology News&Howtos

Analysis on the instability of sorting order by limit value in MySQL

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "sorting order by limit value instability analysis in MySQL". In daily operation, I believe that many people have doubts about sorting order by limit value instability analysis in MySQL. Xiaobian consulted all kinds of data and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about sorting order by limit value instability analysis in MySQL! Next, please follow the editor to study!

First, the question is thrown

The data are as follows:

CREATE TABLE `testse` (`id` int (11) NOT NULL, `nu` int (11) DEFAULT NULL, `name` varchar (20) DEFAULT NULL, PRIMARY KEY (`id`), KEY `nu` (`nu`) ENGINE=InnoDB DEFAULT CHARSET=utf8 INSERT INTO `testse` VALUES (- 1meme 14-minus gaopeng'), (0pje 14-recorder gaopeng'), (1-recorder-1-minus gaopeng'), (2-99), (3-5-5-pint gaopeng'), (4-20-minus gaopeng'), (5-24-minus gaopeng'), (6-14-minus gaopeng'), (7-14-miner gaopeng'), (8-13-miner gaopeng'), (9-9-9-recorder gaopeng'), (10-19-minute gaopeng'), (11-19-miner gaopeng'), (12-14-14-lead gaopeng'), (1315-14-minute gaopeng'), (1416) 'gaopeng'), (15 and 20), (100 and 14), (111 and 14))

The questions are as follows:

Mysql > select * from testse order by nu limit 3 mysql + | id | nu | name | +-+ | 14 | gaopeng | +-+ 1 row in set (2.76 sec) mysql > select * from testse force index (nu) order by nu limit 3L1 +-+ | id | nu | name | +-+ |-1 | 14 | gaopeng | +-+ 1 row in set (0.00 sec)

Ask why the two statements get different data.

Second, the cause of the problem

First of all, the reason is given here, when sorting in MySQL, you can use the index to avoid sorting and do not need to use filesort, and then there may be two situations in memory and two ways of stacking sequence and quick sorting when using filesort. So the ways in which MySQL memory sorting can be used include:

Directly use the index to avoid sorting

Quick sort

Heap sort

Which sort method to use is determined by the optimizer, in general as follows:

Direct use of indexes to avoid sorting: for situations where indexes are available and table returns are efficient

Quick sort algorithm: if there is no index to sort a large number of cases

Heap sorting algorithm: if the amount of sorting is not large without an index

Quick sort and heap sort are unstable sorting algorithms, that is, the order of duplicate values can not be guaranteed. If the index is used directly, the return data is stable, because the order of the B+ leaf nodes of the index is unique and certain. As for the key nu above, its leaf node contains nu+id, which is unique and incremental. Therefore, in the case of this unstable algorithm, the above query has different results, in the final analysis, the use of indexes to avoid sorting and heap sorting are different in the treatment of repeated values.

You may ask why there are two kinds of sorting. In fact, quick sorting has an advantage in the case of a large number of sorts, while heap sorting has an advantage in using priority queues to complete only a small amount of sorting. because it doesn't need sorting at all, just sort the amount of data you need.

MySQL believes that quick sorting is three times faster than heap sorting as follows:

/ * How much Priority Queue sort is slower than qsort. Measurements (see unit test) indicate that PQ is roughly 3 times slower. * / const double PQ_slowness= 3.0

Then when using the sorting algorithm, it will switch according to the amount of data to be sorted, specifically according to the function check_if_pq_applicable. There is the following code in the filesort function:

If (check_if_pq_applicable (trace, & param, & table_sort, table, num_rows, memory_available, subselect! = NULL) {DBUG_PRINT ("info", ("filesort PQ is applicable") / / use heap sort / * For PQ queries (with limit) we know exactly how many pointers/records we have in the buffer, so to simplify things, we initialize all pointers here. (We cannot pack fields anyways, so there is no point in doing lazy initialization) * / table_sort.init_record_pointers (); Filesort- > using_pq= true; param.using_pq= true;} else// uses quick sort {DBUG_PRINT ("info", ("filesort PQ is not applicable")); filesort- > using_pq= false; param.using_pq= false;. Third, how to determine which method is used to return the sorted results.

For using the index directly to avoid sorting, just look at the execution plan, and the word filesort will not appear. However, it is difficult to judge whether to use quick sort or heap sort because the execution plan is the same. All I can think of is to make breakpoints during debugging. I don't know if there is any other way. So I made two breakpoints in the if branch of the above code, one in the heap sort algorithm and one in the quick sort algorithm as follows:

3 breakpoint keep y 0x0000000000f50e62 in filesort (THD*, Filesort*, bool, ha_rows*, ha_rows*, ha_rows*) at / root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:359 breakpoint already hit 3 times4 breakpoint keep y 0x0000000000f50d41 in filesort (THD*, Filesort*, bool, ha_rows*, ha_rows*) Ha_rows*) at / root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:333 breakpoint already hit 1 time

Breakpoint 3 represents a quick sort hit and breakpoint 4 represents a heap sort hit.

IV. Additional testing

As mentioned above, we can define the return of the results in three ways, and we will test here that the data are different in these three ways.

Use indexes to avoid sorting results

Statement:

Select * from testse force index (nu) order by nu

Mysql > desc select * from testse force index (nu) order by nu +-+ | id | select_ Type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +- -+ | 1 | SIMPLE | testse | NULL | index | NULL | nu | 5 | NULL | 19 | 100.00 | NULL | +- -+ 1 row in set 1 warning (0.00 sec) mysql > select * from testse force index (nu) order by nu +-+ | id | nu | name | 1 | 1 | gaopeng | | 9 | 9 | gaopeng | | 8 | 13 | gaopeng |-1 | 14 | gaopeng | 0 | 14 | gaopeng | | 6 | 14 | gaopeng | 7 | 14 | gaopeng | | 12 | 14 | gaopeng | | 100th | | | 14 | gaopeng | | 14 | gaopeng | | 13 | 15 | gaopeng | | 14 | 16 | gaopeng | 10 | 19 | gaopeng | 4 | 20 | gaopeng | | 11 | 20 | gaopeng | 15 | gaopeng | 5 | 24 | gaopeng | 3 | 55 | gaopeng | 2 | 99 | gaopeng | +-+ 19 rows in set (sec) |

Use the results of a quick sort

Statement:

Select * from testse order by nu

Because I set a breakpoint earlier, the breakpoint hit is as follows:

Breakpoint 3, filesort (thd=0x7fff300128c0, filesort=0x7fff30963e90, sort_positions=false, examined_rows=0x7ffff01158a0, found_rows=0x7ffff0115898, returned_rows=0x7ffff0115890) at / root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:359359 DBUG_PRINT ("info", ("filesort PQ is not applicable"))

You can see that PQ is not turned on, that is, heap sorting does not use the priority queue heap sorting method used. So the result is

Mysql > desc select * from testse order by nu +- -+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +-+-- -+ | 1 | SIMPLE | testse | NULL | ALL | NULL | 19 | 100.00 | Using filesort | +- +-+-+ 1 row in set 1 warning (0.00 sec) mysql > select * from testse order by nu +-+ | id | nu | name | +-+ | 1 | 1 | gaopeng | | 9 | 9 | gaopeng | 8 | 13 | gaopeng | 111 | 14 | gaopeng | | 100 | 14 | gaopeng | | 12 | 14 | gaopeng | 7 | 14 | gaopeng | 6 | 14 | gaopeng | | 0 | 14 | gaopeng |-1 | 14 | gaopeng | | 13 | 15 | gaopeng | 14 | 16 | gaopeng | 10 | 19 | gaopeng | 4 | 20 | gaopeng | | 11 | 20 | gaopeng | 15 | gaopeng | 5 | 24 | gaopeng | 3 | 55 | gaopeng | 2 | 99 | gaopeng | +-+ 19 rows in set (1.74 sec)

Use heap sort

Statement:

Select * from testse order by nu limit 8

Its breakpoint hit

Breakpoint 4, filesort (thd=0x7fff300128c0, filesort=0x7fff3095ecc8, sort_positions=false, examined_rows=0x7ffff01158a0, found_rows=0x7ffff0115898, returned_rows=0x7ffff0115890) at / root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:333333 DBUG_PRINT ("info", ("filesort PQ is applicable"))

You can see that PQ is turned on, that is, heap sorting.

Mysql > desc select * from testse order by nu limit 8 +- -+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +-+-- -+ | 1 | SIMPLE | testse | NULL | ALL | NULL | 19 | 100.00 | Using filesort | +- +-+-+ 1 row in set 1 warning (0.00 sec) mysql > select * from testse order by nu limit 8 +-+ | id | nu | name | 1 | 1 | gaopeng | | 9 | 9 | gaopeng | 8 | 13 | gaopeng |-1 | 14 | gaopeng | 0 | 14 | gaopeng | | 12 | 14 | gaopeng | 14 | gaopeng | | 6 | 14 | gaopeng | +-- -+ 8 rows in set (2.20 sec)

As you can see from the previous 8 lines, the data returned by each method is different:

Use indexes to avoid sorting: +-+ | id | nu | name | +-+ | 1 | 1 | gaopeng | | 9 | 9 | gaopeng | 8 | 13 | gaopeng | |-1 | 14 | gaopeng | 0 | 14 | gaopeng | 6 | 14 | gaopeng | 7 | 14 | gaopeng | | 12 | 14 | gaopeng | use Quick sort: +-+ | id | nu | name | +-+ | 1 | 1 | gaopeng | 9 | 9 | gaopeng | | 8 | 13 | gaopeng | | 111c | 14 | gaopeng | 100 | 14 | gaopeng | 12 | 14 | gaopeng | | 7 | 14 | | | gaopeng | | 6 | 14 | gaopeng | sort using heap: +-+ | id | nu | name | +-+ | 1 | 1 | gaopeng | | 9 | 9 | gaopeng | | 8 | 13 | gaopeng | |-1 | 14 | gaopeng | 0 | 14 | gaopeng | | 12 | 14 | gaopeng | | 14 | gaopeng | | 6 | 14 | gaopeng | +-+ 5. Summary

You can see that the data is the same in different ways of getting data, but this is only part of the sorting of duplicate data, which is determined by the instability of the sorting algorithm. Quick sorting is suitable for sorting a large amount of data, and heap sorting has an advantage in a small number of sorts, so when order by limit n n reaches an order of magnitude, it will switch the sorting algorithm, which can not be seen in the execution plan. The specific use of that algorithm is determined by the optimizer through the function check_if_pq_applicable.

Secondly, this is only one situation that causes the sort data to be inconsistent, and another case is the primary sort and the second sort due to the influence of the parameter max_length_for_sort_data, which has the opportunity to study the code. Generally speaking, the default of 1024 bytes is rarely exceeded.

VI. Simple explanation of heap sorting algorithm

Finally, I would like to briefly describe the heap sorting algorithm and review it. You can refer to books such as introduction to algorithms. Let's take Dadingdun as an example. In fact, any array to be sorted can be regarded as a complete binary tree. The screenshot of the introduction to algorithms is as follows:

Image.png

This tree meets the definition of the big top heap and has the following characteristics in the big top heap:

Must satisfy the complete binary tree

It is convenient to calculate the position of two leaf nodes according to the position of the parent node.

If the position of the parent node is iUnip 2, the left child node is I, and the right child node is iThree 1, which is determined by the property of the complete binary tree.

All child nodes can be regarded as a child heap, then all nodes have

Parent node > left child node & & parent node > right node

Obviously, the largest element can be found, which is the root node of the whole heap.

In this algorithm, the most important and important thing is the maintenance and construction of the heap.

Maintenance:

Maintenance: top-down maintenance, which can be done recursively.

The electronic version is a little unclear here, and the black node is worth 4.

Image.png

Corresponding to my last code bigheapad function

Construction

Build: is a bottom-up build, which is a continuous cycle of maintenance of each parent node to meet the conditions of a large top heap for any unordered array. Because the subtree of the lower layer satisfies the condition of large top heap, then the upper layer must satisfy the condition of large top heap.

Image.png

Corresponding to my last code bigheapbulid function

Sort

In fact, sorting is to swap the first number in the array, that is, the largest number, with the last number, and then do maintenance again to make a large top heap structure. if you do this over and over again, then the whole group of elements will be sorted.

Image.png

My functions refer to the biglimitn and bigheapsort functions. The code for heap sorting in the source code of MySQL exists in the priority_queue.h file, where you can see some methods such as:

Maintain heapify function

Build build_heap function

Sort sort function

Of course, the implementation of the source code is much more complex, interested friends can go deep. Stack frame for reference:

# 0 Priority_queue::heapify (size_t, size_t) (this=0x7ffff0115650, iTun0, last=8) at / root/mysql5.7.14/percona-server-5.7.14-7/include/priority_queue.h:124#1 0x0000000000f5807a in Priority_queue::heapify (size_t) (this=0x7ffff0115650 ITun0) at / root/mysql5.7.14/percona-server-5.7.14-7/include/priority_queue.h:147#2 0x0000000000f57d1e in Priority_queue::update_top (void) (this=0x7ffff0115650) at / root/mysql5.7.14/percona-server-5.7.14-7/include/priority_queue.h:354#3 0x0000000000f57814 in Bounded_queue::push (uchar *) (this=0x7ffff0115650 Element=0x7fff309166a0 "o") at / root/mysql5.7.14/percona-server-5.7.14-7/sql/bounded_queue.h:106#4 0x0000000000f52da7 in find_all_keys (Sort_param *, QEP_TAB *, Filesort_info *, IO_CACHE *, IO_CACHE *, Bounded_queue *, ha_rows *) (param=0x7ffff01154c0, qep_tab=0x7fff309268e8, fs_info=0x7ffff0115550, chunk_file=0x7ffff0115200, tempfile=0x7ffff0115360, pq=0x7ffff0115650 Found_rows=0x7ffff0115898) at / root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:1013#5 0x0000000000f51165 in filesort (thd=0x7fff30000bc0, filesort=0x7fff30926bd8, sort_positions=false, examined_rows=0x7ffff01158a0, found_rows=0x7ffff0115898, returned_rows=0x7ffff0115890) at / root/mysql5.7.14/percona-server-5.7.14-7/sql/filesort.cc:425

Here is an implementation I wrote about heap sorting in Chapter 6 of the introduction to the algorithm, which includes a large top heap and a small top heap for your reference:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.