Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

[MySQL] order by principle and optimization

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

A brief introduction

For MySQL-oriented DBA or business developers, order by sorting is a common business function that sorts the results according to the specified fields to meet the needs of the front-end display. However, sorting operations are also the guests who often appear in slow query rankings. This article will gradually understand order by sorting from several aspects, such as principle and actual case optimization, order by usage restrictions and so on.

Two principles

Before understanding the principle of order by sorting, Amway wrote two articles on sorting algorithm, "implementation of merge sorting" and "Classical sorting algorithm". MySQL supports two sorting algorithms, regular sorting and optimization, and has made a special optimization for order by limit MMagin N in MySQL version 5.6, which is listed as the third one.

2.1 General sorting

A get the records that meet the WHERE condition from table T1

B for each record, take out the primary key + sort key (id,col2) of the record and put it into sort buffer

C sort if the sort buffer can hold all the id,col2 pairs that meet the criteria; otherwise, when the sort buffer is full, sort and solidify it into a temporary file. (the sorting algorithm uses a fast sorting algorithm)

D if temporary files are generated in sorting, the merge sorting algorithm needs to be used to ensure that the records in the temporary files are orderly.

E cycle to perform the above process until all records that meet the criteria participate in the sorting

F scan the sorted (id,col2) pairs and use id to retrieve the columns to be returned by the SELECT (col1,col2,col3)

G returns the obtained result set to the user.

Judging from the above process, whether or not to use file sorting depends on whether sort buffer can accommodate (id,col2) pairs that need to be sorted. The size of this buffer is controlled by the sort_buffer_size parameter. In addition, a sort needs two IO, one is id,col2, the second is col1,col2,col3. Because the returned result set is sorted according to col2, so the id is out of order, and a large number of random IO will be generated when col1,col2,col3 through out-of-order id. For the second MySQL itself, an optimization is made, that is, before fishing, the id is sorted and put into the buffer. The size of the cache is controlled by the parameter read_rnd_buffer_size, and then the records are sequentially fished, and the random IO is changed into sequential IO.

2.2 optimized sorting

The regular sorting method requires two additional IO in addition to the sort itself. Compared with the conventional sorting, the optimized sorting method reduces the second IO. The main difference is that putting in sort buffer is not (id,col2), but (col1,col2,col3). Because the sort buffer contains all the fields needed for the query, it can be returned directly after the sorting is completed, without the need to reclaim the data. The cost of this approach is that the number of col1,col2,col3 that can be stored in a sort buffer of the same size is less than (id,col2). If the sort buffer is not large enough, it may lead to the need to write temporary files, resulting in additional IO. Of course, MySQL provides the parameter max_length_for_sort_data, and only when the sort tuple is less than max_length_for_sort_data, can you use the optimized sort mode, otherwise you can only use the normal sort method.

2.3 priority queue sorting

In order to get the final sort result, in any case, we need to sort all the records that meet the criteria before we can return. So is there any room for optimization relative to the optimization of sorting? Version 5.6 optimizes the Order by limit M ~ (th) N statement at the spatial level and adds a new sorting method: priority queue, which is implemented by heap sorting. The characteristics of heap sorting algorithm can just solve the sorting problems such as limit MMagne N. Although all elements are still needed to participate in the sorting, only the sort buffer space of M Magee N tuples is needed. For scenes with very small M score N, there is basically no need for temporary files to merge and sort because of insufficient sort buffer. For the ascending order, the large top heap is used, and the elements in the final heap constitute the smallest N elements, while for the descending order, the small top heap is used, and finally the elements in the heap form the largest N elements.

Three optimization

Through the above principle analysis, we know that the essence of sorting is to change the result set into an ordered result set through a certain algorithm (consuming cpu operation, memory, temporary file IO). How to optimize it? The answer is to take advantage of the ordering of the index in two ways (MySQL's Btree index is the default incremental sort from small to large) to reduce sorting, and the best way is not to sort directly.

Create table T1 (

Id int not null primary key

Key_part1 int (10) not null

Key_part2 varchar (10) not null default''

Key_part3

Key idx_kp1_kp2 (key_part1,key_part2,key_part4)

Key idx_kp3 (id)

) engine=innodb default charset=utf8 the following types of queries can take advantage of index idx_kp1_kp2

SELECT * FROM T1 ORDER BY key_part1,key_part2,...

SELECT * FROM T1 WHERE key_part1 = constant ORDER BY key_part2

SELECT * FROM T1 ORDER BY key_part1 DESC, key_part2 DESC

SELECT * FROM T1 WHERE key_part1 = 1 ORDER BY key_part1 DESC, key_part2 DESC

SELECT * FROM T1 WHERE key_part1 > constant ORDER BY key_part1 ASC

SELECT * FROM T1 WHERE key_part1 constant2 ORDER BY key_part2 warm reminder that you should look at the official examples dialectically and practice more on your own.

Can not take advantage of the situation of index sorting, in fact, I think this is the focus of this article, for the majority of developers, it is easier to remember that index sorting can not be used.

1 the most common case is that the index used to find the results (key2) is different from the sorted index (key1), where aquix and bready order by id

SELECT * FROM T1 WHERE key2=constant ORDER BY key1

2 sort fields are in different indexes and cannot be sorted by index

SELECT * FROM T1 ORDER BY key1,key2

3 the order of the sort fields is not consistent with the order of the columns in the index, so the index sorting cannot be used. For example, the index is key idx_kp1_kp2 (key_part1,key_part2).

SELECT * FROM T1 ORDER BY key_part2, key_part1

4 the ascending and descending order in order by is inconsistent with the default ascending and descending order in the index, so index sorting cannot be used.

SELECT * FROM T1 ORDER BY key_part1 DESC, key_part2 ASC

5 ey_part1 is a range query, key_part2 cannot use index sorting

SELECT * FROM T1 WHERE key_part1 > constant ORDER BY key_part2

5 rder by and group by field columns are inconsistent

SELECT * FROM T1 WHERE key_part1=constant ORDER BY key_part2 group by key_part4

6 the index itself is stored out of order, such as the hash index, which cannot take advantage of the order of the index.

7 the order by field is only indexed with the prefix, key idx_col (col (10))

Select * from T1 order by col

8 for the associated query with join, not all the sort fields come from the first table, and the type value of the first table of the execution plan is not const using explain.

When sorting operations cannot be avoided, how to optimize them? Obviously, the priority is to choose the sorting method of using index, and when it is impossible to satisfy the use of index sorting, let MySQL choose to use the second one-way algorithm for sorting as much as possible. This can reduce a large number of random IO operations and greatly improve the efficiency of sorting.

1 increase the setting of max_length_for_sort_data parameters

In MySQL, the decision to use the old sorting algorithm or the improved sorting algorithm is determined by the parameter max_length_for_sort_data. When the maximum length of all returned fields is less than this parameter value, MySQL chooses the improved sorting algorithm and, conversely, the old algorithm. So, if you have enough memory for MySQL to store the unsorted fields that need to be returned, you can increase the value of this parameter to let MySQL choose to use the improved sorting algorithm.

2 remove unnecessary return fields

When there is not enough memory, you can't simply force MySQL to use the improved sorting algorithm by forcibly increasing the above parameters, otherwise MySQL may have to divide the data into many segments and then sort them, which may outweigh the gain. At this point, you need to remove the unnecessary return fields and make the length of the returned result adapt to the limitation of the max_length_for_sort_data parameter.

At the same time, we should also standardize the MySQL development specification and avoid large fields as far as possible. When a select query column contains large fields blob or text, MySQL chooses regular sorting.

The optimizer selects which filesort algorithm to use. It normally uses the modified algorithm except when BLOB or TEXT columns are involved, in which case it uses the original algorithm.

3 increase the sort_buffer_size parameter setting

If this value is too small, and if you return too many entries at a time, you are likely to sort it many times, and then finally concatenate the sorting results of each time, which will be slower. Increasing sort_buffer_size is not for MySQL to choose an improved version of the sorting algorithm, but for MySQL to minimize the segmentation of the data to be sorted in the sorting process. Because of the segmentation, MySQL has to use temporary tables for exchange sorting. But this value is not the bigger the better:

1 sort_buffer_size is a connection-level parameter that allocates the set memory at once when each connection needs to use this buffer for the first time.

2 sort_buffer_size is not the bigger the better, because it is a connection-level parameter, too large setting + high concurrency may deplete system memory resources.

3 it is said that when the sort_buffer_size exceeds 2m, mmap () instead of malloc () will be used for memory allocation, resulting in inefficiency.

Four reference articles

[1] MySQL order by tuning official documentation

[2] MySQL sorting principle and case study

[3] Taobao MySQL monthly report

The principle analysis part of this article adopts the lazy strategy, which is extracted directly from the blog [2] of my former colleague Yanxu, a strong Amway Yanxu blog, MySQL source researcher, proficient in MySQL business and underlying operation and maintenance.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report