In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Preface
Sorting is a basic function in a database, and MySQL is no exception. The specified result set can be sorted through the Order by statement.
In fact, not only Order by statements, Group by statements, Distinct statements will implicitly use sorting.
In the actual business scenario, some developers often have an orderby,SQL that looks very slick, while the actual business application leads to GAME OVER.
Firstly, the internal principle of sorting by MySQL is introduced, and the parameters related to sorting are introduced. Finally, several "strange" sorting are given to talk about the problem of sorting consistency.
1. Sorting algorithm:
For SQL which cannot avoid sorting by index, the database has to sort by itself to meet the business needs, and "USING TEMPORARY; USING filesort" appears in the execution plan.
Sometimes filesore does not mean that even file sorting can be sorted in memory, only determined by the parameter sort_buffer_size and the size of the result set.
There are three main internal sorting methods in MySQL: general sorting, optimized sorting and priority queue sorting. Suppose the table structure is as follows:
CREATE TABLE `t1` (
`id`int (11) NOT NULL AUTO_INCREMENT
`col1` varchar (64) COLLATE utf8mb4_unicode_ci NOT NULL
`col2` varchar (64) COLLATE utf8mb4_unicode_ci NOT NULL
`col3` varchar (64) COLLATE utf8mb4_unicode_ci DEFAULT NULL
PRIMARY KEY (`id`)
KEY `col1` (`col1`, `col2`)
) ENGINE=InnoDB AUTO_INCREMENT=10 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
SELECT col1,col2,col3 FROM T1 WHERE col1= "100" ORDER BY col2
a. General sorting
(1)。 Get the records that meet the WHERE condition from table T1
(2)。 For each record, take out the primary key + sort key (id,col2) of the record and put it into sort buffer
(3)。 Sort if the sort buffer can hold all the id,col2 pairs that meet the criteria; otherwise, when the sort buffer is full, sort and solidify it into a temporary file. (the sorting algorithm uses a fast sorting algorithm)
(4)。 If temporary files are generated in sorting, it is necessary to use the merge sorting algorithm to ensure that the records in the temporary files are orderly.
(5)。 Cycle through the above process until all records that meet the criteria participate in the sorting
(6)。 Scan the sorted (id,col2) pairs and use id to retrieve the columns to be returned by SELECT (col1,col2,col3)
(7)。 Return the obtained result set
Judging from the above process, whether or not to use file sorting depends on whether sort buffer can accommodate (id,col2) pairs that need to be sorted. The size of this buffer is controlled by the sort_buffer_size parameter. In addition, a sort needs two IO, one is id,col2, the second is col1,col2,col3. Because the returned result set is sorted according to col2, so the id is out of order, and a large number of random IO will be generated when col1,col2,col3 through out-of-order id. An optimization for the second MySQL itself
That is, before fishing, the id is sorted and put into the buffer. The size of the cache is controlled by the parameter read_rnd_buffer_size, and then the records are sequentially fished, and the random IO is converted into sequential IO.
b. Optimized sorting
The regular sorting method requires two additional IO in addition to the sort itself. Compared with the conventional sorting, the optimized sorting method reduces the second IO. The main difference is that putting in sort buffer is not (id,col2), but (col1,col2,col3). Because the sort buffer contains all the fields needed for the query, it can be returned directly after the sorting is completed, without the need to reclaim the data. The cost of this approach is that the number of col1,col2,col3 that can be stored in a sort buffer of the same size is less than (id,col2). If the sort buffer is not large enough, it may lead to the need to write temporary files, resulting in additional IO. Of course, MySQL provides the parameter max_length_for_sort_data.
Only when the sorting tuple is less than max_length_for_sort_data, can we use the optimized sorting method, otherwise we can only use the conventional sorting method.
c. Priority queue sorting
In order to get the final sort result, in any case, we need to sort all the records that meet the criteria before we can return. Then, compared to the optimized sorting method,
At the spatial level, the black box is optimized and a new sorting method, priority queue, is added, which is realized by heap sorting. The heap sorting algorithm feature can solve the sorting problem such as limit M ~ (N) N. Although all elements are still needed to participate in the sorting, only the sort buffer space of M ~ N tuples is needed. There is basically no problem that temporary files need to be merged and sorted because there is not enough sort buffer.
For the ascending order, the large top heap is used, and the elements in the final heap form the smallest N elements, while for the descending order, the small top heap is used, and finally the elements in the heap form the largest N elements.
2. Sort optimization and index usage
In order to optimize the sorting performance of SQL statements, the best case is to avoid sorting, rational use of the index is a good method.
Because the index itself is ordered, if an appropriate index is established on the fields that need to be sorted, then the sorting process can be skipped and the query speed of SQL can be improved.
Use some typical SQL to show which indexes can be used to reduce sorting and which can't.
1. Select * from T1 order by col1,col2
2. Select * from T1 where col1= "100" order by col2
3. Select * from T1 col1 > "100" order by col1 asc
4. Select * from T1 where col1= "100" and col2 >" 100" order by col2
3. You can't use the index to avoid sorting
The number of records scanned by the index exceeds 30%, and the full table scan is changed.
In the federated index, the use range query of the first index column
In the federated index, the first query condition is not the leftmost index column
Inconsistent ascending and descending order cannot be used
Sort fields cannot be used in multiple indexes (a federated index, a single-column index, and a SQL can only use one index at a time)
The sort field is a separate column and cannot use the index
4. Business case, add a reasonable index
1. Business DDL:
2. View the original SQL execution plan:
3. Optimized SQL execution plan-1
3. Optimized SQL execution plan-2
By rewriting the original SQL and adding the corresponding index, we can achieve the optimization of SQL and the optimization of running efficiency.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.