In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the knowledge about "MySQL priority queue". In the actual case operation process, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!
0. Throw the question first.
Assuming category has no index and duplicate values, the combination of order by category and limit will not match expectations.
Problem recurrence:
Table structure (two fields)
CREATE TABLE `ratings` ( `id` int(11) NOT NULL AUTO_INCREMENT, `category` int(11) DEFAULT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=11 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_general_ci;
Sort all data by category: select * from ratings order by category;
idcategory115110132426292237383
Select * from ratings order by category limit 5;
The expected ID sequence is 1 5 10 3 4.
However, the actual results were as follows:
idcategory11101513242
How fat? MySQL bug?
Some students may have encountered this problem, Baidu or Google once solved it, have you ever thought that the method you found is the optimal solution? How did anyone come up with this idea? Why does MySQL do this, does it have to do with the version?
First throw out the conclusion:
HarmonyOS Technology Community
The optimal solution is followed by a unique sorting field with column values, such as: order by category,id;
Why does MySQL do this? The answer is fast! (MySQL 5.6 and later)
The suboptimal solution is to index the category after order by (why suboptimal? After reading this article, you will have the answer);
The following lesson represents the production process that will restore these three conclusions.
1. optimal solution
MySQL documentation 8.2.1.19 LIMIT Query Optimization describes this scenario as follows:
If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.
One factor that affects the execution plan is LIMIT, so an ORDER BY query with and without LIMIT may return rows in different orders.
To sum it up:
When there are duplicate values in the ORDER BY column, the order of the data returned by the ORDER BY statement will be different due to the existence of LIMIT.
This is MySQL's default optimization for this scenario. If you need to ensure that the order of LIMIT is consistent, the official also gives a way:
If it is important to ensure the same row order with and without LIMIT, include additional columns in the ORDER BY clause to make the order deterministic.
That is, add an additional sorting field (such as ID field) after ORDER BY.
The above description first appeared in MySQL 5.6 documentation, and this optimization for ORDER BY LIMIT has been introduced since this release.
Select * from ratings order by category,id;
So the question comes, why does MySQL do such a seemingly Bug optimization?
2. MySQL ORDER BY logic
As the name suggests, ORDER BY is sort.
Execute explain select * from ratings order by category limit 5;
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: ratings partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 10 filtered: 100.00 Extra: Using filesort 1 row in set, 1 warning (0.00 sec)
You can see that Extra: Using filesort indicates that sorting is required.
Under normal circumstances, MySQL will have two kinds of memory sorting and external sorting:
If the amount of data to be sorted is less than the sort buffer size, sorting is done in memory (quick sort);
If the amount of data to be sorted is greater than the sort buffer size, temporary files are used for external sorting (merge sort);
Obviously, these two sorts are all sorts of results, reasonable, whether there is LIMIT or not, are from the sorted results in order to take the required number of items, there is no LIMIT will not affect the order of the results returned.
MySQL version 5.6, however, does a minor optimization for ORDER BY LIMIT (when the sort field has no index and the column values are not unique): the optimizer uses priority queue when encountering ORDER BY LIMIT statements.
filesort.cc has the following pseudocode describing the optimization:
while (get_next_sortkey()) { if (using priority queue) push sort key into queue else { try to put sort key into buffer; if (no free space in sort buffer) { do { allocate new, larger buffer; retry putting sort key into buffer; } until (record fits or no space for new buffer) if (no space for new buffer) { sort record pointers (all buffers); dump sorted sequence to 'tempfile'; dump Merge_chunk describing sequence location into 'chunk_file'; } } if (key was packed) tell sort buffer the actual number of bytes used; } } if (buffer has some elements && dumped at least once) sort-dump-dump as above; else don't sort, leave sort buffer to be sorted by caller.
WL#1393: Optimizing filesystem with small limit:
Many web customers have to do "SELECT ... ORDER BY non_index_column LIMIT X", When X * is smaller than sort_buff_size we can use the following algoritm to speed up the sort: - Create a queue to hold 'limit' keys. - Scan through the table and store the first (last if DESC) keys in the queue - Return values from queue This is much faster than the current algoritm that works as:
The WorkLog records the effect of optimization: 10 to 20 times faster than a quicksort(interested students can read the original text).
So, for the sake of speed!
MySQL believes that this scenario is a problem of finding TOP N, which can be solved by using priority queue.
3. priority queue
Priority queue is actually heap, Java has java.util.PriorityQueue class, its essence is heap this data structure.
A simple explanation of what a stack is:
A heap is a complete binary tree;
The value of each node in the heap must be greater than or equal to (large top heap) or less than or equal to the value of each node in its subtree (small top heap).
If MySQL uses merge or fast sorting, all data needs to be sorted, and then the first few items of LIMIT are taken, and the remaining sorted data is wasted.
The priority queue can maintain a heap based on the number of LIMIT entries, and only need to pass all the data in this heap to get the result.
MySQL uses priority queues using the following statement:
SET optimizer_trace='enabled=on'; select * from ratings order by category limit 5; SELECT * FROM `information_schema`.` OPTIMIZER_TRACE`\G;"filesort_priority_queue_optimization": { "limit": 5, "chosen": true },
file_priority_queue_optimization.chosen = true
The following is a flowchart to restore the execution logic of priority queue (take LIMIT 5 as an example):
Note: The small top piles in the figure are sorted by category value
1. Take the first five pieces of data to form a small top heap:
1. Take the next row of data (6,2), find that 2 is less than the largest category 3 in the current heap, so delete (2,3) from the heap and put (6,2) into the heap:
1. Repeat step 2 until all the data that meet the query conditions have undergone comparison and are piled up. The data in the final heap is shown in the figure:
This is how to find the smallest 5-line category data through priority queue.
Finally, we can get the result by taking it out of the heap. After each minimum element out of the heap, the last element will be placed at the top of the heap and re-stacked according to the small top heap. The process is as shown in the figure:
Select * from ratings order by category limit 5;
4. Why is indexing suboptimal?
Obviously, according to the logic of ORDER BY, directly indexing the sorted field can also eliminate the memory sorting step, thus solving this problem.
But the index is not silver bullet, the extra category index will increase the maintenance cost of the table, if there is no obvious business need, simply to bypass the optimization of this priority queue and add index, class representatives think it is a bit more than worth it.
Especially when the table data volume is very large, the index volume can be considerable. Moreover, for the scenario in this article, category is used as a classification field, and the repetition rate will be relatively high. Even if there is a business SQL query by category, MySQL may not necessarily select this index.
In summary, for this scenario, I believe that order by category,id is the optimal solution to this problem.
PS: Will someone ask: It's none of my business, I've never written SQL with LIMIT!
Don't you write CRUD functions with pagination? PageHelper source code to understand?
"MySQL priority queue is what" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 244
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.