In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains the "MySQL database index order by sorting is what", the article explains the content is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in-depth, together to study and learn "MySQL database index order by sorting is what" it!
Sort this word, my first feeling is that almost all App have ranking places, Taobao goods are sorted by purchase time, bilibili's comments are sorted by popularity.
For MySQL, what's the first thing that comes to mind when it comes to sorting? Keyword order by? Would it be better to have an index for order by fields? The leaf nodes are already sequential? Or should we try not to sort within MySQL?
The cause of the matter
Now suppose you have a list of the user's friends:
CREATE TABLE `user` (`id` int (10) AUTO_INCREMENT, `friend_ id` int (10), `friend_ addr` varchar (1000), `friend_ name` varchar (1000), PRIMARY KEY (`id`), KEY `user_ id` (`user_ id`) ENGINE=InnoDB
There are currently two points to pay attention to in the table:
User's user_id, friend's name friend_name, friend's address friend_addr
User_id has an index.
One day, Little Apes, a junior development engineer, received a demand from Xiao Wang, the junior product manager:
Xiao Wang: comrade Little Ape, now we need to add a function in the background, which supports that all his friends' names and addresses can be found according to the user id, and the names of friends are required to be sorted by dictionary.
Ape: OK, this function is simple. I'll be online right away.
So the little ape wrote this sql:
Select friend_name,friend_addr from user where user_id=? Order by name
At the moment of the electric fire, the little ape toe went online with arrogance, and all went well, until one day an operator classmate led to such an inquiry:
Select friend_name,friend_addr from user where user_id=10086 order by name
However, this query is much slower than usual, the database reported a slow query, the little ape panicked at this time b: what's going on? User_id clearly has an index, and tactfully I only use select friend_name,friend_addr, not select *. The little ape kept comforting himself at this time to calm down, and then suddenly thought of an explain command to use explain to check the sql implementation plan. When the little ape used explain, he found that there was a dangerous word in the extra field: using filesort.
"this query unexpectedly uses the legendary file sorting, but if a person does not have many friends, even if he uses file sorting, it should be very fast", unless the user_id=10086 has a lot of friends, and then the little ape checked, this user has more than 10w friends.
Lost in contemplation, the little ape thought: this pot seems to have been memorized, the 10w data is a little big, and what is the sorting principle of this using filesort?
Sorting of anatomical files
Some people may say that the above problem is that 10w data is too large, even if it is not sorted, this is actually reasonable. 10w data can be found at once, whether it is the occupation of MySQL memory buffer or the consumption of network bandwidth. What if I add limit 1000? The problem of network bandwidth must have been solved, because the overall size of the packet has become smaller, but the problem of using filesort has not been solved. You may have questions here. Is using filesort sorted in a file? How exactly is it sorted in the file? Or I ask: what would you do if you were designed to sort it? With these questions and thoughts, let's take a look at what technical difficulties will be involved in using filesort and how to solve them.
First of all, our user_id has an index, so we will first retrieve our target data, that is, user_id=10086 data on the user_id index tree, but we are going to query the friend_name and friend_addr fields. Unfortunately, we cannot find these two field values by relying on the user_id index alone.
So we need to go back to the table and look it up in the primary key index tree through the primary key corresponding to user_id. Ok, we found the friend_name and friend_addr fields of the first user_id=10086.
What should I do then? It must be wrong to go back directly, because I need to sort the friend_name. How do I sort it? If you haven't found all the data, you have to put the found data in one place first, which is sort_buffer. When you see the name, I think you should guess. Yes, sort_buffer is the buffer used for sorting in this case. It should be noted that each thread has a separate sort_buffer. The main purpose of doing this is to avoid lock competition caused by multiple threads operating on the same block of memory.
When the friend_name and friend_addr of the first data have been put into the sort_buffer, of course, the synchronization step will be repeated until all the friend_name and friend_addr of the user_id=10086 are put into the sort_buffer.
The data in the sort_buffer has been put in, and then it's time to sort. Here the MySQL will quickly arrange the friend_name. After passing the fast row, the friend_name in the sort_buffer will be in order.
Finally, return to the first 1000 entries in sort_buffer, ending.
Everything looks slippery, but sort_buffer takes up memory space, which is embarrassing. Memory itself is not infinite, it must have an upper limit, and of course sort_buffer can't be too small, if it's too small, it doesn't make much sense. In the InnoDB storage engine, this value defaults to 256K.
Mysql > show variables like 'sort_buffer_size';+-+-+ | Variable_name | Value | +-- + | sort_buffer_size | 262144 | +-+-+
In other words, if the data you want to put into sort_buffer is greater than 256K, then fast ranking in sort_buffer will certainly not work. At this point, you may ask: can't MySQL automatically expand according to the data size? Well, MySQL is a multithreaded model. If each thread is expanded, then the buffer assigned to other functions will be smaller (such as change buffer, etc.), which will affect the quality of other functions.
At this time, we have to sort in a different way. Yes, this is the real file sorting, that is, the temporary files on disk. MySQL will adopt the idea of merge sorting, dividing the data to be sorted into several parts, and each piece of data will be sorted in memory and put into temporary files. Finally, the data of these sorted temporary files will be merged and sorted again, ok, the typical principle of divide and conquer. Its specific steps are as follows:
First, the sorted data is divided, and each piece of data can be put into sort_buffer.
Sort each piece of data in sort_buffer, sort it, and write it to a temporary file
When all the data is written to the temporary file, for each temporary file, the interior is orderly, but they are not a whole, the whole is not orderly, so then you have to merge the data.
Suppose there are two temporary files, tmpX and tmpY, in which part of the data is read from tmpX into memory, and then part of the data is read from tmpY into memory. Here you may wonder why it is a part rather than a whole or a single? Because the disk is slow at first, try to read more data into memory each time, but you can't read too much, because there are buffer space limitations.
For tmpX, what is read is tmpX [0-5], for tmpY, what is read is tmpY [0-5], so you only need to compare it this way: if tmpX [0]
< tmpY[0],那么 tmpX[0] 肯定是最小的,然后 tmpX[1] 和 tmpY[0] 比较,如果 tmpX[1] >TmpY [0], then tmpY [0] must be the second smallest..., so pairwise comparison can finally merge tmpX and tmpY into an ordered file tmpZ, multiple such tmpZ merge again. Finally, all the data can be merged into a large ordered file.
The sorting of files is very slow. Is there any other way?
Through the sorting process above, we know that if the data to be sorted is very large and exceeds the size of sort_buffer, then file sorting is required. File sorting involves batch sorting and merging, which is very time-consuming. The root cause of this problem is that sort_buffer is not enough. I don't know if you find that we don't have our friend_name to sort, but we also stuff friend_addr into sort_buffer. In this way, the size of a single line of data is equal to the length of friend_name + the length of friend_addr. Can only friend_name fields be stored in sort_buffer? in this case, the overall use of space will be large, and temporary files may not be needed. Yes, this is another kind of sorting that we are going to talk about next. Optimize rowid sorting.
The idea of rowid sorting is not to put the unnecessary data into the sort_buffer, so that only the necessary data is retained in the sort_buffer, so what do you think is the necessary data? Just friend_name? I'm sure it won't work. What will friend_addr do after sorting? So put the primary key id in it. After this arrangement, you can go back to the table through id and get the friend_addr, so its general flow is as follows:
According to the user_id index, find the target data, then go back to the table and put only id and friend_name into the sort_buffer
Repeat step 1 until all the target data is in the sort_buffer
Sort the data in sort_buffer by the friend_name field
After sorting, go back to the table again according to id and return to friend_addr until 1000 pieces of data are returned.
There are actually a few points to pay attention to:
In this way, you need to return the table twice.
Although sort_buffer is small, if the amount of data itself is still large, it should still be sorted by temporary files.
So the question is, how should MySQL choose between the two ways? You have to decide which method to take according to a certain condition. This condition is to enter the length of a single line of sort_buffer. If the length is too large (the length of friend_name + friend_addr), rowid will be used. Otherwise, the standard of length is based on max_length_for_sort_data, and the default value is 1024 bytes:
Mysql > show variables like 'max_length_for_sort_data' +-- +-+ | Variable_name | Value | +-+-+ | max_length_for_sort_data | 1024 | +-+ -+ do not want to return to the table Don't want to sort again
In fact, no matter which method is mentioned above, they all need to go back to the table + sort, because there is no target field on the secondary index, and the sorting is because the data is not ordered, so if there are target fields on the secondary index and they are already sorted, it will have the best of both worlds.
Yes, it is a federated index. We only need to build a federated index (user_id,friend_name,friend_addr), so that I can get the target data through this index, and friend_name is already sorted, and there is also a friend_addr field. With a trick, there is no need to return to the table, no need to sort again. So for the above sql, the general process is as follows:
Find the data of user_id=10086 through the federated index, then read the corresponding friend_name and friend_addr fields and return them directly, because friend_name is already sorted and no additional processing is needed.
Repeat the first step, follow the leaf node and look back until you find the first data that is not 10086.
Although federated index can solve this problem, it must not be established blindly in practical application. It is necessary to judge whether it needs to be established according to the actual business logic. If there are not often similar queries, it does not need to be established, because federated indexes will take up more storage space and maintenance costs.
Thank you for your reading, the above is the content of "what is the MySQL database index order by sorting". After the study of this article, I believe you have a deeper understanding of what the MySQL database index order by sorting is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.