In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article is about why the commonly used engine of MySQL uses B+ tree as an index by default. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it.
I. Preface
In order to explain this problem clearly, A Fan first takes you to understand what an index is.
I remember when I first studied the database, the teacher liked to use the catalogue of books to compare the index of the database, and told us that the index can be like a catalog, so that we can find the data we want to find more quickly.
If this is the first time to touch the index, this metaphor can give us an intuitive impression. But when we go deep into the study of indexes, we can't continue to hold the idea that indexes are catalogues. we have to jump out and think about the nature of indexes.
Second, the nature of the index
In the absence of an index, we can only look up the data row by row from beginning to end, and each row of data means that we have to go to the corresponding location on the disk to read a piece of data.
If the query of a piece of data is divided into two steps: querying to the disk and comparing the query conditions, the query time to the disk will be much longer than the time to compare the query conditions, which means that in a query, the disk io takes up most of the time. Furthermore, the efficiency of a query depends on the number of disk io, if we can reduce the number of disk io as much as possible in a query, then we can speed up the query.
Now that we know that reducing disk io can speed up queries, we need to focus on how to reduce disk io. If you query row by row according to the original table, n pieces of data will be queried n times, that is, the time complexity of O (N). In order to reduce the number of disk io, we must use a data structure with lower query time complexity to save the data.
This kind of data structure with low query time complexity is called index. So in popular terms, an index is actually a kind of data structure, and there are a variety of data structures that can act as an index.
Third, the choice of index
Since the index is a kind of data structure that is easy to query, if you have some understanding of the data structure, the tree structure will be the first choice. After all, the tree structure generally has O (logN) query time complexity, and the performance of inserting and deleting data is relatively average. (you may say array, hash table query speed is also very high, this will also be analyzed later)
Although we all know that the most commonly used engines in Mysql, such as InnoDB and MyISAM, eventually choose the B+ tree as the index, I intend to start with the most common binary tree, deduce why I chose the B+ tree as the index, and compare the advantages and disadvantages of several tree structures when acting as indexes.
Binary tree
The problem with the most common binary tree is that it cannot guarantee the query time complexity of O (logN). Let's look at the following figure:
As the inserted element increases gradually, the element is always inserted on the right, and a good binary tree eventually becomes a "linked list". In this extreme case, the query time complexity of the binary tree is no longer O (logN), but degenerates to O (N), which obviously does not meet the requirements of the index.
Balanced binary tree (red-black tree)
For a balanced binary tree like the red-black tree, no matter how to insert elements, it can adjust the height of the tree by some rotation methods to maintain the query efficiency of the whole tree at O (logN), as shown in the following figure:
In this way, he has met the necessary conditions for becoming an index, but in the end, not choosing him as the index shows that there are still some deficiencies. If you take a closer look, you can find that each node of the balanced binary tree has only two child nodes, and if the amount of data in a table is particularly large, the height of the whole tree will rise. If a ten-million-level table is indexed by a balanced binary tree, the tree height will reach more than 20 layers. This means that it takes more than 20 disk io to make a query, which is a lot of overhead.
So is it possible to maintain a tree structure with a smaller height in the case of a large amount of data?
B tree and B + tree
The answer is B-tree. As we mentioned above, the bottleneck of a balanced binary tree is that there are only two child nodes in a node, while a node in the B tree can store N child nodes, which perfectly solves the problem of tree height. We can call B tree a balanced multi-tree, and B-tree as an index is shown in the following figure:
Picture source network
However, there is still room for optimization by using the structure of the B-tree as the index. let's take a look at the final B + tree, and then carefully analyze what improvements the B + tree has made on the basis of the B tree, and why the B + tree is finally competent for the index:
Picture source network
From the picture, we can see that the B + tree is also a multi-difference balance tree, which solves the problem of tree height as well as the B tree.
Improvement point 1:
But if we look carefully, we can find that the nodes of the B-tree store both the index and the corresponding data of the table, while the non-leaf nodes of the B + tree do not store data, only the index, and all the data is stored on the leaf node.
Why make such an improvement? Let's do the math once and we'll know.
Suppose the tree height is 2, the primary key ID is bigint type, the length is 8 bytes, the node pointer is 6 bytes, the size of a row of data record is 1k, an io operation can get a page of 16k data.
When the index is B + tree, the root node can store: 16k / (6 + 8) = 1170 index pointers; at the first layer, it can point to 1170 * 1170 = 1368900 index pointers; to the bottom leaf node, a node can store 16k / 1k = 16 records and a total of 1170 * 1170 * 16 = 21902400 records.
In the case of B-tree, because non-leaf nodes use a lot of space to store data, there must be fewer index pointers. Eventually, if the whole tree wants to store as much data as B + tree, it must increase the height of the tree, which increases the disk io, so the performance of B + tree as an index is higher than that of B tree.
Improvement point 2:
Pointers are used to connect leaf nodes to improve the efficiency of interval access. If we want to do a range query, we can easily traverse through the pointers between the leaf nodes of the B+ tree, reducing unnecessary disk io.
Seeing here, I believe you have a preliminary understanding of why Mysql's commonly used engines use B+ trees as indexes by default. Just keep in mind that indexes exist to reduce disk io and improve query performance.
Although the query efficiency of the array is high, the efficiency of adding and deleting is low, because the index has to be maintained when the records are added and deleted, which will lead to the low efficiency of adding or deleting a record in the case of a large amount of data.
The above is why the commonly used engines of MySQL use B+ tree as the index by default. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.