What is the reason why MySQL database indexes choose to use B+ tree? 07/02 Update SLTechnology News&Howtos

What is the reason why MySQL database indexes choose to use B+ tree?

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the reason why MySQL database index chooses to use B+ tree. It is very detailed and has certain reference value. Friends who are interested must read it!

Unary and binary search tree

(1) introduction to binary tree:

Binary search tree, also known as ordered binary search tree, satisfies the general properties of binary search tree, which means that an empty tree has the following properties:

1. If the left subtree of any node is not empty, the value of the left subtree is less than that of the root node.

2. If the right subtree of any node is not empty, the value of the right subtree is greater than that of the root node.

3. The left and right subtrees of any node are also binary search trees.

4. There are no nodes with equal key values.

The image above is a common binary search tree. The output can be sorted from small to large by traversing in the middle order: 2, 3, 5, 6, 7, 8.

Look for the above binary tree, such as the record with a key value of 5, first find the root, the key value is 6, 6 is greater than 5, so find the left subtree of 6, find 3; and 5 is greater than 3, and then find its right subtree; a total of 3 times. If you look for the same requirement 3 times in the order of 2, 3, 5, 6, 7, 8. Using the same method to find the record with a key value of 8, this time three searches are used, while a sequential search takes six times. The average search times are calculated as follows: the average search times of sequential search is (1 "2" 3 "4" 5 "6) / 6 = 3.3, and the average search number of binary search tree is (3" 3 "3" 2 "2" 1) / 6 "2.3 times. The average search speed of a binary search tree is faster than that of a sequential search.

(2) limitation and application

A binary search tree is randomly composed of n nodes, so in some cases, the binary search tree will degenerate into a linear chain with n nodes. As shown below:

If you look at the figure above, if our root node selection is the minimum or maximum number, then the binary search tree will be completely reduced to a linear structure. The average number of searches in the figure above is (1 "2" 3 "4" 5 "5) / 6" 3.16, which is similar to the sequential search. Obviously, the query efficiency of this binary tree is very low, so if we want to construct a binary search tree with maximum performance, it is necessary that the binary tree is balanced (the balance here shows that the height of this tree is larger than the height of the last input, which is unbalanced in the case of the same node), which leads to a new definition-balanced binary tree AVL.

2. AVL tree

(1) introduction

AVL tree is a binary search tree with balance condition, which is generally judged by the difference of balance factor and balanced by rotation. The height of the left and right subtree is not more than 1. Compared with the red-black tree, it is a strict balanced binary tree, and the balance condition must be satisfied (the height difference between the left and right subtrees of all nodes is not more than 1). No matter whether we are performing insert or delete operations, as long as the above conditions are not met, we have to keep the balance through rotation, and rotation is very time-consuming, so we can know that the AVL tree is suitable for situations where there are fewer inserts and deletions but more lookups.

From the above is a common balanced binary tree, we can see that the balance factor difference between the left and right subtrees of any node will not be greater than 1.

(2) limitation

Because the cost of maintaining this high balance is greater than the efficiency gain from it, there are not many practical applications, and more places are red-black trees that pursue local rather than very strict overall balance. Of course, if the insertion and deletion is not frequent in the application scenario, but the search is more demanding, then AVL is still better than the red-black tree.

(3) Application

1. Windows NT kernel exists widely.

Third, red and black trees

(1) introduction

A binary lookup tree that adds a storage bit to each node to represent the color of the node, which can be red or black. By limiting the way nodes are shaded on any path from root to leaf, the red-black tree ensures that no path is twice as long as the others. It is a weakly balanced binary tree (because it is balanced, it can be deduced, the height of the AVL tree is lower than the red-black tree at the same node), compared to the strict requirements of the AVL tree, its rotation times become less, so for the search, insert, delete operations in the case of more, we use the red-black tree.

(2) Nature

1. Each node is either red or black

2. The root node is black

3. Each leaf node (that is, the NULL pointer or NULL node at the end of the tree) is black

4. If a node is red, then its two sons are black

5. For any node, each path to the NULL pointer of the leaf tree contains the same number of black nodes.

6. Each path contains the same black node

(3) Application

1. Widely used in C++ 's STL, Map and Set are implemented with red-black trees.

2. The famous Linux process scheduling Completely Fair Scheduler manages the process control block with a red-black tree. The virtual memory area of the process is stored on a red-black tree. Each virtual address area corresponds to a node of the red-black tree. The left pointer points to the adjacent address virtual storage area, and the right pointer points to the adjacent high-address virtual address space.

3. The implementation of IO multiplexing epoll uses red-black tree to organize and manage sockfd to support rapid addition, deletion, modification and query.

4. The red-black tree is used to manage timer in Nginx, because the red-black tree is orderly, and the timer with the smallest distance can be obtained quickly.

5. Implementation of TreeMap in Java.

4. Bhand B + tree

Talking about the above three kinds of trees: binary search tree, AVL and red-black tree, it seems that we haven't figured out why MySQL uses B+ tree as the index implementation. Don't worry, let's first discuss what B-tree is.

(1) introduction

Our data in MySQL is usually placed on the disk, and there must be operations to access the disk when reading the data. There are two parts of the disk that are mechanically moving, namely, disk rotation and arm movement. Disk rotation is the number of revolutions per minute mentioned in the market, while disk movement is after the disk rotates to a specified position, after moving the magnetic arm to start reading and writing data. Then there is a process of locating the block in the disk, and positioning takes a lot of time to access the disk, after all, mechanical movement takes much more time than electronic movement. When large-scale data is stored on disk, positioning is obviously a very time-consuming process, but we can optimize it through B-tree to improve the efficiency of positioning when reading disk.

Why can class B trees be optimized? According to the characteristics of the class B tree, we can construct a multi-order class B tree, and then store the relevant information on the nodes as much as possible to ensure that the number of layers is as few as possible, so that we can find the information more quickly, and the disk has less Igo O operations, and the class B tree is a balanced tree, and the height from each node to the leaf node is the same, which ensures that every query is stable.

Generally speaking, the Bax B+ tree is a balanced multi-path search tree designed for disks or other storage devices (compared to the binary, each node of the B-tree has multiple branches). Compared with the red-black tree, in the case of the same node, the height of a Bhand B+ tree is much smaller than that of the red-black tree (mentioned in the performance analysis of the Bash B+ tree below). The time of operation on the Bax B + tree usually consists of the time of accessing the disk and the time of CPU computing, and the speed of CPU is very fast, so the operation efficiency of the B tree depends on the number of times to access the disk. The smaller the height of the B tree with the same number of keywords, the less time it takes for disk ISyno.

Notice that a B-tree is a B-tree.-it's just a symbol.

(2) the properties of B-tree

1. Define that any non-leaf node has at most M sons, and M > 2.

2. The number of sons in the root node is [2, M]

3. The number of sons of non-leaf nodes other than root nodes is [M]

4. Each node stores at least 2-1 (rounded up) and at most 1 keyword (at least 2 keywords).

5. The number of keywords in non-leaf nodes = the number of pointers to the son-1

6. Keywords of non-leaf node: K [1], K [2], … , K [M-1]; and K [I]

< K[i+1]； 7、非叶子结点的指针：P[1], P[2], …, P[M]；其中P[1]指向关键字小于K[1]的子树，P[M]指向关键字大于K[M-1]的子树，其它P[i]指向关键字属于(K[i-1], K[i])的子树； 8、所有叶子结点位于同一层；这里只是一个简单的B树，在实际中B树节点中关键字很多的，上面的图中比如35节点，35代表一个key(索引)，而小黑块代表的是这个key所指向的内容在内存中实际的存储位置，是一个指针。五、B+树（1）简介 B+树是应文件系统所需而产生的一种B树的变形树（文件的目录一级一级索引，只有最底层的叶子节点（文件）保存数据）非叶子节点只保存索引，不保存实际的数据，数据都保存在叶子节点中，这不就是文件系统文件的查找吗? 我们就举个文件查找的例子：有3个文件夹a、b、c， a包含b，b包含c，一个文件yang.c，a、b、c就是索引（存储在非叶子节点）， a、b、c只是要找到的yang.c的key，而实际的数据yang.c存储在叶子节点上。所有的非叶子节点都可以看成索引部分！（2）B+树的性质(下面提到的都是和B树不相同的性质) 1、非叶子节点的子树指针与关键字个数相同； 2、非叶子节点的子树指针p[i],指向关键字值属于[k[i],k[i+1]]的子树.(B树是开区间,也就是说B树不允许关键字重复,B+树允许重复)； 3、为所有叶子节点增加一个链指针； 4、所有关键字都在叶子节点出现(稠密索引). (且链表中的关键字恰好是有序的)； 5、非叶子节点相当于是叶子节点的索引(稀疏索引),叶子节点相当于是存储(关键字)数据的数据层； 6、更适合于文件系统；非叶子节点（比如5，28，65）只是一个key（索引），实际的数据存在叶子节点上（5，8，9）才是真正的数据或指向真实数据的指针。（3）应用　　 1、B和B+树主要用在文件系统以及数据库做索引，比如MySQL；六、B/B+树性能分析 n个节点的平衡二叉树的高度为H(即logn)，而n个节点的B/B+树的高度为logt((n+1)/2)+1；若要作为内存中的查找表，B树却不一定比平衡二叉树好，尤其当m较大时更是如此。因为查找操作CPU的时间在B-树上是O(mlogtn)=O(lgn(m/lgt))，而m/lgt>

1; so the operation time of O (mlogtn) is much longer than that of balanced binary tree when m is larger. Therefore, the use of B-tree in memory must take a smaller m. (usually take the minimum value mtree 3, when each internal node in the B-tree can have 2 or 3 children, this kind of third-order B-tree is called 2-3 tree).

7. Why is B + tree more suitable for database index than B tree?

1. The disk read and write cost of the B + tree is lower: the internal node of the B + tree does not have a pointer to the specific information of keywords, so its internal node is smaller than the B tree. If all the keywords of the same internal node are stored in the same disk block, the more keywords the disk can hold, the more keywords need to be searched for in memory at one time, and the relative IO read and write times will be reduced.

2. The query efficiency of B+ tree is more stable: because the non-terminal node is not the node that finally points to the content of the file, but only the index of keywords in the leaf node. Therefore, any keyword search must take a road from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

3. Because the data of the B + tree is stored in the leaf nodes, and the branch nodes are all indexed, it is convenient to scan the database, but the B tree because its branch nodes also store data, we need to find the specific data, we need to do a mid-order traversal to scan in order, so the B+ tree is more suitable for interval query, so it is usually used for database indexing.

PS: I saw someone say this on Zhihu, and I think it makes sense:

They believe that the main reason why the database index adopts B+ tree is that B-tree does not solve the inefficient problem of element traversal while improving the performance of IO. The B+ tree only needs to traverse the leaf nodes to traverse the whole tree. And scope-based queries are very frequent in the database, while B-trees do not support such operations or are too inefficient.

The above is all the content of the article "Why MySQL database index chose to use B+ tree". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.