The reason why MySQL index should use B+tree 07/19 Update SLTechnology News&Howtos

The reason why MySQL index should use B+tree

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Understand the reason why MySQL indexes use B+tree? This problem may be often seen in our daily study or work. I hope you can gain a lot from this question. The following is the reference content that the editor brings to you, let's take a look at it!

When you come across a slow SQL that needs to be optimized, what is the first optimization that you can think of?

Most people's first reaction may be to add indexes, which in most cases can improve the query efficiency of a SQL statement by several orders of magnitude.

The nature of an index: a data structure used to quickly find records.

Common data structures for indexes:

Binary tree red-black tree Hash table B-tree (B-tree, not B-minus tree) B+tree

Graphical data structure website: https://www.cs.usfca.edu/~galles/visualization/Algorithms.html

Index query

As we all know, if the SQL statement select * from t where col = 88 is not searched by the index, the normal search is a full table scan: start from the first row of the table, and compare the value of the col field of each row with 88, which is obviously very inefficient.

If we go to the index, the query process will be completely different (suppose we now use a balanced binary tree data structure to store our index columns)

At this point, the storage structure of the binary tree (Key-Value): Key is the data of the index field, and Value is the disk file address of the row in which the index is located.

When 88 is finally found, you can take out the disk file address corresponding to its Value, and then go directly to the disk to find this row of data, which will be much faster than a full table scan.

But in fact, the bottom layer of MySQL does not use binary tree to store index data, but uses B+tree (B+ tree).

Why not use binary tree

Suppose we record the id index column with a normal binary tree at this time, and we maintain the binary tree index field every time we insert a row of records.

At this point, when I look for the piece of data with id = 7, the search process is as follows:

At this time, I looked for the row record of id = 7 for 7 times, which is not very different from our full table scan. It is obvious that binary tree is not suitable as an index data structure for this kind of data column increasing in turn.

Why not use Hash table

Hash table: a fast search data structure, search time complexity O (1)

Hash function: converts an arbitrary type of key to a subscript of type int

Assuming that the id index column is recorded in the Hash table at this time, we maintain the Hash table index field at the same time as we insert a row of records.

At this time, we started to find the tree node with id = 7 only once, which is very efficient.

However, the index of MySQL still does not use the Hash table that can be accurately located. Because it does not apply to scope queries.

Why not use red and black trees

The red-black tree is a specialized AVL tree (balanced binary tree), which maintains the balance of the binary search tree through specific operations during insert and delete operations.

If a binary search tree is a red-black tree, then any of its sub-trees must be a red-black tree.

Suppose we record the id index column with a red-black tree at this time, and we maintain the red-black tree index field at the same time as we insert a row of records.

During the insertion process, it will be found that it is different from the ordinary binary tree in that when the height difference between the left and right subtrees of a tree is more than 1, it will spin to maintain the balance of the tree.

At this time, I started looking for the tree node with id = 7 only three times, which is faster than the so-called ordinary binary tree.

However, the index of MySQL still does not use red-black trees which are excellent in precise location and range query.

Because when the amount of MySQL data is very large, the volume of the index will also be very large, so it may not be able to store it in it, so you need to read and write from the disk. If the level of the tree is too high, the more times you read and write to the disk (I O interaction), the worse the performance will be.

B-tree

At present, the only deficiency of the red-black tree is that the height of the tree is uncontrollable, so now our starting point is the height of the tree.

At present, only one storage element is allocated to a node. If we want to control the height, we can allocate more space for a node and let it store multiple elements horizontally. At this time, the height can be controlled. Such a transformation process becomes B-tree.

B-tree is an absolutely balanced multipath tree. There are two more concepts in its structure.

Degree (Degree): the number of child nodes (subtrees) that a node has. (some places explain B-tree in terms of degrees, please explain it here.)

Order: the maximum number of child nodes of a node. (usually represented by m)

Keyword: data index.

An m-order B-tree is a balanced m-path search tree. It may be an empty tree, or it may satisfy the following characteristics:

Except for the root node and the leaf node, each node has at least ⌈ 2m ⌉ child nodes

⌈ 2m ⌉ is m / 2 and then round up

The number of keywords contained in each non-root node satisfies: ⌈ 2m ⌉-1 ≤ j ≤ m-1

The keywords of nodes are arranged incrementally from left to right, and the non-leaf nodes with k keywords happen to have (k + 1) child nodes.

All the leaf nodes are on the same layer.

Take the meaning of the name (digression, relax)

Here is an excerpt from Wikipedia

Rudolf Bayer and Ed M. McCreight invented B-tree in 1972 while working at the Boeing Research Laboratory (Boeing Research Labs), but they did not explain what B stands for, if any.

Douglas Comer (Douglas Comer) explains that the two authors have never explained the original meaning of B-tree. We may think that balanced, broad or bushy might be suitable. Others suggested that the letter B stands for Boeing. It comes from his sponsorship, but it seems more appropriate to think of B-tree as a Bayer tree.

In his paper entitled "CS144C classroom lecture about disk storage and B-trees" published in May 1980, Donald Knuth speculated on the meaning of the name B-tree and suggested that B might mean the name Boeing or Bayer.

Find

The search for B-tree is actually very similar to a binary tree:

A binary tree has one keyword and two branches on each node, and each node on B-tree has k keywords and (k + 1) branches.

The search of binary tree only considers whether to go left or right, while in B-tree it needs to be decided by multiple branches.

The search for B-tree is divided into two steps:

First look for the node, because B-tree is usually stored on disk, so this step requires disk IO operation; look for keywords, when you find a node, read the node into memory, and then find the keyword through sequential or half search. If the keyword is not found, you need to determine the size to find the appropriate branch to continue the search. Operation flow

Now you need to find the element: 88

First time: disk IO

Second time: disk IO

Third time: disk IO

Then there is a memory comparison, which is compared with 70 and 88 respectively, and finally 88 is found.

From the search process, it is found that the number of B-tree comparisons and disk IO are actually not much different from the binary tree, so it does not seem to have any advantage.

But if you take a closer look, you will find that the alignment is done in memory, does not involve the disk IO, and the time consuming is negligible.

In addition, a node in B-tree can store a lot of keywords (the number is determined by the order). The number of nodes generated by the same number of keywords in B-tree is far less than that in the binary tree, and the number of nodes in the difference is equal to the number of disk IO. After reaching a certain number, the difference in performance becomes apparent.

insert

When B-tree wants to insert keywords, it always goes directly to the leaf node to operate.

Find the leaf node to be inserted according to the keyword to be inserted; because the maximum number of child nodes of a node (order) is m, it is necessary to determine whether the number of keywords of the current node is less than (m-1). Yes: insert directly No: a node split occurs. The node is divided into left and right parts by the middle keyword of the node, and the middle keyword is put into the parent node. Operation flow

For example, we now need to insert elements in B-tree with Max Degree (order) 3: 72.

Find the leaf node to be inserted

Node splitting: it should be on the same disk block as [70Power88], but when a node has 3 keywords, it may have 4 child nodes, which exceeds the maximum degree of 3 defined by us. So splitting must be carried out at this time: the node is divided into two with the intermediate keyword as the boundary, a new node is generated, and the intermediate keyword is moved up to the parent node.

Tip: when there are two middle keywords, the left keyword is usually split up.

Delete

Delete operation will be more troublesome than finding and inserting, because the keyword to be deleted may or may not be on the leaf node, and delete may also lead to B-tree imbalance, but also need to merge, rotate and other operations to maintain the balance of the whole tree.

Take any tree (order 5) as an example.

Thank you for reading! After reading the above, do you have a general idea of why you want to use B+tree in MySQL indexes? I hope the content of the article will be helpful to all of you. If you want to know more about the relevant articles, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.