Detailed introduction of Mysql Index Model B + Tree 07/02 Update SLTechnology News&Howtos

Detailed introduction of Mysql Index Model B + Tree

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "detailed introduction of Mysql index model B + tree". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

MySQL Index-B+ Tree (you'll understand after reading it)

An index is a data structure that helps us quickly locate the data we are looking for in a large amount of data.

The most vivid analogy of the index is the catalogue of books. Pay attention to the large amount of data here, the index only makes sense when there is a large amount of data. If I want to find 4 of this data in [1Magne2Magazine 3Power4], it is also very fast to retrieve the whole data directly, so there is no need to make efforts to build an index and look for it again.

Indexes fall into three categories in the MySQL database:

B+ tree index

Hash index

Full-text index

What we are going to introduce today is the B+ tree index in the InnoDB storage engine that is most commonly encountered in work development. To introduce B + tree index, we have to mention binary search tree and balance the three data structures of binary tree and B tree. B + trees evolved from the three of them.

Binary search tree

First of all, let's look at a picture:

As you can see from the figure, we have created an index of a binary lookup tree for the user table (user information table).

The circle in the figure is the node of the binary search tree, in which keys (key) and data (data) are stored. The key corresponds to the id in the user table and the data corresponds to the row data in the user table.

The characteristic of binary search tree is that the key value of the left child node of any node is smaller than that of the current node, and the key value of the right child node is larger than that of the current node. The top node is called the root node, and the node without child nodes is called the leaf node.

If we need to find the user information of id=12, use the binary search tree index we created. The search process is as follows:

Take the root node as the current node, compare 12 with the key value of the current node 10, 12 is greater than 10, and then we take the right child node of the current node > as the current node.

Continue to compare 12 with the key value 13 of the current node, and find that 12 is less than 13, taking the left child node of the current node as the current node.

Compare 12 with the key value 12 of the current node. 12 equals 12. If the condition is met, we take the data, that is, id=12,name=xm, from the current node.

Using the binary search tree, we only need 3 times to find the matching data. If we look it up one by one in the table, we need 6 times to find it.

Balanced binary tree

Above we explained that data can be found quickly by using binary search tree. However, if the binary search tree above is constructed like this:

At this point, you can see that our binary search tree has become a linked list. If we need to find the user information of id=17, we need to look it up 7 times, which is equivalent to a full table scan. The reason for this phenomenon is that the binary search tree becomes unbalanced, that is, the height is too high, which leads to the instability of search efficiency. In order to solve this problem, we need to ensure that the binary search tree is always balanced, so we need to use the balanced binary tree. Balanced binary tree, also known as AVL tree, requires that the height difference between the left and right subtrees of each node should not exceed 1 on the basis of satisfying the characteristics of binary search tree. Here is a comparison between a balanced binary tree and an unbalanced binary tree:

From the construction of the balanced binary tree, we can find that the binary tree in the first picture is actually a balanced binary tree.

The balanced binary tree ensures that the construction of the tree is balanced. when we insert or delete data that does not satisfy the imbalance of the balanced binary tree, the balanced binary tree will adjust the nodes on the tree to keep the balance. The specific mode of adjustment will not be introduced here. Compared with the binary search tree, the balanced binary tree has more stable search efficiency and faster overall search speed.

B tree

Because of the volatility of memory. In general, we choose to store the data and indexes in the user table in a peripheral such as a disk. However, compared with memory, the speed of reading data from disk will be hundreds of times or even ten thousand times slower, so we should try our best to reduce the number of times reading data from disk. In addition, when reading data from the disk, it is read according to the disk block, not one by one. If we can put as much data as possible into the disk block, the disk read operation will read more data, and the time it takes to find the data will be greatly reduced. If we use the data structure of the tree as the data structure of the index, we need to read a node from the disk, that is, a disk block, every time we find the data. We all know that balanced binary trees store only one key value and data per node. What does that mean? It means that only one key and data are stored in each disk block! What if we want to store huge amounts of data?

You can imagine that the binary tree will have a lot of nodes, and its height will be extremely high. When we look for data, we will do disk IO many times, and our efficiency of finding data will be very low!

In order to solve this disadvantage of balanced binary tree, we should look for a balanced tree in which a single node can store multiple key values and data. That's the B-tree we're going to talk about next.

B tree (Balance Tree) means balanced tree. The following picture shows a B tree:

The p node in the graph is the pointer to the child node, and the binary search tree and balanced binary tree also exist, because of the aesthetics of the graph, it is omitted.

Each node in the graph is called a page, and a page is the disk block we mentioned above. In MySQL, the basic unit of data reading is a page, so what we call a page here is more in line with the underlying data structure of the index in MySQL.

As can be seen from the above figure, compared with the balanced binary tree, each node stores more key values (key) and data (data), and each node has more child nodes, the number of child nodes is generally called order, the B-tree in the above figure is a third-order B-tree, the height will also be very low.

Based on this feature, the number of times for B-tree to find data and read disk will be very small, and the search efficiency of data will be much higher than that of balanced binary tree.

If we want to find the user information for id=28, the process we look up in the B tree above is as follows:

First find the root node, that is, page 1, and determine that 28 is between the key values 17 and 35, then we find page 3 according to the pointer p2 in page 1.

Comparing 28 with the key value in page 3, 28 is between 26 and 30, and we find page 8 according to the pointer p2 in page 3.

Comparing the key value 28 with the key value in page 8, it is found that there is a matching key value 28, and the user information corresponding to the key value 28 is (2828 BV).

B + tree

B + tree is a further optimization of B tree. Let's first take a look at the structure diagram of the B+ tree:

Based on the picture above, let's take a look at the difference between a B + tree and a B tree:

① B + tree non-leaf nodes do not store data, only key values are stored, while B-tree nodes store not only key values, but also data.

This is done because the size of the page is fixed in the database, and the default size of the page in InnoDB is 16KB.

If the data is not stored, more keys will be stored, the corresponding tree order (the child node tree of the node) will be larger, and the tree will be shorter and fatter, so the number of IO we need to find data for disk will be reduced again, and the efficiency of data query will be faster.

In addition, the order of the B + tree is equal to the number of key values. If one node of our B + tree can store 1000 key values, then the 3-layer B + tree can store 1000 × 1000 × 100 billion data.

Generally, the root node is resident in memory, so generally we need to find 1 billion data and only need 2 disk IO.

② because all the data indexed by the B+ tree is stored in the leaf node, and the data is arranged in order.

Then the B + tree makes range lookup, sorted lookup, grouping lookup, and de-relookup extremely easy. And B-tree because the data is scattered in each node, it is not easy to achieve this.

Interested readers may also find that the pages in the B+ tree above are connected by a two-way linked list, and the data in the leaf node is connected by an one-way linked list.

In fact, we can also add a linked list to each node of the B-tree above. These are not their previous differences, because this is how indexes are stored in MySQL's InnoDB storage engine.

In other words, the Btree index in the figure above is the real implementation of the Btree index in InnoDB, which should be a clustered index (which will be discussed below in clustered and nonclustered indexes).

As can be seen from the above figure, in InnoDB, we can find all the data in the table through a two-way linked list connection between data pages and an one-way linked list connection between the data in the leaf node.

The implementation of the B+ tree index in MyISAM is slightly different from that in InnoDB. In MyISAM, the leaf node of the B+ tree index does not store the data, but the file address where the data is stored.

Clustered index VS nonclustered index

When we introduced the B+ tree index in the previous section, we mentioned that the index in the figure is actually a way to implement a clustered index.

So what is a clustered index? In MySQL, B+ tree indexes are divided into clustered indexes and nonclustered indexes according to the different storage modes.

Here we focus on clustered and nonclustered indexes in InnoDB:

① clustered index (clustered index): in a table with InnoDB as the storage engine, the data in the table will have a primary key, and even if you do not create a primary key, the system will help you create an implicit primary key.

This is because InnoDB stores the data in the B+ tree, and the key value of the B+ tree is the primary key, and all the data in the table is stored in the leaf node of the B+ tree.

This kind of B+ tree index, which takes the primary key as the key value of the B+ tree index, is called the clustered index.

② nonclustered index (nonclustered index): a B + tree index built with column values other than the primary key as key values, which we call nonclustered indexes.

The difference between a nonclustered index and a clustered index is that the leaf node of a nonclustered index does not store the data in the table, but stores the corresponding primary key of the column. If we want to find the data, we also need to look up the data in the clustered index according to the primary key. This process of finding data according to the clustered index is called returning to the table.

Knowing the definition of clustered index and nonclustered index, we should understand this sentence: data is index, index is data.

Find data using clustered and nonclustered indexes

When we explained the B+ tree index, we did not talk about how to look up the data in the B+ tree, mainly because the concept of clustered index and nonclustered index has not been introduced.

Let's introduce the method of finding data by B + tree index by explaining how to find data in a data table through clustered index and nonclustered index.

Using clustered index to find data

It's the same B+ tree index diagram, and we should now know that this is the clustered index in which the data in the table is stored.

Now suppose we want to look for id > = 18 and id=18 and id=18 and id

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.