What are the reasons why mysql index adopts B+ tree structure? 07/09 Update SLTechnology News&Howtos

What are the reasons why mysql index adopts B+ tree structure?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail the reasons why the mysql index uses the B+ tree structure. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

Indexing improves query efficiency, just like the books we read. If we want to turn directly to a certain chapter, we just need to look at the catalog and find the number of pages according to the catalog.

In the computer, we need a data structure to store this directory, the common data structures are hash table, binary search tree, binary balanced tree (AVL), red-black tree, so why Innodb and MyISAM choose b + tree.

1. Hash table

A hash table is an array + linked list, using the subscript 0pm 1 pm 2 pm 3. Represents the location of its data. If you want to store data in a hash table, first use the hashing algorithm for this data (the basic is modular operation). If the length of the array is 13 and the module 13 is followed by 0-12, it happens to be the subscript of the corresponding data. if the calculated subscript is the same, it will follow the linked list at the subscript position.

Disadvantages:

Using hash storage needs to add all the data files to memory, which consumes memory space.

The search of hash is equivalent query, the speed is very fast, but there is no range rule between each data, but in the actual work, it is more range query, hash is not very suitable.

We cannot directly say that mysql does not use hash tables, but should be determined according to the storage engine. Memory storage engine uses hash tables.

two。 Binary search tree

Disadvantages:

As shown in the figure, the tilt problem may occur in extreme cases, and eventually become a linked list structure.

Causes the tree node to be too deep, thus increasing the IO of search, and now IO is the bottleneck of search.

3. Binary balanced tree-AVL

In order to keep the balance of the tree and avoid data skew, it is necessary to rotate the tree. Through left or right rotation, the length of the longest and shortest subtree cannot exceed 1. If it exceeds 1, it is not an AVL tree in the strict sense.

Disadvantages:

1. When the amount of data is very large, in order to keep the balance, it is necessary to rotate for 1mi n times, which is a waste of performance, the efficiency of insertion and deletion is very low, and the query efficiency is very high.

There are only two branches, and when the amount of data is large, the depth of the tree is still very deep.

4. Red and black tree

The longest subtree cannot be more than twice that of the shortest subtree, and there is a balance between insertion and query through discoloration and rotation.

Red-black tree is a variant of avl tree, which loses some query performance to improve insertion performance.

Disadvantages:

Similarly, there are only two branches, and the depth will still be very deep when the amount of data is large.

With the increase of data, the above three kinds of binary trees will eventually have too many nodes, and they have only 2 branches, so the number of IO is the same.

How to solve the problem that there are only two branches and the depth is too deep, so there is a B-tree and add branches.

5. B-Tree

First of all, don't read B minus tree, read B tree.

All the key values are distributed throughout the tree.

The search may end at the non-leaf node and do a search in the full set of keywords, and the performance is close to the binary search.

Each node has at most m subtrees.

The root node has at least 2 subtrees.

The branch node has at least 2 subtrees (all branch nodes except root and leaf nodes).

All leaf nodes are on the same layer, and each node can have up to 1 key in ascending order.

Such as the picture above: (only part of the picture is drawn, in fact, there is no limit, not only p1diary p2recoverp3)

Each node occupies a disk block, and on one node there are two keywords arranged in ascending order and three pointers to the root node of the subtree, which stores the address of the disk block where the child node is located. The three scope fields divided by the two keywords correspond to the scope fields of the data of the subtree pointed to by the three pointers. Taking the root node as an example, the data range of the subtree pointed to by the keyword 16 and 34 p1 pointer is smaller than that pointed to by the 16p2 pointer, and the data range of the subtree pointed to by the 16-34 pointer is greater than 34.

The process of finding keyword 28:

Find disk block 1 according to the root node and read it into memory. [the first disk Ihop O operation]

The comparison keyword 28 is in the interval (161.34), and the pointer p2 of disk block 1 is found.

Find disk block 3 according to the p2 pointer and read to memory. [second disk Ihop O operation]

The comparison keyword 28 is in the interval (251.31), and find the pointer p2 of disk block 3.

Find disk block 8 according to pointer p2 and read to memory. [the third disk Ihop O operation]

Find keyword 28 in the keyword list in disk block 8, and end.

Disadvantages:

Each node has key and contains data, and the storage space of each page is limited, so if the data is very large, it will result in a smaller number of key that each node can store.

When storing a large amount of data, it will lead to a greater depth, increase the number of io queries to the disk, and then affect the query performance.

6. B+ tree

The B + tree is an optimization based on the B tree, and the changes are as follows:

Each node of a B+ tree can contain more nodes for two reasons. The first reason is to reduce the height of the tree, and the second reason is to change the data range into multiple intervals. The more intervals, the faster the data retrieval.

Non-leaf nodes store only key, while leaf nodes store key and data.

The two pointers of the leaf node are connected to each other (in line with the characteristics of disk pre-reading), and the sequential query performance is higher.

Like the figure above: there are two head pointers on the B+ tree, one pointing to the root node and the other to the smallest leaf node of the keyword, and there is a chain ring structure between all leaf nodes (and data nodes). Therefore, there are two kinds of search operations for the B+ tree: one is the range search and paging search for the primary key, and the other is a random search starting from the root node.

Differences in indexes between InnoDB and MyISAM 1. InnoDB- primary key index

The leaf node stores specific row data.

2. InnoDB- non-primary key index

The leaf node of a non-primary key index stores the primary key value (so the query data basically goes back to the table)

3. MyISAM

The leaf node stores the address of the row data, requiring an extra address and an extra IO.

On "what are the reasons for mysql index using B+ tree structure" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.