How to use B+ Tree in MySQL 07/15 Update SLTechnology News&Howtos

How to use B+ Tree in MySQL

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces how to use B+ tree in MySQL, the content is very detailed, interested friends can refer to, hope to be helpful to you.

First of all, it needs to be clarified that MySQL is not directly related to the B+ tree. What is really related to the B+ tree is that the storage engine in InnoDB,MySQL, the default storage engine of MySQL, is responsible for data storage and extraction. In addition to InnoDB, MyISAM is also supported as the underlying storage engine for tables in MySQL.

We can specify the storage engine for the current table when we use the SQL statement to create the table. You can find all the storage engines supported by it in MySQL's document Alternative Storage Engines, such as MyISAM, CSV, MEMORY, etc. However, by default, using the SQL statement shown below to create a table will get the table supported by the InnoDB storage engine:

CREATE TABLE T1 (

An INT

B CHAR (20

), PRIMARY KEY (a)) ENGINE=InnoDB

Readers who want to learn more about the MySQL default storage engine can learn about InnoDB storage methods, indexes, and locks through the previous articles "simple and simple" MySQL and InnoDB. We will not cover too much about InnoDB here.

In fact, the question we are going to analyze today is why InnoDB, MySQL's default storage engine, uses MySQL to store data. I believe people who know a little bit about MySQL know that both the data in the table (primary key index) and the secondary index will eventually use the B+ tree to store the data, in which the former will be stored in the table, while the latter will be stored in the same way, which is easier to understand:

In the primary key index, id is the primary key, and we can find all the columns of the row through id.

In the secondary index, several columns in the index form the key. We can find the id through the columns in the index, and if necessary, we can find the entire contents of the current data row through id.

For InnoDB, all data is stored as key-value pairs. Primary and secondary indexes store data with id and index as keys, and all columns and id as key values.

Before specifically analyzing the reasons behind InnoDB's use of B+ tree, we need to find several "imaginary enemies" for B+ tree, because if we have only one choice, then choosing B+ tree is not worth discussing. The two imaginary enemies found are B-tree and hash. I believe this is also the real problem that many people will encounter in the interview. We take these two data structures as examples to analyze and compare the advantages of B+ tree.

Design

At this point, we have identified the issue to be discussed today, that is, why MySQL's InnoDB storage engine chooses B + tree as the underlying data structure instead of B tree or hash? In this section, we will introduce the reasons for InnoDB's choice in the following two aspects.

Scenarios and functions that InnoDB needs to support require strong performance on specific queries

It takes a lot of time for CPU to load data from disk into memory, which makes B+ tree a very good choice.

Data persistence and query of persistent data is actually a common requirement, and data persistence requires us to deal with disk, memory and CPU. As the database of OLTP, MySQL needs not only transaction processing ability, but also data persistence and real-time data query ability. These requirements together determine the choice of B+ tree. Next, we will analyze the logic behind the above two reasons in detail.

Read and write performance

Many people may not know the word OLTP very well. We help you quickly understand that compared with OLTP, there are OLAP, which are Online Transaction Processing and Online Analytical Processing. From these two names, we can see that the former refers to the traditional relational database, which is mainly used to handle basic, day-to-day transactions, while the latter is mainly used in data warehouses to support some complex analysis and decision-making.

As the storage engine that supports OLTP databases, we often use InnoDB to do the following:

Add, modify and delete data in the table through INSERT, UPDATE and DELETE statements

Batch deletion of qualified data through UPDATE and DELETE statements

Query all columns of a record through SELECT statements and primary keys

Use the SELECT statement to query the table for records that meet certain criteria and sort them according to certain fields

Query the number of rows of data in the table through the SELECT statement

Guarantee the uniqueness of a field or fields in a table through a unique index

If we use the B+ tree as the underlying data structure, then the time complexity of all SQL that can only access or modify one piece of data is O (log n), that is, the height of the tree, but using hashing is possible to achieve O (1) time complexity, doesn't it look particularly good? But when we use the SQL shown below, the hash will not perform so well:

SELECT * FROM posts WHERE author = 'draven' ORDER BY created_at DESC

SELECT * FROM posts WHERE comments_count > 10

UPDATE posts SET github = 'github.com/draveness' WHERE author =' draven'

DELETE FROM posts WHERE author = 'draven'

If we use hash as the underlying data structure, when we encounter the above scenario, the primary key index or secondary index composed of hash may not be able to deal with it quickly, and its performance for processing range queries or sorting will be very poor. We can only scan the full table and determine whether the conditions are met in turn. A full table scan is a very bad result for the database, which means that the data structure we use has no other effect on these queries, and the final performance may not be as good as matching sequentially from the log.

The use of B+ tree actually ensures that the data is stored in the order of keys, that is, all the adjacent data are actually arranged in a natural order, but this effect cannot be achieved by using hash, because the purpose of the hash function is to make the data as scattered as possible in different buckets for storage, so when there may be the same key author = 'draven or sort and range query comments_count > 10:00 Tables with hashes as the underlying data structure may face the nightmare of database queries-full table scans.

B-tree and B + tree actually have some similarities in data structure, they can traverse the contents of the index in a certain order, for operations such as sorting and range query, B-tree and B + tree will bring better performance than hashing, of course, if the index is not good enough or the SQL query is very complex, it will still lead to full table scan.

Compared with B tree and B + tree, the hash table as the underlying data structure can handle the addition, deletion, modification and query of a single data row at the speed of O (1), but the range query or sorting will lead to the result of full table scan. Although B tree and B + tree need O (log n) time in the addition, deletion, query and modification of single data rows, it will store the data similar to the index column in order, so it can avoid full table scanning.

Data loading

Since hashing cannot handle operations such as sorting and range queries in our common SQL, and B-tree and B-tree and B +-tree can execute these queries relatively efficiently, why not choose B-tree? The reason is actually very simple-the computer loads data into memory on a page-by-page basis when reading and writing files. The page size may vary depending on the operating system, but in most operating systems, the page size is 4KB, and you can get the page size on the operating system by using the following command:

$getconf PAGE_SIZE

4096

The page size of the author's macOS system is 4KB, and of course it is possible to get different results on different computers.

When we need to query the data in the database, CPU will find that the current data is on disk rather than in memory, which will trigger the Ibind O operation to load the data into memory for access. The data is loaded in the dimension of the page. However, the cost of reading the data from disk to memory is very high. Normal disk (non-SSD) loads data through queuing, seeking, rotation, and transfer, which takes about 10ms time.

When we estimate the query of MySQL, we can use the order of magnitude of 10ms to estimate the time occupied by random Icano. What we want to say here is that the performance of random Icano will have a great impact on the query performance of MySQL, and the speed of sequentially reading data from disk can reach 40MB/s. The performance difference between the two is of several orders of magnitude, so we should try our best to reduce the number of random Imax O in order to improve performance.

The biggest difference between B-tree and B + tree is that B-tree can store data in non-leaf nodes, but all the data of B + tree is actually stored in leaf nodes. When the data structure at the bottom of a table is B-tree, suppose we need to access all "data greater than 4 and less than 9":

Without considering any optimization, in the simple B-tree above, we need to do 4 random disk I-O to find all the data rows that meet the criteria:

Loading the page where the root node is located, it is found that the first element of the root node is 6, which is greater than 4

Load the page where the left child node is located through the pointer of the root node, traverse the data in the page, and find 5

Reload the page where the root node is located and find that the root node does not contain the second element

Load the page where the right child node is located through the pointer of the root node, and traverse the data in the page to find 7 and 8

Of course, we can optimize the above process in a variety of ways, but B-tree can optimize B + tree basically, so we don't need to consider the benefits of optimizing B-tree. Let's just see what kind of optimized B + tree can do, but B-tree can't.

Because all nodes may contain target data, we always have to traverse the subtree from the root node to find the data rows that meet the conditions. This feature brings a large number of random I-map O, which is also the biggest performance problem of B-tree.

This problem does not exist in the B+ tree, because all the data rows are stored in the leaf nodes, and these leaf nodes can be connected sequentially through the "pointer". When we traverse the data in the B+ tree shown below, we can jump directly between multiple child nodes, which can save a lot of disk Ihand O time, and there is no need to splice and sort the data between nodes at different levels. Through the leftmost leaf node of a B+ tree, we can traverse all the data in the whole tree like a linked list, or we can introduce a two-way linked list to ensure the performance of reverse traversal.

Some readers may think that using the data structure of B+ tree will increase the height of the tree and increase the overall time-consuming. However, a B+ tree with a height of 3 can store tens of millions of levels of data. In practice, the height of B+ tree is only 4 or 5 at most, so this is not a fundamental problem affecting performance.

On how to use the B+ tree in MySQL to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.