How to understand B-Tree 07/02 Update SLTechnology News&Howtos

How to understand B-Tree

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I would like to talk to you about how to understand B-Tree. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something from this article.

B-Tree is what we often call B-tree, must not be read as B minus tree, otherwise it will be very humiliating. The data structure of B-tree is often used to implement database index because of its high search efficiency.

Disk IO and pre-read

Disk reading depends on mechanical motion, which is divided into three parts: seek time, rotation delay and transmission time. The sum of these three parts is the time of a disk IO, about 9ms. This cost is about 100, 000 times that of accessing memory; it is precisely because disk IO is a very expensive operation that the computer operating system optimizes it: read ahead; with each IO, not only the data from the current disk address is loaded into memory, but also adjacent data is loaded into the memory buffer.

Because the principle of local pre-reading states that when you access an address data, the data adjacent to it will soon be accessed. Each time the disk IO reads data, we call it a page. The size of a page is related to the operating system, usually 4k or 8k. This means that when reading data within a page, a disk IO actually occurs.

Comparison between B-Tree and binary search Tree

We know that the time complexity of binary search tree query is O (logN), the fastest search speed and the least number of comparisons. Since the performance is already so good, why the index is implemented using B-Tree instead of binary search tree is the key factor is the number of disk IO.

The database index is stored on disk, and when the amount of data in the table is relatively large, the size of the index increases to a few gigabytes or more. When we use the index to query, it is impossible to load all the indexes into memory, we can only load each disk page one by one, where the disk page corresponds to the node of the index tree.

Unary and binary tree

Let's first look at the times of disk IO in binary tree lookups: define a binary tree with a height of 4, with a lookup value of 10:

First disk IO:

Second disk IO

Third disk IO:

Fourth disk IO:

From the look-up process of the binary tree, both the height of the tree and the number of disk IO are 4, so in the worst case the number of disk IO is determined by the height of the tree.

From the previous analysis, to reduce the number of disk IO, we must compress the height of the tree and let the thin tree become a chunky tree as much as possible, so B-Tree was born under such a great background.

II. B-Tree

The m-order B-Tree satisfies the following conditions:

1. Each node has a maximum of m subtrees

2. The root node has at least 2 subtrees

3. The branch node has at least 2 subtrees (except the root node and the leaf node).

4. All leaf nodes are on the same layer, and each node can have at most 1 key in ascending order.

Here is a third-order B-tree to observe the process of finding element 21:

First disk IO:

Second disk IO:

Here's a memory comparison: compare with 3 and 12, respectively.

Third disk IO:

Here's a memory comparison, with 14 and 21, respectively.

From the search process, it is found that the number of comparisons of B-tree and the number of disk IO are not much different from that of binary tree, so it does not seem to have any advantage.

But if you take a closer look, you will find that the alignment is done in memory, does not involve the disk IO, and the time consuming is negligible. In addition, species B can store a lot of key in one node (the number is determined by the tree order).

The number of nodes generated by the same number of key in the B-tree is much less than that in the binary tree, and the difference in the number of nodes is equal to the number of disk IO. After reaching a certain number, the difference in performance becomes apparent.

Third, the addition of B-tree

Add element 4 to the previous one, which should be between 3 and 9:

IV. Deletion of B-tree

Delete element 9:

Inserting or deleting elements will cause nodes to fission, which is sometimes very troublesome, but it is precisely because of this that B-tree can always maintain a multi-path balance, which is also an advantage of B-tree itself: self-balancing; B-tree is mainly used in the file system and some database indexes, such as MongoDB, most relational database indexes are implemented using B+ tree.

After reading the above, do you have any further understanding of how to understand B-Tree? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.