Definition and usage of B-tree in java 07/02 Update SLTechnology News&Howtos

Definition and usage of B-tree in java

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly introduces the definition and usage of B tree in java. In daily operation, I believe many people have doubts about the definition and usage of B tree in java. Xiaobian consulted all kinds of information and sorted out simple and easy to use operation methods. I hope to help you answer the doubts about the definition and usage of B tree in java! Next, please follow the small series to learn together!

B-trees and B+-trees

The development speed of computer is very fast, CPU, memory, graphics card and so on are no longer the bottleneck of computer performance, the emergence of SSD hard disk also makes the speed of hard disk reading and writing has a qualitative leap, but there is still a great gap compared with memory, which means that we design algorithms in memory environment, when it comes to hard disk reading and writing efficiency will be greatly reduced. For example, red-black trees, AVL trees, etc., because each node can only store one data, and each node has at most two child nodes, which means that when there is a lot of data, the height of the tree will be very large, which means that IO operations should be frequently performed. Even if it is an ordinary tree, each node can have multiple children, and it is either very large, or extremely high, or both, and it cannot get rid of the performance bottleneck caused by frequent IO operations. The B-trees and B+-trees we're going to look at today are solutions to this frequent IO scenario.

B-tree definition

We first need to understand the concept of multi-way search tree, which is defined as follows:

A multi-way search tree can have more than two children per node, and multiple elements can be stored at each node.

Compared with ordinary trees, a node of a multi-way lookup tree can no longer store only one element, which breaks our understanding of trees, but it is this feature that makes it excellent for solving IO problems. The B-tree we are studying is a multi-path search tree, which is defined as follows:

A B-tree is a balanced multi-path search tree, and the maximum number of children in a node is called the order of the B-tree.

A B-tree of order m has the following properties:

If the root node is not a leaf node, it has at least two subtrees.

Each non-root branching node has k − 1 elements and k children, where [m/2]≤ k ≤ m. Each leaf node n has k-1 elements, where [m/2]≤ k ≤ m.

All leaf nodes are at the same level.

All branch nodes contain the following information data ( n, A0, K1, A1, K2, A2,..., Kn, An ), where Ki( i=1, 2,..., n ) is a keyword, and Ki

< Ki+1( i=1, 2, …, n-1 ); Ai ( i=0, 2, …, n) 为指向子树根结点的指针，且指针Ai-1所指子树中所有结点的关键字均小于Ki( i=1, 2, …, n ) ，An 所指子树中所有结点的关键字均大于Kn，n ( [m/2]-1 ≤ n ≤ m-1 ) 为关键字的个数(或 n+1 为子树的个数)。这段定义一定让人感到费解吧，那我们就从B树的一个特例：2-3树作为切入点，来看看一个B树是如何构建和操作的。 2-3树是这样的一棵多路查找树：其中的每一个结点都具有两个孩子（称为2结点）或三个孩子（称为3结点）。它拥有如下属性：一个2结点包含一个元素和两个孩子（或没有孩子），和二叉排序树一致，左子树包含的元素小于该元素，右子树包含的元素大于该元素。但是这个2结点要么有两个孩子，要么没有孩子，不能只有一个孩子。一个3结点包含两个元素和三个孩子（或没有孩子），左子树、较小元素、中间子树、较大元素和右子树也按照从小到大排序。一个3结点要么有三个孩子，要么没有孩子。 2-3树的所有叶子结点都在同一层次上。按照这个描述，一棵正确的2-3树大概长这个样子：

inserted

Let's demonstrate this process by constructing a 2-3 tree, assuming that the initial data is {1, 7, 4, 9, 15, 13, 6, 5, 8, 10, 3, 12, 14, 2, 11}. Now that the tree is empty, to insert a 1 into it, you just need to construct a 2 node, as follows:

Next insert element 7, simply upgrade the current node to 3 nodes, as follows:

Next insert 4, you can find that the root node is already 3 nodes, because it must be balanced, so you can only split the root node into 3 2 nodes, as follows:

When inserting 9, because 9 is larger than 4, it is inserted to the right, and the node where 7 is located can be upgraded to 3 nodes, so the insertion result is as follows:

Next, we need to insert 15, because the node 9 is already a 3 node, but its parent node 4 is a 2 node, so we can upgrade the node 4, because the 3 node must have three children, so the nodes 7 and 9 need to be split, as follows:

When 13 and 6 are inserted next, the corresponding nodes can be upgraded, so the insertion results are as follows:

When inserting element 5, we find that the node where element 6 is located is already 3 nodes, and the parent node, that is, the root node, is also 3 nodes. At this time, we can only split it again. First of all, the number between 5, 6, and 7 is 6, and we propose that it should be located between 4 and 9, as follows:

Because a 3-node can only have two elements, the root node must also be split, resulting in the following:

It can be seen that splitting the root node increases the height of the tree. Then insert 8, 10, 3 and repeat the steps as follows:

At this point, inserting elements 12, 14, and 2 is also very simple, and the result is as follows:

Finally insert 11, you can find that it is between 10 and 12, and the parent node is also 3 nodes, so 10 and 12 should be split, 9 and 13 should also be split, 11 should be upgraded to 3 nodes together with 6, the result is as follows:

delete

Now that we have built a 2-3 tree, let's demonstrate the deletion process in order of insertion. First delete element 1, because 1 is a 2 node, deleting it will affect the balance, but we find that its parent node is a 3 node, so we can split the parent node, 2 and 3 merged into a 3 node, the result is as follows:

Now, to delete 7, because 7 is a leaf node and also a 3 node, you can delete it directly. The result is as follows:

Delete node 4, because its left child is node 3, as long as it can be disassembled, the result is as follows:

Deleting 9 is more complicated, because its left and right children are 2 nodes, first merge its two children into 3 nodes and replace it, the result is as follows:

At this point the tree is unbalanced, and it is found that 3 and 6 on the left can be merged into 3 nodes, with the following results:

Then delete 15, delete it directly, and the result is as follows:

Deleting 13 is also more complicated. First, you need to merge its two children, and then use 11 as the root node to do a right-handed operation. The specific method is that the right child of 6 becomes the left child of 11, and then 6 becomes the parent node of 11. This is consistent with the right-handed operation of AVL trees. The result is as follows:

The next element to be deleted is the root node. The method is to first find its predecessor (the first element smaller than it) 5 to replace it. At this time, nodes 2 and 3 need to be merged. After merging, the left and right subtrees are no longer balanced, so 5 and 11 need to be merged. The result is as follows:

The rest of the deletion operations are actually similar to the previous ones. Here is no longer a demonstration. Interested parties can try it themselves and soon discover the rules.

summary

The 2-3 tree introduced here is a special case of B-tree, which extends the order of 2-3 tree to m, and its every node has the same characteristics as 2-3 tree. The pointer field and data field of every node except leaf node must be filled. So how does B-tree solve IO access problems? Suppose we have a B-tree of order 1001, that is, each node can store 1000 data and 1001 pointers, then at the level of height 2, the data that can be stored is 1001X1000, and the number of pointers it is 1001X1001. These pointers can point to 1001X1000 data, which is about 1 billion pieces of data. This means that as long as we keep the root node in memory, accessing the billion pieces of data requires at most two IO operations, which is unmatched by other structures.

What is the problem with B-trees? In actual use, the order is usually matched to the disk page size, that is, one page of data is read at a time, because the disk reads very fast continuously within the page, but relatively slowly between pages. That is its strength and precisely its problem. Assuming that each node is on a different page, we want to traverse it in order, which is roughly as follows:

The sequence is page 2-> page 1-> page 3-> page 1-> page 4. It can be seen that the node at page 1 will be visited many times, and the elements at this node will also be traversed many times, which will become inefficient, so the B-tree is not friendly to traversal. The B+ tree introduced next is an optimization for this problem.

B+ tree

The requirements for traversal mainly come from "scanning the library." For example, websites are flooded with various lists. If you use B-tree traversal, the efficiency is too low. The B+ tree is an improvement on the B tree, in which elements appearing in a branch node are listed again as if they were the mid-order successors (leaf nodes) of that branch node location, and each leaf node holds a pointer to the next leaf node. Here is a B+ tree:

For simplicity, the pointer fields on the left and right sides of leaf nodes are omitted. It is characterized by the fact that any non-leaf node appears again on a leaf node, and all leaf nodes are linked from left to right. In general, it also has the characteristics of a B-tree, but differs in two ways. The first is to find the element, even if the target value is found in the non-leaf node, it is only used for indexing, and it needs to continue to find its position in the leaf node. The second is that if you want to traverse, you only need to traverse the leaf node once. B+ tree structure is also very suitable for range search, just need to find the minimum value of the range location, and then traverse along the linked list.

Comparison of B tree and B+ tree

Both B-trees and B+-trees are disk-friendly data structures that significantly reduce disk access times. The advantage of B-trees is that data is stored in each node and can be accessed faster without having to go to leaf nodes. B-trees are more used in file systems. Each non-leaf node of the B+ tree only acts as an index, so the query must end at the leaf node, but it is very suitable for "sweeping the library" and interval search, and because most nodes are only used for indexes, they will not store real data, and will be more compact in memory. For example, Pinyin and Chinese characters in a dictionary are separated, and only dozens of pages are needed to obtain a complete Pinyin table, but if Pinyin and Chinese characters are mixed together, the entire dictionary is needed to obtain a complete index (Pinyin) table. These characteristics of the B+ tree make it more suitable for indexing databases.

At this point, the study of "the definition and usage of B tree in java" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.