Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the B + tree of database

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "how to understand the B+ tree of the database". In the operation process of the actual case, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

1 What are the differences between reading and writing data from disk and memory

We usually have access to mechanical hard drives and solid state drives. Memory belongs to semiconductor devices, for memory, we know the memory address can get data through the address, that is, the random access characteristics of memory. Access speed is fast but expensive, so memory space is generally small.

For magnetic disks, it belongs to mechanical devices. Whenever the disk accesses data, it needs to wait for the disk to rotate to the head before reading the corresponding data. Even if the disk rotates very fast, it is still slag compared with random access of memory. See, if it is random read and write, its performance gap is very large. If it is sequential access to a large amount of data, the performance of the disk and memory gap is not big, why is this?

The minimum read and write unit of a disk is a sector, and now the disk sector is generally 4k bytes. For the operating system, multiple sectors will be read at a time, and the minimum read unit of the operating system is a block. Every time we read a piece of data from disk, the operating system reads the entire block at once, so disk efficiency is much higher for a large number of sequential reads and writes than random reads and writes.

Suppose you now have an ordered array, all stored in blocks on disk, and now we find element A by binary search. First we find the middle element, pull it out of the block, put it from disk into memory, and then do a binary search in memory. In the next step, if the element we are looking for is in another block, we need to continue reading from disk into memory. This repeated from disk to memory, its efficiency will be very low. So we need to figure out how to keep disk access as low as possible.

2 Data and index separation

Take the police system as an example. There are a lot of users in the system. In addition to basic information such as name and age, each user also has a unique ID. We can know the corresponding basic information when we get this ID. However, the basic information of each user is too much to be stored in memory, so consider storing it on disk.

user data

Using an ordered array, where the user ID and the location of the disk where the user information is stored are stored, so I only need to store two elements directly in memory. as shown in the following figure

sorted arrays

But in scenarios where the data changes frequently, the drawbacks of ordered arrays arise. In most cases, consider binary search trees or hash tables. However, hash tables do not support interval queries, so binary search trees are used more often. as shown in the following figure

Insert picture description here

If there are too many indexes and they still cannot be stored completely in memory, can we consider storing the index on disk as well? How to efficiently organize index structure on disk? This introduces the B+ tree.

2 B+ trees

Make node size equal to block size

We know that when the operating system accesses the disk, it usually reads in blocks. If the data you currently need to read is only a few bytes, but the disk will still read the entire block, is it inefficient to read and write? In the B+ tree, the big boss uses to make the size of a node equal to the size of a block. The node stores not an element, but an ordered array. This makes full use of the routine of the operating system to maximize the reading efficiency.

Internal nodes and leaf nodes

Although internal nodes and leaf nodes have the same structure, they store different contents. The inner node stores keys and pointers that maintain the tree structure, and it does not store the data corresponding to the key. Leaf nodes store keys and corresponding data, but do not store pointers that maintain the tree structure, thus maximizing the use of node space.

Internal nodes and leaf nodes

B+ tree uses double linked list, which has good range query ability and flexible adjustment ability.

In conclusion, B+ tree is a perfectly balanced tree of order m.

multitree of order m

3 Retrieval scheme of B+ tree

Just now I bragged about how awesome the B+ tree was. How did I retrieve it? The specific search process is like this: we first determine which two adjacent elements of the array we are looking for are located between, and then we read the pointer corresponding to the first element to obtain the position of the next block. After reading the node data of the next block, we will do the same to it. In this way, the B+ tree accesses the internal nodes layer by layer until the leaf nodes are read. For the array in the leaf node, we can determine whether the element we are looking for exists by directly using the binary search algorithm. If it exists, we can get the stored data corresponding to the query value. If this data is a pointer to the location of the details, then we need to access the disk again to read the details.

A B+ tree is an m-th order multitree, so a node in a B+ tree can store an array of m elements, ok, so that only a few layers of b+ trees can index a large number of data. For example, a 2k node can store 200 elements, so a four-level B+ tree can store 200^4, or 1.6 billion elements.

If there are only four layers, it means that we can access the disk at most 4 times. Assuming that each node is currently 2k, then the first layer has one node, which is 2k, and the second layer has a maximum of 200 elements, which is 0.8M. The third layer is 200^2, which is 40000 nodes, totaling 80M. For the current computer, we can store the first three layers in memory, only need to store the fourth layer in disk, so we only need to deal with the disk once to break up, that is, the interview wants to know why it is divided into internal nodes and leaf nodes.

4 How B+ trees adjust dynamically

The above introduces the structure and query principle of B+ tree. Now let's see how B+ tree is added and deleted.

Now let's take a B+ tree with three elements as an example. Suppose we want to insert an element with ID 6=5. The first step is to find the corresponding leaf node. If the leaf node is not full, insert it directly.

Insert element 6

What if the element we inserted was 10? Logically we should insert after 9, but the node is full, so we need to do something else. The method is to split this leaf node, that is, generate a new node, and then divide the data equally between the two nodes.

node splitting

Obviously, the splitting of the leaf node affects the parent node. If the parent node is also full, it will also split.

node splitting

"How to understand the database B+ tree" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report