Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the role of B + tree in database index?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article is to share with you about the role of B+ tree in the database index. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Let's take a look at it with the editor.

1. Review of B-trees and B + trees 1.B-trees

B-tree (Multipath search Tree) is a common data structure. The use of B-tree structure can significantly reduce the intermediate process of locating records, thus speeding up access speed. According to the translation, B is usually regarded as the abbreviation of Balance. This data structure is generally used for the index of the database, and the comprehensive efficiency is high.

Each node of the B-tree stores key and data, all nodes make up the tree, and the leaf node pointer is null.

Characteristics of B-tree:

The root node has at least two children.

Each non-root node has [, M] children.

Each non-root node has [- 1m Mmur1] keywords, which are arranged in ascending order.

The value of the child node between key [I] and key [I + 1] is between key [I] and key [iTun1].

All the leaf nodes are on the same layer.

Advantages of B-tree:

The advantage of B-tree lies in multi-path search, which is the specific reason why it is better than red-black tree. Think about it. B-tree has multiple key for each node, while red-black tree has one key for each node. Then with the increasing data, the height of red-black tree continues to increase and the efficiency decreases, while the height of B-tree is generally very low. Why? Because a node of B tree can put N key,. Only when all of them are full will they split once! Why does the B-tree split? Because with the increase of data, the key of a node is full, in order to maintain the characteristics of the B-tree, it will split, just like the red-black tree and the AVL tree need to rotate in order to maintain the nature of the tree!

2. B + tree

B+ tree is a variant of B-tree and a kind of multipath search tree, and its definition is basically the same as that of B-tree. The leaf node on the B+ tree stores the keyword and the address of the corresponding record, and each layer above the leaf node is used as an index.

B+ tree: only the leaf node stores data, the leaf node contains all the key values of the tree, and the leaf node does not store pointers.

Characteristics of B+ tree:

All keywords appear in the linked list of leaf nodes (dense index), and the keywords in the linked list happen to be ordered

It is impossible to hit at a non-leaf node.

A non-leaf node is equivalent to an index (sparse index) of a leaf node, and a leaf node is a data layer that stores (keywords) data.

More suitable for file indexing system

The difference between B+Tree and B-tree:

The upper limit of the pointer per node is 2d instead of 2d+1

The inner node does not store data, but only key, that is, all keywords appear in the leaf node.

The leaf node does not store pointers, but adds a chain pointer to all leaf nodes.

Advantages of B+ tree:

A sequential access pointer is added to the B+ tree, that is, each leaf node adds a pointer to the adjacent leaf node, so that a tree becomes the preferred data structure for the database system to implement the index. There are many reasons, the most important is that the tree is short and fat, generally speaking, the index is very large, often stored in the form of index files on the disk, the index search produces disk Imax O consumption, which is several orders of magnitude higher than memory access, so the most important index to evaluate the quality of a data structure as an index is the time complexity of the number of disk Icano operations in the search process. The smaller the height of the tree is, the less the number of Ihammer O is. Then why is it a B + tree instead of a B tree? because the nodes in it do not store data, so one node can store more key.

Second, what is the decisive factor of index search speed?

Because most of the data in the database is stored on disk, generally speaking, the index itself is so large that it is impossible to store it all in memory, so the index is often stored on disk in the form of index files. In this way, the disk Imax O consumption will be generated in the index search process, which is several orders of magnitude higher than the memory access. Therefore, the most important index to evaluate the quality of a data structure as an index is the time complexity of the number of disk Imax O operations in the search process. The smaller the height of the tree is, the less the number of Ihammer O is. In other words, the structure of the index should minimize the number of access to disk Imando O during the lookup process.

The height of binary tree is too deep to perform disk IO for many times, resulting in low query efficiency, while each node in B-tree and B + tree contains at most m children, so the height of B-tree and B + tree is lower than that of binary tree, B-tree and B + tree.

Storage engine in MySQL

In MySQL, the two most commonly used storage engines are MyISAM and InnoDB, which are two generations of search engines of MySQL.

They implement the index differently. MyISAM data stores data address, index and data separately. What InnoDB data stores is the data itself, and the index is also data.

Indexes are divided into primary indexes and secondary indexes: generally, those indexed by primary keys are called primary indexes, while those indexed by other keys are called secondary indexes.

4. MyISAM is realized by using B+ tree.

Main index:

As you can see from the figure above, the col1 is the primary key, while the data stored by the leaf node is an address, which is found by the address.

Secondary index (unlike the primary index, the key of the secondary index is repeatable):

5. InnoDB is realized by using B+ tree.

Main index:

Note that, unlike MyISAM, the data field of the leaf node holds all the data.

Secondary index:

If you look carefully at the difference between the secondary index and the primary index, the leaf node of the secondary index holds the primary key; this is the biggest difference between MyISAM and InnoDB.

6. Where is InnoDB better than MyISAM?

Since MyISAM and InnoDB are two-generation engines of MySQL, there is bound to be an upgrade, and InnoDB is the latest generation, so what's its advantage?

Just imagine, both MyISAM and InnoDB are implemented on the basis of the B + tree, and the difference from the B tree has already been mentioned, that is, the separation of data domains and nodes.

On the other hand, MyISAM separates the index from the file. The data field of the leaf node of the B + tree stores the address of the file content, and the Btree of both the primary index and the secondary index is the same, so if I change an address, will all index trees have to be changed? as we said earlier, frequent read and write operations on disk are very inefficient, and this block does not apply the local principle, because logically adjacent nodes It is not necessarily physically adjacent, so it will lead to a reduction in efficiency.

As a result, InnoDB came into being, which allows the data fields of the leaf nodes of the secondary index except the primary index to keep the primary key, first find the primary key through the secondary index, and then find all the data of the leaf node through the primary key, which sounds like a lot of trouble, traversing two trees, but in this way, if there is a change, only the primary index will be changed, and the other auxiliary indentation will not be moved. The key of each node of the tree in the database is not as small as we give. Imagine that if a node has 1024 key, then the B + tree with a height of 2 has 1024 key, so the height of the general tree is very low, so the consumption of traversing the tree is almost negligible!

7. Summary 1. Why use B+ trees?

The files are too large to be stored in memory, so they have to be stored on disk.

The structure and organization of the index should minimize the number of access to disk Imax O during the search process (why you use B-/+Tree is also related to the principle of disk access, see the following analysis)

Locality principle and disk pre-reading, the length of pre-reading is generally an integral multiple of the page (page size is usually 4k in many operating systems)

The database system skillfully makes use of the principle of disk pre-reading, setting the size of a node to be equal to a page, so that each node can be fully loaded with only one Ibank O (because there are two arrays in the node, so the address is continuous). On the other hand, the structure of the red-black tree is obviously much deeper. Because logically close nodes (father and son) may be physically far away, locality cannot be taken advantage of.

two。 Why is B + tree more suitable for indexing than B tree?

The disk read and write cost of B+ tree is lower.

The internal node of B+ has no pointer to the specific information of the keyword, that is, the internal node does not store data. Therefore, its internal node is smaller than that of B-tree. If all the keywords of the same internal node are stored in the same disk, the more keywords the disk can hold. The more keywords you need to find when you read them into memory at once. Relatively speaking, the number of IO reads and writes has been reduced.

The query efficiency of bounded-tree is more stable.

Because the non-endpoint is not the node that ultimately points to the content of the file, but only the index of the keyword in the leaf node. Therefore, any keyword search must take a road from the root node to the leaf node. The path length of all keyword queries is the same, resulting in the same query efficiency for each data.

The difference between MyISAM and InnoDB in the two indexes of 3.MySQL

MyISAM is non-transactional secure, while InnoDB is transactional secure.

The granularity of MyISAM locks is table-level, while InnoDB supports row-level locks

MyISAM supports full-text indexing, while InnoDB does not support full-text indexing

MyISAM is relatively simple and superior to InnoDB in efficiency. Small applications can consider using MyISAM.

The MyISAM table is saved as a file, which is more convenient to use across platforms.

MyISAM manages non-transactional tables, provides high-speed storage and retrieval, as well as full-text search capabilities, and can be selected if a large number of select operations are performed in the application

InnoDB is used for transaction processing and has features such as ACID transaction support. If you perform a large number of insert and update operations in an application, you can choose.

The above is the role of B+ tree in the database index. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report