Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use B+Tree in MySQL

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article shows you how to use B+Tree in MySQL. The content is concise and easy to understand. It will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

B + tree

B + tree is actually an m-ary balanced search tree (not a binary tree).

Balanced search tree definition: the height difference between the left and right subtrees of any node in the tree cannot be greater than 1

/ * this is the definition of a non-leaf node in a B+ tree. * * suppose keywords= [3 INF,3 5, 8 score 10] * 4 key values divide the data into 5 intervals: (- INF,3), [3 force 5), [5 line 8), [8 line 10), [10 line INF) * 5 intervals correspond to: children [0]... children [4] * * m is calculated beforehand. The calculation is based on the fact that the size of all the information is exactly equal to the size of the page: * PAGE_SIZE = (PAGE_SIZE 1) * 4 [keywordss size] + MF8 [children size] * / public class BPlusTreeNode {/ / 5 tree public static int m = 5 / / key value, used to divide the data interval public int [] keywords = new int [m-1]; / / save the child node pointer public BPlusTreeNode [] children = new BPlusTreeNode [m];} / * * this is the definition of the leaf node in the B+ tree. * the leaf node in the B+ tree is different from the internal node. * the leaf node stores values, not intervals. * in this definition, each leaf node stores the key value and address information of three data rows. * * the k value is calculated in advance on the basis that the size of all the information is exactly equal to the size of the page: * PAGE_SIZE = Know4 [keyw.. Size] + KFL8 [dataAd.. Size] + 8 [prev size] + 8 [next size] * / public class BPlusTreeLeafNode {public static int k = 3; / / the key value of the data public int [] keywords = new int [k]; / / data address public long [] dataAddress = new long [k]; / / the precursor node of this node in the linked list public BPlusTreeLeafNode prev; / / the successor node of this node in the linked list public BPlusTreeLeafNode next;}

In a B+ tree, the nodes in the tree do not store the data itself, but only as indexes. In addition, all recorded nodes are stored in the same layer of leaf nodes in order of size, and each leaf node is connected by a pointer.

To sum up, the B + tree has the following characteristics

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Each node of a B + tree can contain more nodes for two reasons, one of which is to reduce the height of the tree (indexes are not all stored in memory, which may not be supported in memory, so the index tree is generally stored on disk. Just put the root node in memory. In this way, access to each node is actually access to the disk, and the height of the tree is equal to the number of disk IO operations each time the data is queried.), the other is to change the data range to multiple intervals. The larger the interval, the faster the data retrieval (you can imagine a jump table)

Instead of storing one key, each node stores multiple key

Leaf nodes store data, while other nodes are used for indexing

Leaf nodes are linked to each other by two pointers, resulting in better sequential query performance.

This design also has the following advantages:

The non-leaf nodes of the B + tree only store keys and take up very little space, so the range of data that each layer of the node can index is much wider. In other words, you can search for more data for each IO operation

The leaf nodes are connected in pairs, which accords with the pre-read characteristics of the disk. For example, leaf nodes store 50 and 55, which have pointers to leaf nodes 60 and 62. When we read the data corresponding to 50 and 55 from the disk, we will mention 60 and 62 by the way because of the pre-read nature of the disk. Read out the corresponding data. This is a sequential read, not a disk search, which speeds up the speed.

Support range query, local scope query is very efficient, each node can index larger and more accurate range, which means that B + tree single disk IO information is larger than B tree, and I / O efficiency is higher.

Because the data is stored in the leaf node layer and has pointers to other leaf nodes, the range query only needs to traverse the leaf node layer without traversing the entire tree.

Because of the gap between disk access speed and memory, disk I / O should be minimized in order to improve efficiency. Disks are usually not strictly read on demand, but are pre-read every time. After the disk reads the required data, it will read back a certain length of data in memory. The theoretical basis for this is the well-known local principle in computer science:

B-Tree

B-Tree is actually an m-ary balanced search tree.

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

All the key values are distributed throughout the tree.

All key values appear in one node

The search can end at non-leaf nodes.

In the complete keyword search process, the performance is close to binary search.

The difference between B-tree and B + tree

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

The non-leaf nodes in the B + tree do not store data, and all the data stored in the leaf nodes make the query time complexity fixed as log n.

The complexity of B-tree query time is not fixed, it is related to the position of the key in the tree, preferably O (1).

Because the leaf nodes of the B + tree are linked through a two-way linked list, the range query is supported and the efficiency is higher than that of the B tree.

The keys and data of each node of the B-tree are the same.

Why does MongoDB use BMurTreeMagneMysql and B+Tree?

The non-leaf nodes in the B + tree do not store data, and all the data stored in the leaf nodes make the query time complexity fixed as log n. The time complexity of B-tree query is not fixed, it is related to the position of the key in the tree, preferably O (1).

As we have said, having as few disks as possible IO is an effective way to improve performance. MongoDB is an aggregate database, while B-tree happens to be a cluster of key domains and data domains.

As to why MongoDB uses a B-tree instead of a B + tree, consider it from the perspective of its design. MongoDB is not a traditional relational database, but a nosql stored in BSON format (which can be thought of as JSON). The goal is high performance, high availability and easy scalability.

Mysql is a relational database, the most commonly used is data traversal operation (join), while MongoDB its data is more aggregated data, the relationship between tables is not as strong as Mysql, so MongoDB is more of a single query.

Because Mysql uses the B + tree, the data is on the leaf node, and the leaf nodes are connected by a two-way linked list, which is more convenient for data traversal, while MongoDB uses the B tree, and all nodes have a data field. As long as the specified index is found, it can be accessed. There is no doubt that the average query speed of a single query MongoDB is faster than Mysql.

Hash index

In short, hash indexing uses some kind of hash algorithm to convert key values into new hash values. There is no need to search step by step from the root node to the leaf node like the B + tree. Only need a hash algorithm, you can find the corresponding location immediately, the speed is very fast. Here you can think of HashMap in Java.

The difference between B+ Tree Index and Hash Index

1. If it is an equivalent query, the hash index obviously has an absolute advantage, because only one algorithm is needed to find the corresponding key value; of course, the premise is that the key value is unique, and if there is a hash conflict, the linked list must be traversed.

Hash indexing does not support range queries (but after modification, LinkedHashMap in Java saves the insertion order of nodes through linked lists, so you can also use linked lists to save the size order of the data)

two。 Although this supports range query, the time complexity is O (n), and the efficiency is lower than that of jump table and B+Tree.

3. Hash indexing cannot use index sorting and fuzzy matching

4.。 Hash indexing also does not support leftmost matching rules for multi-column federated indexes.

5. In the case of a large number of key value conflicts, the efficiency of Hash index is very low.

The above is how to use B+Tree in MySQL. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report