Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A brief discussion on B-Tree Index and Index Optimization of MySQL

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

MySQL's MyISAM and InnoDB engines both use B+ tree indexes by default (both are displayed as "BTREE" when querying). This article discusses two issues:

Why do mainstream databases such as MySQL choose the index structure of the B+ tree? How to understand the common optimization ideas of MySQL index based on index structure?

Why can't all indexes be loaded into memory?

The choice of index structure is based on the property that when there is a large amount of data, the index cannot be fully loaded into memory.

Why can't all indexes be loaded into memory? Assuming that the index is organized using a tree structure, make a simple estimate:

Suppose a single index node 12BPM 1000W data rows, unique index, then the leaf node occupies a total of about 100MB, the whole tree has the most 200MB. Assuming that a row of data occupies 200B, the total amount of data is about 2G.

Suppose the index is stored in memory. In other words, every time 2G of data is saved on a physical disk, it takes up 200MB memory, and the index: the data footprint ratio is about 1GB 10. Is the occupation ratio of 1Accord 10 considered large? Physical disk is much cheaper than memory. Take a server with 16G hard disk and 1T hard disk as an example, if you want to store 1T hard disk, you need at least 100g of memory, which is much larger than 16g.

Considering that there may be multiple indexes, federated indexes, and smaller data row occupancy on a table, the actual occupancy ratio is usually greater than 1max 10 and can reach 1max 3 in some cases. In an index-based storage architecture, the index: the data footprint ratio is too high, so the indexes cannot be fully loaded into memory.

Problems with other structures

Since you cannot load memory, you must rely on disk (or SSD) storage. The read and write speed of memory is thousands of times faster than that of disk (related to the specific implementation), so the core problem is "how to reduce the number of disk reads and writes".

First of all, regardless of the page table mechanism, assuming that each read and write penetrates directly to the disk, then:

Linear structure: average read / write O (n) times binary search tree (BST): average read / write times O (log2 (n)); if the tree is unbalanced, the worst read / write O (n) times self-balanced binary search tree (AVL): add a self-balancing algorithm based on BST, read / write maximum O (log2 (n) times red-black tree (RBT): another self-balanced search tree, read / write maximum O (log2 (n) times)

BST, AVL and RBT optimize the number of reads and writes from O (n) to O (log2 (n)); among them, AVL and RBT have more self-balancing functions than BST, reducing the number of reads and writes to the maximum O (log2 (n)).

Assuming that the self-increasing primary key is used, the primary key itself is ordered, and the read and write times of the tree structure can be optimized to the tree height, and the lower the tree height is, the less the read and write times are; the self-balance ensures the stability of the tree structure. If you want to optimize further, you can introduce B-tree and B + tree.

What problem does B-tree solve?

Many articles mistakenly call B-tree as B-(minus) tree, which may be a misunderstanding of its English name "B-Tree" (what's more, B-tree is called binary tree or binary search tree). Especially when talking with B + trees. Take it for granted that if there is a B + (plus) tree, there is a B-(minus) tree. In fact, the English name of the B + tree is "B+-Tree".

If you put aside the maintenance operation, then the B-tree is like an "m-fork search tree" (m is the maximum number of subtrees), and the time complexity is O (logm (n)). However, the B-tree designs an efficient and simple maintenance operation, which keeps the depth of the B-tree between about log (ceil (m _ map 2)) (n) ~ logm (n), which greatly reduces the height of the tree.

Emphasize again:

Don't worry about the time complexity, unlike the simple algorithm, the number of disk IO is a bigger factor. Readers can deduce that the time complexity of B-tree is the same as that of AVL, but because B-tree has fewer layers and disk IO times, the performance of B-tree is better than that of AVL and other binary trees in practice.

Similar to the binary search tree, each node stores multiple key and subtrees, and the subtrees and key are arranged in order.

The directory of the page table is extended external memory + accelerated disk read and write, a page (Page) is usually 4K (equal to the size of disk data block block, see the analysis of inode and block), each time the operating system loads the content from disk to memory (in order to share the cost of seeking), modify the page, and then choose to write the page back to disk. Considering the good nature of the page table, the size of each node can be about equal to one page (making m very large), so that each loaded page can completely cover one node so that the next layer subtree can be selected; the same is true for the subtree. For the page table, AVL (or RBT) is equivalent to the B-tree of a key+2 subtree, because logically adjacent nodes are usually not physically adjacent, so reading into a 4k page, most of the space in the page will be invalid data.

Assuming that both key and subtree node pointers occupy 4B, then the B-tree node maximum m * (4 + 4) = 8MB; page size 4KB. Then m = 4 * 1024 / 8m = 512, a 512-fork B-tree, 1000W data, the maximum depth log (512x2) (10 ^ 7) = 3.02 ~ = 4. The depth of comparing binary trees such as AVL is log (2) (10 ^ 7) = 23.25 ~ = 24, which is more than 5 times different. A great shock! The depth of the B-tree index is so deep!

In addition, the B-tree is very friendly to the locality principle. If the key is small (such as the self-incrementing key of the 4B above), the cache can be further pre-read accelerated in addition to the addition of the page table. Meizi ~

What problem does B+ tree solve?

The residual problem of B-tree

However, there are some problems with B-tree if it is actually applied to the index of the database:

Unlocated data rows cannot handle range queries

Question 1

The record of the data table has multiple fields, and it is not enough to navigate to the primary key, but also to the data row. There are three solutions:

Directly store the data rows (possibly multiple rows) corresponding to the key in the child node. Data rows are stored separately; a field is added to the node to locate the location of the data row corresponding to the key. Modify the judgment logic of key and subtree so that the subtree is greater than or equal to the previous key less than the next key, and eventually all visits will fall on the leaf node; the location where data rows or data rows are directly stored in the leaf node.

Scheme 1 pass directly, storing data rows will reduce the number of subtrees in the page, and m will decrease and the tree height will increase.

In scenario 2, a field is added to the node. Assuming a pointer of 4B, the new m = 4 * 1024 / 12m = 341.33 ~ = 341and the maximum depth log (341amp 2) (10 ^ 7) = 3.144.

In scheme 3, the node m and depth remain the same, but the time complexity becomes stable O (logm (n)).

Option 3 can be considered.

Question 2

In the actual business, the frequency of range query is very high, B-tree can only locate to one index location (may correspond to multiple rows), it is difficult to deal with range query. The minor changes are 2.

Scheme:

No change; when querying, check the left bound first, then the right bound, and then DFS (or BFS) traverses the nodes between the left bound and the right bound. On the basis of "problem 1-scheme 3", because all the data rows are stored in the leaf node, the leaf node of the B-tree itself is ordered, and a pointer can be added to point to the next leaf node in the primary key order of the current leaf node; when querying, check the left bound first, then check the right bound, and then traverse linearly from the left bound to the bounded one.

At first glance, it feels that option 1 is better than option 2-the time complexity and constant terms are the same, and option 1 does not need to be changed. But don't forget the locality principle, regardless of whether the data row or the location of the data row is stored in the node, the advantage of scenario 2 is that the page table and cache can still be used to pre-read the information of the next node. However, scheme 1 faces the shortcomings of logical adjacency and physical separation of nodes.

Lead to B + tree

To sum up, scheme 2 of question 1 and scheme 1 of question 2 can be integrated into one scheme (index based on B-tree), and scheme 3 of question 1 and scheme 2 of question 2 can be integrated into one (index based on B + tree). In fact, some databases and file systems use B-trees and some use B + trees.

For reasons that some monkeys don't understand, most mainstream databases, including MySQL, choose B+ trees. That is:

The main changes are as follows:

Modify the organization logic of key and subtree, drop index access to leaf nodes, string leaf nodes sequentially (convenient range query)

The process of adding, deleting and searching B-tree and B + tree

For the time being, the process of adding and deleting B-tree can refer to the section "6, insert and delete operation of B-tree" from B-tree, B + tree and B * tree to R-tree. I will not repeat it here.

Mysql index optimization

According to the nature of B+ tree, it is easy to understand all kinds of common MySQL index optimization ideas.

Don't consider the difference between different engines for the time being.

Give priority to using self-increasing key as primary key

In the previous analysis, assuming that the self-increasing key of 4B is used as the index, m can reach 512 and the layer height is only 3. There are two benefits to using self-increasing key:

The self-increasing key is generally of integer type such as int, and the key is relatively compact, so that m can be very large, and the index occupies a small space. In the most extreme example, if a 50B varchar (including length) is used, then m = 4 * 1024 / 54m = 75.85 ~ = 76, and the maximum depth log (76amp 2) (10 ^ 7) = 4.43 ~ = 5, coupled with the cost of missing cache and string comparison, the time cost increases greatly. At the same time, the key has grown from 4B to 50B, and the space footprint growth of the entire index tree is also extremely scary (if the secondary index uses the primary key to locate the data row, the space growth is even more serious).

The nature of self-increment makes the insertion request of new data rows inevitably fall to the far right of the index tree, and the frequency of node splitting is low. Ideally, the index tree can reach a "full" state. The index tree is full, on the one hand, the height of the layer is lower, on the other hand, the frequency of node merging is lower when deleting nodes.

Optimization experience:

The monkey once used the column of varchar (100) as the primary key to store containerId. After 3 or 4 days, the 100G database was full. Miss DBA politely expressed her disdain for me in her email. After that, the self-increment column is added as the primary key and containerId as the secondary index of unique. The time and space optimization effect is quite significant.

Leftmost prefix match

An index can be as simple as a column (a) or as complex as multiple columns (a, b, c, d), that is, a federated index. If it is a federated index, then key also consists of multiple columns. At the same time, the index can only be used to find out whether key exists (equal). If you encounter a range query (>, 3 and d = 4), it will hit a, b, c in turn on each node, but will not hit d. That is, the leftmost prefix matching principle.

=, in automatic optimization sequence

Regardless of the order of =, in, and so on, mysql automatically optimizes the order of these conditions to match as many index columns as possible.

If there is an index (a, b, c, d), the query condition c > 3 and b = 2 and a = 1 and d

< 4与a = 1 and c >

3 and b = 2 and d

< 4等顺序都是可以的,MySQL会自动优化为a = 1 and b = 2 and c >

3 and d < 4, hit a, b, c in turn.

Index columns cannot participate in the calculation

Query conditions with index columns participating in the evaluation are not index-friendly (or even indexes cannot be used), such as from_unixtime (create_time) = '2014-05-29'.

The reason is simple: how to find the corresponding key in the node? If there is a linear scan, it needs to be recalculated each time, and the cost is too high; if the binary search is done, the size relationship needs to be determined for the from_unixtime method.

Therefore, index columns cannot participate in the calculation. The above from_unixtime (create_time) = '2014-05-29' statement should be written as create_time = unix_timestamp ('2014-05-29').

If you can expand, do not create a new index.

If you already have an index (a) and want to create an index (a, b), try to modify index (a) to index (a, b).

The cost of creating a new index is easy to understand. If the index (a) is changed to index (a, b), MySQL can be modified into index (a, b) directly on the B + tree of index an after splitting, merging and so on.

There is no need to build an index with a prefix containing relationship

If you already have an index (a, b), you no longer need to build an index (a), but if necessary, you still need to consider indexing (b).

Select a highly differentiated column as the index

It's easy to understand. For example, using gender as an index, the index can only divide 1000w rows of data into two parts (for example, 500w male and 500w female), and the index is almost invalid.

The formula of distinguishing degree is count (distinct) / count (*), which indicates the proportion in which the field is not repeated. The larger the proportion, the better the distinguishing degree. The discrimination degree of the only key is 1, while the distinction degree of some status and gender fields may be close to 0 in front of big data.

This value is difficult to determine. Generally speaking, the field requirement for join is more than 0.1, that is, an average of 10 records are scanned.

The above is the whole content of this article, I hope it will be helpful to your study, and I also hope that you will support it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report