Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the index structure of MySQL?

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly shows you "what the MySQL index structure is like", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "MySQL index structure is how" this article.

Database storage unit

First of all, we need to know that, in order to achieve persistence, the index can only be stored on the hard disk, and the hard disk's IUnip O operation will occur when querying through the index. therefore, the design of the index needs to reduce the number of searches as much as possible, so as to reduce the time consuming.

There is also an important principle to know: the basic unit of database management storage space is the page (Page), and multiple rows of records are stored in one page (Row).

The computer system optimizes the read-ahead of disk IMaGO. When a single IMaGO, in addition to the data of the current disk address, it will also read the adjacent data into the memory buffer pool. Each time the data read by IWeiGo becomes a page, and the default page size of InnoDB is 16KB.

64 consecutive pages form an Extent, one or more extents form a Segment, and one or more segments form a tablespace (Tablespace). There are two types of table spaces in InnoDB, shared table space means that multiple tables share a table space, and independent table space means that the data and indexes of each table all exist in a separate table space.

The structure of the data page is as follows (image source: geek time "MySQL must know"):

The seven structural contents of the data page can be roughly divided into the following three categories:

The general part of the file is used to verify the integrity of the page transfer.

Header (File Header): represents page information. FIL_PAGE_PREV and FIL_PAGE_NEXT are used in the header to form a two-way linked list, pointing to the front and back data pages, respectively.

File Header: record the status information of the page

End of file (File Trailer): verify that the page is complete

A recording section for storing data records

Maximum and minimum record (Infimum/Supremum): a virtual row record that represents the maximum and minimum records of a data page.

User records (User Record) and free space (Free Space): used to store the contents of data rows

The index part is used to improve the retrieval efficiency of records.

Page directory (Page Directory): the relative location where user records are stored

For more information, please refer to Taobao's monthly database kernel report.

Index data structure

Naturally, we will think of some common data structures involved in the search algorithm, such as binary search tree, binary balanced tree and so on. In fact, the index of Innodb is implemented by B + tree. Let's take a look at why we choose this index structure.

The limitation of binary tree

First, let's briefly review the definition of the binary search tree (Binary Search Tree). In the binary search tree, if the key to be found is greater than the root node, search in the right subtree, and if the key is less than the root node, search in the left subtree until key is found, with a time complexity of O (logn). For example, the sequence [4, 2, 6, 6, 1, 3, 5, 7] generates the following binary search tree:

However, in some special cases, the depth of the binary tree will be very large. For example, [1magee, 2pyrmol, 3pr, 4pr, 5pr, 6p7], the tree will be generated as follows:

In the following case, in the worst case, you need to check seven times to find the desired result, and the query time becomes O (n).

In order to optimize this situation, there is a balanced binary search tree (AVL tree), AVL tree refers to the height difference between the left and right subtrees is not more than 1, the search time complexity is O (logn), this is already an ideal search tree, but in the database with tens of millions of rows of records, the depth of the tree will still be very high, still not the most ideal structure.

B tree

Then, if you expand from binary tree to N-tree, it is easy to imagine that N-tree can greatly reduce the depth of the tree. In fact, the 4-tier tree structure can already support dozens of T of data.

B-tree (Balance Tree) is such an N-tree. B-tree, also known as B-tree, satisfies the following definition:

Let k be the degree of the B-tree (degree, indicating the maximum number of child nodes per node)

Each disk block contains a maximum of k-1 keywords and pointers to k child nodes

In the leaf node, there are only keywords and no child node pointer

The keywords in each node are arranged from small to large, and all keywords in the left subtree of each keyword are smaller than it, while all keywords in the right subtree are larger than it.

All leaf nodes are on the same layer.

As mentioned above, each time SQL O will pre-read the data of one disk block, the size of which is one page, and the content of one disk block is used to represent the structure of the B tree. (image source: geek time must know):

The B-tree is also ordered, because the pointer of the child node must be 1 more than the keyword, so the section of the molecular node can be delineated with keywords. As in the example in the figure, each node has two keywords and three child nodes, such as disk block 2. The keyword 3J5 of the first node is smaller than its first child node 8, the 910 of the second child node is between 8 and 12, and the value of the third child node is 13. 15 is greater than its second child node 12.

Suppose we are looking for 9 now, and the steps are as follows:

Compared with root node disk block 1 (17Power35), which is less than 17, continue to search in pointer P1, corresponding to disk block 2.

Compared with disk block 2 (8P12), located between the two, continue to search in pointer P2, corresponding to disk block 6

Compared with disk block 6 (9, 10), find 9

As you can see, although a lot of comparison operations have been done, due to pre-reading, the comparison within the disk block is carried out in memory, which does not cost the disk Ihand O, and the above operation only needs to be done 3 times, which is already an ideal structure.

B+ tree index

The B + tree has been further improved on the basis of the B tree. The differences between the B + tree and the B tree are as follows:

The B+ tree is constructed so that for keywords in the parent node, all keywords in the left subtree are less than it, and all keywords in the right subtree are greater than or equal to it.

Non-leaf nodes are only used for indexing and do not store data records

The keyword of the parent node also appears in the child node, and it is the maximum (or minimum) value in the child node.

All keywords appear in the leaf node, which forms an ordered linked list, sorted from smallest to largest.

The example is as follows, in this case, the keyword of the parent node is the minimum value in the child node (image source: geek time SQL must know):

Suppose you want to find keyword 16, and the steps are as follows:

Compared with disk 1 of the root node (1pc18 ~ 35), where 16 is between 1 and 18, a pointer P1 is obtained, pointing to disk 2

Find disk 2 (1be8, 14), 16 is greater than 14, get pointer P3, point to disk 7

Find disk 7 (14, 16, 17), find 16

Advantages of B + tree:

Internal nodes do not store data, so the number of records that each internal node can store is much larger than that of the B-tree, the height of the tree is lower, there is less Iamp O, and there is more content in the data page read by Ipicuro each time.

It can support range query and traverse directly in the ordered linked list composed of leaf nodes.

All data is stored in leaf nodes, so the query efficiency is more stable.

HASH index

The default index structure of MySQL's memory storage engine is Hash index. Hash is a function called hash function, which converts arbitrary length input into fixed length output through specific algorithms (such as MD5, SHA1,SHA2, etc.). Input and output correspond one to one. This article will not give an in-depth introduction to the hash function. For more information, please refer to Baidu Encyclopedia.

The efficiency of Hash search is O (1), and the efficiency is very high. Hash map in map,java in dict,golang of python is implemented based on hash, and Key-Value database such as Redis is also implemented by Hash.

For precise lookup, Hash index is more efficient than B+ tree index, but Hash index has some limitations, so it is not the most mainstream index structure.

Because the data that the Hash index points to is unordered, the Hash index cannot range queries and does not support ORDER BY sorting.

Because the Hash is an exact match, fuzzy queries cannot be made either.

Hash indexes do not support the leftmost matching principle of federated indexes, and federated indexes take effect only when there is an exact match. Because the Hash index calculates the Hash value by merging the index and then calculating the Hash value together, instead of calculating a separate Hash value for each index.

If the indexed field has a lot of duplicate values, it will cause a large number of Hash conflicts, and the query will become very time-consuming.

Based on the above reasons, the Mysql InnoDB engine does not support Hash indexes, but there is a function of adaptive Hash indexes in the memory structure. When an index value is used very frequently, a Hash index is automatically created based on the B+ tree index to improve query performance.

Adaptive Hash index can be understood as an "index index". The Hash index is used to store the page address in the B+ tree index and quickly locate the corresponding leaf node. You can see it through the innodb_adaptive_hash_index variable.

The above is all the content of this article "what is the index structure of MySQL?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report