Principle of Mysql indexing 07/13 Update SLTechnology News&Howtos

Principle of Mysql indexing

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Speaking of Mysql is inseparable from SQL optimization, speaking of optimization is inseparable from the index, so what is the index? Why is indexing faster? Then let's discuss the index-related knowledge together!

I. Common indexes in data structures [Students who understand this data structure are recommended to skip this section] 1. Binary tree

When it comes to binary trees, we all know that each node can have at most two child nodes, for example:

It can be found that the binary tree is very regular, the left child node is smaller than the current node, and the right child node is larger than the current node. Then isn't it convenient to inquire like this? Binary tree properties determine its time complexity is O log (n), of course, binary tree time complexity and its insertion order has, if the ascending or descending way to insert data, then its binary tree height h is equal to the number of nodes, then the complexity is increased to O(n).

If the database uses a binary tree for indexing, and 1000 entries need to be inserted, let's calculate the height of the tree. (A binary tree of depth k has at least k nodes and at most 2^k − 1 nodes)

2^10-1 ≈ 1000 In this case, the tree height is about 10, and in the worst case, the tree height is 1000.

The height of the tree determines the efficiency of the query, and the binary tree will have a height of 10~1000 such a big gap, it is obvious that it is not suitable for our index!

2. balanced tree

Before the problem put out, binary tree height is very unstable, then we can not put the height of stability? This is a balanced tree, which dynamically adjusts the height of the binary tree according to the insertion situation (the height difference between the left and right subtrees is at most 1), for example: we insert from 10, 9, 8,, 1

See, I didn't lie to you, it will adjust the height of the tree according to the insertion situation, how to adjust the specific, I just briefly explain it, after all, is not the focus of this article.

There are four ways to adjust a balanced tree:

LL、LR、RL、RR

It's understandable.

In this case, the nodes 5 and 6 are rotated to the left, then transformed into LL type, and then rotated to the right.

Well, the other two are not mentioned, and the rotation of these two ways is exactly the opposite.

Cough cough, back to topic, now good, balanced tree enough to ensure the balance of the tree, then use it to do the index there is no problem?

If we have 100000 records, then according to the nature of the binary tree, the minimum height of the tree is about 2^16, that is, to find an element needs to find 16 times, some students may say, um, query 16 times I can accept, so what if insert or delete data? The biggest drawback of AVL trees is that they require frequent rotation and adjustment of nodes when inserting nodes, so balanced trees are not suitable for indexing.

3. red-black tree

As mentioned above, the height difference between the left and right subtrees is at most 1. If the data is inserted carelessly, it will cause the conversion operation of the balanced tree, and the conversion is a very time-consuming thing.

The red-black tree appears to avoid frequent node conversion operations in balanced trees. Red-black tree does not pursue complete balance, it only requires partial nodes to achieve balance, reducing the rotation requirements, thus improving performance. Let's look at the definition of red-black tree!

* Each node is either red or black. * The root node is black. * Each leaf node (i.e., a NIL pointer or NULL node at the end of the tree) is black. * If a node is red, then both of its children are black. * For any node, each path to the NIL pointer at the end of the leaf tree contains the same number of black nodes.

When inserting or deleting elements, it is also necessary to maintain the red-black tree structure. The reason why the index does not use the red-black tree is mainly when the binary tree stores a large number of nodes, which will lead to the height of the tree increasing continuously. For example, 100 million nodes, the height of the tree will reach about 27 layers, and the general index is saved to disk, if each query requires an IO, then it requires 27 IO operations, and each IO operation is very time-consuming.

4.B tree

Balanced trees, red-black trees are binary trees, when the binary tree to save a large number of elements will lead to the height of the tree continues to increase, it can not use a multi-fork tree?

Let's look at the definition of B-tree:

1. Composition of B-tree Keywords (can be understood as data) Pointer to child node

2. Properties of B-tree: * If the root node is not a terminal node, then there are at least 2 subtrees * All non-leaf nodes except the root node have at least M/2 subtrees, and at most M subtrees (the key number is subtree minus one)* All leaf nodes are located at the same level 5.B+ tree

B+ trees differ from B trees in that:

* The number of subtrees of a node is the same as the number of keywords (the number of keywords in a B-tree is one less than the number of subtrees)* The keyword of the node represents the maximum number in the subtree, which also contains this data in the subtree * The leaf node contains all the data, and it conforms to the order of left small and right large.

Compared with B tree, B+ tree has the following advantages: lower hierarchy, leaf nodes form linked list, and range query is convenient;

Second, B tree and B+ tree in Mysql 1. Disk reading principle

Index files generally exist in the form of files on the disk, index retrieval operations require disk IO, but disk sequential reading efficiency is very high, so when reading, the disk is often not on-demand read, and each time will be pre-read, even if only one byte is needed, the disk will start from this position, sequentially read a certain length of data into memory. Because sequential disk reads are efficient, read-ahead can improve IO efficiency for programs that are local. The length of the prefetch is generally an integer multiple of the page (page is the logical block of computer management memory, the operating system often divides the main memory and disk storage area into consecutive blocks of equal size, each storage block is called a page, the size is usually 4K) Main memory and disk exchange data in units of pages. When the data to be read by the program is not in the main memory, a page missing exception will be triggered. At this time, the system will send a disk read signal to the disk. The disk will find the starting position of the data and continuously read one or several pages backward into the memory. Then the exception returns and the program continues to run.

B+ trees in Innodb

Innodb uses a B+ tree as an index, such as the primary index in the following figure:

Leaf nodes contain all nodes, except leaf nodes, which contain no values, while leaf nodes contain specific values.

secondary index

The secondary index in Innodb is also a B+ tree, but its leaf nodes point to the primary key value in the primary index, and then go to the primary index to find the specific value;

B+ tree in myISAM

When MyISAM engine uses B+ tree as index, its structure is similar to Innodb, except that its leaf nodes store not specific record values, but record addresses.

secondary index

In the primary index, MyISAM's index file only stores the address of the data record, while the secondary index has no difference in structure, and the secondary index also points directly to the address of the record.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.