What is the underlying layer and optimization method of Mysql index? 07/06 Update SLTechnology News&Howtos

What is the underlying layer and optimization method of Mysql index?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, I would like to share with you about the underlying Mysql index and optimization methods of what the relevant knowledge, detailed content, clear logic, I believe that most people still know too much about this knowledge, so share this article for your reference, I hope you will learn something after reading this article, let's take a look at it.

one。 First of all, let's talk about what an index is and why we use it.

The index is used to quickly find the row with a specific value in a column, without using the index, MySQL must read the entire table from the first record until finding out the relevant row, the larger the table, the more time it takes to query the data. If the query column in the table has an index, MySQL can quickly reach a location to search the data file without having to view all the data, then it will save a lot of time.

two。 Index types are divided into two categories: 1.hash index 2.bTree III. Let's briefly analyze the hash index and the bTree index.

1. Hash table is a structure that stores data by key-value (key-value). As long as we enter the key to be looked up, namely key, we can find its corresponding value, namely Value. The hash's idea is simple: put the value in the array, convert the key into a definite position with a hash function, and then put the value in this position of the array.

Inevitably, multiple key values will be converted by the hash function and the same value will occur. One way to handle this is to pull out a linked list.

two。 When it comes to bTree, we have to mention binary tree, binary tree is divided into many, for example: binary search tree, balanced binary tree and so on. And, of course, the key red-black tree.

1) the characteristic of binary search tree is that the value of all nodes in the left child tree of the parent node is less than that of the parent node. The value of all nodes in the right child tree is greater than that of the parent node. Here is a picture as an example to show the binary search tree.

IDname5 5 6 6 7 7 7 2 2 1 4 4 3 3

There is a need to find Zhang San, if we do not use the binary search tree then we need to search 7 times, using the binary search tree we only need to find 4 times to find the value we want.

According to what has been mentioned above, using a binary search tree can indeed reduce the number of queries, but have you ever thought that if the data in the database are incremented in turn like 1, 2, 3, 4, 5, 6, 7, continue to use the binary search tree will become a linked list. So if we want to find 7, we need to search 7 times, and scan the table 7 times. This is no different from not indexing, which is one of the drawbacks. The following figure is an example.

2) balanced binary tree: also known as AVL tree, the absolute value of the height difference between the left and right subtrees is not more than 1, and the left and right subtrees are both balanced binary trees. AVL tree is the earliest self-balanced binary search tree. In an AVL tree, the maximum height difference between two subtrees of any node can only be 1, so it is also called a height balanced tree. Queries, additions, and deletions are O (log n) on average and at worst. Adding and deleting will require one or more tree rotations to rebalance the tree.

The purpose of introducing binary tree is to improve the efficiency of binary tree search, so as to reduce the average search length of the tree. Therefore, we must adjust the structure of the tree when each binary tree inserts a node, so that the binary tree search can be balanced, thus it is possible to reduce the height of the tree and reduce the search length of the average tree.

The characteristics of balanced binary tree are as follows:

1. Its left and right subtrees are both AVL trees.

two。 The height difference between the left subtree and the right subtree cannot exceed 1.

Example figure:

3) Red-black tree: it can be understood that the red-black tree is above the balanced binary tree, the red-black tree will not pursue "complete balance", it will only partially meet the balance requirements, reducing the requirements for rotation, so as to improve performance. In addition, because of its design, all imbalances can be solved within three rotations. In the red-black tree, the time complexity of its algorithm is the same as that of AVL, and the statistical performance will force the AVL tree to be higher. Therefore, compared with the balanced binary tree, the red-black tree is not a balanced binary tree in the strict sense, the insertion and deletion efficiency of the red-black tree is higher, and the query efficiency is relatively lower than the balanced binary tree, but the difference between the two query efficiency is basically negligible. The red-black tree has the following characteristics:

1. The node is red or black.

two。 The root node is black.

3. The two children of each red node are black. (the child of a red node must be a black node)

4. All paths from any node to each of its leaves contain the same number of black nodes.

The parent and child nodes of a red node can only be black nodes.

Example figure:

4) BTree (B tree): of course, the red-black tree is mentioned above, and its performance is very high. For example, the highest height of the tree is 4, with a total of 9 pieces of data, but for Mysql database, millions of pieces of data, tens of millions of pieces of data, the height of the tree is inestimable, for example, millions of pieces of data need to go through 30-50 disk IO to query the data, or even more times, obviously can not meet the efficient query efficiency of the Mysql index. If we control the height of the tree, it will greatly reduce the number of requests for disk IO. If the height is controlled at 4, it only takes 4 times for disk IO to query the data.

But how to control the height of the tree? the red-black tree stores only one element per node. If more than one element is stored in each node, the height problem can be solved. Some students must have questions and put all the elements on one node. Then the height value is 1, isn't it faster? It must be wrong to think this way. Every time Mysql deals with disk IO, there is a size limit. Mysql limits the size of each node to 16K. Students who want to see their own Mysql limit node size can execute the sql below.

Show global status like 'Innodb_page_size'

Take the figure as an example to illustrate BTree

BTree has the following characteristics:

1. All index elements are not duplicated

two。 The data index of the node is incremented from left to right

3. The leaf node has the same depth and the pointer of the leaf node is empty

4. Both leaf nodes and non-leaf nodes store indexes and data

5) B+ tree: it is mentioned above that BTree controls the height of the tree, which can meet the needs of Mysql for the index, but in the end, the implementation of Mysq index is not BTree but B+ tree. Mysql made a little modification to the B tree to get the B+ tree, which can also be understood as the upgraded version of the B tree.

Take the figure as an example:

As you can see from this figure, our non-leaf nodes only store indexes and not data, and leaf nodes are connected by pointers. Both the leaf node and the non-leaf node of the B tree store indexes and data, and the pointer of the leaf node is empty. The B + tree puts the data on the leaf node, so that the non-leaf node can store more indexes and get more indexes each time from the disk IO.

The characteristics of B + tree are as follows:

1. Non-leaf nodes do not store data, but only indexes (redundancy) and lower-level pointers, which can put more indexes.

two。 The leaf node contains all index fields, and data

3. Leaf nodes are connected with double pointers to improve the performance of interval access.

The B+ tree drawn on Baidu and many blogs is wrong, so you must avoid the pit.

If you are interested in Mysql's official explanation of the B+ tree, you can take a look.

Link: Mysql official website.

four。 Index classification

1. Classified according to the storage association of the index: divided into two categories

1.) clustering index: the leaf node contains the complete data record and does not need to go back to the table.

2.) Nonclustered index: need to go back to the table and look up the tree twice, which affects the performance.

We all know that there are two commonly used storage engines in Mysql, MyISAM and InnoDB, but have you actually understood the underlying data storage structures of the two storage engines?

Let's take the picture as an example to illustrate:

The test. Myisam table is the MyISAM storage engine, and the InnoDB table is the InnoDB storage engine. You can see that the MyISAM storage engine has three files, namely, frm, MYD and MYI. It is easy to understand the abbreviation of frm-frame, which stores the structure of the table, MYD-MYData stores the data, MYI-MYIndex stores the index, and the index and data are stored separately. InnoDB has only frm and IBD, in which frm is also the structure of the stored table, and IBD files store indexes and data. InnoDB is different from MyISAM in this respect.

The following figure is used as an example to illustrate that the primary key index of the MyISAM storage engine needs to return to the table operation (nonclustered index)

Of which 15 is the primary key index, 0x07 stores the disk file address pointer of 15 records in the line, for example, if we want to find the data of 15, we should first find the corresponding pointer of 15 through the primary key index tree, and then find the pointer to find the specific data in the MyD file, need to carry out a secondary search, this process is called back table operation.

The following figure shows that the primary key index of the InnoDB storage engine does not need to return to the table. (clustered index)

The first row of the InnoDB storage engine child node 15 stores the index, and the column below 15 stores all the other fields in the row of the index. If we want to look up the data of 15, we can find it directly without having to look up the tree twice.

two。 Classified according to function: mainly divided into five categories

2.1 Primary key index: InnoDB primary key index does not need to return to the table operation

2.2 ordinary index (secondary index): InnoDB ordinary index needs to return table operation, for secondary index, it will do federated index with primary key by default.

2.3 unique Index

2.4 full-text index

2.5 Federated index: need to meet the leftmost prefix principle

3. It is mentioned in 2.2 that the ordinary index needs to go back to the table. Is there any ordinary index that does not need to go back to the table? the answer is yes. In a query, the index has covered our query requirements, which we call the overlay index. There is no need to return to the table at this time.

Because the overlay index can reduce the search times of the tree and significantly improve the query performance, the use of overlay index is a common means of performance optimization.

For example: here is the initialization statement for this table.

Mysql > create table T (ID int primary key,k int NOT NULL DEFAULT 0, s varchar (16) NOT NULL DEFAULT'', index k (k)) engine=InnoDB;insert into T values (100 1, 'aa'), (200pime 2)), (300pc'), (5005pime'), (600pc6), (700pcg')

In table T above, if I perform select * from T where k between 3 and 5, how many tree search operations do I need to perform, how many rows will be scanned?

Now, let's take a look at the execution flow of this SQL query. Look at the picture below.

1.) find the record of kappa 3 on the k index tree and get ID = 300

2.) then go to the ID index tree to find the R3 corresponding to ID=300

3.) take a value of KF5 in the k index tree to get ID=500

4.) go back to the ID index tree and find the R4 corresponding to ID=500.

5.) if the k index tree takes a value of k _ index 6, if the condition is not satisfied, the loop ends.

In this process, the process of going back to the primary key index tree search is called back to the table. As you can see, this query process reads three records of the k-index tree (steps 1, 3, and 5) and returns the table twice (steps 2 and 4).

In this example, because the data needed for the query result is only available on the primary key index, it has to go back to the table.

If the statement you are executing is select ID from T where k between 3 and 5, you only need to check the value of ID, and the value of ID is already on the k index tree, so you can provide the query results directly without going back to the table. In other words, in this query, index k has "overridden" our query requirements, which we call an overlay index.

In InnoDB, tables are stored in the form of indexes according to the order of primary keys, which is called index organization tables. And because we mentioned earlier, InnoDB uses the B + tree index model, so the data is stored in the B + tree. Each index corresponds to a B + tree in InnoDB.

five。 Index optimization 1. The basic concepts, classification and underlying structure of the index are described above. Let's talk about index optimization.

1.) when only one column in the composite index contains a null value, the index is invalid

2.) the calculated index on the column fails, and all the indexes after the range fail

3.) using functions on query conditions will cause index failure

4.) use the! = or operator in the where sentence, causing the index to fail

5.) avoid using or, resulting in index failure

6.) using fuzzy queries can also cause index invalidation. You can use like'a% 'instead of like'% a%'.

7.) use overlay indexes as much as possible and reduce select * statements

8.) satisfies the leftmost prefix rule, the leftmost leading column starts and does not skip the columns in the index

9.) the index of a string without single quotation marks is invalid

two。 The following is a practical example of index optimization.

Create a new employee table with five attributes, as follows.

Create table employees (id int primary key auto_increment comment' primary key increment', name varchar (30) not null default''comment' name', age int not null default 1 comment' age', id_card varchar (40) not null default''comment' ID number', position varchar (40) not null default''comment' location');-- create a federated index create index name_index on employees (name,age,position) -- insert a data insert into employees (name,age,id_card,position) values ('Zhang San', 15meme 201124199011035321 'Beijing');-- with 10 sql tests below, note that the joint index established is name,age,position1.explain select * from employees where age=15 and position=' Beijing 'and name=' Zhang San'; 2.explain select * from employees where name=' Zhang San 'and age=15 and position=' Beijing'; 3.explain select * from employees where age=15 and name=' Zhang San' 4.explain select * from employees where position=' Beijing 'and name=' Zhang San'; 5.explain select * from employees where position=' Beijing 'and age=15;6.explain select * from employees where position=' Beijing' and age > 15 and name=' Zhang San'; 7.explain select * from employees where position=' Beijing'; 8.explain select * from employees where age=15;9.explain select * from employees where name=' Zhang San'; 10.explain select * from employees where name! = 'Zhang San' Which of the above 10 sql are indexed and which are not invalidated? I believe the students already have the answer, but whether the answer is right or not, let's analyze it together. First of all, the query conditions used all three indexes, but the order of the indexes changed from name,age,position to age,position,name. I thought that many students must have given the answer that the index failed, but the fact proved that the result was wrong and the index was in force. Many students must have wondered why this sql does not meet the leftmost principle. This is about to involve the execution process of sql. Here the blogger simply says that sql executes the process of an optimizer, and one of the functions of the optimizer is to optimize the selection of the index, so the optimizer helps us make the order of the index correct, so the index takes effect. The following is the result of the execution of Article 1 in indexed order sql and Article 2 not in indexed order sql. The execution results are shown in the following figure: you can find that all of them are in effect.

Article 1 the value of sql type is ref, the byte is 288, and ref has 3 const, all of which are valid.

Article 2 the value of sql type is ref, the byte is 288 and ref has 3 const, all of which are valid.

Those who want to learn about the execution process of sql can see another article by the blogger about the execution process of sql. Some students have questions, is the leftmost principle useless? Answer: useful. Now let's talk about Article 3, 4, and 5 sql Article 3: explain select * from employees where age=15 and name=' Zhang San'; when sql is executed, the optimizer optimizes the order of the index for us, from age-> name to name-> age, when the index is valid. Article 4: explain select * from employees where position=' Beijing 'and name=' Zhang San'; the index order is optimized to name-> position, but at this time only the name index takes effect, position does not take effect, because we build the index order is name-> age-> position, you will find that skipping the age, the index is essentially a tree, without a node, the following index will not take effect, of course, this does not meet the leftmost principle. Article 5: explain select * from employees where position=' Beijing 'and age=15; this is the same as Article 4 sql. The first index is gone, and the latter cannot be effective. The implementation results are as follows:

You can find that the value of the third sql type is ref, the byte is 126, and the ref has two const, all of which are valid.

Article 4 sql is only 122 bytes and ref has only 1 const, only the name index is valid.

Article 5 the value of sql type is all, bytes and ref are null and void.

Here is Article 6 sql, and the rest of the sql is the same as the previous sql. Explain select * from employees where position=' Beijing 'and age > 15 and name=' Zhang San'; this sql we will find that all the index fields are used and queried as conditions, except that age is a range lookup, and the optimizer optimizes the index order to name-> age-> position for us, according to our index optimization article 2: calculate the index on the column, all the indexes after the range are invalid, you must know the answer. The implementation results are as follows:

Article 6 sql is only 126 bytes and the value of type is range, range lookup, only name and age indexes are valid.

These are all the contents of this article entitled "what are the underlying levels and optimization methods of Mysql indexes?" Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.