Example Analysis of Innodb Storage engine Index in MySQL 07/11 Update SLTechnology News&Howtos

Example Analysis of Innodb Storage engine Index in MySQL

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly shows you the "sample analysis of Innodb storage engine index in MySQL", which is easy to understand and well-organized. I hope it can help you solve your doubts. Let Xiaobian lead you to study and study the "sample analysis of Innodb storage engine index in MySQL".

Overview

In the database, the index is just like the directory of the tree to speed up the search of data. For a SQL query operation, it can quickly filter out the data that does not meet the requirements according to the index and locate the data that meets the requirements, so that it is not necessary to scan the whole table to get the required data.

In innodb storage engine, indexing is mainly based on B + tree, index keywords are stored in non-leaf nodes, and primary key values in data records or primary key index (or clustered index) are stored in leaf nodes. All data records are in the same layer. Leaf nodes, that is, data records are directly connected by pointers to form a two-way linked list. Thus, it is convenient to traverse all or a certain range of data records.

B tree, B + tree

Both B-tree and B + tree are multi-channel balanced search trees, which reduce the height of the tree by storing more keywords in each node and maintaining the balance of the tree through rotation and splitting operations, so as to reduce the amount of disk access to data retrieval.

One of the main differences between B + tree and B tree is that the leaf nodes of B + tree are connected back and forth by pointer, specifically through a two-way linked list, so it is very suitable to perform range search. For details, please refer to:

Data structure-tree (3): multipath search tree B tree, B + tree

Both clustering and non-clustered indexes of innodb storage engine are implemented based on B + tree.

Primary key index

The innodb storage engine uses the primary key index as the clustering index of the table. The characteristic of the clustering index is that the non-leaf node stores the primary key as the search keyword, and the leaf node stores the actual data record itself (also known as the data page). It stores the data record from left to right in the order of keywords, so the clustering index is actually the way of data storage, so there can only be one cluster index per table. The data table of the innodb storage engine is also known as an index organization table. The structure is as follows: (the picture is quoted from "inside MySQL Technology: Innodb Storage engine")

In the query, if the data is found through the primary key, even when using explain to analyze the key of SQL to display PRIMARY, the search efficiency is the highest, because the leaf node stores the data record itself, and all can be returned directly, without the need for additional query back to the table (in the primary key index) to obtain the data record like the non-clustered index.

Secondly, for ORDER BY sorting operations, whether positive ASC or reverse DESC, if the column of ORDER BY is a primary key, because the B+ tree corresponding to the primary key index is ordered, the data returned by the storage engine is ordered according to the primary key, so there is no need to sort at the MySQL server layer, which improves the performance. If extra displays Using filesort when analyzing SQL through explain, it indicates that sorting needs to be carried out at the MySQL server layer. At this point, you may need to use temporary tables or external file sorting, which generally needs to be optimized.

For the range search based on the primary key, because the leaf nodes of the cluster index have been connected by a two-way linked list according to the order of the primary key, a certain range of data records can be found quickly.

Auxiliary index

Secondary index, also known as secondary index, is a kind of non-clustered index, which is generally designed to improve the efficiency of some queries. Even when using the index column query, the auxiliary index is used to avoid full table scanning. Because the secondary index is not a clustered index, there can be multiple secondary indexes per table, with the following structure:

The non-leaf section of the secondary index stores the keywords of the index column, and the leaf node stores the primary key value of the corresponding clustered index (or primary key index). That is, after the required data is located through the secondary index, if the required column cannot be covered by the index, that is, all the data columns needed for the query can be obtained through the secondary index column, then the primary key value of the corresponding clustered index needs to locate the primary key in the clustered index, and then through the primary key value to find the corresponding leaf page in the clustered index, thus the corresponding data record can be obtained. So the whole process involves two processes: first look up in the secondary index, and then look up (query back to the table) in the clustered index (that is, the primary key index).

For example:

If the height of the B+ tree corresponding to the secondary index is 3, three times disk IO is required to locate the leaf node, where the leaf node contains a primary key value of the corresponding cluster index.

Then, through the primary key value of the corresponding cluster index of the leaf node, the corresponding data record is found in the cluster index, that is, if the height of the B + tree corresponding to the cluster index is also 3, it also needs 3 times of disk IO to locate the leaf page of the cluster index, so as to obtain the actual data record in the leaf page.

The above process requires a total of 6 disk IO. Therefore, if more rows of data need to be queried back to the table, the disk IO required will multiply, and the query performance will decline. Therefore, it is necessary to establish a secondary index in columns with a high degree of filtering, that is, less duplicated data.

Cardinality: data repetition of the index column

From the above analysis, when querying through a secondary index, if you need to query back to the table and query a large number of data rows, you need a large number of disk IO to obtain data, so this index not only does not provide query performance, but will reduce query performance, and the MySQL optimizer will give up using the index and scan the whole table directly when it needs to return more data rows. Therefore, the columns selected by the secondary index need to be columns with low repetition, that is, only one or two rows of data need to be returned after the general query. If there are too many duplicate values in the column, you need to consider giving up establishing a secondary index on the column.

You can judge by the value of Cardinality in the SHOW INDEX FROM data table:

Cardinality represents the estimated number of unique values of the index column. If it is close to the number of data rows, it means that there are fewer duplicate values in the column, and the filter of the column is better; if the difference is too large, that is, the value of Cardinality / the total number of data rows is too small, if the gender column contains only "male" and "female" values, it means that there are a large number of duplicate values in the column, and you need to consider whether to delete the index.

Overlay index

Due to the high cost of query back to the table, in order to reduce the number of queries back to the table, you can add all the columns needed for the query in the secondary index, such as using a federated index. in this way, you can get all the data needed for the query from the secondary index (because the leaf page of the secondary index contains the primary key value, even if the index does not have the primary key value, if you only need to return the primary key value and index column, the overlay index will be used) There is no need to go back to the table to query complete rows of data to improve performance, which is called an override index.

When using explain to analyze the query SQL, if the extra displays using index, it means that the overlay index is used to return data, and the query has high performance.

As the existence of the index will increase the overhead of updating data, that is, when updating data, such as adding and deleting data rows, we need to update the corresponding auxiliary index, so in the specific design, we need to make a compromise between the two.

The federated index matches the leftmost prestamp

A federated index uses multiple columns as an index, such as (arecoverb), which means that three columns are used as an index. As can be seen from the characteristics of the B+ tree, the index needs to match the leftmost prestamp, so it is actually equivalent to establishing three indexes: a, (acentine b) and (acentine bjorc).

Therefore, when designing a federated index, we need to consider not only whether it can be optimized to cover the index, but also the order of multiple columns. The general experience is that the columns with the highest query frequency and the best filtering (less repetitive values) are at the front, that is, the left.

Optimized sorting of federated index order by

In addition, you can consider reducing the sorting of the MySQL server layer through federated indexing. For example, the user order table contains federated indexes (user_id, buy_date) and single-column indexes (user_id): (note that this is just to demonstrate the federated index, the actual project only needs the federated index, as mentioned above, (aMaginb), which is equivalent to a, (amemb) two indexes):

KEY `idx_user_id_buy_ id` (`user_ id`), KEY `idx_user_id_buy_ date` (`user_ id`, `buy_ date`)

If you are just querying a user's order, innodb will use the user_id index, as follows:

However, when you need to sort and retrieve the purchase records of the user for the last 3 days based on the purchase date buy_date, both the single-column index user_id and the federated index (user_id, buy_date) can be used, and innodb will choose to use the federated index. Because the buy_date is already ordered in the federated index, there is no need for another sorting in the MySQL server layer, thus improving performance as follows:

If the federated index idx_user_id_buy_date is deleted, Using filesort is displayed:

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.