Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the principle of MySQL Index

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article focuses on "how to understand the principle of MySQL indexing". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let the editor take you to learn "how to understand the principle of MySQL indexing".

Case background

Suppose the interviewer asks you: in the order center system of the e-commerce platform, the required orders are usually screened according to the product type and order status, and sorted according to the time when the order was created, then for the following SQL, how can you improve the query efficiency through the index?

Select * from order where status = 1 order by create_time asc

Some students will think that it is OK to build an index for status alone. But a better way is to set up a combined index of status and create_time to avoid file sorting in the MySQL database.

Because you can only use the index of status when querying, but if you want to sort the create_time, you need to sort the filesort with the file, that is, in the SQL execution plan, the Extra column will appear Using filesort.

So you need to take advantage of the ordering of the index to build a joint index in the status and create_time columns, so that the data filtered according to status is sorted according to create_time, avoiding sorting in files.

Case analysis

Through this case, you can find the importance of "indexing knowledge".

What data structures and algorithms are used at the bottom of the database index?

Why did MySQL InnoDB choose B+Tree as the default index data structure?

How can I see the details of index usage through the execution plan?

What will cause the index to fail?

What are the common ways to optimize the index?

……

To sum up, it is as follows:

Understand the indexing principle of MySQL InnoDB

Master the advantages of B+Tree over other index data structures (such as B-Tree, binary tree, and Hash tables)

Master the method of MySQL to execute the plan

Master the common situations that lead to index failure

Master the skills of building efficient indexes commonly used in practical work (such as prefix index, overlay index, etc.).

If you have ever been asked one of these questions, you need to seriously tamp the MySQL index and optimize the content.

The indexing principle of case solution MySQL InnoDB

From the perspective of data structure, the common indexes of MySQL are B+Tree index, HASH index and Full-Text index. The types of indexes supported by MySQL's common storage engines InnoDB, MyISAM, and Memory, respectively. (the last two storage engines are rarely mentioned in actual jobs and interviews, so they only talk about InnoDB.)

Index type

In practical application, InnoDB is the default storage engine when MySQL builds tables, and B+Tree index type is also the most frequently used index type by MySQL storage engine.

When creating a table, the InnoDB storage engine defaults to using the primary key of the table as the primary key index, which is the clustered index (Clustered Index). If the table does not define a primary key, InnoDB generates a hidden 6-byte primary key ID value as the primary key index, while the created primary key index defaults to the B+Tree index.

Next, we will use a simple example to illustrate the specific implementation of B+Tree index in the storage of data, in order to let you understand the principle of indexing through B+Tree.

First, we create a list of goods:

CREATE TABLE `product` (`id` int (11) NOT NULL, `product_ no` varchar (20) DEFAULT NULL, `name` varchar (255) DEFAULT NULL, `price` decimal (10,2) DEFAULT NULL, PRIMARY KEY (`id`) USING BTREE) CHARACTER SET = utf8 COLLATE = utf8_general_ci ROW_FORMAT = Dynamic

Then add a few rows of data:

The process of querying (primary key index) commodity data through a primary key

When we use the primary key index to query item 15, how do we find the corresponding data according to the principle of B+Tree index?

Select * from product where id = 15

We can manually build a B+Tree from the data, each node of which contains 3 child nodes (B+Tree allows M child nodes per node, and M > 2). The data values 1, 18, and 36 in the root node are the minimum values in the child nodes (1, 18, 6, 12), (18, 24, 30) and (36, 41, 52), respectively.

The data values of each parent node appear in the data values of the lower-level child nodes, so all the data value information is included in the leaf node, and each leaf node points to the next leaf node, forming a linked list. As shown in the figure:

Let's give an example to explain the query flow of B+Tree. For example, if you want to find a data value, 15 Magi BeverTree will look it up layer by layer from top to bottom:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Comparing 15 with the data of the root node (1Magne18, 36), 15 is between 1 and 18, so according to the search logic of B+Tree, the data block of the second layer is found (1mem6 and 12).

Search in the data block of the second layer (1mem6, 12). Because 15 is greater than 12, the data block of the third layer is found (12, 15, 15 and 17).

Look for it in the data block of the leaf node (12recover15jp17), and then we find the data value 15

Finally, the data stored in the leaf node is found according to the data value 15.

The whole process takes three operations, so the biggest advantage of B+Tree over B-tree and binary tree is query efficiency.

So the question is, if you currently query the data, not through the primary key ID, but with the product code, then what is the query process?

The process of querying commodity data through a non-primary key (secondary index)

If you use the commodity code to query the product (that is, using the secondary index to query), you will first retrieve the product code of B+Tree in the secondary index, find the corresponding leaf node, get the primary key value, and then query the corresponding leaf node through the B+Tree tree in the primary key index, and then get the whole row of data. This process is called returning to the table.

The above is the implementation principle of the index.

During the interview, the interviewer will not ask you to directly describe the process of querying the index, but will assess your mastery of index principles by examining your understanding of index optimization methods, such as why MySQL InnoDB chose B+Tree as the default index data structure. What are the common ways to optimize indexes in MySQL?

So next, let's take a closer look at how to answer the question of index optimization in the interview.

Advantages of B+Tree Index

If you are asked, "Why did MySQL choose B+Tree as the index data structure?" In fact, I am looking at two aspects of you: the principle of B+Tree index and the advantage of B+Tree index over other index types.

Now that we've just talked about the index principle of B+Tree, let's answer what is the advantage of B+Tree over other common index structures, such as B-tree, binary tree, or Hash index structure.

Advantages of B+Tree over B-tree index structure

B+Tree only stores data in leaf nodes, while non-leaf nodes in B-tree also store data, so the amount of data in a single node of B+Tree is smaller, and more nodes can be queried under the same number of disk ID O.

In addition, the B+Tree leaf node uses a double-linked list join, which is suitable for the range-based sequential search that is common in MySQL, but the B-tree cannot do this.

The advantage of B+Tree over binary tree index structure for B+Tree with N leaf nodes, its search complexity is O (logdN), where d represents the maximum number of child nodes allowed by a node is d.

In practical applications, the d value is greater than 100, which ensures that even if the data reaches the level of 10 million, the height of the B+Tree is still maintained at about 3 to 4 layers, that is, a data query operation only needs to do 3 to 4 disk I / O operations to query the target data (the query here refers to the query process of the clustering index of the above B+Tree).

However, the number of sons of each parent node of the binary tree can only be 2, which means that the search complexity of the binary tree is O (logN), which is much higher than that of B+Tree, so the binary tree retrieves the target data more times.

Advantages of B+Tree over Hash storage structure

We know that range query is a common scenario in MySQL, but Hash table is not suitable for range query, it is more suitable for equivalent query, which is why B+Tree index has a wider range of scenarios than Hash table index.

At this point, you know "Why MySQL chose B+Tree for indexing". When answering, you should focus on the advantages of B+Tree, and then introduce the query process of indexing principle (mastering these knowledge points, this question is actually easier to answer).

Next, let's move on to the next question: how to view the execution plan of the index in practice.

Check the usage details of the index by executing the plan. I have here a demonstration table product that stores product information:

CREATE TABLE `product` (`id` int (11) NOT NULL, `product_ no` varchar (20) DEFAULT NULL, `name` varchar (255) DEFAULT NULL, `price` decimal (10,2) DEFAULT NULL, PRIMARY KEY (`id`) USING BTREE, KEY 'index_name' (' name'). KEY 'index_id_name' (' id', 'name') CHARACTER SET = utf8 COLLATE = utf8_general_ci

The table contains the primary key index, the normal index on the name field, and the joint index of the id and name fields. Now let's look at the execution plan of a simple query statement:

Carry out the plan

For the execution plan, the parameters include the possible_keys field for the index that may be used, the key field for the actual index, the key_len for the length of the index, and the rows for the number of rows of data scanned.

You need to focus on the type field, indicating the type of data scanning, that is, describing the scanning method used to find the required data. The order of execution efficiency of common scanning types is from low to high (considering query efficiency, full table scan and full index scan should be avoided as far as possible):

ALL (full table scan)

Index (full index scan)

Range (Index range scan)

Ref (non-unique index scan)

Eq_ref (unique index scan)

Const (the result has only one primary key or unique index scan).

In general, the execution plan is an essential skill for R & D engineers to analyze index details (many large companies recruit JD that says "SQL statement tuning"), so you should also know the meaning of the core parameters of the execution plan, such as type. In the answer, we should also take the key parameters as the starting point, and then expand to other parameters, and then say how I do the SQL optimization work.

Common cases of index failure

In our work, we often encounter situations where SQL statements do not apply to existing indexes. Let's take a look at an example of index failure:

This SQL statement with a like query does not use the index_name index in the product table.

Let's take a look at the cause of index failure with the B+Tree structure of the ordinary index: when the MySQL optimizer evaluates the query on the B+Tree structure of the index index_name according to the condition of name like'% router', it is found that the values on the left and right child nodes of the current node may meet the condition of'% router', so the optimizer determines that the current index needs to scan the entire index and returns the table query. Let's just scan the whole table.

Of course, there are other similar index failures:

Calculation, function and type conversion operations are performed on the index column. In these cases, the index fails because the query process needs to scan the entire index and return the table, which is more expensive than a direct full table scan.

Like matching uses the prefix match character'% abc'

String without quotation marks causes type conversion

My advice to you is that if the MySQL query optimizer estimates that the cost of removing the index is greater than the cost of a full table scan, then do not walk the corresponding index and scan the whole table directly, and use the index if the walking index is less expensive than the full table scan.

Common methods of optimizing index prefix index optimization

A prefix index is to index the first few characters of a string in a field, for example, we can index the first five characters of the item name field on the order table. The prefix index is used to reduce the size of the index field, which can increase the index value stored in the index page and effectively improve the query speed of the index. When using fields with large strings as indexes, using prefix indexes can help us reduce the size of index entries.

However, the prefix index has some limitations, for example, order by cannot use the prefix index and cannot use the prefix index as an override index.

Overlay index optimization overlay index refers to all the fields of query in SQL, which can be found on the leaf node of index B+tree. The records can be obtained from the secondary index, but not by clustered index query.

Suppose we only need to inquire about the name and price of the goods, is there any way to avoid returning to the table?

We can set up a combinatorial index, that is, commodity ID, name, price as a combinatorial index. If this data exists in the index, the query will not retrieve the primary key index again, thus avoiding returning to the table. Therefore, the benefit of using an overlay index is obvious, that is, there is no need to query all the information that contains the entire row of records, which reduces a large number of Ibig O operations.

When federated indexes are federated indexes, there is a leftmost matching principle, that is, index matching is carried out in the most left-first way.

For example, the federated index (userpin, username) can match the federated index if the query condition is WHERE userpin=1 AND username=2, or if the query condition is WHERE userpin=1, it can also match the federated index, but if the query condition is WHERE username=2, it cannot match the federated index.

In addition, the order of fields when establishing a federated index also has a great impact on the efficiency of the index. The higher the field, the higher the probability of being used for index filtering. In the actual development work, when establishing a federated index, it is necessary to put the fields with high degree of differentiation first, so that the fields with high degree of differentiation are more likely to be used by more SQL.

The degree of distinction is the number of different values of a field column divided by the total number of rows in the table. For example, the gender distinction is very small and is not suitable for indexing or ranking at the top of the joint index column, while fields such as uuid are more suitable for indexing or ranking at the top of the joint index column.

At this point, I believe you have a deeper understanding of "how to understand the principle of MySQL indexing". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report