What is the reason why the index query is still slow? 07/12 Update SLTechnology News&Howtos

What is the reason why the index query is still slow?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "what is the reason for the slow use of index queries?". In the operation of actual cases, many people will encounter such a dilemma. Next, let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Case analysis

To get to the point, for the experiment, I created the following table:

CREATE TABLE `T` (`id` int (11) NOT NULL, `a` int (11) DEFAUT NULL, PRIMARY KEY (`id`), KEY `a` (`a`) ENGINE=InnoDB

The table has three fields, where id is the primary key index and an is the normal index.

First of all, SQL determines whether a statement is a slow query statement, using the execution time of the statement. He compares the statement execution time with the long_query_time system parameter, and if the statement execution time is longer than it, he records the statement in the slow query log. The default value for this parameter is 10 seconds. Of course, in production, we will not set so large, generally set 1 second, for some more sensitive business, may set a value less than 1 second.

Whether the index of the table is used in the execution of the statement, you can see whether the value of KEY is NULL by the output of explain a statement.

Let's take a look at explain select * from t; the KEY result is NULL

(figure 1)

The KEY result of explain select * from t where id=2; is PRIMARY, which we often say uses the primary key index.

(figure 2)

The KEY result of explain select a from t; is a, indicating that the index an is used.

(figure 3)

Although the KEY of the last two queries is not NULL, the last one actually scans the entire index tree a.

Assuming that the table has 1 million rows of data, the statement in figure 2 can still be executed very fast, but figure 3 must be very slow. If it is a more extreme case, for example, the CPU pressure on this database is very high, then the execution time of the second statement may also exceed long_query_time, which will enter the slow query log.

So we can draw a conclusion: there is no inevitable relationship between whether or not to use an index and whether to enter a slow query. The use of an index only represents the execution process of a SQL statement, and whether or not to enter a slow query is determined by its execution time, which may be affected by a variety of external factors. In other words, statements that use an index may still be slow.

Deficiency of full index scan

Well, if we look at this issue at a deeper level, in fact, there is another hidden problem that needs to be clarified, that is, what is meant by using an index.

As we all know, InnoDB is an index organization table, and all the data is stored on the index tree. For example, the table t above contains two indexes, a primary key index and a normal index. In InnoDB, the data is placed in the primary key index. As shown in the figure:

You can see that the data is placed on the primary key index, if logically, all queries on the InnoDB table use at least one index, so now I ask you a question, if you execute select from t where id > 0, do you think this statement is useful on the index?

We see that the output of the explain of the above statement shows PRIMARY. In fact, you know from the data, this sentence must have done a comprehensive scan. However, the optimizer believes that during the execution of this statement, the index needs to be located to the first value that satisfies ID > 0 according to the primary key index.

So even if the KEY written in the result of explain is not NULL, it may actually be a full table scan, so there is only one case in InnoDB called not using an index, that is, starting from the leftmost leaf node of the primary key index and scanning the entire index tree to the right.

That is, not using an index is not an accurate description.

You can use a full table scan to show that a query traverses the entire primary key index tree.

You can also use a full index scan to illustrate a query like select a from t;, which scans the entire ordinary index tree.

And statements like select * from t where id=2 are what we usually call the use of indexes. What he means is that we use the fast search function of the index and effectively reduce the number of rows scanned.

The filtering of the index should be good enough

Based on the above anatomy, we know that full index scanning slows down queries, so let's talk about index filtering.

Suppose you now maintain a table that records the basic information of 1.4 billion people in China, and now you want to find out all the names and basic information between the ages of 10 and 15, then your sentence will say, select * from t_people where age between 10 and 15.

As soon as you look at this statement, you must start indexing on the age field, otherwise it will be a full scan, but you will find that after you build the index, the statement is still slow, because there may be more than 100 million rows of data that meet this condition.

Let's take a look at the organization chart of this table after indexing:

The execution process of this statement is as follows:

Search with a tree from the index, get the first record with age equal to 10, get the value of its primary key id, and go to the primary key index according to the value of id to get the information of the whole row, which is returned as part of the result set.

Scan to the right on the index age, take the value of an id, fetch the entire row of information from the primary key index, and return as part of the result set

Repeat the above steps until you encounter the first record with an age greater than 15

Look at this sentence. Although he used the index, he scanned more than 100 million rows. So now you know, when we're talking about whether or not to use an index, what we really care about is the number of rows scanned.

For a large table, not only should there be an index, but the filtering of the index should be good enough.

Like the age in this example, its filtering is not good enough. When designing the table structure, we should make all the filtering good enough, that is, the degree of discrimination is high enough.

The cost of returning to the table

So if the filtering is better, does it mean that the number of rows scanned by the query must be small?

Let's look at another example:

If your execution statement is select * from t_people where name=' Zhang San 'and age=8

There is an index on the tpeopletable that is a joint index of name and age, so the filtering of this joint index should be good. You can quickly find the children whose first name is Zhang San and the age is 8 on the joint index. Of course, there should not be many such children, so the number of rows scanned to the right is very small, and the query efficiency is very high.

But the filter of the query and the filter of the index are not necessarily the same. If your requirement is to find out that the first word of all names is Zhang, and all the children are 8 years old, how will your sentence be written?

How do you write your sentence? Obviously you would write this: select * from t_people where name like 'Zhang%' and age=8

In MySQL5.5 and previous versions, the execution flow of this statement is as follows:

First of all, find the record with the first age field at the beginning of Zhang from the federated index, take out the primary key id, and then go to the primary key index tree to extract the value of the whole row according to id.

Determine whether the age field is equal to 8, if so, return as a row of the result set, and discard it if not.

Traverse to the right on the federated index and repeat the logic of going back to the table and judging until the first word of the name on the federated index tree is not Zhang.

We call the action of looking up an entire row of data on the primary key index according to id to return to the table. You can see that in this execution process, the most time-consuming step is to return to the table. Assuming that there are 80 million people whose first name is Zhang, then this process will go back to the table 80 million times. When locating the first row of records, only the leftmost prefix of the index and federated index can be used, which is most called the leftmost prefix principle.

You can see this execution process, it returns the table a lot of times, the performance is not good enough, is there any way to optimize it?

In the MySQL5.6 version, optimizations for index condition pushdown are introduced. Let's take a look at the execution process of this optimization:

First of all, from the joint index tree, find the record whose first age field is at the beginning of Zhang, and determine whether the value of age in the index record is 8. If so, go back to the table, take out the whole row of data, and return it as part of the result set. If not, discard it.

On the federated index tree, traverse to the right and determine the age field, then return to the table as needed until you come across a record that the first word of the name on the federated index tree is not Zhang.

The difference between this process and the above is that in the process of traversing the joint index, the condition that the age is equal to 8 is pushed to all the traversing process, reducing the number of returns to the table, assuming that the first word of the national name is Zhang. 1 million are 8-year-old children, so this query process has to traverse 80 million times in the joint index, while returning to the table only needs 1 million times.

Virtual column

You can see that the effect of this optimization is still very good, but this optimization still does not bypass the limitation of the leftmost prefix principle, so you still have to scan 80 million rows in the federated index, is there any further optimization method?

We can consider making a joint index of the first word of the name and age. This can be done using the virtual columns introduced by MySQL5.7. The corresponding SQL statement to modify the table structure:

Alter table t_people add name_first varchar (2) generated (left (name,1)), add index (name_first,age)

Let's look at the execution effect of this SQL statement:

CREATE TABLE `t_ people` (`id` int (11) DEFAULT NULL, `name` varchar (20) DEFAUT NULL, `name_ first` varchar (2) GENERATED ALWAYS AS (left (`name`, 1)) VIRTUAL,KEY `name_ first` (`name_ first`, 'age')) ENGINE=InnoDB DEFAULT CHARSET=utf8

First of all, he creates a virtual column on people called name_first, and then creates a joint index for name_first and age, and makes the value of this virtual column always equal to the first two bytes of the name field, so that the virtual column cannot specify a value when inserting data, and cannot actively modify it when updating, its value will be automatically generated according to the definition, and will be automatically modified when the name field is modified.

With this new joint index, when we are looking for the first word of the name is Zhang, and the child is 8 years old, the SQL statement can be written as follows: select * from t_people where name_first=' Zhang 'and age=8.

In this way, the execution of this statement only needs to scan 1 million rows of the federated index and return to the table 1 million times. The essence of this optimization is that we have created a more compact index to speed up the query process.

This is the end of the content of "what is the reason why the index query is still slow?" Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.