How to solve the online slow query accident caused by the wrong index of tens of millions of data tables 07/16 Update SLTechnology News&Howtos

How to solve the online slow query accident caused by the wrong index of tens of millions of data tables

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to solve the online slow query accident caused by the wrong index of tens of millions of data tables". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. let's study and learn how to solve the online slow query accident caused by the wrong index of tens of millions of data tables.

Fault description

At 11:00 on July 24th, a database suddenly received a large number of alarms, the number of slow queries exceeded the standard, and caused a sharp increase in the number of connections, resulting in slow database response and affecting business. Take a look at the chart. The number of slow queries reached 14w per minute at the peak. Under normal circumstances, the number of slow queries is only below double digits, as shown below:

Quickly check the slow SQL records and find that the slow queries caused by the same kind of statements (private data such as table names, which I have hidden):

Select * from sample_table where 1 = 1 and (city_id = 565) and (type = 13) order by id desc limit 0,1

It seems that the sentence is very simple, nothing special. But each query execution time reaches an astonishing 44s.

It is simply sensational, this is no longer "slow" can describe.

Next, look at the table data information, as shown in the following figure:

You can see that the table has a large amount of data, with an estimated number of rows of 83683240, that is, about 8000W, a "table with tens of millions of data".

This is roughly the case. Let's move on to troubleshooting.

Troubleshooting the cause of the problem

First of all, of course, you have to doubt whether the statement will not leave the index. Check the index in the table-building DML:

KEY `idx_ 1` (`city_ id`, `type`, `rank`), KEY `idx_log_dt_city_id_ rank` (`log_ dt`, `city_ id`, `rank`), KEY `idx_city_id_ type` (`city_ id`, `type`)

Please ignore the duplication of the idx_1 and idx_city_id_type indexes, which is a problem left over from history.

"you can see that there are idx_city_id_type and idx_1 indexes." our query conditions are city_id and type, both of which are accessible.

But do we really only consider city_id and type in our query conditions? (the witty little friend should notice the problem and go on first, leaving it for everyone to think about.)

Now that you have an index, it's time to see if the statement actually goes to the index. MySQL provides Explain to parse SQL statements. Explain is used to parse SELECT query statements.

The more important fields of Explain are:

Select_type: query type, including simple query, federated query, subquery, etc.

Key: index used

Rows: estimated number of rows to scan

For more details on Explain, please refer to: MySQL performance Optimization artifact Explain usage Analysis

We use Explain to analyze the statement:

Select * from sample_table where city_id = 565and type = 13 order by id desc limit 0Jol 1

The results are obtained:

As you can see, although possiblekey has our index, we end up with the primary key index. While the table is tens of millions, "and the query condition is actually the returned empty data", that is, MySQL actually takes a long time to retrieve on the primary key index, resulting in a slow query.

We can use force index (idx_city_id_type) to have the statement select the federated index we set:

Select * from sample_table force index (idx_city_id_type) where (1 = 1) and (city_id = 565) and (type = 13) order by id desc limit 0,1

This time the execution is obviously fast, the analysis statement:

The actual execution time is 0.00175714s. After taking the federated index, it is no longer a slow query.

The problem is found, and to sum up: "the MySQL optimizer thinks that in the case of limit 1, the primary key index can find that piece of data more quickly, and if the joint index needs to scan the index for sorting, and the primary key index is naturally ordered, so the optimizer takes the primary key index into consideration. In fact, MySQL traversed 8000w pieces of data and couldn't find the chosen person (qualified data), so it wasted a lot of time. "

Principle of MySQL Index selection

Criteria for index selection of optimizer

The execution flow of a MySQL statement is roughly like the following figure, while the query optimizer is where the index is selected:

Quote a paragraph from the reference to explain:

The first thing to know is that it is the job of the MySQL optimizer to choose an index.

The purpose of the optimizer in selecting the index is to find an optimal execution scheme and execute the statement with the minimum cost. In the database, the number of scan rows is one of the factors that affect the execution cost. The smaller the number of rows scanned, the fewer times you access disk data and the less CPU resources are consumed.

"of course, the number of rows scanned is not the only criterion, and the optimizer will make a comprehensive judgment based on factors such as whether to use temporary tables and whether to sort them. "

To sum up, there are many factors for optimizer selection: "number of rows scanned, whether to use temporary tables, whether to sort, and so on."

Let's look back at the two explain screenshots we just took:

Without the query statement of "primary key index", the number of rows estimated by rows is 1833, while the number of rows forced by "federated index" is 45640, and the Extra information indicates that additional sorting is required by Using filesort. So without strengthening the index, "the optimizer chooses the primary key index because it feels that the primary key index scans fewer rows and does not require additional sorting operations, and the primary key index is inherently orderly. "

How did rows estimate it?

Students are about to ask, why the rows is only 1833, obviously actually scanned the entire primary key index ah, the number of rows is far more than a few thousand rows. In fact, the rows of explain is the number of rows "estimated" by MySQL, which is the estimated number of rows based on the query condition, index, and limit. "

How does MySQL get the cardinality of the index? Here, let me give you a brief introduction to the method of MySQL sampling and statistics.

Why do we need sampling and statistics? Because the whole table is taken out row by row of statistics, although accurate results can be obtained, but the cost is too high, so we can only choose "sampling statistics".

When sampling and counting, InnoDB will select N data pages by default, count the different values on these pages, get an average, and then multiply by the number of pages of the index to get the cardinality of the index.

The data table is constantly updated, and the index statistics are not fixed. Therefore, when the number of rows changed exceeds 1max M, it will automatically trigger a redo of index statistics.

In MySQL, there are two ways to store index statistics. You can choose by setting the value of the parameter innodb_stats_persistent:

When set to on, the statistics are persisted. At this point, the default N is 20 and M is 10.

When set to off, the statistics are stored only in memory. At this point, the default N is 8 and M is 16.

Because of the sampling statistics, this base is easy to be inaccurate, whether N is 20 or 8.

We can use the analyze table t command, which can be used to re-count the index information. But this command production environment needs to contact DBA, so I will not do the experiment, you can experiment on your own.

The index should consider the fields of order by

Why do you say that? Because if the index in my table is a joint index of city_id,type and id, then the optimizer will take the joint index because the index is already sorted.

Changing the size of limit will solve the problem?

Does adjusting the number of limit affect the estimated row number rows, which in turn affects the choice of the optimizer index?

The answer is yes.

We execute limit 10

Select * from sample_table where city_id = 565and type = 13 order by id desc limit 0Jol 10

In the figure, the rows has increased to 18211, a tenfold increase. What happens if you use limit 100?

The optimizer chose the federated index. The initial estimate is that rows will double again, so the optimizer abandoned the primary key index. Would rather sort with a federated index than use a primary key index.

Why do abnormal slow queries suddenly appear?

Q: this query statement has been running steadily online for a very long time. Why is there a sudden slow query this time?

Answer: the previous query conditions return results are not empty, limit1 will soon be able to find that data and return the results. This time, the query condition in the code is actually empty, resulting in a scan of all primary key indexes.

Solution

Now that we know why MySQL chose this index, we can list solutions based on the above ideas.

There are two main directions:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Force the specified index

Interference optimizer selection

Force selection index: force index

As I did at the beginning above, we directly use force index to let the statement go to the index we want to go.

Select * from sample_table force index (idx_city_id_type) where (1 = 1) and (city_id = 565) and (type = 13) order by id desc limit 0,1

The advantage of this is that the effect is quick and the problem can be solved immediately.

The disadvantages are also obvious:

High coupling, this kind of statement written in the code, will become difficult to maintain, if the index name changes, or does not have the index, the code will be modified repeatedly. It's hard-coded.

A lot of code encapsulates SQL,force index () in a framework that is not easy to add.

"Let's take a different approach. We'll guide the optimizer to choose the federated index. "

Interference optimizer selection: increase limit

By increasing the limit, we can quickly increase the estimated number of scan rows, such as the following limit 0, 1000

SELECT * FROM sample_table where city_id = 565 and type = 13 order by id desc LIMIT 0Jol 1000

This will lead to the federated index, and then sort, but this forced growth of limit, in fact, there is always a sense of black box-oriented parameter adjustment. Do we have a more beautiful solution?

Interfere with optimizer selection: add a federated index containing order by id fields

We use order by id in this slow query, but we don't add the id field to the federated index, which leads the optimizer to think that we have to sort after the federated index, so we don't want to take the federated index at all.

We can solve this problem by creating a new joint index of city_id,type and id.

This also has some disadvantages, such as my table to 8000W data, the establishment of the index is very time-consuming, and usually the index has 3.4 g, if the unlimited use of the index to solve the problem, it may bring new problems. There should not be too many indexes in the table.

Interfere with optimizer selection: write as a subquery

Is there anything else I can do? We can use a subquery to first go to the joint index of city_id and type in the subquery, get the result set and select the first item in limit1.

However, the use of subqueries is risky, and the use of subqueries is not recommended in a version of DBA. It is recommended that you complete complex queries in the logic of the code. Of course, our sentence is not complicated.

Select * From sample_table Where id in (Select id From `newhome_ db`.`af _ hot_price_ region` where (city_id = 565 and type = 13)) limit 0,1

There are many solutions.

SQL optimization is a big project, we still have a lot of ways to solve this slow query problem, so we won't start one by one here. I leave it for everyone to think about.

Thank you for your reading. The above is the content of "how to solve the online slow query accident caused by the wrong index of 10 million-level data table". After the study of this article, I believe you have a deeper understanding of how to solve the problem of online slow query accident caused by the wrong index of 10 million-level data table, and the specific use still needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.