Case Analysis of MySQL not exists Indexes 04/26 Update SLTechnology News&Howtos

Case Analysis of MySQL not exists Indexes

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article introduces you the case analysis of MySQL not exists index, the content is very detailed, interested friends can refer to, hope to be helpful to you.

In some business scenarios, NOT EXISTS statements are used to ensure that the returned data does not exist in a specific collection. Some colleagues will find that the performance of NOT EXISTS is poor in some scenarios, and there is even a rumor on the Internet that "NOT EXISTS does not walk the index". How can we optimize the NOT EXISTS statement?

Take today's optimized SQL as an example, the SQL before optimization is:

SELECT count (1) FROM t_monitor mWHERE NOT exists (SELECT 1 FROM t_alarm_realtime AS a WHERE a.resource_id=m.resource_id AND a.resource_type=m.resource_type AND a.monitor_name=m.monitor_name)

We use LEFT JOIN for optimization, and the optimized SQL is as follows:

SELECT count (1) FROM t_monitor mLEFT JOIN t_alarm_realtime AS an ON a.resource_id=m.resource_id AND a.resource_type=m.resource_type AND a.monitor_name=m.monitor_nameWHERE a.resource_id is NULL

Optimization effect:

The execution time is more than 29 seconds before optimization and 1.2 seconds after optimization, and the optimization is increased by 25 times.

Is it true that NOT EXISTS does not take the index?

Check out the execution plans for both SQL!

Execution plan using the NOT EXIST approach:

Execution plan using the LEFT JOIN approach:

In terms of the execution plan, both tables use indexes, with the difference being that NOT EXISTS uses the "DEPENDENT SUBQUERY" approach, while LEFT JOIN uses a normal table association.

It is recommended to take a look at this: why can indexes improve query speed?

View the execution process of the two modes through the Profiling method provided by MySQL.

The execution process using NOT EXIST:

The execution process using LEFT JOIN:

From the point of view of the execution process, the LEFT JOIN mode is mainly consumed in Sending data (1.2s), while the NOT EXISTS mode is mainly consumed in executeing and Sending data, which is limited by the fact that Profiling only stores 100 rows of records.

From the Profiling, you can only see 47 "executeing and Sending data" combination items (each combination is about 50us). Through the execution plan, we can see that the amount of data of the exterior t_monitor is 578436 rows. Ignoring the inaccuracy of statistical information, using the NOT EXISTS method should produce 578436 "executeing and Sending data" combination items, with a total consumption time of 50 μ s * 578436=28921800us=28.92s.

It can be inferred from the above execution process:

The performance of execution using NOT EXISTS heavily depends on the number of times the NOT exists subquery is executed, that is, the amount of data in the outer query result set.

When the amount of data N of the outer query result set is small, the execution performance is better. If the execution time of Niss10 is 50 μ s * 10=500us=0.005s, coupled with some extra consumption, the execution result can also be in the range of 0.01s or 10ms. This response time should be accepted by most applications.

When the amount of data N of the outer program result set is large or even tens of millions of data, the query performance of NOT EXISTS will become very poor, and even consume a lot of server IO and CPU resources, thus affecting the normal operation of other businesses.

In addition to the above problems, in the process of optimization, it is found that the resource_id columns that should store the same data are defined differently in two tables, one table is VARCHAR and the other table is BIGINT. The difference between the field types of the external result set and the field types in the NOT EXIST word table leads to the inability to use indexes in the NOT exists subquery, resulting in poor performance of the subquery and ultimately affecting the execution performance of the whole query.

There have been a large number of similar cases in JD.com Mall. Some tables use VARCHAR to store the order number, while others use BIGINT to store. The performance of the two tables is very poor when managing the two tables, which I hope colleagues in R & D will take as a warning. Follow the official account Java technology stack and reply to M36 to get a copy of the MySQL R & D military rules.

The case analysis of MySQL not exists indexing is shared here. I hope the above content can be helpful to everyone and learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.