In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces how to compare and analyze IN and Exists in MySQL query sentences. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
Background introduction
Recently, when writing SQL statements, I was hesitant to choose IN or Exists, so I wrote out the SQL of both methods to compare the execution efficiency. I found that the query efficiency of IN was much higher than that of Exists, so I took it for granted that IN was more efficient than Exists, but in line with the principle of getting to the bottom of the matter, I wanted to know whether this conclusion was applicable to all scenarios and why this result occurred.
Check the relevant information on the Internet, generally can be summarized as: external table is small, internal table is large, suitable for Exists; external table is large, internal table is small, applicable IN. Then I am confused, because inside my SQL statement, there is only 1W-level data on the outside and 30W-level data on the inner table. According to the Internet, Exists will be more efficient than IN, but my result is just the opposite!
"without investigation, there is no right to speak"! So I began to study the actual implementation process of IN and Exists, from a practical point of view, to find the fundamental reasons, so I have this blog post to share.
Experimental data
My experimental data consists of two tables: the t _ author table and the t _ poetry table.
Amount of data in the corresponding table:
T_author table, 13355 records
T_poetry table, 289917 records.
The corresponding table structure is as follows:
CREATE TABLE t_poetry (
Id bigint (20) NOT NULL AUTO_INCREMENT
Poetry_id bigint (20) NOT NULL COMMENT 'poem id'
Poetry_name varchar (200) NOT NULL COMMENT 'Poetry name'
Author_id bigint (20) NOT NULL COMMENT 'author id'
PRIMARY KEY (id)
UNIQUE KEY pid_idx (poetry_id) USING BTREE
KEY aid_idx (author_id) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=291270 DEFAULT CHARSET=utf8mb4
CREATE TABLE t_author (
Id int (15) NOT NULL AUTO_INCREMENT
Author_id bigint (20) NOT NULL
Author_name varchar (32) NOT NULL
Dynasty varchar (16) NOT NULL
Poetry_num int (8) NOT NULL DEFAULT'0'
PRIMARY KEY (id)
UNIQUE KEY authorid_idx (author_id) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=13339 DEFAULT CHARSET=utf8mb4
Execution Plan Analysis of IN execution process
Sql example: select * from tabA where tabA.x in (select x from tabB where y > 0)
Its implementation plan:
(1) execute the subquery of the tabB table to get the result set B, and you can use the index y of the tabB table.
(2) execute the query of tabA table on the condition that tabA.x is in the result set B and the index x of tabA table can be used.
Exists execution process
Sql example: select from tabA where exists (select from tabB where y > 0)
Its implementation plan:
(1) fetch all the records of tabA table first.
(2) for the records of the tabA table row by row, disassociate the tabB table and determine whether the subquery of the tabB table returns data. The version after 5.5 uses Block Nested Loop (Block nesting loop).
(3) if the subquery has returned data, the current tabA record is returned to the result set.
TabA is equivalent to traversing all table data, and tabB can use indexes.
Experimental process
The experiment analyzes the SQL statements of IN and Exists with the same result set.
The SQL statement that contains IN:
Select from t_author ta where author_id in
(select author_id from t_poetry tp where tp.poetry_id > 3650)
The SQL statement that contains Exists:
Select from t_author ta where exists
(select * from t_poetry tp where tp.poetry_id > 3650 and tp.author_id=ta.author_id)
Data of the first experiment
T_author table, 13355 records; t_poetry table, subquery filter result set where poetry_id > 293650 records
Execution result
It takes 0.94S to use exists and 0.03S to use in. The efficiency of IN is higher than that of Exists.
Cause analysis
The subquery result set of the t_poetry table is very small, and both of them can use indexes in the t_poetry table, and the consumption of the t _ poetry subquery is basically the same. The difference is that when using in, the t_author table can use indexes:
When using exists, the t_author table scans the entire table:
When the subquery result set is small, the query time is mainly manifested in traversing the t_author table.
Data of the second experiment
T_author table, 13355 records; t_poetry table, subquery filter result set where poetry_id > 3650, 287838 records
Execution time
It takes 0.12s to use exists and 0.48S to use in.
Cause analysis
The index usage of the two is the same as that of the first experiment, the only difference is that the size of the subquery filtering result set is different, but the experimental results are different from the first experiment. In this case, the subquery result set is very large, so let's take a look at mysql's query plan:
When using in, because the result set of the subquery is very large, both the t_author table and the t _ author table are close to a full table scan, so the effect of the traversal time difference on the overall efficiency of the t_author table can be ignored, and there is one more row in the execution plan. In the case of close to the full table scan, the mysql optimizer chooses auto_key to traverse the t _ author table:
When using exists, the change in the amount of data does not change the execution plan, but due to the large subquery result set, MySQL versions after 5.5 use Block Nested-Loop (Block nested loop, join buffer, similar to caching function) to match query results, especially when the subquery result set is large. Query matching efficiency can be significantly improved:
According to the above two experiments and experimental results, we can clearly understand the implementation process of IN and Exists, and sum up the applicable scenarios of IN and Exists.
IN queries can use indexes on both internal and external tables
Exists queries can only use indexes on internal tables
When the result set of the subquery is large and the external table is small, the role of Block Nested Loop (Block nested Loop) of Exists begins to appear, and makes up for the defect that the external table can not use the index, and the query efficiency will be better than IN.
When the result set of subquery is small and the external table is very large, the optimization effect of Block nested loop of Exists is not obvious, and the advantage of external index of IN plays a major role, so the query efficiency of IN is better than that of Exists.
The statement on the Internet is not accurate. In fact, the "size of the table" does not look at the internal and external tables, but the external tables and subquery result sets.
Finally, and the most important point: there is no absolute truth in the world, to grasp the nature of things, for different scenarios for practical verification is the most reliable and effective method.
Supplement to the problems found in the process of experiment
It is strange to find that the larger the dataset, the less time is consumed when we analyze the above exists statements in different datasets.
The specific query conditions are:
Where tp.poetry_id > 3650, time consuming 0.13s
Where tp.poetry_id > 293650, time-consuming 0.46s
Possible reason: the larger the condition value, the lower the query, the more records to traverse, resulting in the final consumption of more time. This explanation needs to be further verified and then added.
On the MySQL query sentence IN and Exists how to share the comparative analysis here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.