In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Let's talk about whether to choose IN or Exists when querying MySQL statements. The secret of the text lies in being close to the topic. So, no gossip, let's just look at the following, I believe that after reading the query MySQL sentence to choose IN or Exists this article will certainly benefit you.
Experimental data
My experimental data consists of two tables: the t _ author table and the t _ poetry table.
Amount of data in the corresponding table:
T_author table, 13355 records
T_poetry table, 289917 records.
The corresponding table structure is as follows:
CREATE TABLE t_poetry (
Id bigint (20) NOT NULL AUTO_INCREMENT
Poetry_id bigint (20) NOT NULL COMMENT 'poem id'
Poetry_name varchar (200) NOT NULL COMMENT 'Poetry name'
Author_id bigint (20) NOT NULL COMMENT 'author id'
PRIMARY KEY (id)
UNIQUE KEY pid_idx (poetry_id) USING BTREE
KEY aid_idx (author_id) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=291270 DEFAULT CHARSET=utf8mb4
CREATE TABLE t_author (
Id int (15) NOT NULL AUTO_INCREMENT
Author_id bigint (20) NOT NULL
Author_name varchar (32) NOT NULL
Dynasty varchar (16) NOT NULL
Poetry_num int (8) NOT NULL DEFAULT'0'
PRIMARY KEY (id)
UNIQUE KEY authorid_idx (author_id) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=13339 DEFAULT CHARSET=utf8mb4
Execution Plan Analysis of IN execution process
Sql example: select * from tabA where tabA.x in (select x from tabB where y > 0)
Its implementation plan:
(1) execute the subquery of the tabB table to get the result set B, and you can use the index y of the tabB table.
(2) execute the query of tabA table on the condition that tabA.x is in the result set B and the index x of tabA table can be used.
Exists execution process
Sql example: select from tabA where exists (select from tabB where y > 0)
Its implementation plan:
(1) fetch all the records of tabA table first.
(2) for the records of the tabA table row by row, disassociate the tabB table and determine whether the subquery of the tabB table returns data. The version after 5.5 uses Block Nested Loop (Block nesting loop).
(3) if the subquery has returned data, the current tabA record is returned to the result set.
TabA is equivalent to traversing all table data, and tabB can use indexes.
Experimental process
The experiment analyzes the SQL statements of IN and Exists with the same result set.
The SQL statement that contains IN:
Select from t_author ta where author_id in
(select author_id from t_poetry tp where tp.poetry_id > 3650)
The SQL statement that contains Exists:
Select from t_author ta where exists
(select * from t_poetry tp where tp.poetry_id > 3650 and tp.author_id=ta.author_id)
Data of the first experiment
T_author table, 13355 records; t_poetry table, subquery filter result set where poetry_id > 293650 records
Execution result
It takes 0.94S to use exists and 0.03S to use in. The efficiency of IN is higher than that of Exists.
Cause analysis
The subquery result set of the t_poetry table is very small, and both of them can use indexes in the t_poetry table, and the consumption of the t _ poetry subquery is basically the same. The difference is that when using in, the t_author table can use indexes:
When using exists, the t_author table scans the entire table:
When the subquery result set is small, the query time is mainly manifested in traversing the t_author table.
Data of the second experiment
T_author table, 13355 records; t_poetry table, subquery filter result set where poetry_id > 3650, 287838 records
Execution time
It takes 0.12s to use exists and 0.48S to use in.
Cause analysis
The index usage of the two is the same as that of the first experiment, the only difference is that the size of the subquery filtering result set is different, but the experimental results are different from the first experiment. In this case, the subquery result set is very large, so let's take a look at mysql's query plan:
When using in, because the result set of the subquery is very large, both the t_author table and the t _ author table are close to a full table scan, so the effect of the traversal time difference on the overall efficiency of the t_author table can be ignored, and there is one more row in the execution plan. In the case of close to the full table scan, the mysql optimizer chooses auto_key to traverse the t _ author table:
When using exists, the change in the amount of data does not change the execution plan, but due to the large subquery result set, MySQL versions after 5.5 use Block Nested-Loop (Block nested loop, join buffer, similar to caching function) to match query results, especially when the subquery result set is large. Query matching efficiency can be significantly improved:
Experimental conclusion
According to the above two experiments and experimental results, we can clearly understand the implementation process of IN and Exists, and sum up the applicable scenarios of IN and Exists.
IN query can use index on both internal table and external table; Exists query can only use index on internal table; when the result set of subquery is large, but the external table is small, the function of Block Nested Loop (Block nested loop) of Exists begins to appear, and make up for the defect that the external table can not use index, the query efficiency will be better than IN. When the result set of subquery is small and the external table is very large, the optimization effect of Block nested loop of Exists is not obvious, and the advantage of external index of IN plays a major role, so the query efficiency of IN is better than that of Exists. The statement on the Internet is not accurate. In fact, the "size of the table" does not look at the internal and external tables, but the external tables and subquery result sets. Finally, and the most important point: there is no absolute truth in the world, to grasp the nature of things, for different scenarios for practical verification is the most reliable and effective method. Supplement to the problems found in the process of experiment
It is strange to find that the larger the dataset, the less time is consumed when we analyze the above exists statements in different datasets.
The specific query conditions are:
Where tp.poetry_id > 3650, time consuming 0.13s
Where tp.poetry_id > 293650, time-consuming 0.46s
Possible reason: the larger the condition value, the lower the query, the more records to traverse, resulting in the final consumption of more time. This explanation needs to be further verified and then added.
For the above query MySQL statement to choose IN or Exists-related content, is there anything you don't understand? Or if you want to know more about it, you can continue to follow our industry information section.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.