What is the principle of MySQL subquery? 07/06 Update SLTechnology News&Howtos

What is the principle of MySQL subquery?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what is the principle of MySQL sub-query". In the daily operation, I believe that many people have doubts about the principle of MySQL sub-query. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what is the principle of MySQL sub-query?" Next, please follow the editor to study!

01 preface

The popular explanation of sub-query is that there is another query statement nested in the query statement. I believe that students who come into contact with MySQL in their daily work have understood or used sub-queries, but how is it implemented? How efficient is the query? I'm afraid many people are not clear about these. Let's explore these two issues together.

02 prepare content

Here we need to use three tables, all of which have a primary key index id and an index a, and no index on field b. The stored procedure idata () inserts 100 rows of data into table T1 and 1000 rows into tables T2 and T3. The statement of the table is as follows:

CREATE TABLE `t1` (`id` INT (11) NOT NULL, `t1a` INT (11) DEFAULT NULL, `t1b` INT (11) DEFAULT NULL,PRIMARY KEY (`id`), KEY `idx_ a` (`t1a`) ENGINE = INNODB;CREATE TABLE `t2` (`id` INT (11) NOT NULL, `t2a` INT (11) DEFAULT NULL, `t2b` INT (11) DEFAULT NULL,PRIMARY KEY (`id`), KEY `idx_ a` (`t2a`) ENGINE = INNODB CREATE TABLE `t3` (`id` INT (11) NOT NULL, `t3a` INT (11) DEFAULT NULL, `t3b` INT (11) DEFAULT NULL,PRIMARY KEY (`id`), KEY `idx_ a` (`t3a`) ENGINE = INNODB;-- add 100 pieces of data to T1-- drop procedure idata;delimiter;; create procedure idata () begin declare i int; set iDiver1; while (i explain select * from T1 where T1 limit a = (select t2rooma from T2 limit 1) +-+ | id | select_type | table | type | possible_keys | key | key_len | | ref | rows | Extra | +-+ | 1 | PRIMARY | T1 | ref | | | idx_a | idx_a | 5 | const | 1 | Using where | | 2 | SUBQUERY | T2 | index | | idx_a | 5 | 1000 | Using index | +-+ -+

How it is executed:

First execute the subquery (select T2 query a from T2 limit 1) separately.

Then the outer query select * from T1 where T1 query is executed as the parameter of the outer query.

In other words, for queries that contain unrelated scalar quantum queries or row subqueries, MySQL executes outer queries and subqueries independently, just as two single-table queries.

4.1.2 related subqueries

For example, the following query:

Mysql root@localhost:test > explain select * from T1 where T1 where t1.t1_b=t2.t2_b limit = (select T2 rooma from T2 where t1.t1_b=t2.t2_b limit 1) +-- + | id | select_type | table | type | Possible_keys | key | key_len | ref | rows | Extra | +- -+ | 1 | PRIMARY | T1 | ALL | Using where | 2 | DEPENDENT SUBQUERY | T2 | ALL | 1000 | Using where | + -+

This is how it works:

First get a record from the outer query, in this case, from the T1 table.

Then find the value involved in the subquery from the record obtained in the previous step, that is, find the value of the t1.t1_b column in the T1 table, and then execute the subquery.

Finally, the condition of the outer query WHERE clause is tested according to the query results of the subquery. If so, the record of the outer query is added to the result set, otherwise it will be discarded.

Then repeat the above steps until all the records in T1 are matched.

4.2 in subquery 4.2.1 materialization

If the number of records in the result set of a subquery is very small, it is quite efficient to treat the subquery and the outer query as two separate single-table queries, but if there are too many result sets after executing the subquery separately, this will lead to these problems:

There are so many result sets that it may not fit in memory ~

For outer queries, if the subquery has too many result sets, it means that there are a lot of parameters in the IN clause, which leads to:

1) the index cannot be used effectively, and the outer query can only be scanned by a full table.

2) when performing a full table scan on an outer query, because there are too many parameters in the IN clause, it takes too long to test whether a record matches the parameters in the IN clause.

So instead of directly taking the result set of an unrelated subquery as a parameter of the outer query, the result set is written into a temporary table. The process of writing a temporary table is as follows:

The columns of the temporary table are the columns in the subquery result set.

Records written to the temporary table are deduplicated, making the temporary table smaller and more space-efficient.

In general, when the result set of a subquery is small, a memory-based temporary table using the Memory storage engine is created for it, and a hash index is established for the table.

If the result set of a subquery is very large, exceeding the system variable tmp_table_size or max_heap_table_size, the temporary table will instead use a disk-based storage engine to save the records in the result set, and the index type will be changed to a Btree index.

This process of saving records in a subquery result set to a temporary table is called Materialize. For convenience, we call the temporary table that stores the result set of the subquery a materialized table. Because all the records in the materialized table are indexed (the memory-based materialized table has a hash index and the disk-based one has a B+ tree index), it is very fast to execute IN statements to judge that a certain Operand is not in the subquery result set, thus improving the performance of the subquery statement.

Mysql root@localhost:test > explain select * from T3 where T3 in (select T2 from T2) +-+ | id | | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + +-+ | 1 | SIMPLE | T3 | ALL | idx_a | | 1000 | Using where | | 1 | SIMPLE | | eq_ref | | 5 | test.t3.t3_a | 1 | 2 | MATERIALIZED | T2 | | | index | idx_a | idx_a | 5 | | 1000 | Using index | +-- +-- | -+

In fact, the above query is equivalent to the inner join of table T3 and subquery materialized table:

Mysql root@localhost:test > explain select * from T3 left join T2 on t3.t3_a=t2.t2_a +-+ | id | select_type | table | type | possible_keys | key | key_ Len | ref | rows | Extra | +-- + | 1 | SIMPLE | T3 | | ALL | 1000 | 1 | SIMPLE | T2 | ref | idx_a | idx_a | 5 | test.t3.t3_a | 1 | | +-+ -+

At this point, the MySQL query optimizer uses operations to select a lower-cost solution to execute the query.

Although the in subquery is transformed into a join query by materializing the table above, there is still the cost of establishing a temporary table. Can the subquery be directly converted into a join without materialization? It's definitely not possible to switch directly.

Here we first construct three records, which are also common indexes that are not unique

+-+ | id | T2 rooma | T2 roomb | +-+ | 1100 | 1000 | 1000 | 1101 | 1000 | 1000 | 1102 | 1000 | 1000 | +-add restrictions where t2.id > = 1100 to reduce the mysql root@localhost of data to be displayed : test > select * from T3 where T3 in (select T2 from T2 where t2.id > = 1100) +-+ | id | T3 rooma | T3 roomb | +-+ | 1000 | 1000 | +-+ 1 row in setTime: 0.016smysql root@localhost:test > select * from T3 left join T2 on t3.t3_a=t2.t2_a where t2.id > = 1100 +-+-+ | id | T3 rooma | T3 roomb | id | T2 rooma | T2 roomb | +-+-+ | 1000 | 1100 | 1000 | 1000 | 101 | 1000 | 1101 | 1000 | 1000 | 1101 | 1000 | 1102 | 1000 | 1000 | +-+-+ 3 rows in setTime: 0.018s

So IN subqueries and table joins are not exactly equivalent. What we need is another kind of join called semi-join: for a record in the T3 table, we only care about whether there are matching records in the T2 table, not how many records match it, and only the records of the T3 table are retained in the final result set.

Note: semi-join is only a way to execute subqueries within MySQL, and MySQL does not provide a user-oriented semi-join syntax.

4.2.2 implementation of semi-join:

Table pullout (table pull-up in subquery)

When there is only a primary key or unique index column in the query list of a subquery, you can directly pull up the table in the subquery to the FROM clause of the outer query, and merge the search conditions in the subquery into the search conditions of the outer query, such as this:

Mysql root@localhost:test > select * from T3 where T3 rooma in (select T2 from T2 where t2.id=999) +-+ | id | T3 rooma | T3 roomb | +-+ | 102 | 999 | 102 | +-1 row in setTime: 0.024smysql root@localhost:test > select * from T3 Join t2 on t3.t3_a=t2.t2_a where t2.id=999 +-+-+ | id | t3rooma | t3cubb | id | T2 rooma | T2 roomb | +-+-+ | 102 | 999 | 999 | 999 | +-- -+ 1 row in setTime: 0.028smysql root@localhost:test > explain select * from T3 where T3 rooma in (select T2 rooma from T2 where t2.id=999) +-+- -+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +- -- +-+ | 1 | SIMPLE | T2 | const | PRIMARY Idx_a | PRIMARY | 4 | const | 1 | | 1 | SIMPLE | T3 | ref | idx_a | idx_a | 5 | const | 1 | | + -- +

FirstMatch execution strategy (first match)

FirstMatch is the most primitive semi-join execution method, in the same way as related subqueries, that is, first take the records in an outer query, and then go to the table of the subquery to find records that match the criteria. If one can be found, the records of the outer query will be put into the final result set and stop looking for more matching records. If not, the records of the outer query will be discarded. Then start fetching the record from an outer query and repeat the above process.

Mysql root@localhost:test > explain select * from T3 where T3 in (select T2 from T2 where t2.t2_a=1000) +-+- -+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + +-+ | 1 | SIMPLE | T3 | ref | idx_a | idx_a | 5 | const | 1 | | 1 | SIMPLE | T2 | ref | idx_a | Idx_a | 5 | const | 4 | Using index FirstMatch (T3) | +-+

DuplicateWeedout execution strategy (duplicate value elimination)

After conversion to a semi-join query, a record in the T3 table may have multiple matching records in the T2 table, so the record may be added to the final result set many times. In order to eliminate duplication, we can create a temporary table and set the primary key id. Whenever a record in a T3 table is to be added to the result set, the id value of this record is first added to the temporary table. It shows that the record in the previous T2 table has not been added to the final result set, which is a required result. If the addition fails, the record in the previous S1 table has been added to the final result set and is discarded directly.

LooseScan execution strategy (loose scan)

Although this is a scan index, but only the first entry of the same value of the record to do the matching operation is called loose scanning.

4.2.3 applicable conditions for semi-connection

Of course, not all query statements that contain in subqueries can be converted to semi-join, and only queries like this can be converted to semi-join:

SELECT... FROM outer_tables WHERE expr IN (SELECT... FROM inner_tables.) AND.-- or in this form: SELECT. FROM outer_tables WHERE (oe1, oe2,...) IN (SELECT ie1, ie2,... FROM inner_tables.) AND...

To sum up in words, only subqueries that meet the following criteria can be converted to semi-join:

The subquery must be a Boolean expression consisting of an IN statement and appear in the WHERE or ON clause of the outer query

The outer query can also have other search criteria, except that the search criteria for the IN subquery must be joined using AND

The subquery must be a single query and cannot be in the form of several queries joined by UNION

The subquery cannot contain GROUP BY or HAVING statements or aggregate functions

4.2.4 convert to EXISTS subquery

Regardless of whether the subquery is relevant or irrelevant, you can try to turn the IN subquery into an exists subquery. In fact, any IN subquery can be converted to an EXISTS subquery. The general examples are as follows:

Outer_expr IN (SELECT inner_expr FROM... WHERE subquery_where)-can be converted to: EXISTS (SELECT inner_expr FROM... WHERE subquery_where AND outer_expr=inner_expr)

Of course, there are some special cases in this process, such as when the value of outer_expr or inner_expr is NULL. Because expressions with NULL values as operands tend to result in NULL, for example:

Mysql root@localhost:test > SELECT NULL IN (1,2,3); +-+ | NULL IN (1,2,3) | +-+ | | +-+ 1 row in set

The result of the EXISTS subquery must be TRUE or FASLE. But in reality, most of the scenarios where we use IN subqueries put it in the WHERE or ON clause, while the WHERE or ON clause does not distinguish between NULL and FALSE, for example:

Mysql root@localhost:test > SELECT 1 FROM S1 WHERE NULL;+---+ | 1 | +-- + 0 rows in setTime: 0.016smysql root@localhost:test > SELECT 1 FROM S1 WHERE FALSE;+---+ | 1 | +-+ 0 rows in setTime: 0.033s

So as long as our in subquery is placed in the WHERE or ON clause, the conversion of IN-> EXISTS is fine. After all that has been said, why change it? This is because an index may not be needed without conversion, such as the following query:

Mysql root@localhost:test > explain select * from T3 where T3 rooma in (select T2 from T2 where t2.t2_a > = 999) or T3 roomb > 1000 +-+-+ | id | select_type | Table | type | possible_keys | key | key_len | ref | rows | Extra | +- -+ | 1 | PRIMARY | T3 | ALL | 1000 | Using where | | 2 | SUBQUERY | T2 | range | idx_a | idx_a | 5 | 1000 | Using where Using index | +-

However, after converting it to an EXISTS subquery, you can use the index:

Mysql root@localhost:test > explain select * from T3 where exists (select 1 from T2 where t2.t2_a > = 999 and t2.t2_a=t3.t3_a) or T3 roomb > 1000 +- -+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + -+ | 1 | PRIMARY | T3 | ALL | 1000 | Using where | 2 | DEPENDENT SUBQUERY | T2 | ref | idx_a | idx_a | 5 | test.t3.t3_a | 1 | Using where Using index | +-

It is important to note that if the IN subquery does not meet the conditions for conversion to semi-join, cannot be converted to a materialized table, or is too expensive to convert to a materialized table, then it will be converted to an EXISTS query. Or if the cost of converting to a materialized table is too high, it will be converted to an EXISTS query.

At this point, the study on "what is the principle of MySQL subquery" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.