How to analyze the execution efficiency of exists and in in Oracle 07/06 Update SLTechnology News&Howtos

How to analyze the execution efficiency of exists and in in Oracle

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

How to analyze the implementation efficiency of exists and in in Oracle, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.

In refers to the outer and inner tables as hash join, while exists refers to the external tables as loop. Each time loop queries the inner table. It is generally believed that exists is more efficient than in statements, but this statement is actually not accurate, and it is necessary to distinguish between environments.

Exists uses loop to query the appearance one by one, and each query will check the conditional statement of exists. When the conditional statement in exists can return record rows (no matter how many record rows are, as long as it can), the condition is true and the record to which the current loop goes is returned. On the contrary, if the conditional statement in exists cannot return record rows, the record to which the current loop arrives is discarded, and the condition of exists is like a bool condition. True when the result set can be returned, false if the result set cannot be returned.

For example:

Select * from user where exists (select 1)

The records of the user table are taken out one by one. Since select 1 in the subcondition can always return record rows, then all records of the user table will be added to the result set, so it is the same as select * from user;.

Again as follows

Select * from user where exists (select * from user where userId = 0)

You can know that when loop the user table, check the conditional statement (select * from user where userId = 0). Because the userId is never 0, the conditional statement always returns an empty set, and the condition is always false, then all records of the user table will be discarded.

Not exists is the opposite of exists, that is, when the exists condition has a result set returned, the record of loop will be discarded, otherwise the record of loop will be added to the result set.

In general, if table A has n records, then the exists query is to take out the n records one by one, and then judge the n-pass exists condition

The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query

Select * from user where userId in (1,2,3)

Equivalent to

Select * from user where userId = 1 or userId = 2 or userId = 3

Not in is the opposite of in, as follows

Select * from user where userId not in (1,2,3)

Equivalent to

Select * from user where userId! = 1 and userId! = 2 and userId! = 3

Generally speaking, the in query is to first find out all the records of the subquery condition, assuming that the result set is B, with a total of m records, and then decompose the result set of the subquery condition into m, and then make m queries.

It is worth mentioning that the subcondition of an in query must return a result with only one field, such as

Select * from user where userId in (select id from B)

And can't be

Select * from user where userId in (select id, age from B)

Exists does not have this restriction.

Consider the performance of exists and in:

For the above two cases, in traverses the comparison in memory, while exists needs to query the database, so when the amount of B-table data is large, exists is more efficient than in.

Consider the following SQL statement

Select * from A where exists (select * from B where B.id = A.id)

Select * from A where A.id in (select id from B)

1. Select * from A where exists (select * from B where B.id = A.id)

Exists () executes A.length times, it does not cache the exists () result set, because the content of the exists () result set is not important, the important thing is that the result set of the query statement in it is empty or non-empty, empty returns false, non-empty returns true.

Its query process is similar to the following:

For ($I = 0; $I

< count(A); $i++) { 　　$a = get_record(A, $i); #从A表逐条获取记录　　if (B.id = $a[id]) #如果子条件成立　　　　$result[] = $a; } return $result; 当B表比A表数据大时适合使用exists()，因为它没有那么多遍历操作，只需要再执行一次查询就行。如：A表有10000条记录，B表有1000000条记录，那么exists()会执行10000次去判断A表中的id是否与B表中的id相等。如：A表有10000条记录，B表有100000000条记录，那么exists()还是执行10000次，因为它只执行A.length次，可见B表数据越多，越适合exists()发挥效果。再如：A表有10000条记录，B表有100条记录，那么exists()还是执行10000次，还不如使用in()遍历10000*100次，因为in()是在内存里遍历比较，而exists()需要查询数据库，我们都知道查询数据库所消耗的性能更高，而内存比较很快。结论：exists()适合B表比A表数据大的情况 2、select * from A where id in (select id from B); in()只执行一次，它查出B表中的所有id字段并缓存起来。之后，检查A表的id是否与B表中的id相等，如果相等则将A表的记录加入结果集中，直到遍历完A表的所有记录。它的查询过程类似于以下过程： Array A=(select * from A); Array B=(select id from B); for(int i=0;icreate index idx_table_name on wjq2(table_name); Index created. SEIANG@seiang11g>

Select count (*) from wjq1

COUNT (*)

86976

SEIANG@seiang11g > select count (*) from wjq2

COUNT (*)

2868

1. What happens when the inner query result set is relatively small and the outer query is larger. 1. In CBO mode:

SEIANG@seiang11g > select * from wjq1 where object_name in (select table_name from wjq2 where table_name like'M%')

815 rows selected.

Execution Plan

Plan hash value: 1638414738

| | 0 | SELECT STATEMENT | | 1238 | 270K | 354 (1) | 00:00:05 |

| | * 1 | HASH JOIN RIGHT SEMI | | 1238 | 270K | 354 (1) | 00:00:05 |

| | * 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 772 | 13124 | 7 (0) | 00:00:01 |

| | * 3 | TABLE ACCESS FULL | WJQ1 | 5503 | 1112K | 347 (1) | 00:00:05 |

Predicate Information (identified by operation id):

1-access ("OBJECT_NAME" = "TABLE_NAME")

2-access ("TABLE_NAME" LIKE'M%')

Filter ("TABLE_NAME" LIKE'M%')

3-filter ("OBJECT_NAME" LIKE'M%')

Note

-dynamic sampling used for this statement (level=2)

Statistics

17 recursive calls

0 db block gets

1462 consistent gets

1256 physical reads

0 redo size

46140 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

815 rows processed

SEIANG@seiang11g > select * from wjq1 where exists (select 1 from wjq2 where wjq1.object_name=wjq2.table_name and wjq2.table_name like'M%')

815 rows selected.

Execution Plan

Plan hash value: 1638414738

| | 0 | SELECT STATEMENT | | 1238 | 270K | 354 (1) | 00:00:05 |

| | * 1 | HASH JOIN RIGHT SEMI | | 1238 | 270K | 354 (1) | 00:00:05 |

| | * 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 772 | 13124 | 7 (0) | 00:00:01 |

| | * 3 | TABLE ACCESS FULL | WJQ1 | 5503 | 1112K | 347 (1) | 00:00:05 |

Predicate Information (identified by operation id):

1-access ("WJQ1". "OBJECT_NAME" = "WJQ2". "TABLE_NAME")

2-access ("WJQ2". "TABLE_NAME" LIKE'M%')

Filter ("WJQ2". "TABLE_NAME" LIKE'M%')

3-filter ("WJQ1". "OBJECT_NAME" LIKE'M%')

Note

-dynamic sampling used for this statement (level=2)

Statistics

13 recursive calls

0 db block gets

1462 consistent gets

1242 physical reads

0 redo size

46140 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

815 rows processed

Through the comparison of the above implementation plan, it is found that:

In CBO mode, we can see that the execution plans of the two are exactly the same, and the statistics are the same.

Let's take a look at the situation in RBO mode, which is a bit more complicated.

2. In RBO mode:

SEIANG@seiang11g > select / * + rule*/ * from wjq1 where object_name in (select table_name from wjq2 where table_name like'M%')

815 rows selected.

Elapsed: 00:00:00.01

Execution Plan

Plan hash value: 144941173

| | 0 | SELECT STATEMENT |

| | 1 | NESTED LOOPS |

| | 2 | NESTED LOOPS |

| | 4 | SORT UNIQUE |

| | 7 | TABLE ACCESS BY INDEX ROWID | WJQ1 |

Predicate Information (identified by operation id):

5-access ("TABLE_NAME" LIKE'M%')

Filter ("TABLE_NAME" LIKE'M%')

6-access ("OBJECT_NAME" = "TABLE_NAME")

Note

-rule based optimizer used (consider using cbo)

Statistics

0 recursive calls

0 db block gets

698 consistent gets

0 physical reads

0 redo size

55187 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

815 rows processed

SEIANG@seiang11g > select / * + rule*/ * from wjq1 where exists (select 1 from wjq2 where wjq1.object_name=wjq2.table_name and wjq2.table_name like'M%')

815 rows selected.

Elapsed: 00:00:00.15

Execution Plan

Plan hash value: 3545670754

| | 0 | SELECT STATEMENT |

| | * 1 | FILTER |

| | 2 | TABLE ACCESS FULL | WJQ1 |

Predicate Information (identified by operation id):

1-filter (EXISTS (SELECT 0 FROM "WJQ2"WJQ2" WHERE)

"WJQ2". "TABLE_NAME" =: B1 AND "WJQ2". "TABLE_NAME" LIKE'M%'))

3-access ("WJQ2". "TABLE_NAME" =: B1)

Filter ("WJQ2". "TABLE_NAME" LIKE'M%')

Note

-rule based optimizer used (consider using cbo)

Statistics

0 recursive calls

0 db block gets

91002 consistent gets

1242 physical reads

0 redo size

46140 bytes sent via SQL*Net to client

1117 bytes received via SQL*Net from client

56 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

815 rows processed

Through the comparison of the above two implementation plans, it is found that:

Here, we can see that, in fact, using in is more efficient than exists. We can understand the situation like this:

The results of the memory query selected by the in,RBO optimizer are used as the driving table for nest loops join, so when the result set of the memory query is relatively small, the efficiency of this in is relatively high.

For the exists,RBO optimizer, the full table scan result set of the external query table is used to filter the result set of the inner query. When the outer query table is larger, the relative efficiency is relatively low.

2. What happens when the inner query result set is large and the outer query is small. 1. In CBO mode:

SEIANG@seiang11g > select * from wjq2 where table_name in (select object_name from wjq1 where object_name like's%')

278 rows selected.

Elapsed: 00:00:00.03

Execution Plan

Plan hash value: 1807911610

| | 0 | SELECT STATEMENT | | 278 | 164k | 55 (0) | 00:00:01 |

| | * 1 | HASH JOIN SEMI | | 278 | 164k | 55 (0) | 00:00:01 |

| | * 2 | TABLE ACCESS FULL | WJQ2 | 278 | 146k | 31 (0) | 00:00:01 |

| | * 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 4435 | 285k | 24 (0) | 00:00:01 |

Predicate Information (identified by operation id):

1-access ("TABLE_NAME" = "OBJECT_NAME")

2-filter ("TABLE_NAME" LIKE'S%')

3-access ("OBJECT_NAME" LIKE'S%')

Filter ("OBJECT_NAME" LIKE'S%')

Note

-dynamic sampling used for this statement (level=2)

Statistics

67 recursive calls

0 db block gets

403 consistent gets

446 physical reads

0 redo size

22852 bytes sent via SQL*Net to client

721 bytes received via SQL*Net from client

20 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

278 rows processed

SEIANG@seiang11g >

SEIANG@seiang11g > select * from wjq2 where exists (select 1 from wjq1 where wjq1.object_name=wjq2.table_name and wjq1.object_name like'S%')

278 rows selected.

Elapsed: 00:00:00.02

Execution Plan

Plan hash value: 1807911610

| | 0 | SELECT STATEMENT | | 278 | 164k | 55 (0) | 00:00:01 |

| | * 1 | HASH JOIN SEMI | | 278 | 164k | 55 (0) | 00:00:01 |

| | * 2 | TABLE ACCESS FULL | WJQ2 | 278 | 146k | 31 (0) | 00:00:01 |

| | * 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 4435 | 285k | 24 (0) | 00:00:01 |

Predicate Information (identified by operation id):

1-access ("WJQ1". "OBJECT_NAME" = "WJQ2". "TABLE_NAME")

2-filter ("WJQ2". "TABLE_NAME" LIKE'S%')

3-access ("WJQ1". "OBJECT_NAME" LIKE'S%')

Filter ("WJQ1". "OBJECT_NAME" LIKE'S%')

Note

-dynamic sampling used for this statement (level=2)

Statistics

13 recursive calls

0 db block gets

295 consistent gets

2 physical reads

0 redo size

22852 bytes sent via SQL*Net to client

721 bytes received via SQL*Net from client

20 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

278 rows processed

Through the comparison of the above two implementation plans, it is found that:

Although they have the same execution plan, using exists is significantly smaller than physical and logical reads using in, so using exists is a little more efficient.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.