In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
How to analyze the implementation efficiency of exists and in in Oracle, many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can get something.
In refers to the outer and inner tables as hash join, while exists refers to the external tables as loop. Each time loop queries the inner table. It is generally believed that exists is more efficient than in statements, but this statement is actually not accurate, and it is necessary to distinguish between environments.
Exists uses loop to query the appearance one by one, and each query will check the conditional statement of exists. When the conditional statement in exists can return record rows (no matter how many record rows are, as long as it can), the condition is true and the record to which the current loop goes is returned. On the contrary, if the conditional statement in exists cannot return record rows, the record to which the current loop arrives is discarded, and the condition of exists is like a bool condition. True when the result set can be returned, false if the result set cannot be returned.
For example:
Select * from user where exists (select 1)
The records of the user table are taken out one by one. Since select 1 in the subcondition can always return record rows, then all records of the user table will be added to the result set, so it is the same as select * from user;.
Again as follows
Select * from user where exists (select * from user where userId = 0)
You can know that when loop the user table, check the conditional statement (select * from user where userId = 0). Because the userId is never 0, the conditional statement always returns an empty set, and the condition is always false, then all records of the user table will be discarded.
Not exists is the opposite of exists, that is, when the exists condition has a result set returned, the record of loop will be discarded, otherwise the record of loop will be added to the result set.
In general, if table A has n records, then the exists query is to take out the n records one by one, and then judge the n-pass exists condition
The in query is equivalent to the superposition of multiple or conditions, which is easy to understand, such as the following query
Select * from user where userId in (1,2,3)
Equivalent to
Select * from user where userId = 1 or userId = 2 or userId = 3
Not in is the opposite of in, as follows
Select * from user where userId not in (1,2,3)
Equivalent to
Select * from user where userId! = 1 and userId! = 2 and userId! = 3
Generally speaking, the in query is to first find out all the records of the subquery condition, assuming that the result set is B, with a total of m records, and then decompose the result set of the subquery condition into m, and then make m queries.
It is worth mentioning that the subcondition of an in query must return a result with only one field, such as
Select * from user where userId in (select id from B)
And can't be
Select * from user where userId in (select id, age from B)
Exists does not have this restriction.
Consider the performance of exists and in:
For the above two cases, in traverses the comparison in memory, while exists needs to query the database, so when the amount of B-table data is large, exists is more efficient than in.
Consider the following SQL statement
Select * from A where exists (select * from B where B.id = A.id)
Select * from A where A.id in (select id from B)
1. Select * from A where exists (select * from B where B.id = A.id)
Exists () executes A.length times, it does not cache the exists () result set, because the content of the exists () result set is not important, the important thing is that the result set of the query statement in it is empty or non-empty, empty returns false, non-empty returns true.
Its query process is similar to the following:
For ($I = 0; $I
< count(A); $i++) { $a = get_record(A, $i); #从A表逐条获取记录 if (B.id = $a[id]) #如果子条件成立 $result[] = $a; } return $result; 当B表比A表数据大时适合使用exists(),因为它没有那么多遍历操作,只需要再执行一次查询就行。 如:A表有10000条记录,B表有1000000条记录,那么exists()会执行10000次去判断A表中的id是否与B表中的id相等。 如:A表有10000条记录,B表有100000000条记录,那么exists()还是执行10000次,因为它只执行A.length次,可见B表数据越多,越适合exists()发挥效果。 再如:A表有10000条记录,B表有100条记录,那么exists()还是执行10000次,还不如使用in()遍历10000*100次,因为in()是在内存里遍历比较,而exists()需要查询数据库,我们都知道查询数据库所消耗的性能更高,而内存比较很快。 结论:exists()适合B表比A表数据大的情况 2、select * from A where id in (select id from B); in()只执行一次,它查出B表中的所有id字段并缓存起来。之后,检查A表的id是否与B表中的id相等,如果相等则将A表的记录加入结果集中,直到遍历完A表的所有记录。 它的查询过程类似于以下过程: Array A=(select * from A); Array B=(select id from B); for(int i=0;icreate index idx_table_name on wjq2(table_name); Index created. SEIANG@seiang11g>Select count (*) from wjq1
COUNT (*)
-
86976
SEIANG@seiang11g > select count (*) from wjq2
COUNT (*)
-
2868
1. What happens when the inner query result set is relatively small and the outer query is larger. 1. In CBO mode:
SEIANG@seiang11g > select * from wjq1 where object_name in (select table_name from wjq2 where table_name like'M%')
815 rows selected.
Execution Plan
Plan hash value: 1638414738
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 1238 | 270K | 354 (1) | 00:00:05 |
| | * 1 | HASH JOIN RIGHT SEMI | | 1238 | 270K | 354 (1) | 00:00:05 |
| | * 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 772 | 13124 | 7 (0) | 00:00:01 |
| | * 3 | TABLE ACCESS FULL | WJQ1 | 5503 | 1112K | 347 (1) | 00:00:05 |
-
Predicate Information (identified by operation id):
1-access ("OBJECT_NAME" = "TABLE_NAME")
2-access ("TABLE_NAME" LIKE'M%')
Filter ("TABLE_NAME" LIKE'M%')
3-filter ("OBJECT_NAME" LIKE'M%')
Note
-
-dynamic sampling used for this statement (level=2)
Statistics
17 recursive calls
0 db block gets
1462 consistent gets
1256 physical reads
0 redo size
46140 bytes sent via SQL*Net to client
1117 bytes received via SQL*Net from client
56 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
815 rows processed
SEIANG@seiang11g > select * from wjq1 where exists (select 1 from wjq2 where wjq1.object_name=wjq2.table_name and wjq2.table_name like'M%')
815 rows selected.
Execution Plan
Plan hash value: 1638414738
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 1238 | 270K | 354 (1) | 00:00:05 |
| | * 1 | HASH JOIN RIGHT SEMI | | 1238 | 270K | 354 (1) | 00:00:05 |
| | * 2 | INDEX RANGE SCAN | IDX_TABLE_NAME | 772 | 13124 | 7 (0) | 00:00:01 |
| | * 3 | TABLE ACCESS FULL | WJQ1 | 5503 | 1112K | 347 (1) | 00:00:05 |
-
Predicate Information (identified by operation id):
1-access ("WJQ1". "OBJECT_NAME" = "WJQ2". "TABLE_NAME")
2-access ("WJQ2". "TABLE_NAME" LIKE'M%')
Filter ("WJQ2". "TABLE_NAME" LIKE'M%')
3-filter ("WJQ1". "OBJECT_NAME" LIKE'M%')
Note
-
-dynamic sampling used for this statement (level=2)
Statistics
13 recursive calls
0 db block gets
1462 consistent gets
1242 physical reads
0 redo size
46140 bytes sent via SQL*Net to client
1117 bytes received via SQL*Net from client
56 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
815 rows processed
Through the comparison of the above implementation plan, it is found that:
In CBO mode, we can see that the execution plans of the two are exactly the same, and the statistics are the same.
Let's take a look at the situation in RBO mode, which is a bit more complicated.
2. In RBO mode:
SEIANG@seiang11g > select / * + rule*/ * from wjq1 where object_name in (select table_name from wjq2 where table_name like'M%')
815 rows selected.
Elapsed: 00:00:00.01
Execution Plan
Plan hash value: 144941173
| | Id | Operation | Name | |
| | 0 | SELECT STATEMENT |
| | 1 | NESTED LOOPS |
| | 2 | NESTED LOOPS |
| | 3 | VIEW | VW_NSO_1 |
| | 4 | SORT UNIQUE |
| | * 5 | INDEX RANGE SCAN | IDX_TABLE_NAME |
| | * 6 | INDEX RANGE SCAN | IDX_OBJECT_NAME |
| | 7 | TABLE ACCESS BY INDEX ROWID | WJQ1 |
Predicate Information (identified by operation id):
5-access ("TABLE_NAME" LIKE'M%')
Filter ("TABLE_NAME" LIKE'M%')
6-access ("OBJECT_NAME" = "TABLE_NAME")
Note
-
-rule based optimizer used (consider using cbo)
Statistics
0 recursive calls
0 db block gets
698 consistent gets
0 physical reads
0 redo size
55187 bytes sent via SQL*Net to client
1117 bytes received via SQL*Net from client
56 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
815 rows processed
SEIANG@seiang11g > select / * + rule*/ * from wjq1 where exists (select 1 from wjq2 where wjq1.object_name=wjq2.table_name and wjq2.table_name like'M%')
815 rows selected.
Elapsed: 00:00:00.15
Execution Plan
Plan hash value: 3545670754
-
| | Id | Operation | Name | |
-
| | 0 | SELECT STATEMENT |
| | * 1 | FILTER |
| | 2 | TABLE ACCESS FULL | WJQ1 |
| | * 3 | INDEX RANGE SCAN | IDX_TABLE_NAME |
-
Predicate Information (identified by operation id):
1-filter (EXISTS (SELECT 0 FROM "WJQ2"WJQ2" WHERE)
"WJQ2". "TABLE_NAME" =: B1 AND "WJQ2". "TABLE_NAME" LIKE'M%'))
3-access ("WJQ2". "TABLE_NAME" =: B1)
Filter ("WJQ2". "TABLE_NAME" LIKE'M%')
Note
-
-rule based optimizer used (consider using cbo)
Statistics
0 recursive calls
0 db block gets
91002 consistent gets
1242 physical reads
0 redo size
46140 bytes sent via SQL*Net to client
1117 bytes received via SQL*Net from client
56 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
815 rows processed
Through the comparison of the above two implementation plans, it is found that:
Here, we can see that, in fact, using in is more efficient than exists. We can understand the situation like this:
The results of the memory query selected by the in,RBO optimizer are used as the driving table for nest loops join, so when the result set of the memory query is relatively small, the efficiency of this in is relatively high.
For the exists,RBO optimizer, the full table scan result set of the external query table is used to filter the result set of the inner query. When the outer query table is larger, the relative efficiency is relatively low.
2. What happens when the inner query result set is large and the outer query is small. 1. In CBO mode:
SEIANG@seiang11g > select * from wjq2 where table_name in (select object_name from wjq1 where object_name like's%')
278 rows selected.
Elapsed: 00:00:00.03
Execution Plan
Plan hash value: 1807911610
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 278 | 164k | 55 (0) | 00:00:01 |
| | * 1 | HASH JOIN SEMI | | 278 | 164k | 55 (0) | 00:00:01 |
| | * 2 | TABLE ACCESS FULL | WJQ2 | 278 | 146k | 31 (0) | 00:00:01 |
| | * 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 4435 | 285k | 24 (0) | 00:00:01 |
-
Predicate Information (identified by operation id):
1-access ("TABLE_NAME" = "OBJECT_NAME")
2-filter ("TABLE_NAME" LIKE'S%')
3-access ("OBJECT_NAME" LIKE'S%')
Filter ("OBJECT_NAME" LIKE'S%')
Note
-
-dynamic sampling used for this statement (level=2)
Statistics
67 recursive calls
0 db block gets
403 consistent gets
446 physical reads
0 redo size
22852 bytes sent via SQL*Net to client
721 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
278 rows processed
SEIANG@seiang11g >
SEIANG@seiang11g > select * from wjq2 where exists (select 1 from wjq1 where wjq1.object_name=wjq2.table_name and wjq1.object_name like'S%')
278 rows selected.
Elapsed: 00:00:00.02
Execution Plan
Plan hash value: 1807911610
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 278 | 164k | 55 (0) | 00:00:01 |
| | * 1 | HASH JOIN SEMI | | 278 | 164k | 55 (0) | 00:00:01 |
| | * 2 | TABLE ACCESS FULL | WJQ2 | 278 | 146k | 31 (0) | 00:00:01 |
| | * 3 | INDEX RANGE SCAN | IDX_OBJECT_NAME | 4435 | 285k | 24 (0) | 00:00:01 |
-
Predicate Information (identified by operation id):
1-access ("WJQ1". "OBJECT_NAME" = "WJQ2". "TABLE_NAME")
2-filter ("WJQ2". "TABLE_NAME" LIKE'S%')
3-access ("WJQ1". "OBJECT_NAME" LIKE'S%')
Filter ("WJQ1". "OBJECT_NAME" LIKE'S%')
Note
-
-dynamic sampling used for this statement (level=2)
Statistics
13 recursive calls
0 db block gets
295 consistent gets
2 physical reads
0 redo size
22852 bytes sent via SQL*Net to client
721 bytes received via SQL*Net from client
20 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
278 rows processed
Through the comparison of the above two implementation plans, it is found that:
Although they have the same execution plan, using exists is significantly smaller than physical and logical reads using in, so using exists is a little more efficient.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.