In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
In Oracle, determining the type of join operation is an important aspect of execution plan generation. Various connection operation types represent different connection operation algorithms, and different connection operation types are also suitable for different data volume and data distribution.
Both Nest Loop Join (nested Loop) and Merge Sort Join (merge sort join) are classical join methods adapted to different special situations. Although Nest Loop Join algorithm can use join column index, the cost of random reading is too large. Although Merge Sort Join can reduce the situation of random reading, the large-scale Sort operation brings too much pressure on memory and Temp space. When dealing with massive data, the two algorithms are not acceptable join algorithms if they are massive random reading or massive sorting. In this article, we introduce a commonly used connection method, Hash Join connection.
1. Hash Join (Hash join) principle
Starting with Oracle 7. 3, Hash Join officially enters the optimizer to perform plan generation, and only CBO can use Hash Join operations. In essence, Hash Join connection is an algorithm that uses Hash algorithm to connect small-scale Nest Loop Join and uses memory space for high-speed data cache retrieval.
Let's introduce the steps of Hash Join algorithm step by step:
I. Hash Join connection object is still two data tables, first select one of the "small tables". The small table here means that the amount of data set involved in the join operation is small. Perform a Hash function operation on all data values of the connection column field. Hash function is a kind of processing function which is often used in computer science, and the fast search algorithm using Hash value has been regarded as a mature retrieval method. The characteristic of the data processed by Hash function is that "the value of Hash function with the same data value must be the same, and the value of Hash function with different data value may be the same"
ii. The concatenated columns of small tables processed by Hash are stored in Oracle PGA space together with the data. There is a space in PGA called hash_area, which is dedicated to storing this kind of data. Moreover, the Bucket operation is divided according to different Hash function values. Each Bucket includes all small table data with the same hash function value. At the same time, the corresponding bitmap of Hash key value is established.
iii. After that, the data connection columns of the large table for Hash connection are read in turn, and each Hash value is matched by Bucket to locate to the appropriate Bucket (apply Hash retrieval algorithm)
iv. In the located Bucket, make a small-scale exact match. Because the scope at this time has been narrowed, the success rate of matching is high. At the same time, the matching operation is performed in memory, which is much faster than that of Merge Sort Join.
Here is an execution plan for Hash Join.
PLAN_TABLE_OUTPUT
Plan hash value: 779051904
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
| | 0 | SELECT STATEMENT | | 2617 | 572k | 142 (1) | 00:00:02 |
| | * 1 | HASH JOIN | | 2617 | 572k | 142 (1) | 00:00:02 |
| | 2 | TABLE ACCESS FULL | SEGS | 2503 | 312K | 16 (0) | 00:00:01 |
| | 3 | TABLE ACCESS FULL | OBJTS | 31083 | 2914K | 126 (1) | 00:00:02 |
Predicate Information (identified by operation id):
1-access ("SEGS". "SEGMENT_NAME" = "OBJTS". "OBJECT_NAME")
From the point of view of the principle process, there is a certain similarity between Hash Join and Nest Loop Join/Merge Sort Join.
First of all, Hash Join, like Nest Loop Join, performs some nested loop matching operations, but the difference is that the range of random reads for matching is limited. Full-scale random readings are not performed directly and frequently as Nest Loop Join does.
Secondly, Hash Join is similar to the previously introduced Merge Sort Join, which uses the space of PGA to operate independently. The Bucket in Hash Join is stored in the PGA in memory, and there is a special Hash_Area for this operation. To choose a small table as a driver join table is to try to fully load the small table data in PGA memory and try not to use Temp tablespaces. In this way, the speed of Hash matching and exact matching is guaranteed.
Finally, the scenarios used by Hash Join are limited. One of the largest is that connection operations can only use "=" connections. Because the process of Hash matching can only support equality operations. In addition, the data distribution of join columns should be as uniform as possible, so that the resulting Bucket will be as uniform as possible. Only in this way can the speed of the match be limited. If the data column distribution deviation is serious, the efficiency of Hash Join algorithm will be degraded.
With the continuous increase of the amount of data in the system, there will be more and more Hash Join scenarios. The following is a series of experiments to determine the various characteristics of Hash Join.
2. Hash Join connection experiment
The first is to prepare the experimental environment.
SQL > create table segs as select * from dba_segments where wner='SYS'
Table created
SQL > create table objts as select * from dba_objects where wner='SYS'
Table created
SQL > select count (*) from segs
COUNT (*)
-
2503
SQL > select count (*) from objts
COUNT (*)
-
31083
SQL > create index idx_segs_name on segs (segment_name)
Index created
SQL > create index idx_objts_name on objts (object_name)
Index created
SQL > exec dbms_stats.gather_table_stats (user,'SEGS',cascade = > true)
PL/SQL procedure successfully completed
SQL > exec dbms_stats.gather_table_stats (user,'OBJTS',cascade = > true)
PL/SQL procedure successfully completed
At this point, we compare the cost factors of the three connections.
SQL > set autotrace traceonly
SQL > select * from segs, objts where segs.segment_name=objts.object_name
4870 rows have been selected.
Carry out the plan
Plan hash value: 779051904
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
| | 0 | SELECT STATEMENT | | 2617 | 572k | 142 (1) | 00:00:02 |
| | * 1 | HASH JOIN | | 2617 | 572k | 142 (1) | 00:00:02 |
| | 2 | TABLE ACCESS FULL | SEGS | 2503 | 312K | 16 (0) | 00:00:01 |
| | 3 | TABLE ACCESS FULL | OBJTS | 31083 | 2914K | 126 (1) | 00:00:02 |
Predicate Information (identified by operation id):
1-access ("SEGS". "SEGMENT_NAME" = "OBJTS". "OBJECT_NAME")
Statistical information
1 recursive calls
0 db block gets
814 consistent gets
0 physical reads
0 redo size
356347 bytes sent via SQL*Net to client
3940 bytes received via SQL*Net from client
326 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4870 rows processed
SQL > select / * + use_nl (segs,objts) * / * from segs,objts where segs.segment_name=objts.object_name
4870 rows have been selected.
Carry out the plan
Plan hash value: 2045044449
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 2617 | 572k | 5023 (1) | 00:01:01 |
| | 1 | NESTED LOOPS | | |
| | 2 | NESTED LOOPS | | 2617 | 572k | 5023 (1) | 00:01:01 |
| | 3 | TABLE ACCESS FULL | SEGS | 2503 | 312K | 16 (0) | 00:00:01 |
| | * 4 | INDEX RANGE SCAN | IDX_OBJTS_NAME | 1 | | 1 (0) | 00:00:01 |
| | 5 | TABLE ACCESS BY INDEX ROWID | OBJTS | 1 | 96 | 2 (0) | 00:00:01 |
-
Predicate Information (identified by operation id):
4-access ("SEGS". "SEGMENT_NAME" = "OBJTS". "OBJECT_NAME")
Statistical information
1 recursive calls
0 db block gets
5799 consistent gets
0 physical reads
0 redo size
406352 bytes sent via SQL*Net to client
3940 bytes received via SQL*Net from client
326 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
4870 rows processed
SQL > select / * + use_merge (segs,objts) * / * from segs,objts where segs.segment_name=objts.object_name
4870 rows have been selected.
Carry out the plan
Plan hash value: 2272228973
-
| | Id | Operation | Name | Rows | Bytes | TempSpc | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 2617 | 572k | | 900K (1) | 00:00:11 |
| | 1 | MERGE JOIN | | 2617 | 572k | | 900K (1) | 00:00:11 |
| | 2 | SORT JOIN | | 2503 | 312K | 920K | 90 (2) | 00:00:02 |
| | 3 | TABLE ACCESS FULL | SEGS | 2503 | 312K | | 16 (0) | 00:00:01 |
| | * 4 | SORT JOIN | | 31083 | 2914K | 8168K | 809 (1) | 00:00:10 |
| | 5 | TABLE ACCESS FULL | OBJTS | 31083 | 2914K | | 126 (1) | 00:00:02 |
-
Predicate Information (identified by operation id):
4-access ("SEGS". "SEGMENT_NAME" = "OBJTS". "OBJECT_NAME")
Filter ("SEGS". "SEGMENT_NAME" = "OBJTS". "OBJECT_NAME")
Statistical information
1 recursive calls
0 db block gets
494 consistent gets
0 physical reads
0 redo size
427743 bytes sent via SQL*Net to client
3940 bytes received via SQL*Net from client
326 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
4870 rows processed
For a detailed comparison, see the following figure:
Block reading
Sort
CPU cost
Execution time
Hash Join
eight hundred and fourteen
0
one hundred and forty two
0.02
Nest Loop Join
5799
0
5023
1.01
Merge Sort Join
four hundred and ninety four
two
nine hundred
0.11
Three connection methods, the same amount of data and statements in SQL, and finally get different cost consumption. It can be seen that when the amount of data reaches ten thousand, the random reading of Nest Loop Join will increase sharply, and the cost of CPU and the total execution time will also increase greatly.
The use of Merge Sort Join brings relatively few block reads, but the CPU cost and execution time can not be ignored. Mapping a data set sort to memory (possibly using Temp Tablespace) consumes a lot of CPU and memory resources (sort segments).
Generally speaking, Hash Join can bring good comprehensive performance in this SQL. Only the block reading is slightly larger, and other indicators are the best acceptable values.
Below we introduce some system parameters related to Hash Join and the three operating modes of Hash Join. Different system parameters may affect the cost calculation of CBO. Different operation modes help us understand how the size of hash_area in PGA affects the performance of Hash Join operations.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.