In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Hash Join is a kind of connection method that often appears in the Oracle CBO era, and it often appears in the execution plan when dealing with massive data. The first part of this article (http://space.itpub.net/17203031/viewspace-697442) introduces some external characteristics and operation algorithm flow of Hash Join. Let's take a look at some important parameters and internal indicators that affect Hash Join.
3. Hash Join related parameters
Hash Join is an execution plan operation that can only be generated by the CBO optimizer, and an execution plan that includes Hash Join cannot be generated if RBO is selected. In addition, the Oracle parameters related to Hash Join include the following:
ü Hash_Join_Enable
This parameter is the switch that controls CBO to enable Hash Join. If set to True, it means that CBO can use Hash Join connection mode, otherwise it cannot be used. In the current version, this parameter has evolved into an implicit parameter named "_ hash_join_enable".
SQL > col name for A20
SQL > col value for A10
SQL > col DESCRIB for A30
SQL > SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM SYS.x$ksppi x, SYS.x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE'% hash_join_enable%'
NAME VALUE DESCRIB
_ hash_join_enabled TRUE enable/disable hash join
The implicit formulation of this parameter indicates that CBO has matured to a certain extent, and Oracle officials do not want us to disable this Hash Join connection method. Of course, we can disable hash Join temporarily from both system and session layers.
/ / now _ hash_join_enable=true
SQL > explain plan for select * from segs, tabs where segs.segment_name=tabs.table_name
Explained
SQL > select * from table (dbms_xplan.display)
PLAN_TABLE_OUTPUT
Plan hash value: 2106473715
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
| | 0 | SELECT STATEMENT | | 990 | 354k | 25 (4) | 00:00:01 |
| | * 1 | HASH JOIN | | 990 | 354k | 25 (4) | 00:00:01 |
| | 2 | TABLE ACCESS FULL | TABS | 968 | 229K | 11 (0) | 00:00:01 |
| | 3 | TABLE ACCESS FULL | SEGS | 2267 | 274k | 13 (0) | 00:00:01 |
Predicate Information (identified by operation id):
1-access ("SEGS". "SEGMENT_NAME" = "TABS". "TABLE_NAME")
15 rows selected
/ / disable hash_join connection at the session level
SQL > alter session set "_ hash_join_enabled" = false
Session altered
NAME VALUE DESCRIB
_ hash_join_enabled FALSE enable/disable hash join
/ / the same SQL, when the parameter environment has changed
SQL > explain plan for select * from segs, tabs where segs.segment_name=tabs.table_name
Explained
SQL > select * from table (dbms_xplan.display)
PLAN_TABLE_OUTPUT
-
Plan hash value: 3475644097
| | Id | Operation | Name | Rows | Bytes | TempSpc | Cost (% CPU) | Time |
| | 0 | SELECT STATEMENT | | 990 | 354k | | 144K (2) | 00:00:02 |
| | 1 | MERGE JOIN | | 990 | 354k | | 144K (2) | 00:00:02 |
| | 2 | SORT JOIN | | 968 | 229K | 712K | 65 (2) | 00:00:01 |
| | 3 | TABLE ACCESS FULL | TABS | 968 | 229K | | 11 (0) | 00:00:01 |
| | * 4 | SORT JOIN | | 2267 | 274k | 824K | 79 (2) | 00:00:01 |
| | 5 | TABLE ACCESS FULL | SEGS | 2267 | 274k | | 13 (0) | 00:00:01 |
Predicate Information (identified by operation id):
4-access ("SEGS". "SEGMENT_NAME" = "TABS". "TABLE_NAME")
Filter ("SEGS". "SEGMENT_NAME" = "TABS". "TABLE_NAME")
18 rows have been selected.
It can be seen that when we disable the hash Join connection at the session level, CBO cannot choose the Hash Join path. So choose the Merge Join path, obviously whether it is the execution time or the cost of CPU, Merge Join is slightly inferior.
ü Hash_Area_Size
Hash Join operations rely on independent private spaces, which we call Hash_Area. The role of Hash Area in the Join process is to cache the join table in Hash Area as much as possible for Hash matching and Bucket internal exact matching. Hash Area is stored in PGA and belongs to a separate space of session session. If the Hash Area is too small to hold all the data in the small table, it will lead to the use of Temp tablespaces, which in turn affects the performance of Hash Join.
SQL > show parameter hash
NAME TYPE VALUE
-
Hash_area_size integer 131072
Because each session opens a Hash Area for Hash operation, the size of the Hash Area is usually not set very large. The space similar to Hash Area is Sort Area, which is used for Order by operations in SQL statements and is also a parameter item that depends on allocation. Typically, Hash Area is assigned twice the size of Sort Area.
SQL > show parameter sort_area
NAME TYPE VALUE
-
Sort_area_retained_size integer 0
Sort_area_size integer 65536
After entering Oracle 9i, especially 10g, Oracle shared memory and exclusive memory allocation strategy shows the trend of automation and adaptation, and this technology is becoming more and more mature. DBA only needs to determine the total memory usage size of the Oracle database (memory_target), and it will adjust the memory partition adaptively according to the algorithm and load.
As a PGA allocation, the automatic regulation parameter introduced by Oracle is pga_aggregate_target, which indicates the total PGA allocation size of all sessions. If PGA auto-allocation is not enabled, the parameter value is set to 0.
SQL > show parameter pga
NAME TYPE VALUE
-
Pga_aggregate_target big integer 0
ü Hash_multiblock_io_count
This parameter represents the number of blocks that can be read at a time during a Hash Join connection operation. In the latest version, this parameter has become an implicit parameter.
SQL > SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ
2 FROM SYS.x$ksppi x, SYS.x$ksppcv y
3 WHERE x.inst_id = USERENV ('Instance')
4 AND y.inst_id = USERENV ('Instance')
5 AND x.indx = y.indx
6 AND x.ksppinm LIKE'% hash_multiblock%'
NAME VALUE DESCRIB
-
_ hash_multiblock_io_count 0 number of blocks hash join wil
L read/write at once
This parameter dates back to the era of Oracle 8i, when the default value was set to 1. In later versions, it is usually set to 0. This parameter has a great influence on IO, and the effect is different under different hardware environment and system load. So, when set to 0, Oracle automatically calculates the value each time.
As far as we are concerned, it is best not to set this parameter.
4. Connect the three modes
One of the advantages of Hash Join over Merge Sort Join is the limited use of PGA space. However, using PGA is a risky operation after all. Because Hash Area, like Sort Area, invokes the hard disk space of the Temp tablespace when the small table cannot be fully loaded into the system. In this way, it will cause some problems.
The following exposition of the three modes draws lessons from the "Oracle Hash Join" (http://www.alidba.net/index.php/archives/440) of the eight gods' predecessors. I would like to express my gratitude.
Oracle has different modes corresponding to different states.
Optimal mode
This is the ideal situation for us to carry out Hash Join. When the Hash data set generated by the driver table (small table) can be completely stored in Hash Area, we call it the Optimal schema.
First of all, find the driver table and get the driver table. Stored in Hash_Area
In Hash Area, Hash operation is performed on the driver table to form Hash Bulket and corresponding partition information. For multiple Bulket, a Bitmap list is formed at the same time to achieve the connection between Bulket and Bitmap bits.
Different data rows are distributed in each Bulket. If the join columns are evenly distributed, the data in the Bulket is more uniform. If data is included in the Bulket, the Bitmap bit of the corresponding Bulket is 1, otherwise it is 0
Find each column of the driven table and Hash the value of the join column. Match the bitmap bit. If Bitmap is 0, the column value does not exist and is discarded directly. Otherwise, enter Bulket for exact matching.
Onepass mode
If we set up a small PGA space, or if the connected small table volume is already large, then the temporary tablespace will be used. The specific treatment is to do two Hash processing, and establish the Partition partition above the Bulket level.
When Hash operations are performed, part of the Partition is in memory and the other part of the Partition is stored in the Temp tablespace.
When doing connection matching, if you can determine that the Partition is in memory in Bitmap, then the retrieval and exact matching process is performed directly in memory. Otherwise, the corresponding Partition is called into memory from the Temp tablespace for matching operation.
Multipass mode
This is a very extreme situation, if the Hash Area is too small to hold a Partition. After the Hash operation, only half of the Partition can be loaded into the Hash Area.
In this case, if a Partition match is not achieved, the operation cannot be abandoned, and the remaining half of the Partition should be obtained for Hash Join matching. That is, a Partition has to go through two Bitmap matching processes.
5. Conclusion
Hash Join is a highly efficient and common way to connect in the CBO era. However, compared with other classical algorithms, the synthesis efficiency of Hash Join is very high, especially in the era of massive data.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.