In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article discusses the mechanism behind join technology. We know that the commonly used table joins are as follows
Cartesian connection
Internal connection
Left outer connection
Right outer connection
Full connection
The writing of these sql must be very clear to everyone, so how is the data access of these connections realized?
Nested loop
Let's look at the following query
SQL > alter session set optimizer_mode=rule;Session altered.SQL > select ename,dname from emp,dept where emp.deptno=dept.deptno 14 rows selected.Execution Plan---Plan hash value: 3625962092 UV-| Id | Operation | Name |-- | 0 | SELECT STATEMENT | | 1 | NESTED LOOPS | 2 | NESTED LOOPS | | 3 | TABLE ACCESS FULL | | EMP | | * 4 | INDEX UNIQUE SCAN | PK_DEPT | | 5 | TABLE ACCESS BY INDEX ROWID | DEPT |-- Predicate Information (identified by operation id):-- | -4-access ("EMP". "DEPTNO" = "DEPT". "DEPTNO")
According to the interpretation of the execution plan we talked about earlier, this query is implemented as follows:
Full table scan emp table (non-blocking scan, not all the data is taken out before performing the next step).
Pull out the data in emp one by one, query the rowid in the index through the index PK_DEPT, and the result set becomes (ename,rowid).
Take out the result set generated by 2 one by one, access the dept table through rowid, and the result set becomes (ename,dname)
Returns the result set.
This join implementation of fetching data in a loop is called a nested loop.
This plan can be implemented with the following logic pseudocode
For y in (for x in (select * from emp) loop index lookup the rowid for x.deptno output joined record (ename,dept.rowid) end loop) loop select * from dept where rowid=y.rowid output joined record (ename,dname) end loop
We call the emp table the driver table (note that the driver table has nothing to do with the table order of the from clause and mainly depends on the execution plan).
This connection mode is suitable for driving the table to return less data, and the deptno column on the driven table dept has an index. If the query returns n rows, the dept table will be scanned n times. This connection is good at quickly fetching the first row from the result set.
Hash Join
Hash Join is suitable for dealing with large result sets, and the optimizer selects two tables or the smaller of the source data, and uses join key to build a hash table in memory. Then scan the large table and explore the hash table to find matching records.
A small table is called a driver table, and a large table is called a probe table.
This is best when the hash table is all in memory. If there is no room for hash table in memory, the optimizer will partition hash table, and partitions that are out of memory will be written to the temporary tablespace.
We discuss the implementation of hash join in two situations.
Hash table is all in memory.
Hash table is an Oracle that splits a small table into multiple bucket using a hash function based on join key. After the establishment of the hash table, Oracle scans the large table and uses the same hash algorithm to split the read data into multiple bucket. Join operation is performed between bucket and bucket, and the result is returned. Until the big watch has been read.
2. Hash table can't be all in memory.
This situation is a bit troublesome. When Oracle finds that memory cannot fully store the small table, Oracle partitions the small table when constructing hash table, and then constructs bucket in each partition. When memory is full, Oracle writes the largest partition in memory to tempfile. Use this method until the construction of the small table hash table is complete. At this point, part of the data in hashtable is in memory and part of the data is in tempfile.
When Oracle scans a large table, if the scanned row can find the result in memory through hash, the match is successful. If it fails, the data is written to the tempfile first in the same way as hash table. When all the large tables have been scanned, the parts of the hash table memory have been matched. At this point, load the partitions in tempfile into memory in turn. Rescanning the large table temporarily exists in the corresponding partition in the tempfile to match. Until all the data is processed.
SQL > insert into big_emp select * from big_emp;SQL > insert into big_emp select * from big_emp;# execute multiple times SQL > / 458752 rows created.SQL > create table dept_new as select * from dept;Table created.SQL > set autot traceonlySQL > select * from big_emp 917504 rows selected.Execution Plan---Plan hash value: 1925493178 Murray- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |- -| 0 | SELECT STATEMENT | | 917K | 54m | 1490 (2) | 00:00:18 | | * 1 | HASH JOIN | | 917K | 54m | 1490 (2) | 00:00:18 | | 2 | TABLE ACCESS FULL | DEPT_NEW | 4 | 120 | 3 (0) | 00:00:01 | 3 | TABLE ACCESS | FULL | BIG_EMP | 917K | 28m | 1482 (1) | 00:00:18 |-Predicate Information (identified by operation id): -- 1-access ("A". "DEPTNO" = "B". "DEPTNO") Note--dynamic sampling used for this statement (level=2) Statistics-- -4 recursive calls 1 db block gets 66338 consistent gets 0 physical reads 0 redo size 62512398 bytes sent via SQL*Net to client 673349 bytes received via SQL*Net from client 61168 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 917504 rows processed
Sort Merge Joins
Sort merge joins are different from nested loops and hash joins. Sort merge joins have no concept of driving tables. In short, the sort merge will sort the first input set, sort the second input set, and then merge the results. Sort merging is usually not as efficient as hashing, because both result sets need to be sorted, while hashing joins only one result set before data output. Sort merges are usually valid in non-equivalent connections. That is, the connection condition is not an equation but a range comparison (=). Or the data of the two tables have been sorted.
Let's look at the following example
SQL > set linesize 200 pagesize 200SQL > set autot traceonlySQL > select a.ename from emp b.enamerecy a.hiredatereditionb.hiredate 2 from emp emp b 3 where a.empnob.empno and a.hiredate
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.