In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Table connection
As the name implies, a table join means that multiple tables are joined together by join conditions, and the purpose of using the target SQL of a table join is to obtain data from multiple tables in different dimensions stored in these tables. Reflected in the SQL statement, multiple tables appear in the from portion of the target SQL with table joins, while the where condition parts of these SQL define specific table join conditions.
When the optimizer parses a target SQL with a table join, in addition to determining the type of table join based on the writing of the SQL text of the target SQL, it must decide on three things to get the final execution plan.
1. Table join order
No matter how many tables are used as table joins in the target SQL, Oracle can only do table joins in pairwise when actually executing the SQL, and then perform such a pairwise table join process in turn until all the tables in the target SQL are joined. Strictly speaking, the table join order here contains two meanings: one is that when two tables are joined, the optimizer needs to decide which of the two tables is the driven table (outer table) and which is the driven table (inner table). Another meaning is that when multiple tables (more than two tables) make a table join, the optimizer needs to decide who joins the table with whom first, and then decide which table join the result set of the table join result is in. This pairwise table join process will continue until all the tables in the target SQL have been joined.
two。 Table join method
In Oracle database, there are four kinds of table join methods between two tables: sort join (sort merge join), nested circular join (nested loops join), hash join (hash join) and Cartesian join (cross join). Therefore, when parsing the target SQL with table join, the optimizer needs to choose one of the above four methods as the method for each pair of tables to join.
3. The method of accessing single table
It is not enough for the optimizer to determine the table join order and table join method, which is not enough to get the final execution plan of the target SQL, because when the optimizer joins each table in the target SQL, it must also decide how to obtain the different dimensions of data stored in those tables, that is, the optimizer also decides how to access the single table. For example, when accessing a single table, whether to use the full table scan or walk the index, if it is to walk the index, what kind of index access method should be used and so on.
1.1 types of table joins
In general, we can think that the table join in Oracle database can be divided into two types: inner join and outer join. The type of table join directly determines the result of table join, and the writing of the SQL text of the target SQL directly determines the type of table join.
1.1.1 Internal connection
Inner Join means that the result of a table join contains only those records that fully meet the join conditions. For a target SQL that contains table joins, as long as its where condition does not write those keywords that represent outer joins defined in standard SQL or customized in Oracle (such as left outer join, right outer join, full outer join in standard SQL, or "(+)" customized in Oracle to represent outer joins), the join type of that SQL is inner join.
Oracle custom inner connection method:
Target Table 1, Target Table 2 where connection conditions
In standard SQL, internal connections are made using JOIN ON or JOIN USING.
The syntax of JOIN ON:
Target Table 1 join Target Table 2 on (connection conditions)
The syntax of JOIN USING:
Target Table 1 join Target Table 2 using (collection of join columns).
For a target SQL that uses JOIN USING, if there are multiple join columns, the join columns in the syntax "(set of join columns)" should be separated by commas. It should be noted that using JOIN USING's join syntax, if the join column appears in the query column at the same time, the join column cannot be preceded by the table name or table name alias (alias), otherwise Oracle will report an error (ORA-25154).
There is also a special JOIN USING in the standard SQL, which we call NATURAL JOIN, which means that the join column of a table join using NATURAL JOIN is all the same name columns of the two tables joined by the table. Syntax:
Target Table 1 natural join Target Table 2
This is actually equivalent to target table 1 join target table 2 using (a collection of all identical columns in target table 1 and target table 2). The advantage of using NATURAL JOIN is that there is no need to write a set of join columns, but the downside is that it increases the risk of errors in the execution results of table joins, because columns with the same name between two tables may not be exactly the same in meaning, and even if they have the same meaning, they do not necessarily need to be used as join columns.
1.1.2 external connection
Outer join (Outer Join) is an extension of inner join, which means that the join result of a table join contains not only those records that fully meet the join condition, but also all records in the driver table that do not meet that condition.
The outer join in the standard SQL can be divided into three types: left outer join (Left Outer Join), right join (Right Outer Join) and full join (Full Outer Join). Their corresponding keywords in the standard SQL are left outer join, right outer join and full outer join, which can be used with JOIN ON or JOIN USING.
The syntax of the left connection:
Target Table 1 left outer join Target Table 2 on (join conditions) or Target Table 1 left outer join Target Table 2 using (set of join columns)
It means that the target table 1 and the target table 2 are joined according to the join conditions in parentheses, and the table 1 to the left of the keyword is used as the drive table (outer table). In this case, the join result includes not only all the records in tables 1 and 2 that meet the connection conditions, but also all records in the driver table (table 1) that do not meet the connection conditions. The query columns in the driven table (Table 2) corresponding to all records in the driven table that do not meet the join condition are populated with NULL values.
Syntax for right join:
Target Table 1 right outer join Target Table 2 on (join conditions) or Target Table 1 right outer join Target Table 2 using (set of join columns)
The meaning is similar to the left join, but this time Table 2, located on the right table of the keyword, is the driver table.
Full join syntax:
Target Table 1 full outer join Target Table 2 on (join conditions) or Target Table 1 full outer join Target Table 2 using (set of join columns)
It means that target table 1 and target table 2 are joined according to the join conditions in parentheses. In addition to all the records in tables 1 and 2 that meet the join condition, the connection result will also include all records in target tables 1 and 2 that do not meet the join condition. At the same time, the query columns in another table in tables 1 and 2 that do not meet the join condition will be populated with NULL values.
In the example SQL described above, there are no additional restrictions except with join conditions. If there are additional constraints in the target SQL in addition to the table join conditions, the type of table joins in the target SQL and the location of the additional constraints in the SQL text of the target SQL may affect the final execution result.
Add other restrictions to the internal connection:
For internal joins, the location of additional constraints other than table join conditions in the SQL text of the target SQL does not affect the actual execution result of that SQL.
Add other restrictions to the outer connection:
For outer joins, if the additional constraint is in parentheses corresponding to the outer join keyword, this means that the constraint will be applied to table T1 before the right join of table T1 and table T2, while if the additional constraint is outside the parentheses corresponding to the outer join keyword, it means that the constraint will not be applied to the join result set of table T1 and table T2 after the right join of table T1 and table T2.
So, for outer joins, the location of additional constraints in the SQL text of the target SQL in addition to the table join condition may indeed affect the actual execution of that SQL.
Unlike the syntax for outer joins in standard SQL, Oracle uses the custom keyword "(+)" to represent outer joins. The position of the keyword "(+)" is after the join column of a table in the target SQL connection condition, and its meaning is that the keyword "(+)" appears after the join column of which table, indicating which table will fill the query column in the table that does not meet the connection conditions to find the location. At this time, the corresponding table of the keyword "(+)" should be used as the outer join driver table, which is the key is which table is the driver table!
As mentioned earlier, for outer joins, the location of additional constraints other than table join conditions in the SQL text of the target SQL may affect the actual execution result of that SQL. So if you use the Oracle custom keyword "(+)" to represent external joins, then how to reflect it? Quite simply, Oracle reflects this effect by placing the same keyword "(+)" after the target column of the additional constraint:
Select t1.col1,t1.col2,t2.col3
From t1,t2
Where t1.col2 (+) = t2.col2
And t1.col1 (+) = 1
The NATURAL JOIN mentioned earlier applies not only to internal connections, but also to external connections:
Select t1.col1,col2,t2.col3
From t1 natural left outer join t2
1.2 the method of table join
As mentioned earlier, when the optimizer parses a target SQL with a table join, when it determines the type of table join based on the writing of the SQL text of the target SQL, one of the next things to do is to determine the method of the table join.
In Oracle database, there are four kinds of table join methods between two tables: sort merge join, nested loop join, hash join and Cartesian join. These four kinds of table joins have their own advantages and disadvantages, and each has its own applicable scenarios. Next, we will introduce them respectively.
1.2.1 sort merge join
Sort merge join (Sort Merge Join) is a table join method in which two tables use sort operation (Sort) and merge operation (Merge) to get the join result set.
If two tables (if T1 and T2) are joined using a sort merge join, Oracle performs the following steps sequentially:
First, access table T1 with the predicate condition (if any) specified in the target SQL, and then sort the access results according to the connections in table T1. The sorted result set is recorded as result set 1.
Then access table T2 with the predicate condition (if any) specified in the target SQL, and then sort the access results according to the connections in table T2, and the sorted result set is recorded as result set 2.
Finally, the merge operation is performed on result set 1 and result set 2, from which the matching record is taken as the final execution result of the sort merge join.
The advantages and disadvantages of sorting and merging connections and the applicable scenarios are summarized as follows:
In general, sort merge joins are far less efficient than hash joins, but the former is more widely used because hash joins are usually only used for equivalent joins, while sort merge joins cannot be used for other conditions (for example, =).
In general, sorting merge connections are not short-distance OLTP-type systems, the essential reason is that sorting is a very expensive operation for OLTP-type systems, of course, if sorting operations can be avoided, then even OLTP-type systems can still use sort merge connections. For example, although two tables are sort merge joins, they do not actually need to be sorted, because indexes exist on the join columns of both tables.
Strictly speaking, sort merge joins do not have the concept of driving tables.
1.2.2 nested loop connection
Nested Loop join (Nested Loops Join) is a table join method in which two tables rely on two layers of nested loops (outer loop and inner loop) to get the join result set.
If two tables (if T1 and T2) use nested loop joins when doing table joins, Oracle performs the following steps sequentially:
First of all, the optimizer will follow certain rules to determine who is the driven table and who is the driven table in tables T1 and T2. The driven table is used for the outer loop, and the driven table is used for the memory loop. This assumes that the driven table is T1 and the driven table is T2.
Then access the driver table T1 with the predicate condition (if any) specified in the target SQL, and the result set obtained after accessing the driver table T1 is recorded as the driver result set 1.
Then traverse the driven result set 1 and traverse the driven table T2 at the same time, that is, first take out the first record in the driven result set 1, then traverse the driven table T2 and judge whether there is a matching record in T2 according to the connection conditions. then take out the second record in the driven result set 1, traverse the driven table T2 according to the same connection conditions and determine whether there is a matching record in T2. Until all the records in the driver result set 1 are traversed. Here, the outer loop refers to traversing the loop corresponding to the driven result set 1, and the inner loop refers to traversing the loop corresponding to the driven table T2. Obviously, how many records there are in the driving result set 1 corresponding to the outer loop, and how many times you have to traverse the inner loop of the driven table T2, which is the meaning of the so-called "nested loop".
The advantages and disadvantages of nested loop connections and the applicable scenarios are summarized as follows:
From the specific execution process of the above nested loop join, we can see that if the number of records of the driving result set corresponding to the driven table is small, at the same time, there is a unique index on the join column of the driven table (or there is a selective non-unique index on the join column of the driven table), then the execution efficiency of using nested loop join will be very high. However, if the number of records of the driven result set corresponding to the driven table is large, even if there is an index on the join column of the driven table, the execution efficiency of using nested loop joins will not be high.
As long as the number of records driving the result set is small, there is a prerequisite for making a nested loop join, and the drive result set is the result set obtained after applying the predicate condition (if any) specified in the target SQL to the driver table, so the large table can also be used as the driver table of the nested loop join, depending on whether the predicate condition (if any) specified in the target SQL can reduce the amount of data driving the result set.
Nested loop connection has an advantage that other connection methods do not have: nested loop connection can achieve fast response, that is, it can return records that have been connected and meet the connection conditions at the first time. You don't have to wait for all the connection operations to be done before returning the connection results. Although sorting merge connections is OK, they are not returned the first time, because sort merge connections can not start to return data until the merge operation is done after sorting, while hash connections can not start to return data until all the Hash Table corresponding to the driving result set has been built.
If Oracle uses a nested loop join and there is an index on the join column of the driven table, Oracle usually uses a single block read when accessing the index, which means that Oracle will need to access the index as many times as there are as many records in the driven result set of the nested loop join. In addition, if not all the query columns in the target SQL can be obtained from the relevant indexes of the driven table, then Oracle needs to perform a table return operation on the driven table after the nested loop join. This green-back operation usually uses a single-block read, which means that Oracle needs to return to the table as many times as there are records in the join result set after the nested loop join.
In order to improve the execution efficiency of nested loop joins, Oracle introduces the vector I-hand O (Vector I-hand O) in Oracle 11g. After the introduction of vector I _ peg O, Oracle can combine the previous batches of single-block reads and batch them with a single-block read, so as to reduce the number of physical I-sign O consumed by these single-block reads without reducing the number of single-block reads, and improve the execution efficiency of nested loop joins.
1.2.3 Hash connection
Hash join (Hash Join) is a table join method in which two tables mainly rely on hash operations to get the join result set.
Before Oracle7.3, there were only two table join methods commonly used in Oracle database: sort merge join and nested loop join, but each of them has its own obvious defects. For sort merge joins, if the result set of two tables after applying the predicate condition (if any) specified in the target SQL is large and needs sorting, the execution efficiency of the sort merge join must be inefficient. For nested loop joins, if the number of records in the driving result set corresponding to the driven table is very large, even if there is an index on the join column of the driven table, the execution efficiency of using nested loop joins will be equally inefficient. In order to solve the problem of inefficiency in the above situations, and to provide a new choice for the optimizer, Oracle introduced hash joins in 7. 3. In theory, hash joins are more efficient than sort merge joins and nested loop joins, but this is not always the case.
In Oracle10g and later versions of the Oracle database, whether the optimizer (actually CBO, because hash connections are only applicable to CBO) considers that the hash connection is limited by the implicit parameter _ HASH_JOIN_ENABLED when parsing the target SQL, while before Oracle10g, whether CBO considers the hash connection when parsing the target SQL is limited by the parameter HASH_JOIN_ENABLED. The default value for _ HASH_JOIN_ENABLED is TRUE, which allows CBO to consider hash connections when parsing the target SQL. Of course, even if this parameter is FALSE, using USE_HASH Hint still allows CBO to consider the hash connection when parsing the target SQL, which means that USE_HASH Hint has a higher priority than the parameter _ HASH_JOIN_ENABLED.
If two tables (if T1 and T2) use a hash join when making a table join, Oracle performs the following steps sequentially:
First, Oracle determines the data of Hash Partition based on the values of the parameters HASH_AREAS_SIZE, DB_BLOCK_SIZE, and _ HASH_MULTIBLOCK_IO_COUNT. (Hash Partition is a logical concept, which is actually a collection of Hash Bucket. The set of all HashPartition is called Hash Table, that is, a Hash Table consists of multiple HashPartition, and a HashPartition consists of multiple Hash Bucket).
After the predicate conditions (if any) specified in the target SQL are imposed on tables T1 and T2, the result set with a small number of result sets is selected by Oracle as the driven result set of the hash join. Here we assume that the data amount of the result set corresponding to T1 is relatively small, and that the result set corresponding to T2 has relatively much data, marked B as B. Obviously, S is the driven result set and B is the driven result set.
Oracle then traverses S, reads each record in S, and hashes each record according to the join column of that record in table T1. This hash operation uses two built-in hash functions, which calculate the hash value for the join column at the same time. We record the two built-in hash functions as hash_func_1 and hash_func_2, and their calculated hash values are hash_vale_1 and hash_value_2, respectively.
Then Oracle stores the corresponding record in the corresponding S in different Hash Partition in different Hash Bucket according to the value of hash_value_1, along with the hash_value_2 calculated by hash_func_2. Note that the records stored in the Hash Bucket are not the complete row records of the target table; it is sufficient to store the query and join columns associated with the target table in the location target SQL. We mark each Hash Partition corresponding to S as Si.
While building the Si, Oracle builds a bitmap (BITMAT) that marks whether each Hash Bucket contained in the Si is recorded (that is, whether the number of records is greater than 0).
If S has a large amount of data, then when building the Hash Table corresponding to S, the WORK AREA of the PGA may be filled. At this point, Oracle will write the Hash Partition that contains the most records in the workspace to disk (TEMP tablespace). Then Oracle will continue to build the Hash Table corresponding to S, and in the process of building, if the workspace is full again, Oracle will continue to repeat the above action, that is, pick the Hash Partition with the largest number of records and write back to disk. If the Hash Partition corresponding to the record to be built has been written back to disk by Oracle, Oracle will update the Hash Partition on disk at this time, that is, add the record and Hash_vale_2 directly to the corresponding Hash Bucket of the Hash Partition that is already on disk. Note that in extreme cases, it may occur that only part of a Hash Partition is recorded in memory, and the rest of the Hash Partition and all the remaining Hash Partition have been written back to disk.
The above process of building the Hash Table corresponding to S continues until all the records in S are traversed.
Then, Oracle will sort all the Si according to the number of records they contain, and then put the sorted Hash Partition in memory as much as possible (the workspace of the PGA). Of course, if you can't put it down, the part of the Hash Partition that can't be put will still be on disk.
At this point, Oracle is done with S, and now it's time to deal with B.
Oracle traverses B, reads each record in B, and does a hash operation according to the join column of the record in table T2, which is exactly the same as the hash operation in step 3, that is, it still uses the hash_func_1 and hash_func_2 in step 3, and calculates the two hash values hash_value_1 and hash_value_2.
Then Oracle will go to Si to find the join columns of each record in the matching Hash Bucket according to the corresponding hash value hash_value_1 of the record to see if it really matches (that is, to verify whether the join columns corresponding to the matching records in S and B are really equal, because for hashing, the results of different values after hashing may be the same). If it does match, the query column in the target SQL recorded in the B corresponding to the above hash_value_1 and the matching record in the Hash Bucket will be combined and returned together as records that meet the target SQL connection conditions. If no matching Hash Bucket is found, Oracle accesses the bitmap built in step 5.
If the bitmap shows that the number of records corresponding to the Hash Bucket in the Si is greater than 0, it means that although the Hash Bucket is not in memory, it has been written back to disk, and the Oracle will write the corresponding record in the corresponding B back to the disk in the way of Hahs Partition according to the value of hash_value_1. At the same time, the record is stored with the value of the hash_value_2 calculated by hash_func_2. If the bitmap shows that the number of records corresponding to the Hash Bucket in the Si is equal to 0, then the Oralce does not need to write the record in the B corresponding to the above hash_value_1 back to disk, because the record must not meet the connection conditions of the target SQL. This action that determines whether to write the B record of hash_value_1 to disk based on location is the so-called "bitmap filtering" (Oralce does not necessarily enable bitmap filtering, because if all the Si is already in memory and the operation of writing Si back to disk has not happened, then Oracle does not need to enable bitmap filtering here). We write down each Hash Partition corresponding to B as Bj.
The above process of finding a matching Hash Bucket in Si and building a Bj continues until all the records in B are traversed.
At this point, Oracle has processed all the Si in memory and the corresponding Bj, and now only the si and Bj on disk have not been processed.
Because the same hash functions hash_func_1 and hash_func_2 are used when building Si and Bj, Oracle can safely pair up when dealing with Si and Bj on disk, that is, only Si and Bj with the same Hash Partition number values can produce records that meet the connection conditions. Here we use Sn and Bn to represent Si and Bj that are located on disk and have the same Hash Partition number value.
For each pair of Sn and Bn, the one with fewer records will be regarded as the driving result set, then the Oracle will use the hash_vale_2 of the records in the driven result set Hash Bucket to build the new Hash Table, and the other pair with more records will be treated as the driven result set, and then Oracle will use the hash_value_2 recorded in the driven result set Hash Bucket to find matching records in the new Hash Table built above. Note that for each pair of Sn and Bn, Oracle always chooses the less recorded of them as the driver result set, so the driver result set of each pair of Sn and Bn may change, which is called "dynamic role exchange".
If a matching record exists in step 14, the matching record is returned as a record that meets the criteria for the target SQL connection.
The above process of processing Sn and Bn continues until all the Sn and Bn are traversed.
The advantages and disadvantages of hash connection and the applicable scenarios are summarized as follows:
Hash joins are not necessarily sorted, or in most cases there is no need for sorting.
The possibility of the join column corresponding to the driver table of the hash join should be as good as possible, because this selectivity will affect the number of records in the corresponding Hash Bucket, and the number of records in the Hash Bucket will directly affect the efficiency of finding matching records from the Hash Bucket. If there are too many records in a Hash Bucket, the execution efficiency of the corresponding hash connection may be seriously reduced. At this time, the typical performance is that the hash connection has not been finished for a long time, and the CPU occupancy rate on the database server where the database is located is very high, but the logical read consumption of the target SQL is very low, because most of the time is spent traversing all the records in the above Hash Bucket. Traversing the records in Hash Bucket takes place in the workspace of PGA, so it doesn't cost logical reading.
Hash joins apply only to CBO, and it can only be used for equivalent join conditions (even if it is a hash disjoin, Oracle actually converts it to an equivalent join).
Hash joins are very suitable for table joins between small tables and large tables with a large number of records in the join result set, especially when the selectivity of the join columns of a small table is very good. The execution time of the hash join can be regarded as approximately equal to the time it takes to scan that large table.
When two tables do a hash join, if the Hash Table corresponding to the result set with a small amount of data after applying the predicate condition (if any) specified in the target SQL can be completely contained in memory (the workspace of the PGA), then the hash join will be very efficient.
1.2.4 Cartesian connection
Cartesian join (Cross Join), also known as Cartesian product (Caresian Product), is a method of joining two tables without any join conditions.
If two tables (if T1 and T2) use Cartesian joins when making table joins, Oracle performs the following steps sequentially:
Table T1 is first accessed with the predicate condition (if any) specified in the target SQL, and the resulting result set is recorded as result set 1, where the number of records in result set 1 is assumed to be m.
Table T2 is then accessed with the predicate condition (if any) specified in the target SQL, and the resulting result set is recorded as result set 2, where the number of records of result set 2 is assumed to be n.
Finally, a merge operation is performed on result set 1 and result set 2, from which the matching record is taken as the final execution result of the Cartesian connection. The special thing here is that for Cartesian joins, because there are table join conditions, when merging result set 1 and result set 2, all records in result set 2 satisfy the condition, that is, they will all be matching records, so the number of records of the above Cartesian join results is the product of m and n (that is, m × n).
Statement example: select t1.col1pender t2.col3 from t1peng T2
Standard SQL uses the keyword "CROSS JOIN" to represent Cartesian connections, such as select t1.col1 from t2.col3 from T1 cross join T2
The advantages and disadvantages of Cartesian connections and the applicable scenarios are summarized as follows:
Cartesian joins usually occur because table join conditions are omitted in the target SQL, so Cartesian joins are generally not good unless they are deliberately done (for example, in some cases Cartesian joins can be used to reduce the number of full table scans of large tables in the target SQL).
Sometimes a Cartesian join occurs because ORDERED Hint is used in the target SQL, and there is no direct association condition between two tables adjacent to each other in the SQL text of the SQL.
Sometimes Cartesian joins occur because the statistics of the related tables in the target SQL are inaccurate. For example, three tables T1, T2 and T3 are joined. The join condition of T1 and T2 is T1.ID1 join T2.ID1 T2 and T3 is T2.ID2=T3.ID2. At the same time, there is a combined index containing these two join columns on the join columns ID1 and ID2 of table T2. If the statistical information of tables T1 and T3 is not accurate, causing Oracle to think that tables T1 and T3 have only a small number of records (for example, there is only one record), then Oracle will probably choose to make a Cartesian join to tables T1 and T3 first, and then to table T2. Because Oracle thinks that after tables T1 and T3 make a Cartesian join, the value of the Cardinality of the join result set is 1, and the join result will contain both column ID1 and column ID2, which means that Oracle can take advantage of the above combined index in table T2. This kind of Cartesian connection is usually problematic. If the actual number of records of tables T1 and T3 is not all 1, but all 1000, then the value of Cardinality of the result set of Cartesian connections made by tables T1 and T3 will be 1 million. Obviously, if the execution efficiency of the SQL is seriously affected if it is still carried out in the way of Cartesian connections.
Refer to "SQL Optimization based on Oracle"
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 275
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.