Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the implementation plan of Oracle?

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

This article shows you what the Oracle implementation plan is like, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

This paper introduces in detail the related concepts of oracle execution plan, the access method of access data, the connection between tables and so on.

I. related concepts

The concept of Rowid: rowid is a pseudo column, since it is a pseudo column, then this column is not user-defined, but is added by the system itself. There is a pseudo column of rowid for each table, but the value of the ROWID column is not physically stored in the table. However, you can use it like other columns, but you cannot delete and change the column, nor can you modify or insert the value of the column. Once a row of data is inserted into the database, the rowid is unique during the lifetime of the row, that is, even if the row produces a row migration, the row's rowid does not change.

Recursive SQL concept: sometimes in order to execute a sql statement issued by a user, Oracle must execute some additional statements, which we call''recursive calls''' or 'recursive SQL statements''.'. For example, when a DDL statement is issued, ORACLE always implicitly issues some recursive SQL statements to modify the data dictionary information, so that the user can successfully execute the DDL statement. Recursive calls often occurs when the required data dictionary information is not in shared memory, and these Recursive calls will read the data dictionary information from the hard disk into memory. Users are less concerned about the execution of these recursive SQL statements, and ORACLE automatically executes these statements internally when needed. Of course, both DML statement and SELECT can cause recursive SQL. To put it simply, we can think of triggers as recursive SQL.

Row Source (Row Source): used in a query, the collection of qualified rows returned by the previous operation, that is, it can be a collection of all row data of a table; it can also be a collection of partial row data of a table; it can also be a collection of row data obtained after a join operation (such as a join join) on the last two row source.

Predicate (predicate): WHERE constraint in a query

Driving Table (driver table): this table is also called outer layer table (OUTER TABLE). This concept is used in nesting with HASH connections. If the row source returns more row data, it has a negative impact on all subsequent operations. Note that although it is translated as a driver table here, it is actually more accurately translated as a driving row source. Generally speaking, after applying the constraints of the query, the table that returns fewer row sources is used as the driver table, so if a large table has constraints in the WHERE condition (such as the equivalent limit system), then the large table as a driver table is also appropriate, so not only a smaller table can be used as a driver table, the correct statement should be the application of query constraints, return a table with fewer row sources as a driver table. In the execution plan, it should be the upper row source, which will be specified later. In our later description, this table is generally referred to as row source 1. 1 for join operations.

Probed Table (probed table): this table is also called inner table (INNER TABLE). After we get a specific row of data from the driver table, look for rows in the table that meet the join criteria. So the table should be large (in fact, it should be a table that returns a large row source) and there should be indexes on the corresponding columns. In our later description, this table is generally referred to as row source 2. 0 for join operations.

Composite index (concatenated index): an index consisting of multiple columns, such as create index idx_emp on emp (col1, col2, col3,... Then we call the idx_emp index a composite index There is an important concept in the composite index: the boot column (leading column). In the above example, col1 is listed as the boot column. When we query, we can use either "where col1 =?" or "where col1 =? and col2 =?" All such restrictions will use the index, but the "where col2 =?" query will not use the index Therefore, the constraint uses the combined index only when the constraint contains a leading column.

Selectivity: compare the number of unique keys in the column with the number of rows in the table to determine the selectivity of the column. If the ratio of the number of unique keys to the number of rows in the table for the column is closer to 1, the higher the selectivity of the column, the more suitable the column is for indexing, and the more selective the index is. In the column up-row query with high selectivity, less data is returned, so it is more suitable to use index query.

II. The access method of oracle access data

1) full table scan (Full Table Scans, FTS)

In order to achieve a full table scan, Oracle reads all the rows in the table and checks whether each row satisfies the WHERE restriction of the statement. A multi-block read operation can make it possible to read multiple blocks at a time (db_block_multiblock_read_count parameter setting) instead of only one block, which greatly reduces the total number of times and improves the throughput of the system. Therefore, the use of multi-block read method can be very efficient to achieve full table scan, and only in the case of full table scan can use multi-block read operation. In this access mode, each data block is read only once.

Prerequisite for using FTS: full table scanning is not recommended on larger tables, unless more data is fetched, more than 5-10% of the total, or when you want to use parallel query.

An example of using a full table scan:

SQL > explain plan for select * from dual

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=

TABLE ACCESS FULL DUAL

2) access through ROWID table (Table Access by ROWID or rowid lookup)

The ROWID of a row indicates the data file, the data block and the position of the row in the block, so accessing data through ROWID can quickly locate to the target data, which is the fastest way for Oracle to access single-row data.

This access method does not use multi-block read operations, and Iswap O can only read one block at a time. We often see this access method in the execution plan, such as querying data through an index.

The method of using ROWID access:

SQL > explain plan for select * from dept where rowid =''AAAAyGAADAAAAATAAF''

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=1

TABLE ACCESS BY ROWID DEPT [ANALYZED]

3) Index scan (Index Scan or index lookup)

We first find the rowid value corresponding to the data through index (multiple rowidth values may be returned for non-unique indexes), and then get the specific data directly from the table according to rowid, which is called index scan or index lookup (index lookup). A rowid uniquely represents a row of data, and the corresponding data block for that row is obtained by an iUniqo, in which case only one database block is read.

In an index, in addition to storing the value of each index, the index also stores the ROWID value corresponding to the row with this value.

An index scan can consist of 2 steps:

(1) scan the index to get the corresponding rowwidth value.

(2) read out the specific data from the table through the rowid found.

Each step is a separate iUnip O, but for indexes, because they are often used, most of them have been CACHE into memory, so the iUnip O in step 1 is often a logical iUnip O, that is, the data can be obtained from memory. But for step 2, if the table is large, its data cannot be all in memory, so its Icano is likely to be a physical IUnip O, which is a mechanical operation, which is extremely time-consuming compared to logical Imax O. Therefore, if the size of the table is scanned by the index, if the data extracted is greater than 5%-10% of the total, the efficiency of using index scanning will be much lower. As shown below:

SQL > explain plan for select empno, ename from emp where empno=10

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=1

TABLE ACCESS BY ROWID EMP [ANALYZED]

INDEX UNIQUE SCAN EMP_I1

However, if all the data of the query can be found in the index, the second step operation can be avoided and the unnecessary Ibig O can be avoided. At this time, even if more data is fetched through index scanning, the efficiency is still very high.

SQL > explain plan for select empno from emp where empno=10;-- queries only empno column values

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=1

INDEX UNIQUE SCAN EMP_I1

Further, if the index columns are sorted in the sql statement, because the indexes are pre-sorted, there is no need to sort the index columns in the execution plan

SQL > explain plan for select empno, ename from emp

Where empno > 7876 order by empno

Query Plan

SELECT STATEMENT [CHOOSE] Cost=1

TABLE ACCESS BY ROWID EMP [ANALYZED]

INDEX RANGE SCAN EMP_I1 [ANALYZED]

You can see from this example that because the index is already sorted, the eligible rows will be queried in the order of the index, thus avoiding further sorting operations.

Depending on the type of index and the where constraint, there are four types of index scans:

Index unique scan (index unique scan)

Index range scan (index range scan)

Index full scan (index full scan)

Index Quick scan (index fast full scan)

(1) Index unique scan (index unique scan)

Finding a value through a unique index often returns a single ROWID. Oracle often implements unique scans if there are UNIQUE or PRIMARY KEY constraints that guarantee that statements access only a single line.

Examples of using uniqueness constraints:

SQL > explain plan for

Select empno,ename from emp where empno=10

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=1

TABLE ACCESS BY ROWID EMP [ANALYZED]

INDEX UNIQUE SCAN EMP_I1

(2) Index range scan (index range scan)

The typical case of using an index to access multiple rows of data and index range scanning on a unique index is to use range operators (such as >, =, explain plan for select empno,ename from emp) in predicates (where constraints)

Where empno > 7876 order by empno

Query Plan

SELECT STATEMENT [CHOOSE] Cost=1

TABLE ACCESS BY ROWID EMP [ANALYZED]

INDEX RANGE SCAN EMP_I1 [ANALYZED]

On non-unique indexes, the predicate col = 5 may return multiple rows of data, so index range scanning is used on non-unique indexes.

There are 3 situations when using index rang scan:

(a) the range operator is used on the unique index column (>

< >

= explain plan for select empno,ename from big_emp order by empno,ename

Query Plan

SELECT STATEMENT [CHOOSE] Cost=26

INDEX FULL SCAN BE_IX [ANALYZED]

(4) Fast index scan (index fast full scan)

Scanning all data blocks in the index is similar to index full scan, but a significant difference is that it does not sort the data from the query, that is, the data is not returned in sort order. In this access method, multi-block read function or parallel read function can be used to maximize throughput and shorten execution time.

An example of a quick index scan:

The BE_IX index is a multi-column index: big_emp (empno,ename)

SQL > explain plan for select empno,ename from big_emp

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=1

INDEX FAST FULL SCAN BE_IX [ANALYZED]

Select only the second column of the multi-column index:

SQL > explain plan for select ename from big_emp

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=1

INDEX FAST FULL SCAN BE_IX [ANALYZED]

III. Connection between tables

Join is a predicate that attempts to join two tables together, joining only two tables at a time, and table joins can also be referred to as table associations. In a later description, we will use "row source" instead of "table" because it is more rigorous to use row source, and each step in the process of calling the two connected row source row source1 and row source 2.Join respectively is often a serial operation, even if the related row source can be accessed in parallel, that is, the data of the two row source connected to the join can be read in parallel. However, after the qualified data in the table is read into memory to form row source, the other steps of join are generally serial. There are many ways to join two tables together, of course, each method has its own advantages and disadvantages, and each join type will give full play to its greatest advantages only under specific conditions.

The join order between row source (tables) has a great impact on the efficiency of the query. By first accessing a specific table, that is, the table as the driver table, you can apply some restrictions first, resulting in a smaller row source, which makes the join more efficient, which is why we often say that we have to implement the constraints first. Typically, the restrictions on the table in the where clause are applied when the table is read into memory.

According to the different operators in the connection conditions of the two row source, connections can be divided into equivalent connections (such as WHERE A.COL3 = B.COL4), non-equivalent connections (WHERE A.COL3 > B.COL4), and external connections (WHERE A.COL3 = B.COL4 (+)). The connection principle of each of the above connections is basically the same, so for a simple period, let's take the equivalent connection as an example.

In the following introduction, the following Sql is used as an example:

SELECT A.COL1, B.COL2

FROM A, B

WHERE A.COL3 = B.COL4

Assuming that table An is Row Soruce1, its corresponding join operation is associated with COL 3.

Table B is Row Soruce2, then its corresponding join operation is associated with COL 4.

Connection type:

So far, regardless of the connection operator, there are three typical connection types:

Sort-merge connections (Sort Merge Join (SMJ))

Nested loops (Nested Loops (NL))

Hash join (Hash Join)

In addition, there is a Cartesian product (Cartesian product), which is generally avoided.

1, sort-merge connections (Sort Merge Join, SMJ)

Internal connection process:

1) first generate the data needed by row source1, and then sort the data by join operation associated column (such as A.col3).

2) then generate the data needed by the row source2, and then sort the data according to the associated column of the join operation corresponding to the sort source1, such as B.col4.

3) the last two sorted rows are put together to perform the merge operation, that is, the two row source are connected according to the join conditions.

The following is a graphical representation of the connection steps:

MERGE

/\

SORTSORT

| | |

Row Source 1Row Source 2

If row source is already sorted on the join associated column, the join operation no longer requires a sort operation, which can greatly improve the join speed of this join operation, because sorting is an extremely resource-intensive operation, especially for larger tables. A pre-sorted row source includes columns that are already indexed (such as an index on a.col3 or b.col4) or row source that has been sorted in the previous step. Although the process of merging two row source is serial, the two row source can be accessed in parallel (such as parallel reading data, parallel sorting).

Examples of SMJ connections:

SQL > explain plan for

Select / * + ordered * / e.deptno, d.deptno

From emp e, dept d

Where e.deptno = d.deptno

Order by e.deptno, d.deptno

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=17

MERGE JOIN

SORT JOIN

TABLE ACCESS FULL EMP [ANALYZED]

SORT JOIN

TABLE ACCESS FULL DEPT [ANALYZED]

Sorting is a time-consuming and resource-consuming operation, especially for large tables. For this reason, SMJ is often not a particularly effective join method, but it is also quite efficient if both row source are pre-sorted.

2, nested loops (Nested Loops, NL)

This join method has the concept of driving tables (external tables). In fact, the join process is a 2-layer nested loop, so the less the number of outer loops, the better, which is why we use a small table or a table that returns a smaller row source as the theoretical basis for driving tables (for outer loops). But this theory is only a general guiding principle, because following this theory does not always guarantee that the number of statements generated will be minimized. Sometimes if you don't abide by this theoretical basis, you will get better efficiency. If you use this approach, it is important to decide which table to use as the driver table. Sometimes if the driver table is not selected correctly, it will result in poor performance of the statement.

Internal connection process:

Row 1 of Row source1-- Probe-> Row source 2

Row 2 of Row source1-- Probe-> Row source 2

Row 3 of Row source1-- Probe-> Row source 2

…….

Row source1's Row n-- Probe-> Row source 2

From the perspective of internal join process, it is necessary to use every row in row source1 to match all rows in row source2, so keeping row source1 as small and efficient as possible to access row source2 (usually implemented through index) is a key issue that affects the efficiency of this connection. This is only a theoretical guiding principle, and the purpose is to minimize the number of physical Imax O times for the whole join operation, and if this principle is followed, it will generally minimize the total number of physical Imax O s. But if you don't follow this guideline, you'll be able to connect with less physical Icano, even if you violate the guidelines! Because the least number of physical Istroke O times is the real guiding principle that we should follow, such an example is given in the following specific case studies.

In the above join process, we call Row source1 a driver table or an external table. Row Source2 is called a probe table or an internal table.

In a NESTED LOOPS connection, Oracle reads each row in row source1, then checks in row sourc2 for matching rows, all matched rows are placed in the result set, and then processes the next row in row source1. This process continues until all lines in the row source1 are processed. This is one of the fastest ways to get the first matching row from a join operation, and this type of join can be used in statements that require quick response, with response speed as the primary goal.

If the driving row source (external table) is small and there is a unique index on the inner row source (internal table), or if there is a highly selective non-unique index, this method can be used to achieve better efficiency. One of the advantages of NESTED LOOPS that other connection methods do not have is that you can return the connected rows first without having to wait for all the connection operations to be processed before returning data, which enables fast response time.

If parallel operations are not used, the best driver tables are those that can return fewer rows of data after applying where constraints, so large tables may also be called driver tables, depending on the constraints. For parallel queries, we often choose large tables as driver tables, because large tables can make full use of parallel functionality. Of course, sometimes using parallel operations on queries is not necessarily more efficient than queries without using parallel operations, because in the end, only a few rows of each table may meet the restrictions, and it depends on whether your hardware configuration can support parallelism (such as whether there are multiple CPU, multiple hard disk controllers), so you should deal with specific problems.

Examples of NL connections:

SQL > explain plan for

Select a.dname,b.sql

From dept a,emp b

Where a.deptno = b.deptno

Query Plan

-

SELECT STATEMENT [CHOOSE] Cost=5

NESTED LOOPS

TABLE ACCESS FULL DEPT [ANALYZED]

TABLE ACCESS FULL EMP [ANALYZED]

3, Hash connection (Hash Join, HJ)

This connection was introduced after oracle 7.3, is theoretically more efficient than NL and SMJ, and is only used in the CBO optimizer.

The smaller row source is used to build hash table and bitmap, and the second row source is used to hansed and match the hash table generated by the first row source for further connections. Bitmap is used as a fast lookup method to check if there are matching rows in the hash table. In particular, this lookup method is more useful when the hash table is too large to be fully contained in memory. This join method also has the concept of so-called driver table in NL connection. The table constructed as hash table and bitmap is the driver table. When the constructed hash table and bitmap can be contained in memory, this join method is extremely efficient.

Examples of HASH connections:

SQL > explain plan for

Select / * + use_hash (emp) * / empno

From emp, dept

Where emp.deptno = dept.deptno

Query Plan

--

SELECT STATEMENT [CHOOSE] Cost=3

HASH JOIN

TABLE ACCESS FULL DEPT

TABLE ACCESS FULL EMP

To make the hash connection valid, you need to set HASH_JOIN_ENABLED=TRUE, which is TRUE by default, and don't forget to set the hash_area_size parameter to make the hash connection run efficiently, because the hash connection will run in memory of the size specified by this parameter, and too small a parameter will make the performance of the hash connection worse than other connection methods.

In addition, Cartesian product (Cartesian Product)

When two row source are connected, but there is no association condition between them, a Cartesian product is made in the two row source, which is usually caused by an omission in writing code (that is, the programmer forgot to write the association condition). The Cartesian product is that each row of one table matches all the rows of another table in turn. In special cases, we can use Cartesian product, such as in star connection, in addition, we should try not to use Cartesian product, otherwise, we can figure out what the result is!

Notice in the following statement that there is no join between the two tables.

SQL > explain plan for

Select emp.deptno,dept,deptno

From emp,dept

Query Plan

SLECT STATEMENT [CHOOSE] Cost=5

MERGE JOIN CARTESIAN

TABLE ACCESS FULL DEPT

SORT JOIN

TABLE ACCESS FULL EMP

The CARTESIAN keyword indicates a Cartesian product between two tables. If the table emp has n rows and the dept table has m rows, the result of the Cartesian product is the result of n * m rows.

Finally, summarize which connection method is better in which case:

Sort-merge connections (Sort Merge Join, SMJ):

A) for non-equivalent connections, this connection is more efficient.

B) if there are indexes on all the associated columns, the effect is better.

C) for connecting 2 larger row source, this connection method is better than NL connection.

D) however, if the row source returned by sort merge is too large, it will lead to database performance degradation when using too much rowid to query data in the table.

Nested loops (Nested Loops, NL):

A) if the driving row source (external table) is small and there is a unique index on the inner row source (internal table), or if there is a highly selective non-unique index, this method can be used to achieve better efficiency.

B) NESTED LOOPS has an advantage that other connection methods do not have: you can return connected rows first without having to wait for all connection operations to be processed before returning data, which enables fast response time.

Hash connection (Hash Join, HJ):

A) this method was introduced later in oracle7, using a relatively advanced connection theory, generally speaking, its efficiency should be better than the other two connections, but this kind of connection can only be used in CBO optimizer, and appropriate hash_area_size parameters need to be set in order to achieve better performance.

B) relatively better efficiency can be achieved when connecting between two larger row source, and better efficiency can be achieved when a row source is smaller.

C) can only be used in equivalent connections

+ + +

Overview of the Oracle execution plan

Related concepts of Oracle execution plan:

Rowid: a pseudo column attached by the system to each row of oracle data, including the data table name, database id, storage database id, and a serial number. Rowid is unique during the life cycle of the row.

Recursive sql: in order to execute user statements, the system appends additional operation statements, such as the maintenance of data dictionaries.

Row source (Row Source): the collection of eligible rows returned by the previous operation during the oracle step.

Predicate (predicate): constraints after where.

Driving table (driver table): also known as a join outer table, is mainly used for nesting and hash join. Generally speaking, after applying the constraints, the table that returns fewer row sources is used as the driver table. In a later description, driving table is referred to as row source 1 of the join operation.

Probed table (probed table): joined inner table, after we get a specific row of data from driving table, look for a qualified row in probed table, so the table should be a larger row source, and there should be an index on the column corresponding to the join condition. In the later description, this table is generally referred to as row source 2. 0 for join operations.

Concatenated index (composite index): an index is called a composite index if it consists of multiple columns. The first column of the combined index is the boot column, and the index is available only if the predicate contains the boot column.

Optional: if the number of different values of a column in the table / the total number of rows of the table is close to 1, the selectivity of the column is high.

The access method of Oracle access data:

Full table scans, FTS (full table scan): by setting db_block_multiblock_read_count, you can set the number of data blocks that IO can read at a time, thus effectively reducing the total number of IO during full table scanning, that is, the data blocks to be accessed are pre-read into memory through the pre-read mechanism. Multi-block read operations can be used only in the case of a full table scan.

Table Access by rowed (access Table through rowid, rowid lookup): because the location of row storage is recorded in rowid, this is the fastest way for oracle to access single row data.

Index scan (Index scan index lookup): in the index, in addition to storing the value of each index, the index also stores the rowd value corresponding to the row with this value. Index scanning is divided into two steps: 1, scan the index to get rowid;2, and read the specific data through rowid. Each step is a separate IO, so if the total amount of data filtered by constraints is more than 5% of the total number of rows in the original table, the efficiency of using index scanning is much lower. If the result data can all be found in the index, the second step can be avoided, thus speeding up the retrieval speed.

Depending on the type of index and the where constraint, there are four types of index scans:

Index unique scan (index unique scan): returns a single rowid data content if unique or primary key exists.

Index range scan (index range scan): 1, using the range operator on the unique index (>, =

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report