Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Oracle dynamic sampling learning

2025-01-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Dynamic sampling (Dynamic Sampling) is a technology introduced in ORACLE 9i Release 2 to cope with the optimizer generating a better execution plan when database objects are not analyzed (statistics are missing). In a nutshell, in the case of no analysis of database segment (table, index, partition) objects, a technique invented to enable the CBO optimizer to get enough information to ensure that the optimizer makes a correct execution plan. It analyzes the data blocks on a certain number of segment objects to get the statistics needed by CBO. Dynamic sampling technology is only a supplement to statistical information, and it can not completely replace statistical information analysis.

Note: dynamic sampling is called Dynamic Sampling before Oracle 11g and renamed Dynamic Statistic after ORACLE 12c.

Default sampling level for Oracle11G R2:

SQL > show parameter optimizer_dynamic_sampling

NAME TYPE VALUE

-

Optimizer_dynamic_sampling integer 2

SQL > show parameter Dynamic Statistic

NAME TYPE VALUE

-

Optimizer_dynamic_sampling integer 2

There are 11 levels of dynamic sampling: please check the official documentation for yourself

Http://docs.oracle.com/cd/E11882_01/server.112/e41573/stats.htm#PFGRF30101

Dynamic sampling experiment:

1. Create a test table test

SQL > create table test as select * from dba_objects

Table created.

SQL > select count (1) from test

COUNT (1)

-

86259

2. Instead of using dynamic sampling, view the execution plan

SQL > set autotrace traceonly explain

SQL > select / * + dynamic_sampling (test 0) * / * from test

Execution Plan

Plan hash value: 1357081020

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 100K | 19m | 336 (1) | 00:00:05 |

| | 1 | TABLE ACCESS FULL | TEST | 100K | 19m | 336 (1) | 00:00:05 |

Note

-

-dynamic sampling used for this statement (level=2)

As you can see from the above, the number of rows in the order optimizer estimate table test is shown as 100K. Let's look at the following execution plan using dynamic sampling, and how many rows will be estimated by the optimizer:

3. Use dynamic sampling to view the execution plan (the following is queried directly, because dynamic sampling is enabled by default in 11G)

SQL > select * from test

Execution Plan

Plan hash value: 1357081020

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 72258 | 14m | 336 (1) | 00:00:05 |

| | 1 | TABLE ACCESS FULL | TEST | 72258 | 14m | 336 (1) | 00:00:05 |

Note

-

-dynamic sampling used for this statement (level=2)

If dynamic sampling is enabled (by default, the dynamic sampling level is 2), the optimizer guesses based on some data information obtained by dynamic sampling and estimates that the number of rows recorded in the table TEST is 86259, which is close to the actual number of rows recorded. It's much better than not doing dynamic sampling analysis.

If we increase the dynamic sampling level to 3, as shown below, we find that the information obtained by the optimizer based on dynamic sampling is more accurate than the default (by default, dynamic sampling level is 2). The optimizer estimates that the number of rows in the table TEST is 92364, which is closer to the actual situation than 72258.

SQL > select / * + dynamic_sampling (test 3) * / * from test

Execution Plan

Plan hash value: 1357081020

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 92364 | 18m | 336 (1) | 00:00:05 |

| | 1 | TABLE ACCESS FULL | TEST | 92364 | 18m | 336 (1) | 00:00:05 |

Note

-

-dynamic sampling used for this statement (level=2)

4. In this article by Master Tom, it is mentioned that without dynamic sampling, if the table data is deleted, the result set estimated by the CBO optimizer is the same as before.

This is because when the data of a table is deleted, the extent and block allocated by the table will not be automatically recycled (the high water mark remains the same). Therefore, if CBO does not sample data blocks for analysis, but only obtains information such as extend from the data dictionary, it will mistakenly think that there is still so much data. Let's empty the test table data and see how the execution plan is.

SQL > delete from test

86259 rows deleted.

SQL > commit

SQL > select / * + dynamic_sampling (test 0) * / * from test;-do not use dynamic sampling

Execution Plan

Plan hash value: 1357081020

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 100K | 19m | 336 (1) | 00:00:05 |

| | 1 | TABLE ACCESS FULL | TEST | 100K | 19m | 336 (1) | 00:00:05 |

SQL > select * from test;-use dynamic sampling

Execution Plan

Plan hash value: 1357081020

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 1 | 207 | 335 (0) | 00:00:05 |

| | 1 | TABLE ACCESS FULL | TEST | 1 | 207 | 335 (0) | 00:00:05 |

Note

-

-dynamic sampling used for this statement (level=2)

From the above view, we can see the difference between not using dynamic sampling and using dynamic sampling.

5. We collect the statistics on the test table: query again, and the execution plan of the table will be less: dynamic sampling

SQL > select * from test

Execution Plan

Plan hash value: 1357081020

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 1 | 207 | 335 (0) | 00:00:05 |

| | 1 | TABLE ACCESS FULL | TEST | 1 | 207 | 335 (0) | 00:00:05 |

SQL >

The second case: when the table TEST is parsed, dynamic sampling is used if the query script contains temporary tables. Because temporary tables are not analyzed, they have no statistics. As follows:

SQL > drop table test

SQL > create table test as select * from dba_objects

SQL > exec dbms_stats.gather_table_stats (ownname = > 'SYS',tabname = >' TEST',cascade= > TRUE)

SQL > create global temporary table tmp (object_type varchar2 (19))

SQL > insert into tmp select distinct object_type from dba_objects

44 rows created.

SQL > commit

Then look at the execution plan of the following query statement:

SQL > select t.owner.owner.objectoriented type from test t inner join tmp l on t.object_type=l.object_type

Execution Plan

Plan hash value: 19574435

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 1 | 26 | 338 (1) | 00:00:05 |

| | * 1 | HASH JOIN | | 1 | 26 | 338 (1) | 00:00:05 |

| | 2 | TABLE ACCESS FULL | TMP | 1 | 11 | 2 (0) | 00:00:01 |

| | 3 | TABLE ACCESS FULL | TEST | 86260 | 1263K | 336 (1) | 00:00:05 |

Predicate Information (identified by operation id):

1-access ("T". "OBJECT_TYPE" = "L". "OBJECT_TYPE")

Note

-

-dynamic sampling used for this statement (level=2)

SQL >

You can see from the above that although you performed a full table scan on the tmp table, the optimizer estimated only one row of data

6. Dynamic sampling also has a unique ability to count the correlation between different columns.

Table statistics are relatively independent. When the query involves the correlation between columns, the statistics are somewhat inadequate. Take a look at Master Tom's example.

Create a special table t, then create an index t_idx on fields flag1 and flag2, and then analyze and collect statistics

SQL > create table t as select decode (mod (rownum,2), 0Magnesia,'Y') flag1, decode (mod (rownum,2), 0LINGY,'N') flag2, a.* from all_objects a

SQL > create index t_idx on t (flag1, flag2)

SQL > begin

Dbms_stats.gather_table_stats (user,'T'

Method_opt = > 'for all indexed columns size 254')

End

/

PL/SQL procedure successfully completed.

6.2. View the number of rows in the table:

SQL > select num_rows, num_rows/2, num_rows/2/2 from user_tables where table_name='T'

NUM_ROWS NUM_ROWS/2 NUM_ROWS/2/2

84396 42198 21099

6.3.Look at the execution plan for the SQL statement of the flag1 filter condition:

SQL > select * from t where flag1='N'

Execution Plan

Plan hash value: 1601196873

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 42937 | 4276K | 342K (1) | 00:00:05 |

| | * 1 | TABLE ACCESS FULL | T | 42937 | 4276K | 342K (1) | 00:00:05 |

Predicate Information (identified by operation id):

1-filter ("FLAG1" ='N')

As can be seen from the above execution plan, the number of lines guessed and estimated by the CBO optimizer is 42937, which is quite close to the number of 42198 records.

6.4. take a look at the execution plan for the SQL statement of the flag2 filter condition:

SQL > select * from t where flag2='N'

Execution Plan

Plan hash value: 1601196873

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 41459 | 4129K | 342K (1) | 00:00:05 |

| | * 1 | TABLE ACCESS FULL | T | 41459 | 4129K | 342K (1) | 00:00:05 |

Predicate Information (identified by operation id):

1-filter ("FLAG2" ='N')

As can be seen from the above execution plan, the number of lines guessed and estimated by the CBO optimizer is 41459, which is quite close to the number of 42198 records.

6.5. if the condition flag1 ='N' and flag2 = 'Nice, we judge that such a record certainly does not exist according to logical reasoning, which is the original intention of painstakingly constructing this special case. Let's see how the CBO optimizer detects and predicts.

SQL > select * from t where flag1 ='N' and flag2 ='N'

Execution Plan

Plan hash value: 1601196873

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

| | 0 | SELECT STATEMENT | | 21093 | 2101K | 342K (1) | 00:00:05 |

| | * 1 | TABLE ACCESS FULL | T | 21093 | 2101K | 342K (1) | 00:00:05 |

Predicate Information (identified by operation id):

1-filter ("FLAG2" ='N' AND "FLAG1" ='N')

From the above, the number of records estimated by CBO is 12468, which is a far cry from the actual situation. In fact, it is estimated by the CBO optimizer:

The number of records in flag1='N' accounts for 1 / 2 of the total.

The number of records in flag2='N' accounts for 1 / 2 of the total.

6.6.According to NUM_ROWS/2/2 = 12468. This is obviously unreasonable. Let's see if dynamic sampling can avoid CBO errors by raising the dynamic sampling level:

SQL > select / * + dynamic_sampling (t 3) * / * from t where flag1 ='N' and flag2 ='N'

Execution Plan

Plan hash value: 470836197

-

| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |

-

| | 0 | SELECT STATEMENT | | 6 | 612 | 2 (0) | 00:00:01 |

| | 1 | TABLE ACCESS BY INDEX ROWID | T | 6 | 612 | 2 (0) | 00:00:01 |

| | * 2 | INDEX RANGE SCAN | T_IDX | 6 | | 1 (0) | 00:00:01 |

-

Predicate Information (identified by operation id):

2-access ("FLAG1" ='N' AND "FLAG2" ='N')

Note

-

-dynamic sampling used for this statement (level=2)

Note:

①: the higher the sampling level, the more blocks of data are sampled, the closer the analytical data is to the reality, but at the same time, the overhead of resource consumption increases. This is something that needs to be considered. The default sampling level for ORACLE 10g & 11g is 2, and it is common to use the dynamic_sampling prompt in a session to modify the dynamic sampling level.

②: everything has its advantages and disadvantages, and dynamic sampling is not an artifact. The more data blocks it samples, the greater the system overhead, which will increase the hard parsing time of SQL. If it is a database warehouse (DW, OLAP) environment, the SQL execution time is quite long, and the hard parsing time is only a small part of the whole SQL execution time, then the dynamic sampling level can be appropriately improved, which is helpful for the optimizer to obtain more correct information. It is generally appropriate to set it to 3 or 4.

③: in OLTP systems with serious concurrency, thousands of SQL statements are executed every second, which requires short SQL statements and short execution time, so dynamic sampling level should be reduced or dynamic sampling should not be used in OLTP systems.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report