What is the impact of cluster factor on the execution plan in the database? 07/04 Update SLTechnology News&Howtos

What is the impact of cluster factor on the execution plan in the database?

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the impact of cluster factor on the implementation plan in the database". The content of the explanation in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's way of thinking to study and learn "what is the impact of cluster factor on the implementation plan in the database"?

The influence of cluster factor on the implementation of the Plan

Test environment: Linux 7.6 + ORACLE 19.6.1

1. Create test environment 1.1.Create test tables and insert data CZH@czhpdb > create table test_ffs as select * from hr.employees; Table created. CZH@czhpdb > insert into test_ffs select * from test_ffs Execution Plan---Plan hash value: 296244252-- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |- -| 0 | INSERT STATEMENT | | 107 | 7383 | 3 (0) | 00:00:01 | | 1 | LOAD TABLE CONVENTIONAL | TEST_FFS | | 2 | | OPTIMIZER STATISTICS GATHERING | | 7383 | 3 (0) | 00:00:01 | | 3 | TABLE ACCESS FULL | TEST_FFS | 7383 | 3 (0) | 00:00:01 |-- | -Note--dynamic statistics used: statistics for conventional DML Statistics 72 recursive calls 89 db block gets 81 consistent gets 12 physical reads 21576 redo size 195 bytes sent via SQL*Net to client 394 bytes received via SQL*Net from client 1 SQL*Net roundtrips to/from client 3 sorts (memory) 0 sorts (disk) 107 rows processed

Two new features can be seen in the above autotrace execution plan:

1.212c R1 and 19c two new features 1.2.1 12c R1 new features OPTIMIZER STATISTICS GATHERING:

In the new feature after # OPTIMIZER STATISTICS GATHERING:12cR1, statistics are automatically collected when an empty table loads data for the first time in direct path load.

# Oracle Database 12c introduced online statistics gathering for CREATE TABLE AS SELECT statements and direct-path inserts.

1.2.2 19c New feature real-time statistics

Oracle Database 19c introduces real-time statistics

, which extend online support to conventional DML statements

. Because statistics can go stale between DBMS_STATS jobs, real-time statistics helps the optimizer generate more optimal plans.Whereas bulk load operations gather all necessary statistics, real-time statistics augment rather than replace traditional statistics.

Oracle introduced new parameters

"_ optimizer_gather_stats_on_conventional_dml" and "_ optimizer_use_stats_on_conventional_dml" which are true by default

"_ optimizer_stats_on_conventional_dml_sample_rate" at 100%

How does real time statistics works?

By default the "_ optimizer_gather_stats_on_conventional_dml" is true so its automatically kicks off

When a DML operation is currently modifying a table (conventional), Oracle Database dynamically computes values for the most essential statistics if the above parameter is on.

Consider an example of table that is having lot of inserts and rows are increasing. Real-time statistics keep track of the increasing row count as rows are being inserted. If the optimizer performs a hard parse of a new query, then the optimizer can use the real-time statistics to obtain a more accurate cost estimate.

DBA_TAB_COL_STATISTICS and DBA_TAB_STATISITICS has columns NOTES tell real time statistics have been used. STATS_ON_CONVENTIONAL_DML

SELECT NVL (PARTITION_NAME, 'GLOBAL') PARTITION_NAME, NUM_ROWS, BLOCKS, NOTESFROM USER_TAB_STATISTICSWHERE TABLE_NAME =' SALES'ORDER BY 1,4 PARTITION_NAM NUM_ROWS BLOCKS NOTES--GLOBAL 1837686 3315 STATS_ON_CONVENTIONAL_DML1.3 inserts large amounts of data and collects statistics CZH@czhpdb > set autot offCZH@czhpdb > insert into test_ffs select * from test_ffs CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > insert into test_ffs select * from test_ffs CZH@czhpdb > insert into test_ffs select * from test_ffs;CZH@czhpdb > commit; CZH@czhpdb > CREATE INDEX IDX_TEST_FFS ON TEST_FFS (EMPLOYEE_ID); CZH@czhpdb > EXEC DBMS_STATS.GATHER_TABLE_STATS (user,'TEST_FFS',cascade= > true); 1.4 use Hint / * + gather_plan_statistics * / to get sql real execution plan

In # sqlplus, both set autotrace and explain plan for are the estimated execution plans of CBO, which may not be the same as the actual execution. We use the following hint to get the real execution plan.

CZH@czhpdb > SELECT / * + gather_plan_statistics * / salary from test_ffs where employee_id

< 100; no rows selected 真实执行计划： SYS@orcl2 >

Select * from table (dbms_xplan.display_cursor ('c9qg9su5khys last') PLAN_TABLE_OUTPUT- -SQL_ID c9qg9su5khysd Child number 0--SELECT / * + gather_plan_statistics * / salary from test_ffs whereemployee_id

< 100 Plan hash value: 296244252 ----------------------------------------------------------------------------------------| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |----------------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 1 | | 0 |00:00:00.01 | 2288 ||* 1 | TABLE ACCESS FULL| TEST_FFS | 1 | 1 | 0 |00:00:00.01 | 2288 |---------------------------------------------------------------------------------------- Predicate Information (identified by operation id):--------------------------------------------------- 1 - filter("EMPLOYEE_ID" alter system flush shared_pool; System altered. #如果不清空shared_pool或者使游标失效，软解析开启10053事件，不会生成trace文件。 CZH@czhpdb >

ALTER SESSION SET EVENTS '10053 trace name context forever,level 1; Session altered. CZH@czhpdb > SELECT / * + gather_plan_statistics * / salary from test_ffs where employee_id

< 100; no rows selected CZH@czhpdb >

ALTER SESSION SET EVENTS '10053 trace name context off'; Session altered.

19c 10053:

# you can see DK (distinct key), CLUF (clustering factor), IX_SEL from 10053 below. The next step is to calculate why CBO thinks that index cost is higher than full table scan based on several parameters.

* * BASE STATISTICAL INFORMATION***Table Stats:: Table: TEST_FFS Alias: TEST_FFSonline table stats for conventional DML (block count: 2263 row count: 219029) used on (TEST_FFS) block count: 5-> 2263 Row count: 219136 # Rows: 219136 SSZ: 0 # Blks: 2263 AvgRowLen: 69.00 NEB: 0 ChainCnt: 0.00 ScanRate: 0.00 SPC: 0 RFL: 0 RNF: 0 CBK: 0 CHR: 0 KQDFLG: 193 # IMCUs: 0 IMCRowCnt: 0 IMCJournalRowCnt: 0 # IMCBlocks: 0 IMCQuotient: 0.000000Index Stats:: Index: IDX_TEST_FFS Col#: 1 LVLS: 1 # LB: 458 # DK: 107 LB/K: 4.00 DB / K: 1524.00 CLUF: 163174.00 NRW: 219136.00 SSZ: 0.00 LGR: 0.00 CBK: 0.00 GQL: 0.00 CHR: 0.00 KQDFLG: 8192 BSZ: 1 KKEISFLG: 1 try to generate single-table filter predicates from ORs for query block SEL$1 (# 0) finally: "TEST_FFS". "EMPLOYEE_ID" 206.00 Nnl: 0-> 0, acl: 4-> 0 Column (# 1): EMPLOYEE_ID (NUMBER) AvgLen: 22 NDV: 107 Nulls: 0 Density: 0.009346 Min: 100.000000 Max: 206.000000 Using density: 0.009346 of col # 1 as selectivity of unpopular value pred Table: TEST_FFS Alias: TEST_FFS Card: Original: 219136.000000 Rounded: 2048 Computed: 2048.000000 Non Adjusted: 2048.000000 Scan IO Cost (Disk) = 615.000000 Scan CPU Cost (Disk) = 49272938.720000 Cost of predicates: io = NOCOST Cpu = 50.000000, sel = 0.009346 flag = 2048 ("TEST_FFS". "EMPLOYEE_ID" 2100.00, max: 24000.00-> 24000.00, nnl: 0-> 0, acl: 4-> 0 * 2. Adjust the cluster factor2.1 cluster factor clustering factor description

Cluster factor indicates that the index order is consistent with the table storage data. When the index is scanned sequentially, if the index key value scans the data block corresponding to the table data row corresponding to the key value, the cluster factor is added by 1, so the minimum cluster factor is the table data block, and the maximum is the table data row, which is highly related to the table storage order. If the table is inserted sequentially, the cluster factor is lower, and if the table data is inserted out of order, the cluster factor is higher. This is why the execution plan sometimes varies in the case of the same table data.

Index scan cost formula:

INDEX ACCESS I take O COST=BLEVEL+CEIL (# LEAF_BLOCKS*IX_SEL)

TABLE_ACCESS I take O COST=CEIL (CLUSTERING_FACTOR*IX_SEL_WITH_FILTERS)

IX_SEL and IX_SEL_WITH_FILTERS are index selection rate and index with predicate selection rate, which is generally 1 / (DISTINCT KEY). In this case, when you scan the whole table, IX_SEL=1/107=0.009345, the index cost is calculated as follows:

ACCESS INDEX COST=INDEX ACCESS I COST + TABLE ACCESS I COST=2+CEIL O COST=2+CEIL (458 to 0.009345) + CEIL (163174 to 0.009345) = 1540

It is approximately equal to the 1532 estimated by CBO, which is higher than the COST 615 with full table scan, so I chose to take the full table scan.

2.2 adjust cluster factor

Rebuild tables, order by sort, lower cluster factor

CZH@czhpdb > create table test_ffs_03 as select * from test_ffs_02 order by employee_id; Table created. CZH@czhpdb > create index idx_test_ffs_03 on test_ffs_03 (employee_id); Index created. CZH@czhpdb > select clustering_factor,index_name from user_indexes where index_name='IDX_TEST_FFS_03' CLUSTERING_FACTOR INDEX_NAME---1128 IDX_TEST_FFS_03

# you can see a significant decrease in cluster factor.

CZH@czhpdb > select / * + gather_plan_statistics * / salary from test_ffs_03 where employee_id

< 100; no rows selected SYS@orcl2 >

Select * from table (dbms_xplan.display_cursor ('8fpk2b8vzn5y2) PLAN_TABLE_OUTPUT- -SQL_ID 8fpk2b8vzn5y2 Child number 0--select / * + gather_plan_statistics * / salary from test_ffs_03 whereemployee_id < 100Plan hash value: 704625359-- -| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads |- -| 0 | | SELECT STATEMENT | | 1 | 0 | 00 TEST_FFS_03 00.01 | 2 | 1 | 1 | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | TEST_FFS_03 | 1 | 1024 | 0 | 00 TEST_FFS_03 00.01 | 2 | 1 | * 2 | INDEX RANGE SCAN | IDX_TEST_FFS_03 | 1 | 1024 | 0 | 000.00 | 00.01 | 2 | 1 |- -Predicate Information (identified by operation id):-2-access ("EMPLOYEE_ID")

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.