The meaning and importance of clustering Factor of Oracle Index 07/13 Update SLTechnology News&Howtos

The meaning and importance of clustering Factor of Oracle Index

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In the index statistics, we need to focus on CLUSTERING_FACTOR (clustering factor).

In Oracle database, clustering factor refers to the storage order and similarity of index rows sorted according to index key values and data rows stored in the corresponding table. Oracle calculates the value of the clustering factor according to the following algorithm:

The initial value of clustering factor is 1.

Oracle first locates to the leftmost leaf block of the target index.

The sequential scan starts with the index row where the first index key value of the leftmost leaf block is located. In the process of sequential scan, Oracle compares the rowid of the current index row with the rowid of the index row before it (they are adjacent relations). If the two rowid do not point to the same table block, then Oracle increments the current value of the clustering factor by 1. If the two rowid point to the same table block, the Oracle does not change the current value of the clustering factor. Note that here Oracle does not need to go back to the table to access the corresponding table block when comparing rowid.

The above process of comparing rowid continues until all index rows in all leaf blocks of the target index are sequentially scanned.

After the above sequential scan operation is completed, the current value of the clustering factor is the CLUSTERING_FACTOR,Oracle in the index statistics and stores it in the data dictionary.

From the algorithm of the clustering factor mentioned above, we can know that if the value of the clustering factor is close to the number of table blocks of the corresponding table, it means that the storage order of the target index row and the data row stored in the corresponding table is very similar. This means that when Oracle scans the index range and obtains the target rowid and then returns to the table to access the data of the corresponding table block, the rowid corresponding to the adjacent index rows is very likely to be in the same table block, that is, after the Oracle reads the corresponding table block through the index row record rowid for the first time and caches the table block in buffer cache, when the corresponding table block is read back to the table for the second time through the rowid of the adjacent index row record There is no need to generate the physical Istroke O, because the table block to be accessed this time is the same as the table block that was accessed last time, and Oracle has cached it in the buffer cache. If the value of the clustering factor is close to the number of records in the corresponding table, it means that the storage order and similarity between the target index rows and the data rows stored in the corresponding table are very low, which means that when Oracle scans the index range to get the target rowid and then returns to the table to access the data of the corresponding table block, the rowid corresponding to the adjacent index rows is very likely not in the same table block. That is, after the Oracle reads the corresponding table block for the first time and caches the table block in the buffer cache through the rowid recorded by the index row, when the corresponding table block is read back to the table for the second time through the rowid of the adjacent index row record, it is also necessary to generate the physical Icando O, because the table block to be accessed this time is not the same as the table block that has been accessed last time.

In other words, the index with high clustering factor consumes more physical Imax O when scanning the index range than the index with low clustering factor under the same conditions. therefore, the cost of index range scanning of indexes with high clustering factors is higher than that of indexes with low clustering factors under the same conditions.

Here we construct a very extreme example where the rowid without any adjacent index row records in the full index points to the same data block in the table:

According to the algorithm of clustering factor mentioned above, we can calculate that the value of clustering factor of this index IDX_T1 should be 20.

Zx@MYDB > create table T1 (id number,name char (1200)); Table created.zx@MYDB > insert into T1 values (1Zhe 1'); 1row created.zx@MYDB > insert into T1 values (3ZHI 3'); 1row created.zx@MYDB > insert into T1 values (5MJ 5'); 1row created.zx@MYDB > insert into T1 values (7MJ 7'); 1row created.zx@MYDB > insert into T1 values (9LJ 9'); 1row created.zx@MYDB > insert into T1 values (11MJ 11') 1 row created.zx@MYDB > insert into T1 values (13meme 13'); 1 row created.zx@MYDB > insert into T1 values (15'); 1 row created.zx@MYDB > insert into T1 values (17'); 1 row created.zx@MYDB > insert into T1 values (19'); 1 row created.zx@MYDB > insert into T1 values (2); 1 row created.zx@MYDB > insert into T1 values (4); 1 row created.zx@MYDB > insert into T1 values (6) 1 row created.zx@MYDB > insert into T1 values (8 row created.zx@MYDB > insert into T1 values); 1 row created.zx@MYDB > insert into T1 values (12'); 1 row created.zx@MYDB > insert into T1 values (14'); 1 row created.zx@MYDB > insert into T1 values (16); 1 row created.zx@MYDB > insert into T1 values (18'); 1 row created.zx@MYDB > insert into T1 values (20') 1 row created.zx@MYDB > commit;Commit complete.zx@MYDB > create index idx_t1 on T1 (id); Index created.zx@MYDB > col location for a10zx@MYDB > select id,dbms_rowid.rowid_relative_fno (rowid) | |'_'| | dbms_rowid.rowid_block_number (rowid) location from T1 order by location,id ID LOCATION--1 4 '300 3 4' 300 5 4 '300 7 4' 300 9 4 '300 11 4' 301 13 4 '301 15 4' 301 17 4 '301 19 4' 301 4 '302 4' 302 6 4_302 8 4_302 10 4_302 12 4_303 14 4_303 16 4_303 18 4_30320 4_30320 rows selected.

From the above display results, we can see that 1, 3, 5, 7, 9 are in data block 300 of document 4, 11, 13, 15, 17, 19 are in data block 301 of document 4, 2, 4, 6, 8, 10 are in data block 302 of document 4, and 12, 14, 16, 18, 20 are in data block 303 of document 4.

Collect statistics and view the value of the clustering factor

# collect statistical information and view the value of clustering factor zx@MYDB > exec dbms_stats.gather_table_stats (ownname= > 'ZX',tabname= >' T1century grammar methodological opt = > 'for all columns size auto',cascade= > true,estimate_percent= > 100); PL/SQL procedure successfully completed.zx@MYDB > select index_name,clustering_factor from dba_indexes where index_name='IDX_T1' INDEX_NAME CLUSTERING_FACTOR -- IDX_T1 20

In Oracle database, the only way to reduce the clustering factor of the target index is to sort the data in the table according to the index key value of the target index and then re-store it. It is important to note that this method of re-storing data in a table after sorting by the index key value of a target index does reduce the value of the clustering factor of that target index. however, it may also increase the value of the clustering factor of other index values that exist on the table.

Copy the data from table T1 to table T2 intact, except that the data from table T2 is sorted according to the id column at the time of storage.

Zx@MYDB > create table T2 as select * from T1 order by id;Table created.zx@MYDB > create index idx_t2 on T2 (id); Index created.zx@MYDB > select id,dbms_rowid.rowid_relative_fno (rowid) | |'_'| | dbms_rowid.rowid_block_number (rowid) location from T2 order by location,id ID LOCATION--1 4 "171 2 4" 171 3 4 "171 4" 171 5 5 "171 6 4" 172 7 4 "172 8 4" 172 9 4 "172 10 4 172 11 4" 173 12 4 "173 13 4 '173 14 4' 173 15 4 '173 16 4' 174 174 18 4 '174 19 4' 17420 4 '17420 rows selected.zx@MYDB > exec dbms_stats.gather_table_stats (ownname= >' ZX' Tabname= >'T1 methods > 'for all columns size auto',cascade= > true,estimate_percent= > 100) PL/SQL procedure successfully completed.zx@MYDB > select index_name,clustering_factor from dba_indexes where index_name='IDX_T2' INDEX_NAME CLUSTERING_FACTOR -- IDX_T2 4

Repeat the same series of operations as table T1, and you can see that the clustering factor of index IDX_T2 is reduced to 4. And the adjacent data are all in the same data block.

In the Oracle database, CBO uses the following formula when calculating the cost of an index range scan (Index Range Scan):

IRS Cost = O Cost + CPU Cost

The formula for calculating the Cost is as follows:

Index Access O Cost = Table Access O Cost + Table Access O Cost

Index Access I Cost O Cost = BLEVEL + CEIL (# LEAF_BLOCKS * IX_SEL)

Table Access I Cost O Cost = CEIL (CLUSTERING_FACTOR * IX_SEL_WITH_FILTERS)

It can be inferred from this formula that the cost of scanning the index range can be approximately regarded as directly proportional to the clustering factor, so the value of the clustering factor actually plays an important role in CBO's judgment of whether to walk the related index or not.

To demonstrate an example, by changing the value of a clustered index, the original execution plan of the index range scan is changed to a full table scan:

Zx@MYDB > create table T1 as select * from dba_objects;Table created.zx@MYDB > create index idx_t1 on T1 (object_id); Index created.zx@MYDB > select clustering_factor from dba_indexes where index_name='IDX_T1';CLUSTERING_FACTOR- 1063zx@MYDB > select / * + cluster_factor_expmple_1 * / object_id,object_name from T1 where object_id between 103 and 108 OBJECT_ID OBJECT_NAME--103MIGRATE$ 104DEPENDENCY$ 105ACCESS$ 106I_DEPENDENCY1 107I_DEPENDENCY2 108I_ACCESS16 rows selected.zx@MYDB > select * from table (dbms_xplan.display_cursor (null,null,'all')) PLAN_TABLE_OUTPUT- -SQL_ID ga3jv3kwwwmx5 Child number 0--select / * + cluster_factor_expmple_1 * / object_id Object_name from t1where object_id between 103and 108Plan hash value: 50753647-| Id | Operation | Name | Rows | | Bytes | Cost (% CPU) | Time |-| 0 | SELECT STATEMENT | | | 3 | TABLE ACCESS BY INDEX ROWID | | 1 | TABLE ACCESS BY INDEX ROWID | T1 | 6 | 474 | 3 (0) | 00:00:01 | * 2 | INDEX RANGE SCAN | IDX_T1 | 6 | | 2 (0) | 00:00:01 |-| -. Omit part of the output

SQL scanned the index range with a cost value of 3

Use Hint to force SQL to walk through the table scan:

Zx@MYDB > select / * + full (T1) * / object_id,object_name from T1 where object_id between 103 and 108 OBJECT_ID OBJECT_NAME--103MIGRATE$ 104DEPENDENCY$ 105ACCESS$ 106I_DEPENDENCY1 107I_DEPENDENCY2 108I_ACCESS16 rows selected.zx@MYDB > select * from table (dbms_xplan.display_cursor (null,null,'all')) PLAN_TABLE_OUTPUT- -SQL_ID b7hjwuvmg2ncy Child number 0--select / * + full (T1) * / object_id Object_name from T1 where object_idbetween 103 and 108Plan hash value: 3617692013 Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |- -| 0 | SELECT STATEMENT | 287 (100) | * 1 | TABLE ACCESS FULL | T1 | 6 | 474 | 287 (1) | ) | 00:00:04 | -. Omit part of the output

Now SQL runs a full table scan with a cost value of 287.

We already know that the cost of walking the index range scan can be approximately proportional to the clustering factor, so if we want to change the execution plan of the above SQL from the index range scan to the full table scan, we only need to adjust the value of the clustering factor so that the cost of walking the index range scan is greater than that of the full table scan.

Manually adjust the value of the clustering factor of index IDX_T1 to 1 million:

Zx@MYDB > exec dbms_stats.set_index_stats (ownname= > 'ZX',indname= >' IDX_T1',clstfct= > 1000000 where object_id between where object_id between = > false); PL/SQL procedure successfully completed.zx@MYDB > select clustering_factor from dba_indexes where index_name='IDX_T1';CLUSTERING_FACTOR- 1000000zx@MYDB > select / * + cluster_factor_expmple_2 * / object_id,object_name from T1 invalidates OBJECT_ID OBJECT_NAME--103MIGRATE$ 104DEPENDENCY$ 105ACCESS$ 106I_DEPENDENCY1 107I_DEPENDENCY2 108I_ACCESS16 rows selected.zx@MYDB > select * from table (dbms_xplan.display_cursor (null,null,'all')) PLAN_TABLE_OUTPUT- -SQL_ID 1ucqsj4j0j432 Child number 0--select / * + cluster_factor_expmple_2 * / object_id Object_name from t1where object_id between 103and 108Plan hash value: 50753647-| Id | Operation | Name | Rows | | Bytes | Cost (% CPU) | Time |-| 0 | SELECT STATEMENT | | | 1 | TABLE ACCESS BY INDEX ROWID | T1 | 6 | 474 | 105 (0) | 00:00:02 | * 2 | INDEX RANGE SCAN | IDX_T1 | 6 | | 2 (0) | 00:00:01 |-| -. Omit part of the output

As can be seen from the execution plan, the cost has changed from 3 to 105 (that is, an increase of 102), which indicates that our adjustment to the clustering factor of the index IDX_T1 has taken effect.

To make the cost greater than 287, you only need to adjust the value of the clustering factor to 4 million.

Zx@MYDB > exec dbms_stats.set_index_stats (ownname= > 'ZX',indname= >' IDX_T1',clstfct= > 40000 bookrunners invalidates = > false); PL/SQL procedure successfully completed.zx@MYDB > select clustering_factor from dba_indexes where index_name='IDX_T1';CLUSTERING_FACTOR- 4000000zx@MYDB > select / * + cluster_factor_expmple_3 * / object_id,object_name from T1 where object_id between 103 and 108 OBJECT_ID OBJECT_NAME--103MIGRATE$ 104DEPENDENCY$ 105ACCESS$ 106I_DEPENDENCY1 107I_DEPENDENCY2 108I_ACCESS16 rows selected.zx@MYDB > select * from table (dbms_xplan.display_cursor (null,null,'all')) PLAN_TABLE_OUTPUT- -SQL_ID cwkc8q61bypa6 Child number 0--select / * + cluster_factor_expmple_3 * / object_id Object_name from t1where object_id between 103 and 108Plan hash value: 3617692013 Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |- -| 0 | SELECT STATEMENT | 287 (100) | * 1 | TABLE ACCESS FULL | T1 | 6 | 474 | 287 (1) | ) | 00:00:04 |-

From what is shown above, you can see that the execution plan has changed from an index range scan to a full table scan.

Refer to "SQL Optimization based on Oracle"

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.