Using functional Index to deal with data skew cases in Oracle 04/20 Update SLTechnology News&Howtos

Using functional Index to deal with data skew cases in Oracle

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

For this article, refer to realkid4's blog:

Http://blog.itpub.net/17203031/viewspace-681311/

For more information about B-Tree, Bitmap, and functional indexing, please refer to another blog post:

Summary of use cases of B-Tree, Bitmap and functional indexing in Oracle

Generally speaking, the data columns selected by the index are best with high dispersion and good selectivity. From the point of view of the index tree structure, the column values are distributed in the position of the leaf node. In this way, the number of leaf nodes obtained by tree structure search is more efficient.

In practice, we often encounter the skew of data column values. That is, the value of the entire column data is limited. However, most of the data values are concentrated in a few values, and the proportion of other values is very small. For example, a data column has values of "N", "B", "M", "P" and "Q", of which 55% of the data rows are "N", 40% of the data rows are "B", and the remaining values are distributed in 5% of the data rows. There are some problems with indexing data columns of this structure.

First of all, the default database is to index all column values (non-null). This means that both high-frequency and low-frequency values will appear on the leaf nodes of the index structure. Of course, most of these leaf nodes are these duplicate values.

Secondly, under the action of CBO (cost-based optimizer), the search for high frequency values generally does not choose the index as the search path, because full table scanning may be more efficient. We have indexed the data columns, but queries for high-frequency words never go to the index path (as the following experiment will prove).

Finally, the established index consumes a lot of space and time. The established index covers all values, and the space occupied by massive data tables is bound to be large. At the same time, in the low-frequency data query, although the index path will be taken, but the logical and physical reading also has some loss.

Let's introduce a solution

Idea: since the high frequency value will not go to the index path when querying, you can consider removing it from the process of building the index and establishing the index structure only for those low frequency data values. In this way, the index tree structure is relatively small, and the efficiency of index query can be improved.

The specific method is to use the decode function. Decode (a ~ ~ b ~ c ~ ~ d ~ e … F) meaning: if aquib, return c, equal to d, return e, and finally return f if there is no match. For the above example, you can use decode (column name, 'column name, null,'B',null, column name), which means that if the column value is N or B, set it directly to null, otherwise the column value will be returned, and the function index will be established.

This is done with the help of two functions of Oracle: 1, do not generate an index on the null value; 2, functional index

The following experiments are used to prove this method:

1. Construct the experimental test environment.

-- create a test data table tb_wjq

SEIANG@seiang11g > create table tb_wjq as select * from dba_objects where owner in ('SEIANG','PUBLIC','HR','SYSMAN','XDB','BI','SYS')

Table created.

-- use scripts to insert large amounts of data

Begin

For i in 1..8 loop

Insert / * + append * / into tb_wjq select * from tb_wjq

Commit

End loop

End

SEIANG@seiang11g > select count (*) from tb_wjq

COUNT (*)

9804160

Elapsed: 00:00:01.54

The distribution of the amount of data used in the experiment

SEIANG@seiang11g > select owner,count (*) from tb_wjq group by owner

OWNER COUNT (*)

SEIANG 3072

PUBLIC 4352256

HR 4352

SYSMAN 454912

XDB 149760

BI 1024

SYS 4838784

7 rows selected.

Elapsed: 00:00:02.44

It can be seen that most of the more than 90, 000 pieces of data are concentrated on PUBLIC, SYSMAN and SYS, and the frequency of other data is relatively small. The tilting trend of the data is obvious.

2. Build an index

-- build general and functional indexes on owner columns, respectively.

SEIANG@seiang11g > create index idx_tb_wjq_owner_normal on tb_wjq (owner)

Index created.

Elapsed: 00:00:24.72

SEIANG@seiang11g > create index func_idx_tb_wjq_owner on tb_wjq (decode (owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner))

Index created.

Elapsed: 00:00:12.34

Index func_idx_tb_wjq_owner converts PUBLIC, SYSMAN, Sys values into null, excluding the process of indexing. It can be seen not only from the time taken to create the index above, but also from the index segment information below, the space difference between the two indexes is relatively large, which also proves this point.

SEIANG@seiang11g > select owner,segment_name,segment_type,bytes/1024/1024,blocks,extents from dba_segments where segment_name='IDX_TB_WJQ_OWNER_NORMAL'

OWNER SEGMENT_NAME SEGMENT_TYPE BYTES/1024/1024 BLOCKS EXTENTS

SEIANG IDX_TB_WJQ_OWNER_NORMAL INDEX 184 23552 94

Elapsed: 00:00:00.01

SEIANG@seiang11g > select owner,segment_name,segment_type,bytes/1024/1024,blocks,extents from dba_segments where segment_name='FUNC_IDX_TB_WJQ_OWNER'

OWNER SEGMENT_NAME SEGMENT_TYPE BYTES/1024/1024 BLOCKS EXTENTS

SEIANG FUNC_IDX_TB_WJQ_OWNER INDEX 3 384 18

As can be seen from above, a data column is also indexed. The common index type IDX_TB_WJQ_OWNER_NORMAL occupies 94 extents and 23552 blocks, accounting for about 184m of space. However, the space of the functional index FUNC_IDX_TB_WJQ_OWNER only uses the initial allocation of 18 regions, 384 data blocks, accounting for about 3m of space. As a result, the space advantage is intuitively reflected!

-- collecting statistical data. Because of the experimental nature and large amount of data, high sampling rate is used to collect statistical information.

SEIANG@seiang11g > exec dbms_stats.gather_table_stats ('SEIANG',' TB_WJQ', cascade = > true, estimate_percent = > 100pr methodological opt = > 'for all indexed columns')

PL/SQL procedure successfully completed.

Elapsed: 00:00:49.67

3. Analysis of retrieval efficiency

The XDB value of 149760 of owner data is analyzed.

-- Direct index search:

SEIANG@seiang11g > select * from tb_wjq where owner='XDB'

149760 rows selected.

Elapsed: 00:00:01.89

Execution Plan

Plan hash value: 3735191644

| | 0 | SELECT STATEMENT | | 149k | 13m | 4708 (1) | 00:00:57 |

| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 149k | 13m | 4708 (1) | 00:00:57 |

| | * 2 | INDEX RANGE SCAN | IDX_TB_WJQ_OWNER_NORMAL | 149k | | 348 (1) | 00:00:05 |

Predicate Information (identified by operation id):

2-access ("OWNER" = 'XDB')

Statistics

8 recursive calls

0 db block gets

26000 consistent gets

6349 physical reads

0 redo size

16482673 bytes sent via SQL*Net to client

110336 bytes received via SQL*Net from client

9985 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

149760 rows processed

It is found that when BI is used as the search value, the index search is carried out. The following is a comparison with a functional index search.

SEIANG@seiang11g > select * from tb_wjq where decode (owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner) = 'XDB'

149760 rows selected.

Elapsed: 00:00:01.54

Execution Plan

Plan hash value: 3652333940

| | 0 | SELECT STATEMENT | | 149k | 13m | 7091 (1) | 00:01:26 |

| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 149k | 13m | 7091 (1) | 00:01:26 |

| | * 2 | INDEX RANGE SCAN | FUNC_IDX_TB_WJQ_OWNER | 149k | | 316 (1) | 00:00:04 |

Predicate Information (identified by operation id):

2-access (DECODE ("OWNER", 'PUBLIC',NULL,'SYSMAN',NULL,'SYS',NULL, "OWNER") =' XDB')

Statistics

0 recursive calls

0 db block gets

25998 consistent gets

0 physical reads

0 redo size

9017261 bytes sent via SQL*Net to client

110336 bytes received via SQL*Net from client

9985 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

149760 rows processed

After comparison, we can find that the method of using functional index has some differences in execution time, physical logic reading, and CPU usage.

General index

Functional index

Execution time

00: 00: 01.89

00: 00: 01.54

CPU usage

4708

7091

Consistent gets

26000

25998

Physical reads

6349

Conclusion: the functional index processing bias method is used to optimize the query efficiency and index structure in a certain length. The data in the above table shows that the consumption of logical physical reads is greatly reduced (the index structure is simplified), and the execution time is reduced in turn. Because the use of functions is calculated, CPU usage is relatively high, within an acceptable range.

However, this method has some limitations and must be carefully planned before application.

First of all, the data table data should be large. Because after all, the establishment and search of the functional index consumes more than the ordinary index, if the data table is small, the degree of optimization can not make up for the cost of consumption, the loss may outweigh the gain. In a series of experiments, the author also found that this performance advantage can not be highlighted when the amount of data is medium to small.

Secondly, the tilt trend of column value is obvious. Through the discussion at the beginning, it is not difficult to find that the higher the tilt of the column value, the larger the amount of data removed by the functional index, and the smaller and more optimized the generated index tree structure. This is the core of this method!

Finally, when using the functional index to search, the higher the frequency of the search, the better the optimization effect. In this case, the column with the value XDB has 149760 rows, and you can see the obvious performance optimization. However, when we select the BI value with 1024 pieces of data, this optimization trend can be seen, but the degree is significantly reduced (the experimental results are shown below). The reason here may be that the amount of data is small, and the difference between the logical and physical readings of the two methods is reduced.

-- Direct index

SEIANG@seiang11g > select * from tb_wjq where owner='BI'

1024 rows selected.

Elapsed: 00:00:00.03

Execution Plan

Plan hash value: 3735191644

| | 0 | SELECT STATEMENT | | 1024 | 98K | 35 (0) | 00:00:01 |

| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 1024 | 98K | 35 (0) | 00:00:01 |

| | * 2 | INDEX RANGE SCAN | IDX_TB_WJQ_OWNER_NORMAL | 1024 | | 5 (0) | 00:00:01 |

Predicate Information (identified by operation id):

2-access ("OWNER" = 'BI')

Statistics

0 recursive calls

0 db block gets

280 consistent gets

151 physical reads

0 redo size

98579 bytes sent via SQL*Net to client

1271 bytes received via SQL*Net from client

70 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1024 rows processed

-- functional index

SEIANG@seiang11g > select * from tb_wjq where decode (owner,'PUBLIC',null,'SYSMAN',null,'SYS',null,owner) = 'BI'

1024 rows selected.

Elapsed: 00:00:00.01

Execution Plan

Plan hash value: 3652333940

| | 0 | SELECT STATEMENT | | 1024 | 98K | 50 (0) | 00:00:01 |

| | 1 | TABLE ACCESS BY INDEX ROWID | TB_WJQ | 1024 | 98K | 50 (0) | 00:00:01 |

| | * 2 | INDEX RANGE SCAN | FUNC_IDX_TB_WJQ_OWNER | 1024 | | 3 (0) | 00:00:01 |

Predicate Information (identified by operation id):

2-access (DECODE ("OWNER", 'PUBLIC',NULL,'SYSMAN',NULL,'SYS',NULL, "OWNER") =' BI')

Statistics

0 recursive calls

0 db block gets

279 consistent gets

3 physical reads

0 redo size

33969 bytes sent via SQL*Net to client

1271 bytes received via SQL*Net from client

70 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1024 rows processed

However, if you are querying SYS, PUBLIC, or SYSMAN, all indexes are skipped and the diameter is scanned throughout the table.

SEIANG@seiang11g > select * from tb_wjq where owner='SYS'

4838784 rows selected.

Elapsed: 00:00:45.85

Execution Plan

Plan hash value: 1501781665

| | 0 | SELECT STATEMENT | | 4838K | 452m | 39893 (1) | 00:07:59 |

| | * 1 | TABLE ACCESS FULL | TB_WJQ | 4838K | 452m | 39893 (1) | 00:07:59 |

Predicate Information (identified by operation id):

1-filter ("OWNER" = 'SYS')

Statistics

8 recursive calls

0 db block gets

457638 consistent gets

139684 physical reads

0 redo size

255169095 bytes sent via SQL*Net to client

3548958 bytes received via SQL*Net from client

322587 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

4838784 rows processed

Author: SEian.G (hard practice changes in 72, but it is difficult to laugh at 81)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.