The influence of histogram usage skills and analyze table Operation on histogram Statistics 07/01 Update SLTechnology News&Howtos

The influence of histogram usage skills and analyze table Operation on histogram Statistics

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Original: http://www.cnblogs.com/yumiko/p/6060485.html

The preface aims at the index columns, especially the index columns with serious data skew, and the statistical information of the histogram is very important for the CBO optimizer to select the execution plan more accurately. For beginners, you can learn the value, function, and skills of histograms for index columns from this article. For those with experience, you can also learn about the impact of different analyze table operations on histogram information statistics from the article. This part can also directly refer to the final summary of this article. First of all, I would like to emphasize: analyze table table_name compute statistics this operation should be cautious!

1. Histogram Overview histogram, as a statistical tool for measuring data distribution, is not exclusive to ORACLE. For ORACLE, histogram is mainly used to count the data on related columns and record the overall data distribution of the column when analyzing tables and indexes.

2. Classification of histograms there are mainly two kinds of ORACLE histograms: equal-frequency histogram and equal-height histogram by default, when the number of unique values on the column is less than 254, ORACLE will establish equal-frequency histogram. By default, ORACLE creates a contour histogram when the number of unique values on a column is higher than 254. You can use a contour histogram by setting the number of unique values whose SIZE is lower than the target column through the method_opt parameter when performing dbms_stats.gather_table_stats to collect statistics.

3. Advantages of histogram

For ORACLE, the CBO optimizer can use the index for column values with high selectivity (with less return rows) based on the column value distribution information collected by the histogram, while column values with low selectivity (with more data rows) do not use indexes. Histograms are important, especially for columns with severe data skew.

Note: data tilt, mainly refers to a value on a column, compared with other values in the column, there is a high proportion, such as: "gender" column, "male" accounts for 80% of the overall value of the column (male, female), there is an obvious data tilt phenomenon.

4. The scope of application of histogram

In general, histograms are not restricted by the use of indexes, that is, they can be used to count indexed columns as well as non-indexed columns. But for the statistics of non-index columns, it is of little significance.

5. The main views involved in the histogram

View of histogram type: DBA_TAB_COL_STATISTICS,USER_TAB_COL_STATISTICS,ALL_TAB_COL_STATISTICS

View of histogram details: DBA_TAB_HISTOGRAMS,USER_TAB_HISTOGRAMS,ALL_TAB_HISTOGRAMS

6. an example of the influence of histograms on the selection of execution plans

The database version used in this example is ORACLE 11.2.0.4.

First, prepare a test table, TEST, in which the OWNER column has a serious data skew, as shown below.

Yumiko_sunny@OA01 > select distinct owner, count (*) as col_rows

(select count (*) from test) as tab_rows

To_char (round (count (*) / (select count (*) from test) * 100Power2), '90.99') | |'%'

As data_ratio

From test group by owner

OWNER COL_ROWS TAB_ROWS DATA_RATIO

HR 476 535164 0.09%

OE 1988 535164 0.37%

ORDDATA 3598 535164 0.67%

SCOTT 98 535164 0.02%

SYS 529004 535164 98.85%

As can be seen from the above figure, the SYS value distribution of this column accounts for 98% of the total, indicating that there is a serious tilt.

Create an index on the OWNER column and use the ANALYZE TABLE method to collect statistics.

-- collect statistics

Yumiko_sunny@OA01 > analyze table test compute statistics

Table analyzed.

-- verify the time of the last statistical collection

Yumiko_sunny@OA01 > select table_name

To_char (LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') LAST_ANALYZED

From dba_tables where TABLE_NAME='TEST'

TABLE_NAME LAST_ANALYZED

TEST 2016-11-13 21:23:19

-- check the statistics of histograms

Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'

COLUMN_NAME HISTOGRAM

OWNER NONE

OBJECT_NAME NONE

SUBOBJECT_NAME NONE

OBJECT_ID NONE

DATA_OBJECT_ID NONE

OBJECT_TYPE NONE

CREATED NONE

LAST_DDL_TIME NONE

TIMESTAMP NONE

STATUS NONE

TEMPORARY NONE

COLUMN_NAME HISTOGRAM

GENERATED NONE

SECONDARY NONE

NAMESPACE NONE

EDITION_NAME NONE

In the above method, although the table statistics are collected through the analyze table table_name compute statistics method, the histogram information is not collected.

Ignore it here, and you can compare it with the for all columns clause later.

Check the selection of the execution plan of the index column at this time, which is retrieved on the condition of tilted data SYS.

Yumiko_sunny@OA01 > select * from test where owner='SYS'

Rows selected.

Execution Plan

Plan hash value: 3856466897

| | 0 | SELECT STATEMENT | | 107k | 10m | 1799 (1) | 00:00:22 |

| | 1 | TABLE ACCESS BY INDEX ROWID | TEST | 107k | 10m | 1799 (1) | 00:00:22 |

| | * 2 | INDEX RANGE SCAN | IND_TEST | 107k | | 228 (1) | 00:00:03 |

Predicate Information (identified by operation id):

-access ("OWNER" = 'SYS')

From the results returned above, the SYS data of 529K rows returns only 107K rows, which obviously has a big error.

In addition, for SYS with data skew up to 98%, it is clear that full table scanning should be more efficient, which should be related to incorrect statistics.

Use dbms_stats.gather_table_stats to collect table statistics again.

Yumiko_sunny@OA01 > exec dbms_stats.gather_table_stats ('SCOTT','TEST',cascade= > true)

PL/SQL procedure successfully completed.

Yumiko_sunny@OA01 > select table_name

To_char (LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') LAST_ANALYZED

From dba_tables where TABLE_NAME='TEST'

TABLE_NAME LAST_ANALYZED

TEST 2016-11-13 21:50:01

Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'

COLUMN_NAME HISTOGRAM

OWNER FREQUENCY

OBJECT_NAME NONE

SUBOBJECT_NAME NONE

OBJECT_ID NONE

DATA_OBJECT_ID NONE

OBJECT_TYPE NONE

CREATED NONE

LAST_DDL_TIME NONE

TIMESTAMP NONE

STATUS NONE

TEMPORARY NONE

COLUMN_NAME HISTOGRAM

GENERATED NONE

SECONDARY NONE

NAMESPACE NONE

EDITION_NAME NONE

As can be seen from the above figure, the latest statistics of the table are completed at this time, and the histogram information of the index column is collected, and the histogram is "equal frequency histogram".

Once again, check the execution plan selection of the index column at this time, where the retrieval is carried out under the condition of skew data SYS with poor selectivity and SCOTT with good selectivity, respectively.

-- query based on SYS

Yumiko_sunny@OA01 > select * from test where owner='SYS'

Rows selected.

Execution Plan

Plan hash value: 1357081020

| | 0 | SELECT STATEMENT | | 530K | 49m | 2098 (1) | 00:00:26 |

| | * 1 | TABLE ACCESS FULL | TEST | 530K | 49m | 2098 (1) | 00:00:26 |

Predicate Information (identified by operation id):

-filter ("OWNER" = 'SYS')

-- query based on SCOTT

Yumiko_sunny@OA01 > select * from test where owner='SCOTT'

Rows selected.

Execution Plan

Plan hash value: 3856466897

| | 0 | SELECT STATEMENT | | 98 | 9506 | 5 (0) | 00:00:01 |

| | 1 | TABLE ACCESS BY INDEX ROWID | TEST | 98 | 9506 | 5 (0) | 00:00:01 |

| | * 2 | INDEX RANGE SCAN | IND_TEST | 98 | | 3 (0) | 00:00:01 |

Predicate Information (identified by operation id):

-access ("OWNER" = 'SCOTT')

As mentioned earlier, accurate data distribution information is collected through histograms.

For the Sys value with poor selectivity, the CBO optimizer uses full table scan to access the data.

For the Scott value with good selectivity, the CBO optimizer uses index scanning to access the data.

What is the real cost of accessing SYS-related data rows by index? here, access to an index scan is performed through hint.

Yumiko_sunny@OA01 > select / * + index (test,ind_test) * / * from test where owner='SYS'

Rows selected.

Execution Plan

Plan hash value: 3856466897

| | 0 | SELECT STATEMENT | | 529K | 49m | 8885 (1) | 00:01:47 |

| | 1 | TABLE ACCESS BY INDEX ROWID | TEST | 529K | 49m | 8885 (1) | 00:01:47 |

| | * 2 | INDEX RANGE SCAN | IND_TEST | 529K | | 1115 (1) | 00:00:14 |

Predicate Information (identified by operation id):

-access ("OWNER" = 'SYS')

As you can see from the above figure, if the information is collected correctly, the real cost of using index scanning is 4 times that of full table scanning.

The above execution plan also illustrates the importance of accurate statistics to the execution plan selection for the CBO optimizer.

View the details of the histogram at this time through the DBA_TAB_HISTOGRAMS view

Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER

To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')

As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE

From DBA_TAB_HISTOGRAMS

Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

TEST OWNER 19 41159093808690300000000000

TEST OWNER 61 41186001805076000000000000

TEST OWNER 62 43232584582496500000000000

TEST OWNER 5 37550853140200700000000000

TEST OWNER 5518 43277234965060400000000000

You can see that although ENDPOINT_VALUE collects hash values that are unique to the OWNER column, the real column ENDPOINT_ACTUAL_VALUE is displayed as empty.

After comparing the analyze table for all columns operation, we will see.

Delete the data rows related to the SYS value and observe the changes in histogram statistics.

Yumiko_sunny@OA01 > delete from test where owner='SYS'

Rows deleted.

Yumiko_sunny@OA01 > commit

Commit complete.

Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER

To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')

As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE

From DBA_TAB_HISTOGRAMS

Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

TEST OWNER 19 41159093808690300000000000

TEST OWNER 61 41186001805076000000000000

TEST OWNER 62 43232584582496500000000000

TEST OWNER 5 37550853140200700000000000

TEST OWNER 5518 43277234965060400000000000

As you can see, the histogram information does not change automatically for DML operations on table data.

Use dbms_stats.gather_table_stats to collect statistics again, and the histogram is updated, as shown in the following figure:

Yumiko_sunny@OA01 > exec dbms_stats.gather_table_stats ('SCOTT','TEST',cascade= > true)

PL/SQL procedure successfully completed.

Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER

To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')

As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE

From DBA_TAB_HISTOGRAMS

Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

TEST OWNER 476 37550853140200700000000000

TEST OWNER 2464 41159093808690300000000000

TEST OWNER 6062 41186001805076000000000000

TEST OWNER 6160 43232584582496500000000000

As explained above, the information of the histogram needs to be collected on a regular basis.

7. An example of the influence of ANALYZE TABLE operation on histogram statistics

This example inherits the contents of the above example, and the database version is consistent.

In the above, it has been demonstrated that analyze table table_name compute statistics cannot collect histogram information for tables.

So what effect does this have on tables that already have a histogram?

First of all, continue the above, execute the operation of analyze table table_name compute statistics again, and observe the change of histogram information.

Yumiko_sunny@OA01 > analyze table test compute statistics

Table analyzed.

Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'

COLUMN_NAME HISTOGRAM

OWNER NONE

OBJECT_NAME NONE

SUBOBJECT_NAME NONE

OBJECT_ID NONE

DATA_OBJECT_ID NONE

OBJECT_TYPE NONE

CREATED NONE

LAST_DDL_TIME NONE

TIMESTAMP NONE

STATUS NONE

TEMPORARY NONE

COLUMN_NAME HISTOGRAM

GENERATED NONE

SECONDARY NONE

NAMESPACE NONE

EDITION_NAME NONE

It is obvious that the histogram information of the OWNER column has disappeared, indicating that this operation will delete the existing histogram information.

If this is a production environment, it may have an unpredictable impact on such a table with data skew columns.

For the operation of analyze table table_name compute statistics for all indexes, I will not demonstrate here, but will directly give the conclusion of the impact on the histogram in the summary below.

Let's take a look at the impact of analyze table table_name compute statistics for all indexes for all columns operations.

Yumiko_sunny@OA01 > analyze table test compute statistics for all indexes for all columns

Table analyzed.

Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'

COLUMN_NAME HISTOGRAM

OWNER FREQUENCY

OBJECT_NAME HEIGHT BALANCED

SUBOBJECT_NAME NONE

OBJECT_ID HEIGHT BALANCED

DATA_OBJECT_ID HEIGHT BALANCED

OBJECT_TYPE FREQUENCY

CREATED FREQUENCY

LAST_DDL_TIME FREQUENCY

TIMESTAMP FREQUENCY

STATUS FREQUENCY

TEMPORARY FREQUENCY

COLUMN_NAME HISTOGRAM

GENERATED FREQUENCY

SECONDARY FREQUENCY

NAMESPACE FREQUENCY

EDITION_NAME NONE

As you can see from the above, when the for all columns clause is executed, not only the histogram information of the index column is collected, but also the histogram information of the non-index column is collected.

Take a look at the details of the DBA_TAB_HISTOGRAMS view at this time.

Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER

To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')

As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE

From DBA_TAB_HISTOGRAMS

Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'

TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU

TEST OWNER 476 37550853140200700000000000 HR

TEST OWNER 2464 41159093808690300000000000 OE

TEST OWNER 6062 41186001805076000000000000 ORDDATA

TEST OWNER 6160 43232584582496500000000000 SCOTT

As you can see, you can see the information about the true value of the ENDPOINT_ACTUAL_VALUE column on the DBA_TAB_HISTOGRAMS view.

8. Summary histogram can provide accurate data distribution reference for CBO optimizer in order to choose the correct execution plan.

By default, the histogram information obtained by using dbms_stats.gather_table_stats collects only the histogram information of the index column. By default, the histogram information obtained by using dbms_stats.gather_table_stats cannot see the real value in the DBA_TAB_HISTOGRAMS view, but does not affect the choice of the CBO optimizer.

Using analyze table table_name compute statistics for statistical analysis, histogram information is not collected for tables that do not already have histogram information. Using analyze table table_name compute statistics for statistical analysis, for tables that already have histogram information, the previous histogram information is deleted.

Using analyze table table_name compute statistics for all indexes for statistical analysis, histogram information is not collected for tables that do not already have histogram information. Using analyze table table_name compute statistics for all indexes for statistical analysis, for tables that already have histogram information, the previous histogram information is not deleted.

Using analyze table table_name compute statistics for all indexes for all columns for statistical analysis, for tables that do not already have histogram information, histogram information is collected, including indexed and non-indexed columns. Using analyze table table_name compute statistics for all indexes for all columns for statistical analysis, the latest histogram information is collected for tables that already have histogram information.

Finally, at least from the collection of histograms: analyze table table_name compute statistics is not equivalent to analyze table table_name compute statistics for all indexes for all columns.

Use the analyze table table_name compute statistics operation with caution.

Say important things three times!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.