In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Original: http://www.cnblogs.com/yumiko/p/6060485.html
The preface aims at the index columns, especially the index columns with serious data skew, and the statistical information of the histogram is very important for the CBO optimizer to select the execution plan more accurately. For beginners, you can learn the value, function, and skills of histograms for index columns from this article. For those with experience, you can also learn about the impact of different analyze table operations on histogram information statistics from the article. This part can also directly refer to the final summary of this article. First of all, I would like to emphasize: analyze table table_name compute statistics this operation should be cautious!
1. Histogram Overview histogram, as a statistical tool for measuring data distribution, is not exclusive to ORACLE. For ORACLE, histogram is mainly used to count the data on related columns and record the overall data distribution of the column when analyzing tables and indexes.
2. Classification of histograms there are mainly two kinds of ORACLE histograms: equal-frequency histogram and equal-height histogram by default, when the number of unique values on the column is less than 254, ORACLE will establish equal-frequency histogram. By default, ORACLE creates a contour histogram when the number of unique values on a column is higher than 254. You can use a contour histogram by setting the number of unique values whose SIZE is lower than the target column through the method_opt parameter when performing dbms_stats.gather_table_stats to collect statistics.
3. Advantages of histogram
For ORACLE, the CBO optimizer can use the index for column values with high selectivity (with less return rows) based on the column value distribution information collected by the histogram, while column values with low selectivity (with more data rows) do not use indexes. Histograms are important, especially for columns with severe data skew.
Note: data tilt, mainly refers to a value on a column, compared with other values in the column, there is a high proportion, such as: "gender" column, "male" accounts for 80% of the overall value of the column (male, female), there is an obvious data tilt phenomenon.
4. The scope of application of histogram
In general, histograms are not restricted by the use of indexes, that is, they can be used to count indexed columns as well as non-indexed columns. But for the statistics of non-index columns, it is of little significance.
5. The main views involved in the histogram
View of histogram type: DBA_TAB_COL_STATISTICS,USER_TAB_COL_STATISTICS,ALL_TAB_COL_STATISTICS
View of histogram details: DBA_TAB_HISTOGRAMS,USER_TAB_HISTOGRAMS,ALL_TAB_HISTOGRAMS
6. an example of the influence of histograms on the selection of execution plans
The database version used in this example is ORACLE 11.2.0.4.
First, prepare a test table, TEST, in which the OWNER column has a serious data skew, as shown below.
Yumiko_sunny@OA01 > select distinct owner, count (*) as col_rows
(select count (*) from test) as tab_rows
To_char (round (count (*) / (select count (*) from test) * 100Power2), '90.99') | |'%'
As data_ratio
From test group by owner
OWNER COL_ROWS TAB_ROWS DATA_RATIO
--
HR 476 535164 0.09%
OE 1988 535164 0.37%
ORDDATA 3598 535164 0.67%
SCOTT 98 535164 0.02%
SYS 529004 535164 98.85%
As can be seen from the above figure, the SYS value distribution of this column accounts for 98% of the total, indicating that there is a serious tilt.
Create an index on the OWNER column and use the ANALYZE TABLE method to collect statistics.
-- collect statistics
Yumiko_sunny@OA01 > analyze table test compute statistics
Table analyzed.
-- verify the time of the last statistical collection
Yumiko_sunny@OA01 > select table_name
To_char (LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') LAST_ANALYZED
From dba_tables where TABLE_NAME='TEST'
TABLE_NAME LAST_ANALYZED
TEST 2016-11-13 21:23:19
-- check the statistics of histograms
Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'
COLUMN_NAME HISTOGRAM
--
OWNER NONE
OBJECT_NAME NONE
SUBOBJECT_NAME NONE
OBJECT_ID NONE
DATA_OBJECT_ID NONE
OBJECT_TYPE NONE
CREATED NONE
LAST_DDL_TIME NONE
TIMESTAMP NONE
STATUS NONE
TEMPORARY NONE
COLUMN_NAME HISTOGRAM
--
GENERATED NONE
SECONDARY NONE
NAMESPACE NONE
EDITION_NAME NONE
In the above method, although the table statistics are collected through the analyze table table_name compute statistics method, the histogram information is not collected.
Ignore it here, and you can compare it with the for all columns clause later.
Check the selection of the execution plan of the index column at this time, which is retrieved on the condition of tilted data SYS.
Yumiko_sunny@OA01 > select * from test where owner='SYS'
Rows selected.
Execution Plan
Plan hash value: 3856466897
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 107k | 10m | 1799 (1) | 00:00:22 |
| | 1 | TABLE ACCESS BY INDEX ROWID | TEST | 107k | 10m | 1799 (1) | 00:00:22 |
| | * 2 | INDEX RANGE SCAN | IND_TEST | 107k | | 228 (1) | 00:00:03 |
-
Predicate Information (identified by operation id):
-access ("OWNER" = 'SYS')
From the results returned above, the SYS data of 529K rows returns only 107K rows, which obviously has a big error.
In addition, for SYS with data skew up to 98%, it is clear that full table scanning should be more efficient, which should be related to incorrect statistics.
Use dbms_stats.gather_table_stats to collect table statistics again.
Yumiko_sunny@OA01 > exec dbms_stats.gather_table_stats ('SCOTT','TEST',cascade= > true)
PL/SQL procedure successfully completed.
Yumiko_sunny@OA01 > select table_name
To_char (LAST_ANALYZED,'YYYY-MM-DD HH24:MI:SS') LAST_ANALYZED
From dba_tables where TABLE_NAME='TEST'
TABLE_NAME LAST_ANALYZED
TEST 2016-11-13 21:50:01
Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'
COLUMN_NAME HISTOGRAM
--
OWNER FREQUENCY
OBJECT_NAME NONE
SUBOBJECT_NAME NONE
OBJECT_ID NONE
DATA_OBJECT_ID NONE
OBJECT_TYPE NONE
CREATED NONE
LAST_DDL_TIME NONE
TIMESTAMP NONE
STATUS NONE
TEMPORARY NONE
COLUMN_NAME HISTOGRAM
--
GENERATED NONE
SECONDARY NONE
NAMESPACE NONE
EDITION_NAME NONE
As can be seen from the above figure, the latest statistics of the table are completed at this time, and the histogram information of the index column is collected, and the histogram is "equal frequency histogram".
Once again, check the execution plan selection of the index column at this time, where the retrieval is carried out under the condition of skew data SYS with poor selectivity and SCOTT with good selectivity, respectively.
-- query based on SYS
Yumiko_sunny@OA01 > select * from test where owner='SYS'
Rows selected.
Execution Plan
Plan hash value: 1357081020
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
| | 0 | SELECT STATEMENT | | 530K | 49m | 2098 (1) | 00:00:26 |
| | * 1 | TABLE ACCESS FULL | TEST | 530K | 49m | 2098 (1) | 00:00:26 |
Predicate Information (identified by operation id):
-filter ("OWNER" = 'SYS')
-- query based on SCOTT
Yumiko_sunny@OA01 > select * from test where owner='SCOTT'
Rows selected.
Execution Plan
Plan hash value: 3856466897
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 98 | 9506 | 5 (0) | 00:00:01 |
| | 1 | TABLE ACCESS BY INDEX ROWID | TEST | 98 | 9506 | 5 (0) | 00:00:01 |
| | * 2 | INDEX RANGE SCAN | IND_TEST | 98 | | 3 (0) | 00:00:01 |
-
Predicate Information (identified by operation id):
-access ("OWNER" = 'SCOTT')
As mentioned earlier, accurate data distribution information is collected through histograms.
For the Sys value with poor selectivity, the CBO optimizer uses full table scan to access the data.
For the Scott value with good selectivity, the CBO optimizer uses index scanning to access the data.
What is the real cost of accessing SYS-related data rows by index? here, access to an index scan is performed through hint.
Yumiko_sunny@OA01 > select / * + index (test,ind_test) * / * from test where owner='SYS'
Rows selected.
Execution Plan
Plan hash value: 3856466897
-
| | Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |
-
| | 0 | SELECT STATEMENT | | 529K | 49m | 8885 (1) | 00:01:47 |
| | 1 | TABLE ACCESS BY INDEX ROWID | TEST | 529K | 49m | 8885 (1) | 00:01:47 |
| | * 2 | INDEX RANGE SCAN | IND_TEST | 529K | | 1115 (1) | 00:00:14 |
-
Predicate Information (identified by operation id):
-access ("OWNER" = 'SYS')
As you can see from the above figure, if the information is collected correctly, the real cost of using index scanning is 4 times that of full table scanning.
The above execution plan also illustrates the importance of accurate statistics to the execution plan selection for the CBO optimizer.
View the details of the histogram at this time through the DBA_TAB_HISTOGRAMS view
Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER
To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')
As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE
From DBA_TAB_HISTOGRAMS
Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
-
TEST OWNER 19 41159093808690300000000000
TEST OWNER 61 41186001805076000000000000
TEST OWNER 62 43232584582496500000000000
TEST OWNER 5 37550853140200700000000000
TEST OWNER 5518 43277234965060400000000000
You can see that although ENDPOINT_VALUE collects hash values that are unique to the OWNER column, the real column ENDPOINT_ACTUAL_VALUE is displayed as empty.
After comparing the analyze table for all columns operation, we will see.
Delete the data rows related to the SYS value and observe the changes in histogram statistics.
Yumiko_sunny@OA01 > delete from test where owner='SYS'
Rows deleted.
Yumiko_sunny@OA01 > commit
Commit complete.
Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER
To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')
As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE
From DBA_TAB_HISTOGRAMS
Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
-
TEST OWNER 19 41159093808690300000000000
TEST OWNER 61 41186001805076000000000000
TEST OWNER 62 43232584582496500000000000
TEST OWNER 5 37550853140200700000000000
TEST OWNER 5518 43277234965060400000000000
As you can see, the histogram information does not change automatically for DML operations on table data.
Use dbms_stats.gather_table_stats to collect statistics again, and the histogram is updated, as shown in the following figure:
Yumiko_sunny@OA01 > exec dbms_stats.gather_table_stats ('SCOTT','TEST',cascade= > true)
PL/SQL procedure successfully completed.
Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER
To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')
As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE
From DBA_TAB_HISTOGRAMS
Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
-
TEST OWNER 476 37550853140200700000000000
TEST OWNER 2464 41159093808690300000000000
TEST OWNER 6062 41186001805076000000000000
TEST OWNER 6160 43232584582496500000000000
As explained above, the information of the histogram needs to be collected on a regular basis.
7. An example of the influence of ANALYZE TABLE operation on histogram statistics
This example inherits the contents of the above example, and the database version is consistent.
In the above, it has been demonstrated that analyze table table_name compute statistics cannot collect histogram information for tables.
So what effect does this have on tables that already have a histogram?
First of all, continue the above, execute the operation of analyze table table_name compute statistics again, and observe the change of histogram information.
Yumiko_sunny@OA01 > analyze table test compute statistics
Table analyzed.
Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'
COLUMN_NAME HISTOGRAM
--
OWNER NONE
OBJECT_NAME NONE
SUBOBJECT_NAME NONE
OBJECT_ID NONE
DATA_OBJECT_ID NONE
OBJECT_TYPE NONE
CREATED NONE
LAST_DDL_TIME NONE
TIMESTAMP NONE
STATUS NONE
TEMPORARY NONE
COLUMN_NAME HISTOGRAM
--
GENERATED NONE
SECONDARY NONE
NAMESPACE NONE
EDITION_NAME NONE
It is obvious that the histogram information of the OWNER column has disappeared, indicating that this operation will delete the existing histogram information.
If this is a production environment, it may have an unpredictable impact on such a table with data skew columns.
For the operation of analyze table table_name compute statistics for all indexes, I will not demonstrate here, but will directly give the conclusion of the impact on the histogram in the summary below.
Let's take a look at the impact of analyze table table_name compute statistics for all indexes for all columns operations.
Yumiko_sunny@OA01 > analyze table test compute statistics for all indexes for all columns
Table analyzed.
Yumiko_sunny@OA01 > select column_name,histogram from dba_tab_col_statistics where table_name='TEST'
COLUMN_NAME HISTOGRAM
--
OWNER FREQUENCY
OBJECT_NAME HEIGHT BALANCED
SUBOBJECT_NAME NONE
OBJECT_ID HEIGHT BALANCED
DATA_OBJECT_ID HEIGHT BALANCED
OBJECT_TYPE FREQUENCY
CREATED FREQUENCY
LAST_DDL_TIME FREQUENCY
TIMESTAMP FREQUENCY
STATUS FREQUENCY
TEMPORARY FREQUENCY
COLUMN_NAME HISTOGRAM
--
GENERATED FREQUENCY
SECONDARY FREQUENCY
NAMESPACE FREQUENCY
EDITION_NAME NONE
As you can see from the above, when the for all columns clause is executed, not only the histogram information of the index column is collected, but also the histogram information of the non-index column is collected.
Take a look at the details of the DBA_TAB_HISTOGRAMS view at this time.
Yumiko_sunny@OA01 > select TABLE_NAME,COLUMN_NAME,ENDPOINT_NUMBER
To_char (ENDPOINT_VALUE,'999999999999999999999999999999999999')
As ENDPOINT_VALUE,ENDPOINT_ACTUAL_VALUE
From DBA_TAB_HISTOGRAMS
Where TABLE_NAME='TEST' and COLUMN_NAME='OWNER'
TABLE_NAME COLUMN_NAME ENDPOINT_NUMBER ENDPOINT_VALUE ENDPOINT_ACTUAL_VALU
-
TEST OWNER 476 37550853140200700000000000 HR
TEST OWNER 2464 41159093808690300000000000 OE
TEST OWNER 6062 41186001805076000000000000 ORDDATA
TEST OWNER 6160 43232584582496500000000000 SCOTT
As you can see, you can see the information about the true value of the ENDPOINT_ACTUAL_VALUE column on the DBA_TAB_HISTOGRAMS view.
8. Summary histogram can provide accurate data distribution reference for CBO optimizer in order to choose the correct execution plan.
By default, the histogram information obtained by using dbms_stats.gather_table_stats collects only the histogram information of the index column. By default, the histogram information obtained by using dbms_stats.gather_table_stats cannot see the real value in the DBA_TAB_HISTOGRAMS view, but does not affect the choice of the CBO optimizer.
Using analyze table table_name compute statistics for statistical analysis, histogram information is not collected for tables that do not already have histogram information. Using analyze table table_name compute statistics for statistical analysis, for tables that already have histogram information, the previous histogram information is deleted.
Using analyze table table_name compute statistics for all indexes for statistical analysis, histogram information is not collected for tables that do not already have histogram information. Using analyze table table_name compute statistics for all indexes for statistical analysis, for tables that already have histogram information, the previous histogram information is not deleted.
Using analyze table table_name compute statistics for all indexes for all columns for statistical analysis, for tables that do not already have histogram information, histogram information is collected, including indexed and non-indexed columns. Using analyze table table_name compute statistics for all indexes for all columns for statistical analysis, the latest histogram information is collected for tables that already have histogram information.
Finally, at least from the collection of histograms: analyze table table_name compute statistics is not equivalent to analyze table table_name compute statistics for all indexes for all columns.
Use the analyze table table_name compute statistics operation with caution.
Say important things three times!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.