Microclass sql optimization (5), statistics collection (3)-about default sampling rate 07/11 Update SLTechnology News&Howtos

Microclass sql optimization (5), statistics collection (3)-about default sampling rate

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

1. The default values for collecting statistics about the default sampling rate may not be the same on different versions of Oracle, for example: ESTIMATE_PERCENT: default: 9i 10g:DBMS_STATS.AUTO_SAMPLE_SIZE (using a very small estimated percentage) 11g:DBMS_STATS.AUTO_SAMPLE_SIZE (using a larger estimated percentage) METHOD_OPT: default: 9i: "FOR ALL COLUMNS SIZE 1", in fact, there are no detailed column statistics. 10g and 11g: "FOR ALL COLUMNS SIZE AUTO"-this setting means that DBMS_STATS decides which columns to add histograms to make a better plan. In 11g, use the default value of 100% of ESTIMATE_PERCENT to collect statistics as accurately as possible. The time limit for collecting statistics is that in previous versions, 100% of samples were impossible, but 11g implemented a new hashing algorithm to calculate statistics instead of sorting (in 9i and 10g, the "slow" part is usually sorted). This significantly improves collection time and resource usage. 2. What statistics are collected and what is the use of table statistics NUM_ROWS: NUM_ROWS is the basis for calculating cardinality. If the table is a nested loop join (nested loops join) driven table, the number of rows of the table is particularly important, and the number of rows of the driven table determines the number of scans of the driven table. BLOCKS: the number of blocks used by the table. This parameter is used with DB_FILE_MULTIBLOCK_READ_COUNT to calculate the number of IO required for a full table scan. AVG_ROW_LEN: average size recorded in the table bytes EMPTY_BLOCKS: the amount of unused empty block data, the number of blocks used and the high water level. (ANALYZE) AVG_SPACE: average free space bytes (ANALYZE) CHAIN_CNT: row migration or line link

Index statistics

If you need to use an index, CBO needs the leading column of the index to be used in the query condition, and CBO prefers to use a full index scan or a full table scan if there are no query conditions or if the index column contains all the query results. BLEVEL: the height from the root node to the leaf node is used to calculate the cost of finding the leaf node. 0 indicates that the root block is the same as the leaf block. LEAF_BLOCKS: used to calculate the cost of full index scane and index range scan. CLUSTERING_FACTOR: calculates the degree of order of records in the table based on the values of the index. If the value is close to the number of blocks, the records in the table are stored sequentially. In this case, index entries in a single leaf block tend to point to rows in the same data block. If the value is close to the number of rows in the table, the record is stored out of order. In this case, it is unlikely that the same leaf block index entry points to rows in the same data block. DISTINCT_KEYS: number of different index values. In primary key constraints and unique indexes, this value is equal to the number of table records. AVG_LEAF_BLOCKS_PER_KEY: the average number of leaf blocks appears in the index at each different value, rounded to the nearest integer. Implement unique index and primary key constraints, which are always 1 (one). The average number of index values that appear at different leaf nodes, which is always the 1 in unique indexes and primary key constraints. AVG_DATA_BLOCKS_PER_KEY: the average number of data blocks in the table that point to the same index value. NUM_ROWS: number of rows in the index

Column statistics

NUM_DISTINCT: calculate the selectivity. (example: 1/NDV) LOW_VALUE and HIGH_VALUE: for all data types, CBO assumes a uniform distribution between high and low values. These values are used to calculate the selectivity of range scans. NUM_NULLS: for columns with null values, when using is null and is not null queries, calculate selective DENSITY: uncollected histogram Density = 1 / Number of distinct NON null values The number of distinct NON-null values for a column (COL1) on table TABLE1 can be obtained as follows: select distinct count (COL1) from TABLE1 where COL1 is not null; collect histogram A non-popular value is one that does not span multiple bucket end points. A popular value is one that spans multiple end points. For non-popular values the density is calculated as the number of non-popular values divided by the total number of values. Formula: Density = Number of non-popular values-- total number of values We only use the density statistic for non-popular values. Popular values calculate the selectivity of a particular column values by using histograms as follows: The Selectivity for popular values is calculated as the number of end points spanned by that value divided by the total number of end points. Formula: Selectivity = Number of end points spanned by this value-- total number of end points3, exercise 3Number of end points spanned by this value 10g and 11g automatic sampling rate test exec DBMS_STATS.DELETE_TABLE_STATS ('HT','A_AMT'); exec dbms_stats.gather_table_stats (' HT','A_AMT') Set line 200 col owner for a10 col table_name for a20 col LAST_ANALYZED for A30 SELECT owner, table_name, num_rows, sample_size, trunc (sample_size / num_rows * 100) estimate_percent, to_char (LAST_ANALYZED,'yyyy-mm-dd hh34:mi:ss') LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE owner='HT' and table_name='A_AMT'; oracle 10g test results SQL > exec dbms_stats.gather_table_stats ('HT','A_AMT'); PL/SQL procedure successfully completed. SQL > set line 200 SQL > col owner for a10 col table_name for a20 col LAST_ANALYZED for A30 SELECT owner, table_name, num_rows, sample_size, trunc (sample_size / num_rows * 100) estimate_percent, to_char (LAST_ANALYZED,'yyyy-mm-dd hh34:mi:ss') LAST_ANALYZED FROM DBA_TAB_STATISTICS WHERE owner='HT' and table_name='A_AMT' SQL > 2 3 4 5 6 7 8 OWNER TABLE_NAME NUM_ROWS SAMPLE_SIZE ESTIMATE_PERCENT LAST_ANALYZED- -HT A_AMT 59968 59968 2017-09-22 02:07:47 SQL >! date Fri Sep 22 02:08:00 CST 2017 SQL > select version from v$instance VERSION-10.2.0.5.0 SQL > oracle 11g test results OWNER TABLE_NAME NUM_ROWS SAMPLE_SIZE ESTIMATE_PERCENT- -HT A_AMT 59968 59968 1004, Summary 1. Oracle 10g may use a very small estimated percentage Manual collection of 100% or at least 30% is recommended. 2. In 11g, use the default value of 100% of ESTIMATE_PERCENT to collect statistics as accurately as possible. -- 11g:DBMS_STATS.AUTO_SAMPLE_SIZE (using a larger estimated percentage-100%) 3, histogram collection: METHOD_OPT: default, 10g and 11g: "FOR ALL COLUMNS SIZE AUTO"-this setting means that DBMS_STATS decides which columns to add the histogram to make a better plan. 4. Understand the collection of statistical information: table statistics, index statistics, column statistics, the content and function of each index.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.