How Oracle collects optimization statistics 07/04 Update SLTechnology News&Howtos

How Oracle collects optimization statistics

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how Oracle collects optimization statistics, which has a certain reference value, and interested friends can refer to it. I hope you will gain a lot after reading this article.

Introduction

The oracle optimizer detects all possible execution plans for a sql statement and selects the one with the lowest cost, where the cost represents the resource usage of a particular execution plan. In order for the optimizer to accurately judge the cost of an execution plan, it must have information about all the objects (tables or indexes) to be accessed by the sql statement as well as the system information to run the sql statement.

This necessary information is often referred to as optimizer statistics. Understanding and managing optimizer statistics is the key to optimizing sql execution. Knowing when and how to collect statistics is critical to maintaining acceptable performance.

This article describes when and how to collect statistics in common oracle database scenarios. It contains the following:

How to collect statistics

When to collect statistics

Improve the efficiency of collecting statistics

When not to collect statistics

Collect statistics for processing

How to collect statistics

The preferred way to collect statistics in oracle is to use the provided automatic statistics collection job.

Automatic Statistics Collection job

Statistics are collected for all database objects that are missing or expired during a predefined maintenance window of the oracle autotask task. Oracle attaches great importance to the statistics of database objects, so this object needs to be updated before processing.

Automatic statistics collection job is implemented using the dbms_stats.gather_database_stats_job_proc procedure, which uses the same default parameters as the dbms_stats.gather_*_stats process. This default value is valid in most cases. However, occasionally these things need to be changed.

The default values of the statistics collection parameters can be modified through the dbms_stats.set_*_pref process. For example, when you set 5% of the data in a table to change instead of the default value of 10%, you will recognize that its statistics are invalid. If you want to change the threshold for invalidation of this statistic, you can

Use the dbms_stats.set_table_prefs procedure to modify the stale_percent parameter.

Sys@JINGYONG > begin

2 dbms_stats.set_table_prefs ('SH','SALES','STALE_PERCENT','5')

3 end

4 /

The PL/SQL process completed successfully.

Using the dbms_stats.set_table_prefs process, the threshold stale_percent of table statistics invalidation was changed by 5%.

Manual statistics collection

If there is already a well-established statistics collection process or for some reason you want to disable automatic statistics collection for a specific user scheme and keep only the statistics of the collected data dictionary. You can use the dbms_stats.set_global_prefs procedure to change the autostats_target parameters

Replace auto for oracle.

Sys@JINGYONG > begin

2 dbms_stats.set_global_prefs ('AUTOSTATS_TARGET','ORACLE')

3 end

4 /

The PL/SQL process completed successfully.

Use the above code to change the automatic statistics collection job only automatically collects data dictionary statistics.

To collect statistics manually, you should use the dbms_stats package to find outdated analyze commands. DBMS _ stats package provides multiple dbms_stats.gather_*_stats procedures to collect statistics for user schema objects, data dictionaries, and fixed objects. Ideally, we should let these

The default values are used for all parameters except the user scheme name and the object name parameter. Default parameter values and adaptive parameter settings are sufficient in most cases.

Sys@JINGYONG > begin

2 dbms_stats.gather_table_stats ('SH','SALES')

3 end

4 /

The PL/SQL process completed successfully.

The two most frequently modified parameters are ESTIMATE_PERCENT and METHOD_OPT

ESTIMATE_PERCENT

The most common question in the process of collecting statistics is "what sampling size to use?" the setting related to this question is the ESTIMATE_PERCENT parameter in the dbms_stats.gather_*_stats process. This ESTIMATE_PERCENT parameter determines the number of rows used to calculate statistics

Percentage ratio. The statistics collected when all rows in the table are processed are the most accurate (such as 100% sampling). However, the larger the sample, the longer the collection operation. So what kind of sampling size is used to provide timely and accurate statistical information.

ESTIMATE_PERCENT before oracle11G

In oracle10g, the default value of ESTIMATE_PERCENT is changed from 100% to AUTO_SAMPLE_SIZE. The purpose of this AUTO_SAMPLE_SIZE is to let oracle determine the appropriate sample size for each table each time it collects statistics. This will allow oracle to automatically change each table

Its sampling size can still ensure the timely and effective collection of statistical information. This method is desirable for most tables, but it is problematic for tables with skewed data. When the data in the table is skewed, the sampling size usually chosen by the AUTO_SAMPLE_SIZE algorithm is too small.

The best way to do this is to specify the size of the ESTIMATE_PERCENT parameter manually.

ESTIMATE_PERCENT in oracle11g

A new hash-based sampling algorithm is introduced into oracle11g to provide accurate statistical data, which solves the two key problems of accuracy and speed. Its accuracy is close to the level of 100% sampling size, but the cost is equal to 10% sampling size. This new algorithm can only be used when any

It is used only when the ESTIMATE_PERCENT parameter in the dbms_stats.gather_*_stats procedure is set to AUTO_SAMPLE_SIZE.

The table below shows an earlier result of collecting statistics using 1% sampling, 100% sampling, and AUTO_SAMPLE_SIZE sampling. The first row compares the running time, and the subsequent rows will show the number of different values in the L_ORDERKDY and L_COMMENT columns calculated for each run (NDV)

1% sample auto_sample_size 100% sample

Elapse time (sec) 797 1908 18772

NDV for L_ORDERKEY Column 225000000 450000000 450000000

NDV for L_COMMENT Column 7244885 177499684 181122127

In this case, the execution time of the new auto_sample_size algorithm is 9 times faster and only 2.4 times slower than that of 1% sampling, and the quality of the statistical information provided is almost close to the level of 100% sampling (not enough to change the implementation plan).

Starting with oracle11g, it is strongly recommended that you use the default value of the estimate_percent parameter. If you manually set the estimate_percent parameter even though you set it to 100%, it will still use the old collection algorithm.

Method_opt

By far the most controversial parameter in the dbms_stats.gather_*_stats process is method_opt. This method_opt parameter controls whether a histogram is created during the collection of statistics. Histograms are a special type of column statistics that provide information about the distribution of column data in a table.

detailed information。 So that's why the histogram is controversial.

The creation of histogram will increase the time of statistical collection and the use of system resources, but the biggest problem is the interaction between histogram and bind peeking function and the impact on near popular values cardinality evaluation.

Histogram and bind peeking

The adverse effect of bind peeking on histogram has been watered down by introducing adaptive cursor sharing in oracle11g, but its effect is still felt today. To illustrate how adaptive cursor sharing solves this problem, let's first examine the cause of the problem.

Histogram and bind peeking before oracle11g

Prior to oracle11g, optimizing a statement that contains bound variables in the where clause peeps into the values of these bound variables the first time the statement is executed (in the hard parsing phase). The optimizer then determines the execution plan based on the value of the initialized binding variable. For subsequent execution

This query does not perform binding variable snooping (no hard parsing occurs), so for all execution of the latter statement, even if the binding variable changes, it will determine the execution plan generated by using the binding variable value set for the first time.

There is a histogram for columns that use bound variables in expressions, which will help to determine the optimal execution plan for initializing the values of bound variables. Therefore, the value execution plan of the bound variable may be different when the same statement depends on initializing hard parsing.

There are two ways to avoid this problem: delete the histogram and stop collecting the histogram or disable bind peeking binding variable snooping. You can determine which method is more suitable for your database based on whether all sql are using binding variables.

Prohibit the creation of histograms

If all sql statements in your environment use bound variables, it is best to delete the existing histogram and disable the creation of the histogram in the future collection of statistics. Banning the creation of the histogram ensures that the execution plan that depends on the values of the bound variables does not change and reduces collection statistics.

The time of the information. No histogram optimizer assumes that the different values in the column are evenly distributed in all rows and uses NDV (number distinct values) to determine the cardinality evaluation when snooping at the value of the initialization binding variable in the sql statement.

You can use the dbms_stats.delete_table_stats procedure to delete histogram information that exists in statistics.

Sys@JINGYONG > begin

2 dbms_stats.delete_table_stats ('SH','SALES')

3 end

4 /

The PL/SQL process completed successfully.

Next, you can use the dbms_stats.set_param procedure to change the default value of the method_opt parameter to prevent future histogram generation. This ensures that the dbms_stats.gather_*_stats process and the automatic statistics collection job will not collect histogram information in the future.

Sys@JINGYONG > begin

2 dbms_stats.set_param (pname= > 'METHOD_OPT',pval= >' FOR ALL COLUMNS SIZE 1')

3 end

4 /

The PL/SQL process completed successfully.

Finally, you can use the dbms_stats.gather_table_stats procedure to re-collect statistics on the affected objects.

Note that unwanted histograms can be deleted without deleting all histogram information by using dbms_stats.delete_column_stats and setting col_stat_type on the histogram in oracle11g. You can also use the dbms_stats.set_table_prefs procedure on a single table to

Forbids the creation of histograms for columns.

You know that histograms are also used for some join predicates and deleting histograms has an impact on the cardinality evaluation of join predicates. A safer way in this case is to disable binding variable snooping.

Disable binding variable snooping

If some sql statements in your environment use bind variables and some sql statements use literal values, then you should disable binding variable snooping. Peeking by disabling binding variables will prevent the optimizer from peeping into the initial binding variable values and will not use histograms for cardinality evaluation.

Instead, the optimizer distributes the different values in the columns evenly in the rows and uses NDV (number distinct values) to evaluate the cardinality. This will use a consistent execution plan for statements that use bound variables. But if the sql statement uses literal values, then you can still

Use histogram to get the optimal execution plan. You can disable binding variable snooping by setting the implicit parameter _ optim_peek_user_binds to false.

Histogram and binding variable peep in oracle11g

In oracle11g, the optimizer has enhanced a single sql statement that allows multiple versions of the execution plan to use bound variables. This function is called adaptive cursor sharing and relies on monitoring execution statistics to ensure that each bound variable value uses the correct execution path.

On the first execution, the optimizer will peek at the bound variable value and judge its execution plan based on the selectivity of the bound variable value, as in previous versions of oracle11g. If the optimizer believes that the best execution plan may depend on the value of the bound variable (for example, a histogram on a column or a range predicate

Or,) this cursor will be marked as bind sensitive. When a cursor is marked as bind sensitive.oracle, it monitors the behavior of the cursor using different binding values to determine whether to use a different execution plan.

If a different binding variable value is used in subsequent execution, the optimizer will use the same execution plan because oracle initially assumes that cursors can be shared. However, the execution statistics of the new bound variable are recorded and compared with the execution statistics of the previous bound variable value. If oracle

Judging that the new binding variable value causes a significant difference in the amount of data for the operation, then the new binding variable value will be hard parsed the next time it is executed and the cursor will be marked as bind-aware. Each bind_aware cursor is related to the selective range of the bound variable, so the cursor is only in this

The binding variable value of the statement can only be shared within a range that is considered shareable.

When another new binding variable value is used, the optimizer will find the best cursor based on the selective similarity of the binding variable value. If it cannot find a cursor, it will create a new one. If you execute a new cursor of the plan with an existing cursor

Similarly, the two cursors will be merged in the shared pool to save space. The selectivity range of cursors will be increased in order to include the value of the newly bound variable.

By allowing multiple execution plans for a single sql statement, histograms no longer have a negative effect on statements that use bound variables in oracle11g.

Histogram and near popular values

When the optimizer encounters a histogram on the predicate column in a where clause, it will evaluate the cardinality based on the occurrence frequency of the literal value. For example, suppose there is a highly balanced histogram on the cust_city_di column in the customers table under the sh user and there is a

The query of cust_city_id=51806. The optimizer first checks how many buckets the histogram has 51806 as its end point. In this case, the endpint is 51806 buckets with 136137138 and 139 (you can see user_histograms). Because the value of endpoint has two or more barrels to be considered as frequent, the optimizer will use the following formula to evaluate the cardinality:

(Number of bucket endpoints / total number of buckets) * number of rows in the table

In this case: 4amp 25455500874

Sys@JINGYONG > set autotrace traceonly

Sys@JINGYONG > select count (*) from sh.customers where cust_city_id=51806

Carry out the plan

Plan hash value: 296924608

| | 0 | SELECT STATEMENT | | 1 | 5 | 382 (3) | 00:00:04 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | * 2 | TABLE ACCESS FULL | CUSTOMERS | 874 | 4370 | 382 (3) | 00:00:04 |

Predicate Information (identified by operation id):

2-filter ("CUST_CITY_ID" = 51806)

However, if the predicate is cust_city_id=52500, which is not an endpoint for any bucket, then the optimizer uses the same formula for cardinality evaluation. The optimizer uses the following formula when the endpoint value appears in only one bucket or does not have this endpoint in any bucket:

Density * number of rows in the table

The value of density can be seen in user_tab_col_statistics, and its value will no longer be used by the optimizer after oracle10.2.0.4. This value is recorded for backward compatibility and will be used in previous versions of oracle9i and oracle10g. In addition, if the version of the parameter optimizer_features_enable setting is less than 10.2.0.4, then the density in the view will still be used.

Sys@JINGYONG > select column_name,density from dba_tab_col_statistics where owner

= 'SH' and table_name='CUSTOMERS'

COLUMN_NAME DENSITY

CUST_ID. 000018018

CUST_FIRST_NAME. 000769231

CUST_LAST_NAME. 001101322

CUST_GENDER. 5

CUST_YEAR_OF_BIRTH. 013333333

CUST_MARITAL_STATUS. 090909091

CUST_STREET_ADDRESS. 000019629

CUST_POSTAL_CODE. 001605136

CUST_CITY. 001612903

CUST_CITY_ID. 002179391

CUST_STATE_PROVINCE. 006896552

CUST_STATE_PROVINCE_ID. 000009009

COUNTRY_ID. 000009009

CUST_MAIN_PHONE_NUMBER. 000019608

CUST_INCOME_LEVEL. 083333333

CUST_CREDIT_LIMIT. 125

CUST_EMAIL. 000588582

CUST_TOTAL 1

CUST_TOTAL_ID. 000009009

CUST_SRC_ID 0

CUST_EFF_FROM 1

CUST_EFF_TO 0

CUST_VALID. 5

23 lines have been selected.

Sys@JINGYONG > select column_name,num_buckets,histogram from dba_tab_col_statisti

Cs where owner='SH' and table_name='CUSTOMERS' and column_name='CUST_CITY_ID'

COLUMN_NAME NUM_BUCKETS HISTOGRAM

CUST_CITY_ID 254 HEIGHT BALANCED

Sys@JINGYONG > show parameter optimzer_features_enable

Sys@JINGYONG > show parameter optimizer_features_enable

NAME TYPE VALUE

Optimizer_features_enable string 11.2.0.1

Sys@JINGYONG > set autotrace traceonly

Sys@JINGYONG > select count (*) from sh.customers where cust_city_id=52500

Carry out the plan

Plan hash value: 296924608

| | 0 | SELECT STATEMENT | | 1 | 5 | 382 (3) | 00:00:04 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | * 2 | TABLE ACCESS FULL | CUSTOMERS | 66 | 330 | 382 (3) | 00:00:04 |

Predicate Information (identified by operation id):

2-filter ("CUST_CITY_ID" = 52500)

Now set optimizer_features_enable to 10.2.0.3

Sys@JINGYONG > alter session set optimizer_features_enable='10.2.0.3'

The session has changed.

Sys@JINGYONG > set autotrace traceonly

Sys@JINGYONG > select count (*) from sh.customers where cust_city_id=52500

Carry out the plan

Plan hash value: 296924608

| | 0 | SELECT STATEMENT | | 1 | 5 | 382 (3) | 00:00:04 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | * 2 | TABLE ACCESS FULL | CUSTOMERS | 121 | 605 | 382 (3) | 00:00:04 |

Predicate Information (identified by operation id):

2-filter ("CUST_CITY_ID" = 52500)

The current cardinality is 121-55500. 002179391 the density of the CUSTRACTYID is. 002179391.

These nearly popular values are classified as non-popular values using the same calculation formula as non-popular values. For example, if the predicate is cust_city_id=52114, its evaluation cardinality will be 66 lines. The same as the cardinality of the non-popular value of 52500, but cust_city_id=52114 actually has 227 rows of records.

Sys@JINGYONG > select count (*) from sh.customers where cust_city_id=52114

Carry out the plan

Plan hash value: 296924608

| | 0 | SELECT STATEMENT | | 1 | 5 | 382 (3) | 00:00:04 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | * 2 | TABLE ACCESS FULL | CUSTOMERS | 66 | 330 | 382 (3) | 00:00:04 |

Predicate Information (identified by operation id):

2-filter ("CUST_CITY_ID" = 52114)

Sys@JINGYONG > select count (*) from sh.customers where cust_city_id=52114

COUNT (*)

two hundred and twenty seven

The only way to make the optimizer aware of these near popular values is to use dynamic sampling. Dynamic sampling collects additional statement-specific object statistics when optimizing a sql statement. In this example, the dynamic sampling prompt is added to the query and the optimizer gets a more accurate cardinality evaluation.

Sys@JINGYONG > select / * + dynamic_sampling (a 2) * / count (a.cust_id) from sh.custo

Mers a where a.cust_city_id=52114

Carry out the plan

Plan hash value: 296924608

| | 0 | SELECT STATEMENT | | 1 | 5 | 382 (3) | 00:00:04 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | * 2 | TABLE ACCESS FULL | CUSTOMERS | 246 | 410 | 382 (3) | 00:00:04 |

Predicate Information (identified by operation id):

2-filter ("A". "CUST_CITY_ID" = 52114)

Note

-dynamic sampling used for this statement (level=2)

Dynamic sampling can be used to improve the cardinality evaluation of non-popular value in highly balanced histograms.

The possible problems and solutions of using histogram in oracle10g have been discussed above. It is recommended to use the default value of parameter METHOD_OPT and adaptive cursors when collecting statistics from oracle11g.

If you want to manually set the method_opt parameter value without using the default, be sure to set only the columns that need the histogram. Setting method_opt to for all columns size 254 will cause oracle to collect histogram information for each column. This is unnecessary for collecting statistics, which will increase running time and waste system resources, as well as increase the space to store these statistics.

Also avoid setting method_opt to for all index columns size 254.It causes oracle to collect histogram information on every indexed column, which is also a waste of system resources. Another side effect of this setting is that it prevents oracle from collecting basic column statistics for columns that do not have an index.

Pending statistics

When deciding to change the default values of the parameters of the dbms_stats_gather_*_stats process, it is strongly recommended to verify these changes before modifying them in the production system. If you don't have a complete test environment, you should use pending statistics. Using pending statistics instead of commonly used data dictionary tables, statistics stored in pending tables can be enabled and tested in a controlled way before they are sent and used by the system. To activate the collection of pending statistics, you need to use the dbms_stats.set_*_prefs procedure to change the parameter publish from the default true to false for objects that want to create pending statistics. The following example enables pending statistics for the sales table under the sh user and collects statistics for the sales table.

Sys@JINGYONG > begin

2 dbms_stats.set_table_prefs ('SH','SALES','PUBLISH','FALSE')

3 end

4 /

The PL/SQL process completed successfully.

Enable pending statistics by setting publish to false

Collect object statistics normally

Sys@JINGYONG > begin

2 dbms_stats.gather_table_stats ('SH','SALES')

3 end

4 /

The PL/SQL process completed successfully.

The statistics collected for these objects can be displayed by querying the * _ tab_pending_stats view:

Sys@JINGYONG > select * from dba_tab_pending_stats where owner='SH'

OWNER TABLE_NAME PARTITION_NAME SUBPARTITION_NAME NUM_ROWS BLOCKS AVG_ROW_LEN SAMPLE_SIZE LAST_ANALYZED

SH SALES 918843 1769 29 918843 2013-12-18 22

SH SALES SALES_1995 2000 2013-12-18 22

SH SALES SALES_1996 2000 2013-12-18 22

SH SALES SALES_H1_1997 2000 2013-12-18 22

SH SALES SALES_H2_1997 2000 2013-12-18 22

SH SALES SALES_Q1_1998 43687 90 29 43687 2013-12-18 22

SH SALES SALES_Q1_1999 64186 121 29 64186 2013-12-18 22

SH SALES SALES_Q1_2000 62197 119 29 62197 2013-12-18 22

SH SALES SALES_Q1_2001 60608 119 30 60608 2013-12-18 22

SH SALES SALES_Q1_2002 00 00 2013-12-18 22

SH SALES SALES_Q1_2003 00 00 2013-12-18 22

SH SALES SALES_Q2_1998 35758 76 29 35758 2013-12-18 22

SH SALES SALES_Q2_1999 54233 103 29 54233 2013-12-18 22

SH SALES SALES_Q2_2000 55515 109 30 55515 2013-12-18 22

SH SALES SALES_Q2_2001 63292 119 30 63292 2013-12-18 22

SH SALES SALES_Q2_2002 00 00 2013-12-18 22

SH SALES SALES_Q2_2003 00 00 2013-12-18 22

SH SALES SALES_Q3_1998 50515 95 29 50515 2013-12-18 22

SH SALES SALES_Q3_1999 67138 120 29 67138 2013-12-18 22

SH SALES SALES_Q3_2000 58950 110 30 58950 2013-12-18 22

SH SALES SALES_Q3_2001 65769 124 29 65769 2013-12-18 22

SH SALES SALES_Q3_2002 00 00 2013-12-18 22

SH SALES SALES_Q3_2003 00 00 2013-12-18 22

SH SALES SALES_Q4_1998 48874 108 29 48874 2013-12-18 22

SH SALES SALES_Q4_1999 62388 114 29 62388 2013-12-18 22

SH SALES SALES_Q4_2000 55984 106 30 55984 2013-12-18 22

SH SALES SALES_Q4_2001 69749 136 29 69749 2013-12-18 22

SH SALES SALES_Q4_2002 00 00 2013-12-18 22

SH SALES SALES_Q4_2003 00 00 2013-12-18 22

29 rows selected

You can use pending statistics by setting the initialization parameter optimizer_use_pending_stats to true through an alter session command. Any sql running in this session will use these new unpublished statistics after pending statistics are enabled. The optimizer uses the current statistics in the standard data dictionary table when there are no pending statistics for the accessed tables under the workload. When you verify these pending statistics, you can use the dbms_stats.publish_pending_stats process to publish them.

When to collect statistics

In order to select the best execution plan optimizer, there must be representative statistics, representative statistics are not the latest statistics, but this set of statistics can help the optimizer determine the correct number of rows expected at each step of the execution plan.

Automatic Statistics Collection job

Oracle automatically collects statistics on all objects that have lost or invalidated statistics in a predefined maintenance window (10:00 to 2 a.m. on weekdays and 6 a.m. to 2 a.m. on weekends).

You can use Enterprise Manager or use the dbms_scheduler and dbms_auto_task_admin packages to change the maintenance window.

Disable the collection task if you already have a well-established statistics collection process or if you want to disable automatic statistics collection for some reason:

Sys@JINGYONG > begin

2 dbms_auto_task_admin.disable (

3 client_name= > 'auto optimizer stats collection'

4 operation= > null

5 window_name= > null)

6 end

7 /

The PL/SQL process completed successfully.

Manual statistics collection

If you plan to maintain optimizer statistics manually, you will need to determine when to collect them.

Based on failure statistics, automatically collect job or when new data is loaded in the system you can tell when to collect statistics. If there is no significant change in the basic data, it is not recommended to constantly re-collect statistics, which will only waste system resources.

If the data is only loaded into the system on a predefined ETL or ELT job, then statistics collection should be scheduled as part of this process. Note that if you use partition swap loading and want to take advantage of incremental statistics, you will need to collect statistics after the exchange process is complete.

However, if there are a large number of online transactions in the system that insert only a small amount of data but these operations occur every day, you will need to determine when your statistics will fail and trigger the statistics collection job. If you plan to rely on the stale_ stats column in user_tab_statistics to determine whether statistics are invalid, you should be aware that they are updated on a daily basis. If you need more and more timely information, such as when your table has performed DML operations, you will need to check the user_tab_modifications view, which will show the insert,update,delete operations performed on each table, whether the table has performed truncated and calculate whether it has been invalidated. It is important to note that this information is automatically updated from memory when it is negative. If you need the latest information, you need to use the dbms_stats.flush_database_monitoring_info function to manually refresh.

Block out-of-range conditions

Whether you use automatic statistics to collect job or manual statistics, if end users start querying newly inserted data before statistics are collected, even if less than 10% of the data has changed, it is possible to get a suboptimal execution plan due to invalid statistics. The most common reason for this problem is that the value provided by predicates in the where clause exceeds the range that the minimum / maximum column statistics can represent. This is often called an out-of-range error.

This is common in partitioned tables. A new partition has just been added to an existing range partition table and the record has just been inserted into the partition. End users start querying the new data before collecting statistics on the new partition. For partitioned tables, you can use the dbms_stats.copy_table_stats procedure (available from oracle10.2.0.4) to block conditional expressions that are out of range. This process copies the statistics of the original partition data to the statistics of the newly created partition. It can copy statistics of dependent objects: columns, local (partitioned) indexes, and so on. Statistics copied until statistics are collected for partitions can only be used as a temporary workaround. Copied statistics are not a substitute for actual collected statistics.

Note that usually dbms_stats.copy_table_stats can only adjust partition statistics, not global or table-level statistics. If you want to update the partition column at the global level when copying statistics, you need to set the flags parameter in dbms_stats.copy_table_stats to 8. 0.

For non-partitioned tables you can manually set the maximum value of the column through the dbms_stats.set_column_stats process. Usually this method is not recommended and is not a substitute for real collected statistics.

Improve the efficiency of collecting statistics

With the increase of the amount of data and the shortening of the maintenance window, it is very important to collect statistics in time. Oracle provides a variety of methods to improve the speed of statistics collection.

Use parallelism

Several parallel methods for collecting Statistical Information

Internal object parallelism

External object parallelism

The combination of internal object parallelism and external object parallelism

Internal object parallelism

Internal object parallelism is controlled by the degree parameter of the dbms_stats.gather_*_stats process. The degree parameter controls the number of parallel server processes used to collect statistics.

Usually oracle uses the value of the parallel attribute in the data dictionary table as the parameter value of the specified parallel server process. All tables in the oracle database have a degree property with a default value of 1. 0. Setting this parameter for large tables that want to collect statistics can improve the speed of statistics collection.

You can also set degree to auto_degree.oracle to automatically determine the appropriate number of parallel service processes based on the size of an object to collect statistics. This value ranges from 1-small objects (serial operations) to large objects' default_degree ((PARALLEL_THREADS_PER_CPU X CPU_COUNT).

You will notice that setting degree for a partition table means using multiple parallel server processes for each partition to collect statistics but not for different partitions at the same time. Statistics can only be collected after one partition has been collected.

External object parallelism

In oracle11.2.0.2, external object parallelism is introduced and controlled by global statistics gathering preference concurrent. When concurrent is set to true, oracle uses oracle job job scheduling and advanced queue groups to create and manage multiple statistics collection job concurrent execution. Through oracle to make full use of multiple cpu to collect statistics concurrently on multiple tables and (sub) partitions to reduce the whole time of statistics collection.

The maximum number of active parallel statistics collection job is controlled by the job_queue_processes parameter. The job_queue_processes default is set to 1000. This is usually too high for parallel statistics collection operations, especially when parallel execution is also in use. The most efficient value should be twice the total number of cpu cores (this is the parameter value for each node in rac). You need to set this parameter at the system level (alter system command or init.ora file) rather than at the session level (alter session).

A parallel combination of internal and external

Each statistics collection job in a parallel statistics collection operation can be performed in parallel. The combination of parallel statistics collection and parallel execution can greatly reduce the time of collecting statistics.

When using parallel execution as part of a parallel statistics collection operation, you should disable the parallel_adaptive_multi_user initialization parameter to prevent parallel job from being degraded to serial operation. It should disable this parameter at the system level rather than at the session level:

Sys@JINGYONG > alter system set parallel_adaptive_multi_user=false

The system has changed.

Incremental statistics

Statistics collection for partitioned tables consists of table-level (global statistics) and (child) partition-level statistics collection operations. If the incremental preference of a partitioned table is set to the value of the parameter granularity in true,dbms_stats.gather_*_stats, including global and estimate_percent is set to auto_sample_size,oracle, the global level statistics will be obtained by scanning these partitions that have been added or modified instead of the entire table.

Incremental global statistics are calculated from the summary stored in each partition in the table. A summary is the metadata of the partition and the statistics of the columns in the partition. Aggregating partition-level statistics and summary information for each partition will accurately generate global-level statistics, thus eliminating the need to scan the entire table. When a new partition is added to the table, you only need to collect statistics for the new partition. Table-level statistics will be calculated automatically and accurately using the profile of the new partition and the profile of the existing partition.

Note that partition statistics do not aggregate from subpartition statistics when incremental statistics are enabled.

When not to collect statistics

Although oracle needs accurate statistics to choose an optimal execution plan, in some cases collecting statistics is difficult, expensive or can not be completed in time, so there should be a harmonious alternative strategy.

Volatile table

A volatile table is a table in which the amount of data changes greatly over time. For example, a list of order lines, which is empty at the beginning of the day, will fill up over time. When a processed order is deleted from the table, the table is emptied again at the end of the day.

If you rely on automatic statistics collection job to maintain statistics for tables like this, then the statistics displayed by these tables are always empty because the job is collected at night. However, this table may have thousands of rows recorded during the day's work.

For such a table, it is best to collect a representative set of statistics during the day and lock them. Locking these statistics will prevent the automatic statistics collection job from overwriting them. The optimizer uses dynamic sampling to collect basic statistics on tables when compiling sql statements before optimizing sql statements. Although the statistical information generated by dynamic sampling is of poor quality or as complete as the statistics collected using dbms_stats packages, it is sufficient in most cases.

Global temporary table

Global temporary tables are often used to store intermediate results in the context of an application. A global temporary table shares its definition for all users with reasonable permissions, but the data is only visible in their respective sessions. No physical storage is allocated until the data is inserted into the table. A global temporary table may be transaction specific (delete rows on commit) or session-specific (save rows on commit). Collecting statistics on a transaction specific table will result in truncate operations on that table. On the contrary

Statistics can be collected from global temporary tables. However, the collection of statistics will be based only on the session-private data content, but these statistics will be available to all sessions accessing the table.

If there is a global temporary table with continuous rows of data and each session will have the same amount of data and the same value, then a set of representative statistics should be collected in one session and locked to prevent other sessions from overwriting them. Note that automatic statistics collection job does not collect statistics for global temporary tables.

Intermediate worksheet

An intermediate worksheet is a typical ELT process or part of a complex transaction. These tables are only written once, read once, and then truncate or delete. Under the circumstances,

The cost of collecting statistics outweighs its benefits because statistics can only be used once. On the contrary, dynamic sampling is more useful in this case. It is recommended to lock the statistics of intermediate worksheets to permanently prevent the automatic statistics collection job to collect statistics from them.

Collect other types of statistics

Because only cost-based optimizers are now supported, all tables in the database need to have statistics, including all data dictionary tables (tables owned by sys,system users and tables built into system and sysaux tablespaces) and x$ tables used through dynamic v$ performance views.

Data dictionary statistics

The statistics of the data dictionary table are collected by the automatic statistics collection job in the late maintenance window. It is strongly recommended that you allow oracle automatic statistics collection job to maintain data dictionary statistics even if you turn off automatic statistics collection for the main application solution. You can use the dbms_stats.set_global_prefs procedure to change autostats_target from auto to oracle

Sys@JINGYONG > begin

2 dbms_stats.set_global_prefs ('AUTOSTATS_TARGET','ORACLE')

3 end

4 /

The PL/SQL process completed successfully.

Fixed object statistics

Automatic Statistics Collection job does not collect statistics for fixed objects. Unlike other database tables when optimization statistics are lost, dynamic sampling cannot be used automatically for invoking the X $table in sql statements. If their statistics are lost, the optimizer uses predefined default statistics. These default statistics may be unrepresentative and may lead to the selection of sub-optimal execution plans, which can lead to serious performance problems in the system. If this is the cause of performance problems, it is strongly recommended that you manually collect statistics for fixed objects.

You can use the dbms_stats.gather_fixed_objects_stats procedure to collect statistics for fixed objects. Because if there is a representative workload in the system, it is important to collect statistics about these fixed objects. In large-scale systems, it is not always feasible to collect statistics on fixed objects because it requires additional resources. If you cannot collect statistics for fixed objects during peak load, you should collect statistics for three key types of fixed object tables after the system load is reduced:

Structural data--, like controlfile contents.

Session based data-such as v$session, v$access

Workload data-such as v$sql, v$sql_plan

It is recommended to re-collect fixed object statistics after the master database or application is upgraded, implement a new module or change the configuration of the database. For example, increasing the size of an SGA that contains buffer cache and shared pool information will significantly change, such as the x$ table used by v$buffer_pool or v$shared_pool_advice views.

System statistics

System statistics enable the optimizer to obtain more accurate cost values for each step in the execution plan by using the actual system hardware information related to executing the statement, such as cpu speed and IO performance. System statistics are enabled by default and are automatically initialized with default values, which are representative for most systems.

Thank you for reading this article carefully. I hope the article "how Oracle collects and optimizes statistics" shared by the editor will be helpful to everyone. At the same time, I also hope that you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.