Oracle histogram 07/04 Update SLTechnology News&Howtos

Oracle histogram

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Histogram Overview:

Histogram is a statistical tool, not exclusive to Oracle. Usually, it is shown as a geometric table, which is drawn based on the data collected from the actual environment about the quality distribution of a certain aspect of the managed object. It is usually drawn as a series of connected rectangles with the quantity as the bottom edge and the frequency as the height, so the histogram is also called mass distribution graph in statistics. Histogram is a tool to describe the quality of data distribution in Oracle. It will plot the distribution of data according to the number and frequency of different values in a column, so as to guide the optimizer to make the right choice according to the distribution of the data. In some cases, the distribution of values in the columns of a table will affect the optimizer's decision to use an index or perform a full table scan. This occurs when the value of the where clause has a disproportionate number of values, making a full table scan cheaper than index access. In this case, if there is a reasonable and correct histogram above the filter predicate column of the where clause, it will play a great role in making the right choice for the optimizer, minimizing the execution cost of SQL statements and improving performance. When analyzing a table or index, the histogram is used to record the distribution of data. By obtaining this information, the cost-based optimizer can decide to use indexes that will return a small number of rows, rather than indexes that return many rows based on constraints. The use of the histogram is not restricted by the index, and you can build the histogram on any column of the table. The main reason for constructing a histogram is to help the optimizer make better planning when the data in the table is severely skewed. For example, if one or two values make up most of the data in the table (data skew), the associated index may not help reduce the number of Icano required to satisfy the query. Creating a histogram lets the cost-based optimizer know when it is most appropriate to use an index or when it should return 80% of the records in the table based on the value in the WHERE clause. Histograms are generally recommended in the following situations: 1. When the Where clause refers to a column with an obvious deviation in the column value distribution: when the deviation is so obvious that the value in the WHERE clause will cause the optimizer to choose a different execution plan. At this point, the histogram should be used to help the optimizer correct the execution path. (note: if the query does not reference this column, there is no point in creating a histogram. This error is common, and many DBA will create a bar chart on the deviation column, even if no query references the column. ) 2. When column values lead to incorrect judgment: this usually occurs when multiple table joins, for example, suppose we have a five-item table join with only 10 rows in the result set. Oracle joins tables in such a way that the result set (set cardinality) of the first join is as small as possible. By carrying less load in the intermediate result set, the query will run faster. To minimize intermediate results, the optimizer attempts to evaluate the set cardinality of each result set during the analysis phase performed by SQL. Having a histogram on a deviating column will greatly help the optimizer make the right decision. If the optimizer makes an incorrect judgment on the size of the intermediate result set, it may choose a table join method that is not optimized. So adding a histogram to this column often provides the optimizer with the information it needs to use the best join method.

How to use a histogram:

Create a histogram for all index fields of the table:

Exec dbms_stats.gather_table_stats ('HDC_DAG','EB_JBXX',method_opt = >' for all indexed columns size 254 cascade = > true)

Create a histogram for the specified field:

Exec dbms_stats.gather_table_stats ('HDC_DAG','EB_JBXX',method_opt = >' for columns size 100 xb',cascade = > true)

Exec dbms_stats.gather_table_stats ('HDC_DAG','DA_GRJBXX', estimate_percent = > dbms_stats.auto_sample_size,method_opt = >' for columns xm size auto', cascade = > true)

Delete the histogram:

Exec dbms_stats.gather_table_stats ('user', 'table', cascade= > false, method_opt= > 'for columns column size 1')

Exec dbms_stats.delete_column_stats ('HDC_DAG','EB_JBXX','JLLYXTBH',col_stat_type = >' HISTOGRAM')

View the histogram bucket (bucket) information for the column:

Select * from user_histograms t where t.TABLE_NAME = 'EB_JBXX' and t.COLUMN_NAME =' XM'

The analysis of histogram is based on the analysis of index.

Considerations for creating a histogram:

Size 10 specifies the number of bucket needed for the histogram. The so-called bucket can be understood as a container for storing data. This container will average the data into each bucket as far as possible according to the distribution of the data. For example, if a table has 6000 records, there will be an average of 600records in each bucket, but this is only an average, and the number of records in each bucket will not be equal. It will concentrate the high-frequency occurrence records in some buckets, and the low-frequency records will be stored in a small number of barrels, so if the number of buckets (bucket) is increased appropriately, the number of barrels used in the high-frequency records will be reduced, and the statistical results will be more accurate (it can avoid being forced to store the low-frequency records in the high-frequency barrels, affecting the optimizer to generate an accurate execution plan). So the accuracy of the histogram information we get is determined by two values, one is the number of BUCTET, the other is the number of NUM_DISTINCT. So when creating a histogram, you should first estimate the number of buckets (bucket) correctly. By default, the histogram of Oracle produces 75 buckets. The value of SIZE can be specified between 1 and 254.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.