Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Histogram of Oracle

2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

1 the meaning of histogram

In the Oracle database, CBO defaults to assume that the data of the target column is evenly distributed between its minimum LOW_VALUE and maximum HIGH_VALUE, and calculates the selectable rate after applying query conditions to the target column and the Cardinality of the result set according to this uniform distribution principle, and then calculates the cost value and selects the execution plan accordingly. However, the principle that the data of the target column is uniformly distributed is not always correct. In the actual system, we can easily see that the data distribution of some target columns is uneven, even extremely inclined, extremely uneven. If the selectable rate and Cardinality are calculated according to the principle of uniform distribution, and the cost is calculated and the execution plan is selected according to this, then the execution plan chosen by CBO may be unreasonable or even wrong.

Consider an example of CBO choosing the wrong execution plan due to the extremely uneven distribution of data:

Zx@ORCL > create table T1 (a number (5), b varchar2 (5)); Table created.zx@ORCL > declare cnt number (5): = 1; 2 begin 3 loop 4 insert into T1 values (1 values); 5 if cnt=10000 then 6 exit; 7 end if; 8 cnt:=cnt+1; 9 end loop; 10 insert into T1 values (2 Magazine 2'); 11 commit; 12 end; 13 / PL/SQL procedure successfully completed.zx@ORCL > select bcount (*) from T1 group by b B COUNT (*)-1 100002 1zx@ORCL > create index t1_ix_b on T1 (b); Index created.

Collect statistics for table T1 in a way that does not collect histogram statistics:

Zx@ORCL > exec dbms_stats.gather_table_stats (USER,'T1',estimate_percent= > 100 methodological opt = > 'for all columns size 1'); PL/SQL procedure successfully completed.zx@ORCL > select * from T1 where breadwinner 2; A B-2 2zx@ORCL > select * from table (dbms_xplan.display_cursor (null,null,'all')) PLAN_TABLE_OUTPUT- -SQL_ID 5p7b772tpcvm4 Child number 0--select * from T1 where b='2'Plan hash value: 3617692013- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |-| 0 | SELECT STATEMENT | | | | 7 (100) | * 1 | TABLE ACCESS FULL | T1 | 5001 | 25005 | 7 (0) | 00:00:01 | -. Omit part of the output

You can see from the execution plan that the execution plan is a full table scan, but it is clear that you should go to the index T1_IX_B. This is because CBO defaults to the view that the data in column B is uniformly distributed, while the distinct values on column B are only 1 and 2, so the selectable rate evaluated by CBO to impose an equivalent query condition on column B is 1 stroke 2, and the Cardinality of the result set that applies the equivalent query condition on column B is 5001:

Zx@ORCL > select round (10001* (1go 2)) from dual;ROUND (10001* (1max 2))-5001

It is precisely because CBO estimates that the Cardinality of the above equivalent query to return the result set is 5001, which already accounts for half of the total number of records in table T1, so CBO thinks that it is no longer appropriate to go to the index T1_IX_B on column B at this time, so he chose the full table scan of column T1. But in fact, CBO's evaluation of the Cardinality of the result set to be returned by the above equivalent query is seriously inconsistent with the fact that the evaluated value is 5001, but in fact it is only 1, several orders of magnitude less.

CBO chose the execution plan here, and the correct execution plan should be index T1_IX_B. The root cause of CBO choosing the wrong execution plan is that the distribution of column B of table T1 is actually extremely uneven (column B has a total of two values, of which 10000 are 1s and only one is 2). The principles used by CBO at the beginning of the evaluation were wrong, and of course the results were wrong.

To solve the above problems, Oracle introduces histogram (Histogram). Histogram is a special kind of column statistics, which describes in detail the data distribution of the target column. The histogram is actually stored in the data dictionary base table HISTGRM$, and the histogram statistics of the table, the partition of the partition table and the sub-partition of the partition table can be viewed through the data dictionaries DBA_TAB_HISTOGRAMS, DBA_PART_HISTOGRAMS and DBA_SUBPART_HISTOGRAMS, respectively.

If the histogram is collected on the target column, it means that CBO will no longer think that the data on the target column is evenly distributed, and CBO will use the histogram statistics on the target column to calculate the selectable rate after applying query conditions to the column and the Cardinality of the returned result set, and then calculate the cost and select the corresponding execution plan.

Also using the above example, after collecting histogram statistics for column B of table T1, CBO correctly evaluates that the Cardinality that returns the result set is not 5001 but 1, and then correctly chooses the execution plan for indexing T1_IX_B:

Zx@ORCL > exec dbms_stats.gather_table_stats (USER,'T1',estimate_percent= > 100 methodological opt = > 'for all columns size auto',cascade= > true); if PL/SQL procedure successfully completed.# clears shared_pool, do not execute zx@ORCL > alter system flush shared_pool;System altered.zx@ORCL > select * from T1 where breadwinner 2' in the production system. A B-2 2zx@ORCL > select * from table (dbms_xplan.display_cursor (null,null,'all')) PLAN_TABLE_OUTPUT- -SQL_ID 5p7b772tpcvm4 Child number 0--select * from T1 where b='2'Plan hash value: 3579362925- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |- -| 0 | SELECT STATEMENT | 2 (100) | 1 | TABLE ACCESS BY INDEX ROWID | T1 | 1 | 5 | 2 (0) | 00:00:01 | | * 2 | INDEX RANGE SCAN | T1_IX_B | 1 | | 1 | (0) | 00:00:01 | -. Omit part of the output

Therefore, the histogram is introduced by Oracle specifically to accurately evaluate the selectivity of the target column and the Cardinality of the result set. It describes the data distribution of the target column in detail, and records these distributions in the data dictionary, which is equivalent to directly telling CBO about the data distribution of these columns, so CBO can make a relatively accurate judgment based on this.

2 types of histograms

The histogram in the Oracle database uses a method called Bucket to describe the data distribution of the target column. This is a bit like the Bucket of the hash algorithm, it is actually a logical concept, equivalent to grouping, each Bucket is a group, and each Bucket stores data on one or more target columns. Oracle describes a Bucket in two dimensions, ENDPOINT NUMBER and ENDPOINT VALUE. Oracle records the dimensions ENDPOIONTNUMBER and ENDPOINT VALUE of each Bucket in the data dictionary base table HISTGRM$, thus achieving the goal of recording the histogram statistics of the target column in the data dictionary. The dimensions ENDPOINT NUMBER and ENDPOINT VALUE correspond to the fields ENDPOINT_NUMBER/BUCKET_NUMBER and ENDPOINT_VALUE in the data dictionaries DBA_TAB_HISTOGRAMS, DBA_PART_HISTOGRAMS and DBA_SUBPART_HISTOGRAMS, respectively. At the same time, Oracle also records the total number of Bucket occupied by the histogram statistics of the target column. You can view the total number of Bucket corresponding to the histogram of the target column through the data dictionaries DBA_TAB_COL_STATISTICS, DBA_PART_COL_STATISTICS and the field NUM_BUCKETS in DBA_SUBPART_COL_STATISTICS.

Before Oracle 12c, the histograms in the Oracle database were divided into two types, Frequency and HeightBalanced (there are also histograms of types Top-Frequency and Hybrid in Oracle 12c). Before Oracle 12, if the number of Buckt stored in the data dictionary describing the target column histogram was equal to the number of distinct values of the target column, then this type of histogram was a Frequency type histogram. If the number of Bucket stored in the data dictionary describing the target column histogram is less than the number of distinct values of the target column, then this type of histogram is the Height Balanced type histogram.

2.1histogram of Frequency type

For a histogram of Frequency type, the number of Bucket in the target column histogram is equal to the number of distinct in the target column. In this case, the target column has multiple distinct values, and Oracle stores how many records are stored in the data dictionaries DBA_TAB_HISTOGRAMS, DBA-PART_HISTOGRAMS, and DBA_SUBPART_HISTOGRAMS. Each record does not represent a description of one of the Bucket. The field ENDPOINT_VALUE in the above data dictionary records these distinct values. The field ENDPOINT_NUMBER is a cumulative value. In fact, we can subtract the ENDPOINT_NUMBER value of the previous record from the ENDPOINT_ number value of a record to get the number of records corresponding to the ENDPOINT_ value of the record itself.

In fact, the histogram of the Frequency type records each distinct value of the target column in the data dictionary, and records in the data dictionary how many records each distinct value has in the target table, so that CBO can know very clearly the actual data distribution of the target column in the target table. The collection method corresponding to this Frequency type histogram is not suitable for situations where the target column has a large number of distinct values, so Oracle has the following restrictions on Frequence type histograms: the number of Bucket corresponding to Frequency type histograms cannot exceed 254. note that this restriction will no longer be imposed in Oracle 12c, while in Oracle 12c, the number of Bucket corresponding to Frequency type histograms can exceed 254s. That is, a histogram of type Frequency applies only to situations where the number of distinct values of the target column is less than or equal to 254.

Zx@ORCL > create table h (x number); Table created.zx@ORCL > declare I number;beginfor I in 1.. 3296 loopinsert into h values (1); 5 6 end loop; 7 for i in 1.. 100 loop 8 insert into h values (3); 9 end loop; 10 for i in 1.. 798 loop 11 insert into h values (5); 12 end loop; 13 for i in 1.. 3970 loop 14 insert into h values (7); 15 end loop; 16 for i in 1.. 16293 loop 17 insert into h values (10); 18 end loop 19 for i in 1... 3399 loop 20 insert into h values (16); 21 end loop; 22 for i in 1.. 3651 loop 23 insert into h values (27); 24 end loop; 25 for i in 1.. 3892 loop 26 insert into h values (32); 27 end loop; 28 for i in 1.. 3521 loop 29 insert into h values (39); 30 end loop; 31 for i in 1.. 1080 loop 32 insert into h values (49); 33 end loop; 34 commit; 35 end 36 / PL/SQL procedure successfully completed.zx@ORCL > select count (*) from h; COUNT (*)-40000

According to the definition of the Frequency type histogram, if the Frequency type histogram is collected for column X, there should be 10 records in DBA_TAB_HISTOGRAMS, and the ENDPOINT_VALUE of these 10 records records these 10 distinct values, and the corresponding ENDPOINT_NUMBER is the number of row records accumulated up to this distinct value. The ENDPOINT_VALUE and ENDPOINT_NUMBER of these 10 records can actually be simulated with the following SQL display:

Zx@ORCL > select x as x account count (*) as cardinality,sum (count (*)) over (order by x range unbounded preceding) as cum_cardinality from h group by x X CARDINALITY CUM_CARDINALITY- 1 3296 3296 3 100 3396 5 798 4194 7 3970 8164 10 16293 24457 16 3399 27856 27 3651 31507 32 3892 35399 39 3521 38920 49 1080 4000010 rows selected.

Column X in the above query results simulates the ENDPOINT_VALUE of the 10 records in DBA_TAB_HISTOGRAMS, and column CUM_CARDINALITY simulates the ENDPOINT_NUMBER of the 10 records in DBA_TAB_HISTOGRAMS.

Actually collect histogram statistics for column x of table h

Zx@ORCL > exec dbms_stats.gather_table_stats (ownname= > USER,tabname= > 'Hang Magi methodological opt = >' for columns size auto X-ray Magazine casca = > true,estimate_percent= > 100), PL/SQL procedure successfully completed.

After collecting statistics, it is found that the value of the field HISTOGRAM corresponding to column x in DBA_TAB_COL_STATISTICS is NONE, which indicates that there is still no histogram statistics on column x:

Zx@ORCL > select table_name,column_name,num_distinct,density,num_buckets,histogram from dba_tab_col_statistics where table_name='H' TABLE_NAME COLUMN_NAM NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM--H X 10. 1 1 NONE

This phenomenon is normal. Because Oracle adheres to the principle of automatically collecting histogram statistics, it collects histogram statistics only for those columns that have been used (that is, columns that have appeared in the where condition of the SQL statement). Oracle records the usage of each column in each table in the table SYS.COL_USAGE$, and Oracle queries SYS.COL_USAGE$, when collecting histogram statistics automatically. If you find that there is no usage record for the target column, then it does not collect histogram statistics for the target column. Table H has just been created and column X has not been used in the where condition of the SQL statement, so histogram statistics are not collected for column X here.

The prerequisites for collecting histograms are: 1. The data on the column is unevenly distributed, 2. The column has been used in the where condition of sql

Zx@ORCL > select name,intcol# from sys.col$ where obj# = (select object_id from dba_objects where object_name='H') NAME INTCOL# -- X 1zx@ORCL > select obj# Intcol#,equality_preds from sys.col_usage$ where obj# = (select object_id from dba_objects where object_name='H') No rows selectedzx@ORCL > select count (*) from h where Xero10; COUNT (*)-16293zx@ORCL > select obj#,intcol#,equality_preds from sys.col_usage$ where obj# = (select object_id from dba_objects where object_name='H') OBJ# INTCOL# EQUALITY_PREDS- 88766 1 1

Once again, histogram statistics are automatically collected for column X of table H:

Zx@ORCL > exec dbms_stats.gather_table_stats (ownname= > USER,tabname= > 'Hang Magna methodological opt = >' for columns size auto Xray Magazine cascade` > true,estimate_percent= > 100); PL/SQL procedure successfully completed.zx@ORCL > select table_name,column_name,num_distinct,density,num_buckets,histogram from dba_tab_col_statistics where table_name='H' TABLE_NAME COLUMN_NAM NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM--H X 10. 0000125 10 FREQUENCY

In addition, the value of the field HISTORAM corresponding to column x in DBA_TAB_COL_STATISTICS has changed from NONE to RREQUENCY, which shows that there is a histogram of Frequency type on column X now.

You can see the specific information about the Frequence type histogram of column x from DBA_TAB_HISTOGRAMS:

Zx@ORCL > select table_name,column_name,endpoint_number,endpoint_value from dba_tab_histograms where table_name='H' TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE--H X 3296 1H X 3396 3H X 4194 5H X 8164 7H X 24457 10H X 27856 16H X 31507 27H X 35399 32H X 38920 39H X 40000 4910 rows selected.

As can be seen from the results, the 10 records in DBA_TAB_HISTOGRAMS are exactly the same as those simulated before.

After introducing the meaning of Frequency type histogram, let's discuss the inherent defects of histogram statistics for text type fields in Oracle database.

In the Oracle database, if histogram statistics are collected for fields that are open for text, Oracle will only take out the first 32 bytes (Byte) of the text value of that text field (actually only the first 15 bytes) and convert it to a floating-point number, which is then stored in the above data dictionary as its histogram statistics. The inherent trap of this processing mechanism is that for text-type fields that are more than 32 bytes long, as long as the first 32 bytes of the text value of the corresponding record are the same, Oracle will assume that when collecting histogram statistics, the text value of the record in that field is the same, even if they are not the same. This congenital defect will directly affect CBO's evaluation of the selectivity of relevant text type fields and the Cardinality that returns the result set.

Use the previous test table T1, where column B is a text field

Zx@ORCL > select baccount count (*) from T1 group by bauxite B COUNT (*)-- 1 100002 1zx@ORCL > select count (*) from T1 where breadwinner 1' COUNT (*)-10000zx@ORCL > exec dbms_stats.gather_table_stats (USER,'T1',estimate_percent= > 100 methods = > 'for columns size auto B'); PL/SQL procedure successfully completed.zx@ORCL > select table_name,column_name,num_distinct,density,num_buckets,histogram from dba_tab_col_statistics where table_name='T1' TABLE_NAME COLUMN_NAM NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM--T1 B 2. 000049995 2 FREQUENCYT1 A 2. 5 1 NONE

View histogram details of column B from DBA_TAB_HISTOGRAMS

Zx@ORCL > select table_name,column_name,endpoint_number,endpoint_value from dba_tab_histograms where table_name='T1' TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE--T1 B 10000 2.5442E+35T1 B 10001 2.5961E+35T1 A 0 1T1 A 1 2

As you can see from the results, floating-point numbers are converted from text-based'1' and'2'.

Conversion method:

Select dump ('1century no. 16) from dual

Fill in 0 on the right side of 0x31 to a length of 15 bytes and convert it to a decimal number:

Zx@ORCL > select dump ('1century pr 16) from dual;DUMP (' 1x pr 16)-Typ=96 Len=1: 31zx@ORCL > select to_number ('310000000000000000000000000000000000000000) from dual TO_NUMBER ('3100000000000000000000000000000000000000)-2.5442E+35

The converted value is consistent with the data in the data dictionary.

Create another test table T2 with a text field B with a length of 33 bytes:

Zx@ORCL > create table T2 (b varchar2 (33)); Table created.zx@ORCL > insert into T2 values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'); 1 row created.zx@ORCL > insert into T2 values (' aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa2'); 1 row created.zx@ORCL > insert into T2 values ('aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa2'); 1 row created.zx@ORCL > commit;Commit complete.

The first 32 bytes of the three records are all the same, all 32 a, but there are two distinct values

Zx@ORCL > select bfocus length (b) from T2 B LENGTH (B)- -aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1 33aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa2 33aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa2 33zx@ORCL > select count (distinct (b)) from T2 COUNT (DISTINCT (B))-2

Use one of the following B to have a record of the usage of column B in SYS.COL_USAGE$:

Select count (*) from T2 where breadth aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa2'

Collect histograms automatically for column B:

Zx@ORCL > exec dbms_stats.gather_table_stats (USER,'T2',estimate_percent= > 100 methodological opt = > 'for columns size auto B'); PL/SQL procedure successfully completed.

Now the field HISTOGRAM corresponding to column B in DBA_TAB_COL_STATISTICS has the value of FREQUENCY (note: 10.2.0.4 and 11.2.0.1 means FREQUENCY,11.2.0.4 is HEIGHT BALANCED), indicating that histogram statistics of Frequency type are already available on column B:

SQL > select table_name,column_name,num_distinct,density,num_buckets,histogram from dba_tab_col_statistics where table_name='T2' TABLE_NAME COLUMN_NAM NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM--T2 B 1. 166666667 1 FREQUENCY

Note that in the above query results, there is only one different distinct for field B of text type, and only one number of Bucket for the histogram of type Frequency, which is obviously not true. In fact, this has shown that for text-type fields that exceed 32 bytes, as long as the first 32 bytes of the text value of the corresponding record are the same, Oracle will assume that the text value of these records in that field is the same when collecting histogram statistics, even if they are not the same in fact.

The specific contents of the histogram statistics of the Frequency type of column B are seen in DBA_TAB_HISTOGRAMS, which further confirms the above conclusion:

SQL > select table_name,column_name,endpoint_number,endpoint_value from dba_tab_histograms where table_name='T2' TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE--T2B 3 5.0563E+35SQL > select dump ('axiomachine 16') from dual DUMP-Typ=96 Len=1: 61SQL > select to_number ('6161616161616161616161616161616161) from dual TO_NUMBER ('616161616161616161616161616161)-5.0563E+35

It is consistent with each other by calculation.

Execute the following sql on table T2

Select count (*) fromt2 where bust aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'

The actual Cardinality of the returned result set is 1

However, from the results of the execution plan, we can see that CBO mistakenly estimated that the Cardinality of the result set returned by the above SQL is 3:

SQL > select count (*) from T2 where breadaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa PLAN_TABLE_OUTPUT- -SQL_ID 3n69wfhjuj4sg Child number 0--select count (*) from T2 where b='aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa1'Plan hash value: 3321871023 Murray- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |-| 0 | SELECT STATEMENT | | | 3100 | | 1 | SORT AGGREGATE | | 1 | 34 | * 2 | TABLE ACCESS FULL | T2 | 3 | 102 | 3 (0) | 00:00:01 |-| -

This is because the Frequency type histogram of column B in DBA_TAB_HISTOGRAMS has only one Bucket, which makes Oracle think that there is only one distinct text value in table T2 with 32 'astats, so for the above SQL, Oracle will think that the SQL is going to access all the data in table T2.

2.2 histogram of Height Balanced type

Before the previous introduction to Oracle 12c, the number of Bucket corresponding to a histogram of type Frequence cannot exceed 254. what if the number of distinct values of the target column is greater than 254? At this point, Oracle collects Height Balanced-type histograms for the target column.

Zx@ORCL > create table T1 (id number); Table created.zx@ORCL > begin 2 for i in 1.. 254 loop 3 for j in 1.. I loop 4 insert into T1 values (I); 5 end loop; 6 end loop; 7 commit; 8 end; 9 / PL/SQL procedure successfully completed.# distinction values are 254zx@ORCL > select count (distinct (id) from T1) COUNT (DISTINCT (ID))-25 columns execute a query to make the id column collect histogram information in the where condition zx@ORCL > select * from T1 where id=1; ID- columns zx@ORCL > exec dbms_stats.gather_table_stats (ownname= > USER,tabname= >'T1 methods true,estimate_percent= > 100) PL/SQL procedure successfully completed.zx@ORCL > col table_name for a10zx@ORCL > col column_name for a10zx@ORCL > set linesize 200zx@ORCL > select table_name,column_name,num_distinct,density,num_buckets,histogram from dba_tab_col_statistics where table_name='T1' TABLE_NAME COLUMN_NAM NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM--T1 ID 254.251 FREQUENCYzx@ORCL > select endpoint_value 000015372 Endpoint_number from dba_tab_histograms where owner=user and table_name='T1' ENDPOINT_VALUE ENDPOINT_NUMBER--1 1 2 3 3 6 4 10 5 15. 252 31878 253 32131 254 32385254 rows selected.

You can see from the output that there is already a histogram of type Frequency on the ID column.

Now insert another record with different ID values in table T1, then delete the histogram information on column ID, then column ID column to collect histogram information again, and then query the type of ID column histogram.

Zx@ORCL > insert into T1 values; 1 row created.zx@ORCL > commit;Commit complete.zx@ORCL > select count (distinct id) from T1 prospect count (DISTINCTID)-255zx@ORCL > exec dbms_stats.gather_table_stats (ownname= > USER,tabname= >'T1 id',cascade= > true,estimate_percent= > 100) PL/SQL procedure successfully completed.zx@ORCL > exec dbms_stats.gather_table_stats (ownname= > USER,tabname= >'T1 magic method > > 'for columns size auto id',cascade= > true,estimate_percent= > 100); PL/SQL procedure successfully completed.zx@ORCL > select table_name,column_name,num_distinct,density,num_buckets,histogram from dba_tab_col_statistics where table_name='T1' TABLE_NAME COLUMN_NAM NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM--T1 ID 255. 004243247 254 HEIGHT BALANCED

From the output, we can see that the histogram type on the ID column has changed from Frequency to Height Balanced.

For Height Balanced-type histograms, that is, when the number of Bucket in the target column histogram is less than the number of distinct values of the target column, Oracle will first sort all the records of the target table according to the target column, and then divide the total number of records in the target table by the number of Bucket to be used to determine the number of sorted records that need to be described in each Bucket. Assuming that the total number of records in the target table is M, the number of Bucket to be used is N, and the number of sequenced records that need to be described in each Bucket is O, then O=M/N

Then Oracle records the Bucket number with the ENDPOINT_NUMBER of each record in DBA_TAB_HISTOGRAMS, DBA_PART_HISTOGRAMS, and DBA_SUBPART_HISTOGRAMS, and the Bucket number starts at 0 and goes all the way to N. Among them, the minimum value of the target column is stored in the No. 0 Bucket, so the ENDPOINT_ number value of the record where the No. 0 Bucket is located is 0, and the ENDPOINT_NUMBER of the records where the other Bucket is located increases from 1 to N. except that the ENDPOINT_VALUE value of the record where the No. 0 Bucket is located is the minimum value of the target column, the ENDPOINT_ value of all other records actually stores the maximum value of the target column in all records described by the Bucket described by this record. That is, except for Bucket 0, the ENDPOINT_ value values of all other records are calculated using the following formula:

Finally, Oracle uses a storage-saving technique when storing these ENDPOINT_NUMBER and ENDPOINT_VALUE in the data dictionary: merging records with different values of the adjacent common endpoint _ NUMBER but the same ENDPOINT_ value, and storing only the merged records in the data dictionary. For example, if the ENDPOINT_NUMBER of barrel 2 is 2, the ENDPOINT_VALUE of barrel 3 is 3, and its ENDPOINT_NUMBER is 3, and its ENDPOINT_VALUE is also P, then Oracle will merge the above adjacent records and store only the merged values in the data dictionary. At this time, the ENDPOINT_NUMBER of the merged record is 3PowerEndPoint value is P, that is to say, the ENDPOINT_ number value of the record where the histogram of Height Balanced type in DBA_TAB_HISTOGRAMS, DBA_PART_HISTOGRAMS and DBA_SUBPART_HISTOGRAMS is located may be discontinuous, and the ENDPOINT_VALUE,Oracle of the merged record in the data dictionary is called popular value. Obviously, the greater the difference between the ENDPOINT_ number value of the record where the popular value is located and the ENDPOINT_ number value of its previous record, the greater the percentage of the popular value in the target table and the larger the corresponding Cardinality.

Let's use the previous H table to illustrate the Height Balanced type histogram.

First delete the histogram of the Frequency type that already exists in Table H

Zx@ORCL > exec dbms_stats.gather_table_stats (ownname= > USER,tabname= > 'Hang Magnum methodological opt = >' for columns size 1 X-ray Magazine cascade> true,estimate_percent= > 100), PL/SQL procedure successfully completed.

For histograms of type Height Balanced, the number of Bucket of the target column histogram is less than the number of distinct values of the target column. Table H here has 10 distinct values, and if the number of Bucket is specified as 5 when collecting histogram statistics, then Oracle should collect histograms of type Height Balanced. Here, when collecting histogram statistics, the value of method_opt is specified as' for columns size 5 X', which means that the number of Bucket used is 5 when collecting histograms for column X (note that the number of Bucket here does not include the number of Bucket 0):

Zx@ORCL > exec dbms_stats.gather_table_stats (ownname= > USER,tabname= > 'Hang Magna methodological opt = >' for columns size 5 X-ray Magazine cascade` > true,estimate_percent= > 100); PL/SQL procedure successfully completed.zx@ORCL > select table_name,column_name,num_distinct,density,num_buckets,histogram from dba_tab_col_statistics where table_name='H' TABLE_NAME COLUMN_NAM NUM_DISTINCT DENSITY NUM_BUCKETS HISTOGRAM--H X 10. 085276318 5 HEIGHT BALANCED

From the output, we can see that the value of the field HISTOGRAM for the X column is HEIGHT BALANCED, which indicates that there is already a histogram of type Height Balanced on the X column.

Now follow the algorithm just described to calculate the details of the histogram statistics of the Height Balanced type stored in DBA_TAB_HISTOGRAM.

Now the number of Bucket that needs to be used is 5 (excluding Bucket 0) the total number of records in table H is 40000, so the number of records that need to be described in each Bucket is 40000 Bucket 50008000.

The ENDPOINT_NUMBER of the record where Bucket 0 is 0 is the lowest of the 10 distinct values in table H.

Use the following formula to calculate the ENDPOINT_ value of each Bucket record:

# Bucket1zx@ORCL > select max (x) from (select x from h order by x) where rownumselect max (x) from (select x from h order by x) where rownumselect max (x) from (select x from h order by x) where rownumselect max (x) from (select x from h order by x) where rownumselect max (x) from (select x from h order by x) where rownumselect table_name,column_name,endpoint_number,endpoint_value from dba_tab_histograms where table_name='H' TABLE_NAME COLUMN_NAM ENDPOINT_NUMBER ENDPOINT_VALUE--H X 1 7H X 3 10H X 4 32H X 5 49H X 0 1

You can see that the actual query results are consistent with our analysis.

3 the collection method of histogram

Collecting histogram statistics in an Oracle database is usually achieved by specifying the input parameter METHOD_OPT when calling the stored procedure GATHER_DATABASE_STATS/GATHER_DICTIONARY_STATS/GATHER_SCHEMA_STAS/GATHER_TABLE_STATS in the DBMS_STATS package to collect statistics. Of course, you can also use the ANALYZE command to collect histogram statistics, such as using the command "analyze table h compute statistics forcolumns X" to collect histogram statistics for column X of table H. Because the ANALYZE command is inherently flawed in collecting statistics, we only discuss using the DBMS_STATS package to collect histogram statistics.

The input parameter METHOD_OPT of the above stored procedure in the DBMS_STATS package can accept the following input values:

FOR all [indexes | HIDDEN] COLUMNS [size_clause]

FOR columns [size _ clause] column | attribute [size_clause] [, column | attribution [size _ clause].]

The size_clause must conform to the following format:

SIZE {integer | REPEAT | AUTO | SKEWONLY}

The meaning of the options in the size_clause subname is as follows:

Integer: the number of Bucket for the histogram, which must be in the range of 1 to 254, where 1 means to delete histogram statistics on the target column.

REPEAT: collect histogram statistics only for columns that already have histogram statistics.

AUTO: let Oracle decide whether to collect histogram statistics for the target column and which type of histogram to use.

SKEWONLY: collect histogram statistics only for columns with uneven data distribution.

Use the table EMP under the SCOTT user as an example to illustrate:

Scott@ORCL > desc emp Name Null? Type-- -EMPNO NOT NULL NUMBER (4) ENAME VARCHAR2 (10) JOB VARCHAR2 (9) MGR NUMBER (4) HIREDATE DATE SAL NUMBER (7 dint 2) COMM NUMBER (7) DEPTNO NUMBER (2)

1) collect histogram statistics automatically for all indexed columns of table EMP:

Exec dbms_stats.gather_table_stats (ownname= > 'SCOTT',tabname= >' EMP',method_opt= > 'for all indexed columns size auto')

2) collect histogram statistics automatically for columns EMPNO and DEPTNO on table EMP:

Exec dbms_stats.gather_table_stats (ownname= > 'SCOTT',tabname= >' EMP',method_opt= > 'for columns size auto EMPNO DEPTNO')

3) collect histogram statistics for columns EMPNO and DEPTNO on table EMP, and specify that the number of Bucket is 10:

Exec dbms_stats.gather_table_stats (ownname= > 'SCOTT',tabname= >' EMP',method_opt= > 'for columns size 10 EMPNO DEPTNO')

4) collect histogram statistics for columns EMPNO and DEPTNO on table EMP, and specify that the number of Bucket for column EMPNO is 10 and the number of Bucket for column DEPTNO is 5:

Exec dbms_stats.gather_table_stats (ownname= > 'SCOTT',tabname= >' EMP',method_opt= > 'for columns EMPNO size 10 DEPTNO size 5')

5) delete only the histogram statistics of the EMPNO listed in table EMP:

Execdbms_stats.gather_table_stats (ownname= > 'SCOTT',tabname= >' EMP',method_opt= > 'for columns EMPNO size 1')

6) Delete the histogram statistics of all columns on the table EMP:

Exec dbms_stats.gather_table_stats (ownname= > 'SCOTT',tabname= >' EMP',method_opt= > 'for all columns size 1')

Refer to "SQL Optimization based on Oracle"

Official document: http://docs.oracle.com/cd/E11882_01/server.112/e41573/stats.htm#PFGRF30103

Http://docs.oracle.com/cd/E11882_01/server.112/e41573/stats.htm#PFGRF003

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report