Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Oracle 11g collects multiple columns of statistical information in detail

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Preface

Typically, when we submit a SQL statement to an Oracle database, Oracle chooses the best way to execute it, which is done by querying the optimizer Query Optimizer. CBO (Cost-Based Optimizer) is the query optimizer mode that Oracle uses by default. In CBO, the generation of SQL execution plan is a goal-oriented implementation plan exploration process to find the optimal cost (Cost). The so-called Cost is a quantitative indicator that integrates CPU and IO consumption, and the cost of each execution plan is the numerical value estimated by the internal formula of the optimizer.

When we write SQL statements, we often encounter the situation that there are multiple conditions after the where clause, that is, we filter the data according to the conditions of multiple columns. By default, oracle multiplies the selectivity of multiple columns to get the selectivity of the where statement, which may cause the selection rate to be inaccurate and cause the optimizer to make a wrong judgment. In order to enable the optimizer to make an accurate judgment and generate an accurate execution plan, oracle introduced the collection of multi-column statistics in the 11g database. Through the multi-conditional query of the test table, this paper introduces the importance of collecting multi-column statistical information.

I. Environmental preparation

We carried out experiments in Oracle 11g.

SQL > SQL > select * from v$version BANNER----Oracle Database 11g Enterprise Edition Release 11.2.0.3.0-ProductionPL/SQL Release 11.2.0.3.0-ProductionCORE 11.2.0.3. 0 ProductionTNS for Linux: Version 11.2.0.3.0-ProductionNLSRTL Version 11.2.0.3.0-ProductionSQL >

Create the test table hoegh under the hr user, and insert the data repeatedly, the amount of data is equivalent to 16 employees tables (the total number of rows is 1712-107-16).

SQL > SQL > conn hr/hrConnected.SQL > SQL > create table hoegh as select * from employees;Table created.SQL > select count (*) from hoegh; COUNT (*)-107SQL > SQL > insert into hoegh select * from hoegh;107 rows created.SQL > / 214rows created.SQL > / 428rows created.SQL > / 856 rows created.SQL > commit;Commit complete.SQL > select count (*) from hoegh; COUNT (*)-1712SQL >

Second, collect statistical information in accordance with conventional methods

SQL > SQL > exec dbms_stats.gather_table_stats (\'HR\',\ 'HOEGH\'); PL/SQL procedure successfully completed.SQL >

View the execution plan of a where statement that executes a single condition

SQL > SQL > explain plan for select * from hoegh where employee_id=110;Explained.SQL > select * from table (dbms_xplan.display) PLAN_TABLE_OUTPUT----Plan hash value: 774871165UV- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |-- -| 0 | SELECT STATEMENT | | 16 | 1104 | 8 (0) | 00:00:01 | | * 1 | TABLE ACCESS FULL | HOEGH | 16 | 1104 | 8 (0) | 00:00:01 |- -Predicate Information (identified by operation id):-PLAN_TABLE_OUTPUT-- -1-filter (\ "EMPLOYEE_ID\" = 110) 13 rows selected.SQL >

You can see from the execution plan that 16 rows of records are returned, and there is no problem with the result. But where did this 16 come from? first, we need to understand how the selectivity and the number of rows returned are calculated:

Selection rate (selectivity) = 1 / unique value in this case

Number of rows returned = selection rate (selectivity) * Total number of table records

That is to say, in this query, the selection rate = 1max 107 and the number of rows returned = 1max 107 171216

4. View the execution plan of the where statement that executes two conditions

SQL > SQL > explain plan for select * from hoegh where employee_id=110 and email=\ 'JCHEN\'; Explained.SQL > SQL > select * from table (dbms_xplan.display) PLAN_TABLE_OUTPUT----Plan hash value: 774871165UV- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |-- -| 0 | SELECT STATEMENT | | 1 | 69 | 8 (0) | 00:00:01 | | * 1 | TABLE ACCESS FULL | HOEGH | 1 | 69 | 8 (0) | 00:00:01 |- -Predicate Information (identified by operation id):-PLAN_TABLE_OUTPUT-- -1-filter (\ "EMPLOYEE_ID\" = 110 AND\ "EMAIL\" =\ 'JCHEN\') 13 rows selected.SQL >

From the execution plan, you can see that one row of records has been returned, and what is the fact? Let's execute this sql statement.

SQL > select count (*) from hoegh where employee_id=110 and email=\ 'JCHEN\'; COUNT (*)-16SQL >

From this, we can see that the test table hoegh has 16 rows of data that meet the query criteria, while the execution plan prompts only 1 row, which is an error. What's going on? there's something wrong with the selectivity that we mentioned at the beginning.

In this multi-column conditional query statement, the selection rate = 1 from hoegh where employee_id=110 and email= 107, the number of rows returned = 1 SQL 107, the number of rows returned is 107 SQL > exec dbms_stats.gather_table_stats (\'HR\',\ 'HOEGH\', method_opt= >\ 'for columns (employee_id,email)\'); PL/SQL procedure successfully completed.SQL > SQL > explain plan for select * from hoegh where employee_id=110 and email=\ 'JCHEN\'; Explained.SQL > select * from table (dbms_xplan.display) PLAN_TABLE_OUTPUT----Plan hash value: 774871165UV- -| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |-- -| 0 | SELECT STATEMENT | | 16 | 1152 | 8 (0) | 00:00:01 | | * 1 | TABLE ACCESS FULL | HOEGH | 16 | 1152 | 8 (0) | 00:00:01 |- -Predicate Information (identified by operation id):-PLAN_TABLE_OUTPUT-- -1-filter (\ "EMPLOYEE_ID\" = 110 AND\ "EMAIL\" =\ 'JCHEN\') 13 rows selected.SQL >

From the result of the execution plan, after collecting multi-column statistics in the same sql query statement, the selectivity of Oracle changes from error to right, this is because the two conditions in the sql statement are related, that is, employee_id and email are unique in the employees table, and both can uniquely identify a row of records. Before collecting multi-column statistics, Oracle did not know that the two query conditions were related, so when calculating the selection rate (selectivity), it simply adopted the multiplication method.

Summary

The above is the whole content of this article, I hope that the content of this article can bring some help to your study or work, if you have any questions, you can leave a message and exchange, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report