SQL optimized statistics and indexes 07/01 Update SLTechnology News&Howtos

SQL optimized statistics and indexes

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In most cases, sql optimization scenarios occur when we want our sql statements to run faster, and we tend to take some "steps" to adjust the execution path of sql so that they can finish faster.

When I first joined the dba industry, I often heard my mentor say to me, "20% of sql optimization skills can optimize 80% of sql performance problems." In most cases, sql optimization is not complex, but the key step is whether we can find the problem, and then we can prescribe the right medicine. Today we first learn about sql from two aspects: statistics and indexing in sql optimization.

For the cbo schema, statistics are extremely important, which is an important factor affecting the execution of sql statements. For the database with large data changes, it is easy to have the problem that the statistical information is too old, that is, the table information recorded by the system is inconsistent with the actual table information. Then the optimizer may choose an inappropriate execution path (non-optimal path) when choosing the optimal execution path. Therefore, our front-line operators often need to collect statistical information on a regular basis. So the question is, what is the basis for collecting statistics?

Since we want to collect, we can often use the LAST_ANALYZED field in the dba_tables view to see the date of the last collection and make a pre-judgment as to whether it is necessary to do a statistical collection.

Sys@RAC > alter session set NLS_DATE_FORMAT='YYYY-MM-DD HH24:MI:SS';Session altered. Sys@RAC > select owner,TABLE_NAME,LAST_ANALYZED from dba_tables where table_name='T1' OWNER TABLE_NAME LAST_ANALYZED- TEST T1 2017-10-28 15:08:25

Since statistical information has not been collected for nearly a year, there is a great suspicion that the statistical information of this kind of table is not accurate.

Another point is that when we look at the output dynamic sampling used for thisstatement at the end of the sql execution plan, it shows that dynamic sampling has occurred in this sql execution. Dynamic sampling is a technology invented in order to make the CBO optimizer get enough information to ensure that the correct execution plan is made without analysis of segments (table, index, partition). It can be regarded as a means for the system to collect statistical information automatically. When the object has no statistical information (that is, no analysis), the dynamic sampling technology can abstract the statistical information needed by CBO by directly sampling data blocks from the objects that need to be analyzed.

Dynamic sampling requires additional consumption of database resources, so if the frequency of SQL execution is very high, the consumption of these resources may also have a great impact on the efficiency of sql execution, which requires our attention. In such an environment, dynamic sampling is not suitable. Therefore, in order to prevent the additional consumption of system performance caused by dynamic sampling, we recommend that we regularly collect statistics on missing or old statistical tables in the database.

In both cases, we generally need to collect statistics. So, how to collect it is another problem.

Example:

BEGINDBMS_STATS.GATHER_TABLE_STATS (OWNNAME = > 'xxxxx', TABNAME = >' SR_PROBLEM_xxxx', PARTNAME= > 'P201802', ESTIMATE_PERCENT = > 1, METHOD_OPT = > 'FOR ALL COLUMNS SIZE 1', NO_INVALIDATE= > FALSE FORCE= > TRUE, CASCADE = > TRUE, DEGREE = > 10) END;/

Parameter description:

Ownname: the owner of the table to be analyzed

Tabname: table name to analyze

Partname: the name of the partition, useful only for partition tables or partition indexes

Estimate_percent: percentage of sampling rows, value range [0.000001100]

Method_opt: determines how histograms information is counted. The value of method_opt is as follows (default is FOR ALL COLUMNS SIZE AUTO)

For columns SIZE | REPEAT | AUTO | SKEWONLY: count the value range of histograms,N in the specified column [1254]

Force: collect statistics even if the table is locked

Cascade: collects information about the index. Default is FALSE.

Degree: determines the degree of parallelism. Default is null.

So are there only two situations in which statistics need to be collected? The answer is obviously no. And the statistical information collection is rich in parameters, and this is just an example, which can be used in daily scenarios. In many cases, it needs to be adjusted dynamically according to the actual situation.

Let's take a look at the index. In most scenarios, the reasonable use of the index will improve the performance of the sql statement exponentially, but the index is good, but it must not be overused, because maintaining the index requires a certain amount of resources.

If there are a large number of data records in a data table, when a query with specified conditions is executed on the table, the conventional query method will read out all the records, and then compare each record read with the query conditions. finally, the record that meets the condition is returned. In this way, the time cost of operation and the cost of Ithumb O are very high. In this case, it can be considered to reduce the system overhead by establishing indexes.

If you want to query a specified record in a table, you must traverse the entire table without an index, and after you have an index, you only need to find the index field value in the index that meets the query criteria, and you can quickly find the corresponding record in the table through the ROWID saved in the index. The description of the image is similar to the function of the dictionary directory, which can quickly query the relevant information.

According to the storage mode of index data, indexes can be divided into B-tree index, bitmap index, reverse key index and function-based index, and can be divided into unique index and non-unique index according to the uniqueness of index columns. creating unique indexes can also ensure the uniqueness of the data in related columns. According to the number of index columns, it can be divided into single-column index and compound index.

To build and plan a reasonable index that can speed up sql access efficiency, you should pay attention to the following points:

1. The index should be based on a list of frequent references to WHERE clauses, and if you frequently use a column or columns on a large table as a condition to perform index operations, and the number of rows retrieved is less than 15% of the total number of rows (empirical value), then you can consider establishing a combined index on these columns.

two。 If you often need to sort operations based on a column or several columns, you should index those columns to speed up data sorting.

3. Limit the number of indexes on the table. Indexes are mainly used to speed up queries, but slow down DML operations. The more indexes, the slower the DML operation, which in particular greatly affects the speed of INSERT and DELETE operations. Therefore, when planning an index, you must carefully weigh the requirements of the query and DML.

4. The characteristics of columns that are not suitable for indexing: columns that rarely search; columns with fewer values; columns of type blob.

5. During the creation process, do not interrupt, because it will cause the information of the index already contained in the oracle data dictionary, but there is no actual allocation of segments for the index, resulting in re-establishment of the index and deletion of index errors.

| | author profile |

Li Yujun, Technology Database engineer of Waldorf

Mainly participate in the company's product implementation, testing, maintenance and optimization.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.