Rewriting of SQL Optimization based on SQL feature 07/19 Update SLTechnology News&Howtos

Rewriting of SQL Optimization based on SQL feature

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Preface

Today, Lao K continues to share the ninth issue with you.

At the weekend, Lao K stayed at home and watched the two wonderful "human and dog" battles. Lao K is neither a fan of science and technology, nor a fan of go, but he has a lot of feelings about it: AlphaGo just keeps learning go through the way of blogging left and right. However, relying on its optimal learning algorithm (learning method) can reach the top of the human go level in just a few months. On the other hand, Lee se-dol relies on his existing experience and human unique inspiration to create the "hand of God". After all, mankind can still defeat AlphaGo, who has super computing power. These can not help but remind Lao K of the most artistic part of his work process-"SQL tunning". On the one hand, we should constantly learn to accumulate and use different optimization methods, and at the same time, we should have more imagination and inspiration when necessary. in this way, we can play our own "divine hand" in the face of different SQL problems.

Well, the case Lao K shared with you today is the case of SQL tuning, but Lao K hopes that you can experience the optimization method and way of thinking in the process of SQL tunning, so that you can really achieve the stone of its mountain and attack jade. At the same time, if you think Lao K's method is not bad, you might as well gently forward it and share it with more ORACLE technology enthusiasts around you.

The problem analyzed today is a SQL statement from the customer DBA, which has been bothering him for some time. I hope Lao K will analyze and solve it together. It is not particularly difficult for Lao K to solve this problem, but in the process of analyzing this problem, Lao K gives several directions of optimization, and finally chooses the best way for both the whole system and the SQL, and finally performs very well in the test environment.

Part 1

Set out questions and list information

For SQL tunning, K's first focus is on the SQL text, execution plan, and execution statistics, and of course don't forget to pay attention to the system / database version.

1.1 introduction to the environment

Operating system AIX 6.1

Database ORACLE 11.2.0.3 two-node RAC

1.2 SQL text

1.3 implement the plan

1.4 execution Statistics

The information is all here. What should we pay attention to? Lao K's experience is to find features first, and then further extract the information he needs according to different features.

Part 2

Look for features and supplementary information

2.1 SQL text Features

> > the sql code of the exists clause (part1) and the update set section (part2) are basically the same, as shown below

In the part1 section, the result of the scalar quantum query is taken as the target value of the set column, indicating that the maximum number of records returned by the query can be guaranteed to be 1 in terms of business logic.

2.2 characteristics of the execution plan

> > all processes of the execution plan use filter

> > combined with sql text and predicate information, we can see that after filtering the target table TARGET_BIG_TABLE with POST_DATE=:V1, the number of records returned is estimated to be 623K.

2.3 Table statistics for supplementary information collection

> > TARGET_BIG_TABLE is about 2G and SOURCE_SMALL_TABLE is about 3m

> the number of records in the TARGET_BIG_ table is about 250W. Statistics estimate: 623K records are returned after POST_DATE filtering. Note: this is an estimated value, and the actual value will change with the input variable V1.

> > the number of records in the SOURCE_SMALL_ table is about 12W, and the selection of the ad02_acct_no column is relatively high.

2.4 interpretation of the implementation plan for the collection of supplementary information

Note: TARGET_BIG_TABLE is abbreviated to T table SOURCE_SMALL_TABLE is abbreviated to S table

Note: the key to interpretation-understanding the filter in the implementation plan

> > the execution plan is divided into two parts. Step ID2-7 represents the part2 part of the corresponding SQL text, and step ID8-12 corresponds to the part1 part of the SQL text.

> > the process of the part2 part: filter the T table using POST_DATE, iterate the filtered records into the exists subquery (the result set of the T table is now passed into the subquery as variables). During the execution of the subquery, if the previous association conditions are met, iterate into the second layer subquery (select max () part) to match.

> > part1 part of the process: for the result set filtered out by step ID2-7, update one by one, while the target value of update is also obtained by step-by-step iterative query similar to the step-by-step process in step 2-7.

> > in each step, the single table access mode is full table scan.

> > as you can see from the execution plan, the result set is estimated to be 623K (rows column) after filtering the table T table in step 3, and then 1 after filtering the S table.

> > it can be estimated that the table access during execution should be: (Lao K suggests remembering the following formula in this sharing, and call it "access formula" for the time being)

Table access in filtering process = (T table full scan + 623K times × (S table full scan + (0 or 1 time) × (S table full scan)

Table access for modification process = (number of records to be modified × (S table full scan + (0 or 1 time) × (S table full scan))

Total access process = number of table visits in the filtering process + table access in the modification process

Note: the (0 or 1 times) × (S table full scan) here indicates the case of the second layer subquery. If the association conditions are not met during the first layer subquery, then you no longer need to iterate into the second layer, that is, 0 times S table full scan, otherwise it is 1 time S table full scan; therefore, the filtering process requires at least 623K full scan of the S table, and a maximum of 1246K full scan; the modification process is the same.

2.5 execution statistics characteristics

> > the average logical read of a single SQL execution is 355245774 (number of block)

> > the average time for a single execution of SQL is about 2000 seconds.

> > the average number of modified records in SQL is about 0.

Part 3

Think, DBA.

All right, now that the information collection is complete, we will enter the established thinking track of Lao K. In fact, for any question of SQL tunning, Lao K will ask the following three questions, and this is no exception.

3.1 routine thinking of Lao K

> > is this execution plan the optimal execution plan under the current SQL statement? (select optimization target)

> > what kind of execution plan do we want? (determine optimization objectives)

> (achieve optimization goals)

If you can, if you are reading this article, you may also be able to think about the above three questions, or recall whether you have thought about these three questions or what you will think about when you are faced with the problem of SQL tunning.

After synthesizing the early analysis and thinking for a moment, Lao K solemnly gave his own answer:

3.2 Lao K's answer-not the best plan

Lao K first checked the historical execution plan of the SQL, and this is the only one, but this does not mean that it is the optimal execution plan of the SQL.

In the part of the interpretation of the execution plan, Lao K gives the "access formula" of the execution plan. We can know from the formula that although the S table is small, it is actually the key to the whole execution plan. During the whole process, 1246K × 2 visits may be needed at most, so can we improve the access efficiency of the S table? Of course, from the estimates in the execution plan, we can know that the access to the S table returns about 1-2 records (here Lao K has verified separately), indicating that the overall selection is relatively high, and we only have to create an appropriate index. you can greatly improve the access efficiency of the S table.

Let's briefly estimate how efficient it would be when using an index. Originally, the logical reading required for a full scan of the S table is 3M (table size) / 8192 read 375 times. After using the index, it is estimated that the maximum logical reading required for a single access to the S table is: (2 index block accesses + 2 data block accesses) = 4; therefore, the logical read using the index is about 1% of that using the full scan, and it is estimated that the average logical reading performed by the statement after the index is created is about 350w.

So, is this the execution plan that Lao K wants to create a new index and change the full scan of the S table into an index scan?

Obviously not, such an execution plan is just an upgraded version of the original execution plan, and the process is still an iterative process. The time / time consumed in this way will basically follow the amount of data returned in step 3 of the original plan (remember the value of 623K, that's it! It is variable and may change linearly with incoming) changes; so although this implementation plan is expected to be much better than the original implementation plan, it is still not the implementation plan that Lao K wants.

3.3 Lao K's answer-the plan you want

SQL text tells us that, in fact, what SQL does is to use exists to associate the T table and S table, and the execution plan that Lao K wants is to use NL or hash join to join the two tables, rather than using filter iterative way, so as to ensure that only a few scans of T table and S table are needed during the implementation of SQL, thus reducing the logical reading performed by SQL.

3.4 Lao K's answer-how to generate a beautiful execution plan

To answer this question, we first need to think about why SQL is not currently running out of the execution plan we want, because the statistics are incorrect. Unreasonable index design? Or does the column type mismatch?

Neither!

Let's go back to the SQL statement itself and take a look at what makes SQL statements special.

Here, we see the crux of the problem. It is precisely because the outermost T table is associated with the two-tier subqueries that ORACLE cannot automatically rewrite SQL. In the end, it is impossible to use T table and S table for JOIN when generating an execution plan, but can only generate an execution plan using filter.

So, in the end, the result of thinking has come out:

> because of the two-tier subquery, ORACLE cannot use JOIN to associate T table and S table.

> > statements must be rewritten to generate a better execution plan

> > the rewritten statement should not have a similar outermost table involving a second-level subquery

The last point points out the key points of our rewriting.

Part 4

Rewrite it, DBA.

According to the experience of Lao K, the rewriting of SQL sentences usually requires the rewriter to have a good understanding of the business involved in SQL. Only by reconstructing reasonable SQL statements through business characteristics can we not change the business logic of SQL, but also effectively improve the performance of SQL. However, for this SQL, we already know the root cause of its poor execution plan. Lao K believes that we can make use of the characteristics of the database to effectively rewrite without considering the business characteristics.

4.1 rewritten tidbits

Based on the fact that part1 and part2 are basically the same in SQL, Lao K casually rewrote SQL as follows (not for the rewriting key points mentioned earlier, of course).

There are several key points in this rewriting:

> > first extract the filter conditions of the post_date field directly, which is consistent with the original logic.

Based on the fact that part1 and part2 are basically the same, the nvl function is used instead of the original exists clause.

> > if the record can be found in the select section (similar to the original exists clause), update the chq_pay_name field with the result of the query

> > if the record cannot be found in the select section, it will be updated with the original record itself (set chq_pay_name=chq_pay_name). The data of the record will remain unchanged before and after the update.

The above points ensure that the rewritten SQL is consistent with the original SQL logic, but it is worth noting that the original SQL only modifies a few records, but the new SQL modifies 623K records, but most of them are redundant changes.

Let's take a look at the rewritten SQL execution plan:

Similar to the original SQL execution plan, except that the part1 part of the original execution plan is missing.

With the new implementation plan, Lao K asked himself again:

4.2 is it really good to rewrite in this way?

Do you still remember the "access formula" given by Lao K during the parsing of the original execution plan:

Total access process = number of table visits in the filtering process + table access in the modification process

So, under this execution plan, because the redundancy is removed, the formula becomes:

Total access process = the number of table accesses in the filtering process

In fact, it can be understood that SQL can reuse the data generated in the filtering process when modifying the data.

However, for this statement, we know from the execution statistics that the amount of data modified at the end of each statement execution is very small, that is to say, the reduced "table access to the modification process" caused by this rewriting has little impact on the overall execution efficiency.

Is there any harm in rewriting in this way?

Yes! The root cause is that the actual number of records modified by the new SQL mentioned above is 623K:

> > the larger range of holding row locks may cause a large number of other sessions that perform DML operations on the table to be blocked

> > if there is an index on the modified column, the time for index maintenance will be greatly increased, resulting in less efficient execution of the new SQL

To sum up, this rewriting is not appropriate for this SQL statement.

However, if the amount of data modified during the execution of the original SQL is close to 623K, then the benefit of this rewriting method will be much higher, and its disadvantages will no longer exist. This rewriting method is only not suitable for this business environment (only a few records are modified at a time), but it is universal to a certain extent, so Lao K also shares this part with you. The most important thing is the ideas and methods in the process of solving the problem.

4.3 continue to rewrite

Previously we have analyzed the key points of rewriting: the rewritten statement should not have similar situations in which the outermost table involves a second-level subquery; let's rewrite our SQL statement towards this goal.

Add information before rewriting:

After the rewriting idea was brewed in Lao K's mind, Lao K checked the information of the T table and confirmed that there was a primary key constraint in the T table, and the primary key was listed as ACCT_NO and JRNL_NO.

4.4 increase redundancy

> > add a redundant T-table alias d to the accounts clause

> > increase the relationship between table d and table a, in which the jrnl_no column and acct_no column are combined as the primary key of the T table, and the associations of other redundant columns mainly pave the way for further rewriting.

> > Table d is not used to associate with other tables in the entire SQL statement

> since table d and table an are associated with a primary key, we can ensure that for each record in table a, only one record can be found in d and only one record matches the association in the statement.

To sum up, it can be seen that the above increase in redundancy does not change the logical relationship of SQL at all.

4.5 key role shifts:

Based on the redundancy equivalence relation in the first step, all the associations between an and b, c in the exists clause are replaced with those between d and b, c.

4.6 reduce redundancy:

Because the primary key column values of primary keys an and d are equal, the other column values of an and d must be equal, so the associated fields of an and d only need to retain the primary key fields (it is also possible to retain them, and it is more concise to remove them).

The above step-by-step rewriting ensures the consistency of logic, and realizes that the outermost T table no longer involves the association of layer 2 subqueries, so we can infer that the execution plan should be close to what Lao K expected:

> > tables b, d, c are associated with hash join in the execution plan

> > after the completion of join, the result set VW_SQ_2 is formed through a series of SORT/FILTER, where the filter part is the comparison within the result set (that is, the comparison of different columns of the same record), which is very efficient.

> > finally, VW_SQ_2 and the outer T table use NL to join, and the associated field is the primary key field

After the execution plan comes out, let's estimate the "access formula" of the SQL during execution:

Total access process = S table full scan + T table full scan + S table full scan + VW_SQ_2 records * (1 T table primary key index block + 1 T table data block)

4.7 Don't forget "set"

The part2 part of the original statement is similar to that expected by Lao K, and the part1 part of the original statement is the same as the part2 part, so can we simply modify the part1 part to the part2 part? Obviously not! Generally speaking, it is very convenient to rewrite the update statement using the merge into statement. Here, we can take advantage of the consistency between the original statement part1 and part2, and rewrite it as follows:

> > ways to rewrite statements to merge into

> > the source of Merge is the same as that in the rewritten exists clause in the previous step, except that the association with an is extracted to the on part of the merge statement.

> > in this way, very few records that need to be modified will be locked during the execution of SQL after rewriting.

The rewritten execution plan here is similar to the previous update statement, so Lao K will not list and analyze it separately.

Part 5

A final overview

Finally, let's take a look at our rewritten statement and its execution plan:

The statement is as follows:

The final implementation plan:

Final test results:

In the test environment, the rewritten statement was executed twice, with an average of 7.5 records modified each time, taking 4s and 3.4w logical reading. Careful readers may see from the final implementation plan that the full table scan of the T table may be avoided, etc., because of the space and the test environment, Lao K no longer delves into this. After all, Lao K shares the method of SQL tuning, and how to avoid full table scanning and how to analyze to avoid the prediction of SQL execution efficiency after full table scanning, I believe you must have learned You might as well make your own estimate.

Write at the end

At the end of reading, what Lao K shared, we might as well recall it carefully.

> > how to calculate the logical reading of SQL execution by executing a plan during SQL analysis

> > how to improve the execution efficiency of SQL in CASE by adding indexes

> > rewrite the SQL in CASE by using NVL, in which scenario it is appropriate and when it is not.

> > how to boot the database to generate the execution plan we want by adding redundant associations

> > how to use merge syntax to rewrite update statements

Finally, Lao K once again stressed that the most important thing in the process of SQLtunning is to optimize the train of thought and the way of thinking about the problem. I hope smart readers will get inspiration from this sharing.

Supernumerary: after communicating with the application development team to understand the business characteristics of SQL, Lao K rewrote SQL again combined with its business characteristics, and the execution efficiency has been greatly improved again. It can be seen that in the process of SQLtunning, understanding the business is indeed a very important part.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.