How to judge self-correlation query by rewriting scope of analysis function 07/13 Update SLTechnology News&Howtos

How to judge self-correlation query by rewriting scope of analysis function

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

The editor will share with you how to use the rewriting scope of the analysis function to judge the self-correlation query. I hope you will get something after reading this article. Let's discuss it together.

Preface

Recently, we have encountered a problem that the inefficiency of a single SQL leads to a high overall running load of the database.

Analyzing and locating the main load of the database is the process caused by this statement is relatively simple, through the AWR report can be relatively easy to complete the location, I will not repeat it here.

Now take a look directly at the SQL statement that caused the performance problem. The corresponding SQL REPORT statistics are as follows:

Stat NameStatement TotalPer Execution% Snap TotalElapsed Time (ms) 363, 741363740.788. 42 CPU Time (ms) 362, 770362770.008. 81Executions1Buffer Gets756756.000.00Disk Reads00.000.00Parse Calls11.000.01Rows50,82550825.00User Imax O Wait Time (ms) 0

Cluster Wait Time (ms) 0Application Wait Time (ms) 0Concurrency Wait Time (ms) 0Invalidations0Version Count1Sharable Mem (KB) 28

From the performance index of SQL, it takes about 6 minutes for a single execution, processes more than 50, 000 records, and the logicality is only 756, which mainly consumes time on CPU. There is a doubt here, however, that logical reading is so low and CPU time is so high, so where is all this CPU consumed? Of course, the answer to this question cannot be found in the statistics of SQL, so let's take a look at the implementation plan of SQL:

IdOperationNameRowsBytesTempSpcCost (% CPU) Time0SELECT STATEMENT

1226 1SORT ORDER BY493793375K3888K1226 (2) 00:00:052HASH JOIN ANTI493793375K2272K401 (3) 00:00:023TABLE ACCESS FULLT_NUM493791687K88 (4) 00:00:014TABLE ACCESS FULLT_NUM493791687K88 (4) 00:00:01

From the point of view of the execution plan, Oracle chose both tables of HASH JOIN ANTI,JOIN to be T_NUM, and both used full table scans, and no index was selected. Only by implementing the plan, we can only get the above conclusion, as to why the index is not selected and why it takes too long to execute, further analysis is needed.

After a simple decryption and rewriting of the original SQL, the SQL text is similar to the following:

SELECT BEGIN, END, ROWID, LENGTH (BEGIN) FROM T_NUM AWHERE NOT EXISTS (SELECT 1FROM T_NUM BWHERE B.BEGIN = A.ENDAND B.ROWID! = A.ROWIDAND LENGTH (B.BEGIN) = LENGTH (A.BEGIN))

If you analyze the SQL statement, you will find that this is a self-associated statement. Under the premise that the length of the BEGIN field is equal, you want to find out which records do not have a BEGIN smaller than the current record BEGIN and a larger END than the current record END.

To put it simply, the records in the table represent the range from the beginning of the BEGIN to the end of the END, so the result you want to get now is to find out which ranges are not included in the range. It is important to note that for the current SQL logic, if there are two records with exactly the same scope, both records will eventually be discarded.

The logic of the business is not particularly complex, but to compare a record with other records, most of the method is self-correlation, and in this self-association, there are both greater than or equal to and less than equal to, and the only association condition equal to comes from the comparison of the length of the range segment BEGIN.

Obviously, if it is a comparison of the range segment itself, the selection is generally good, but if you just compare its length, it is undoubtedly easy to produce a lot of repetition, such as in this example:

SQL > select length (begin), count (*) from t_num group by length (begin) order by 2 desc; LENGTH (BEGIN) COUNT (*)-12 2209611 901113 899914 818616 499 458 417 27

A large number of duplicate data appear in the range of length 11 to 14. In this case, the only equivalent judgment condition LENGTH (BEGIN) is very inefficient. According to this equivalent condition, a record will be associated with nearly 10,000 records and set to more than 20,000 records. Obviously, a large number of practices are consumed in the inefficient connection process.

Take a look at the specific SQL statement, you will find that there is almost no way to build an index, because the selection of LENGTH (BEGIN) is very look-up, while other conditions are unequal queries, the selection will not be good, even if the index is established and the selection index is enforced, the efficiency will not be good.

So if you want to continue to optimize the SQL, there is only one way, and that is to rewrite the SQL. For the self-association query, the best rewriting method is to use the analysis function, whose strong row-level processing ability can obtain the relationship between a record and other records in a scan process, thus eliminating the necessity of self-correlation.

The result of SQL rewriting is as follows:

SELECT BEGIN, OLDEND END, LENGTH (BEGIN) FROM (SELECT BEGIN, OLDEND, END, LENGTH (BEGIN), COUNT (*) OVER (PARTITION BY LENGTH (BEGIN), BEGIN, OLDEND) CN,ROW_NUMBER () OVER (PARTITION BY LENGTH (BEGIN), END ORDER BY BEGIN) RNFROM (SELECT BEGIN, END OLDEND, MAX (END) OVER (PARTITION BY LENGTH (BEGIN) ORDER BY BEGIN, END DESC) ENDFROM T_NUM) WHERE RN = 1AND CN

To put it simply, the inner analysis function MAX is used to determine the value of the maximum END corresponding to each range according to the conditions of BEGIN from small to large and END from large to small. The outer two analysis functions, COUNT, are used to remove completely duplicated records, while ROW_NUMBER is used to obtain the largest range of records (that is, not covered by the scope of other records).

After rewriting, the SQL avoids self-correlation, so there is no performance risk of excessive repetition of the association condition. In a simulated environment, the performance comparison is as follows:

SQL > SELECT BEGIN, END, ROWID, LENGTH (BEGIN) 2 FROM T_NUM A3 WHERE NOT EXISTS (4 SELECT 15 FROM T_NUM B6 WHERE B.BEGIN = A.END8 AND B.ROWID! = A.ROWID9 AND LENGTH (B.BEGIN) = LENGTH (A.BEGIN)) 10; 48344 rows selected. Elapsed: 0000 Elapsed 57.68 Execution Plan-Plan hash value: 2540751655-| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |-| 0 | SELECT STATEMENT | | 48454 | | 1703K | | 275K | 00:00:04 | * 1 | HASH JOIN ANTI | | 48454 | 17024K | 14024K | 275 (1) | 00:00:04 | | 2 | TABLE ACCESS FULL | T_NUM | 48454 | 851K | 68 (0) | 00:00:01 | 3 | TABLE ACCESS FULL | T_NUM | 48454 | 851K | 68 (0) | 00:00:01 |-- Predicate Information (identified by operation id) | ):-1-access (LENGTH (TO_CHAR ("B". "BEGIN")) = LENGTH (TO_CHAR ("A". "BEGIN")) filter ("B". "BEGIN" = "A". "END" AND "B" .ROWID "A" .ROWID) Statistics-0 recursive calls0 db block gets404 consistent gets0 physical reads0 redo size2315794 bytes sent via SQL*Net to client35966 bytes received via SQL*Net from client3224 SQL*Net roundtrips to/from client0 sorts (memory) 0 sorts (disk) 48344 rows processed SQL > SELECT BEGIN OLDEND END, LENGTH (BEGIN) 2 FROM (3 SELECT BEGIN, OLDEND, END, LENGTH (BEGIN), COUNT (*) OVER (PARTITION BY LENGTH (BEGIN), BEGIN, OLDEND) CN,4 ROW_NUMBER () OVER (PARTITION BY LENGTH (BEGIN), END ORDER BY BEGIN) RN5 FROM6 (7 SELECT BEGIN, END OLDEND, MAX (END) OVER (PARTITION BY LENGTH (BEGIN) ORDER BY BEGIN, END DESC) 10) 11 WHERE RN = 112AND CN = 1 48344 rows selected. Elapsed: 00.72 Execution Plan-Plan hash value: 1546715670-| Id | Operation | Name | Rows | Bytes | Cost (% CPU) | Time |-| 0 | SELECT STATEMENT | | 48454 | 2460K | | 00:00:10 | * 1 | VIEW | | 48454 | 2460K | 00:00:10 | | * 2 | WINDOW SORT PUSHED RANK | | 48454 | 1845K | 2480K | 800 (1) | 00:00:10 | 3 | WINDOW BUFFER | 48454 | 1845K | 48454 | 1845K | 00:00:10 | 4 | VIEW | 48454 | 1845K | 311 (1) | 00:00:04 | 5 | WINDOW SORT | 48454 | 662K | 1152K | | 00:00:04 | | 6 | TABLE ACCESS FULL | T_NUM | 48454 | 662K | | 68 (0) | 00:00:01 |-- Predicate Information (identified by operation id):-1-filter ("RN" = 1 AND "CN" = 1) 2-filter (ROW_NUMBER () OVER (PARTITION BY LENGTH (TO_CHAR ("BEGIN") "END" ORDER BY "BEGIN")

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.