Optimization of SUM and GROUP BY statements 07/13 Update SLTechnology News&Howtos

Optimization of SUM and GROUP BY statements

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I. Optimization of sum and GROUP BY statements:

1. The original statement is:

SELECT IID.INVENTORY_ITEM_ID, SUM (IID.AVAILABLE_TO_PROMISE_DIFF), SUM (IID.QUANTITY_ON_HAND_DIFF), SUM (IID.ACCOUNTING_QUANTITY_DIFF)

FROM BOSENT.INVENTORY_ITEM_DETAIL IID

WHERE ((IID.INVENTORY_ITEM_ID ='?'))

GROUP BY IID.INVENTORY_ITEM_ID

The reason for using GROUP BY in the SQL statement is that the field INVENTORY_ITEM_ID is in the select clause.

Analyze the above SQL statement, INVENTORY_ITEM_ID as a condition in the WHERE clause, its value is known and does not need to be queried from the SQL, so you can remove this field from the select clause and remove the GROUP BY clause. SQL can be rewritten as:

SELECT SUM (IID.AVAILABLE_TO_PROMISE_DIFF), SUM (IID.QUANTITY_ON_HAND_DIFF), SUM (IID.ACCOUNTING_QUANTITY_DIFF)

FROM BOSENT.INVENTORY_ITEM_DETAIL IID

WHERE ((IID.INVENTORY_ITEM_ID ='?'))

two。 Carry out the plan

(1) optimize the execution plan of the previous SQL statement (use 160001 instead of? In order to see the execution plan)

| | 0 | SELECT STATEMENT | | 1 | 19 | 8008 (1) | 00:01:37 |

| | 1 | SORT GROUP BY NOSORT | | 1 | 19 | 8008 (1) | 00:01:37 |

* | 2 | TABLE ACCESS FULL | INVENTORY_ITEM_DETAIL | 670K | 12m | 8008 (1) | 00:01:37 |

-* *

Predicate Information (identified by operation id):

2-filter ("IID". "INVENTORY_ITEM_ID" = '160001')

(2) optimize the execution plan of the pre-SQL statement

| | 0 | SELECT STATEMENT | | 1 | 19 | 8008 (1) | 00:01:37 |

| | 1 | SORT AGGREGATE | | 1 | 19 |

| | * 2 | TABLE ACCESS FULL | INVENTORY_ITEM_DETAIL | 670K | 12m | 8008 (1) | 00:01:37 |

Predicate Information (identified by operation id):

2-filter ("IID". "INVENTORY_ITEM_ID" = '160001')

3. Execution plan analysis

(1) SORT (AGGREGATE)

SORT (AGGREGATE) refers to the execution plan displayed when using statistical functions to operate on all data objects without GROUP BY. General sorting is not performed when using statistical functions such as SUM, COUNT, MIN, MAX, AVG, and so on. Instead, each row of data is summed up while reading the data in the table, and AVG or COUNT is also calculated repeatedly for each row of data read.

SORT AGGREGATE as one of the option of sort is quite special, it does not do sort,SORT AGGREGATE to act on all data set.

So how do MIN and MAX calculate? The method is basically the same as that described above. MIN executes by recording the first value read at the beginning, then comparing the value with the next read value, and replacing it if it is less than that value. If the index can be used when performing this statistical operation, very good results can be obtained. Although a full index scan is shown in the execution plan, in fact, only the first index block is read, and then the scan does not continue (because the index is sorted, so the minimum value of the index column value must be on the first block of the index). MAX, on the other hand, reads only the last index block. To obtain such an execution plan, WHERE and GROUP BY cannot be used, of course, the column to be counted must be at the forefront of the index (or, in the case of a combined index, it must be at the forefront), and no additional operational requirements can be added to SELECT-List.

(2) SORT (GROUP BY)

This operation is the operation of aggregating the rows of data into different groups, that is, the related operations are carried out according to the GROUP BY used in the query statement, and can only be sorted for grouping, so the larger the amount of data needed for the grouping, the higher the cost.

If you want to divide the Hai Meter into thousands of groups through GROUP BY, you can clearly feel the cost of this operation when performing GROUP BY on a very large table. When the amount of data to be sorted exceeds a certain amount, the cost becomes very high, and one way to solve this problem is to use HASH (GROUP BY).

(3) SORT (UNIQUE)

This operation refers to the process of turning the output of a query statement into a unique collection. There are two situations in which this sort occurs, one is the use of "DISTINCT", and the other is that the subquery provides its execution results to the main query in the provider role.

SELECT order_id,order_date

FROM orders

WHERE order_id in (SELECT order_id

FROM order_item

WHERE item_id =: b1

AND order_qty > 100)

Since the results of the main query must exist in the subquery, where the subquery as the "M" collection must be transformed into the "1" collection that does not allow repeating elements, the SORT (UNIQUE) operation is performed. If the subquery is executed after the main query, the sort execution plan is not displayed, and the FILTER is displayed.

4. Another optimization method without modifying programs and SQL statements

Table INVENTORY_ITEM_DETAIL already has an index of a single field of INVENTORY_ITEM_ID. This table has 26 fields, and SUM counts only three of them. By creating a composite index below, performance has been greatly improved.

CREATE INDEX

I_INVENTORY_ITEM_DETAIL_XX

INVENTORY_ITEM_DETAIL

(

INVENTORY_ITEM_ID,AVAILABLE_TO_PROMISE_DIFF,QUANTITY_ON_HAND_DIFF,ACCOUNTING_QUANTITY_DIFF

);

After the verification test, after creating the index, the response time of withdrawal transaction decreased from 4.5s to 2.0s, and the deposit transaction decreased from 2.7s to 1.6s. The effect is obvious. The number of records in table INVENTORY_ITEM_DETAIL in the stress test environment is 2.13 million. The test machine is a virtual machine)

5. The performance is better when the third and fourth optimization methods are used at the same time.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.