Sorting, grouping, and collection operations 10/23 Update SLTechnology News&Howtos

Sorting, grouping, and collection operations

2025-10-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Sort operation

Sorting is one of the most basic operations undertaken by computers, especially in the field of database processing, oracle is no exception. Operations that may require oracle to sort data include the following

(1) create an index

(2) grouping or aggregating data through group by,unique or distinct keywords

(3) due to the use of the order by clause, the data is returned in the arranged order.

(4) join tables or result sets by sorting and merging

(5) use collections to operate union,intersect or minus

(6) execute specific subqueries

Queries may require considerable resources, as specified below:

CPU is always consumed. The number of CPU required is proportional to the size of the result set that needs to be sorted.

Oracle allocates a block of memory for sorting. This memory comes from the program global area (PGA). The total amount of PGA memory available usually depends on the value of MEMORY_TARGET or PGA_AGGREGATE_TARGET

If the memory area is insufficient to complete the sorting, oracle allocates one or more temporary periods to the temporary tablespace. This is called disk sorting. Disk sorting adds additional overhead, including the IO consumption of allocating space during temporary periods and writing data to temporary tablespaces and disk readout.

1.1 optimal, one or more sorting

The amount of memory available for sorting is the most important factor that determines sorting performance. Performance will be optimal if the amount of memory used for sorting is sufficient for sorting to be done in memory, and oracle actually calls this type of sorting optimal sorting.

If there is not enough memory, oracle must read and write the temporary period when performing a sort operation. In a sort, orale only needs to write (and then read back) a short. In multiple sorting, oracle needs to write and read back multiple sort segments. The more sorting times you need. The more IO is included in the sort, the worse the sorting performance.

The IO required for sorting increases sharply with the increase of sorting times, and sorting times eventually become the main factor affecting the performance of sql.

1.2 measure sorting activity

Now let's see how sorting activities can be measured.

The optimizer calculates the amount of memory it thinks it will need, and you can see this by checking the tempSpc field of the dbms_xplan output (even before you execute sql)

SQL > explain plan for select * from customers order by cust_last_name,cust_first_name,cust_year_of_birth

Explained.

SQL > select * from table (dbms_xplan.display ())

PLAN_TABLE_OUTPUT

Plan hash value: 2792773903

| | 0 | SELECT STATEMENT | | 55500 | 9810K | | 2609 (1) | 00:00:02 |

| | 1 | SORT ORDER BY | | 55500 | 9810K | 12m | 2609 (1) | 00:00:02 |

| | 2 | TABLE ACCESS FULL | CUSTOMERS | 55500 | 9810K | | 405K (1) | 00:00:01 |

9 rows selected.

The value generated by DBMS_XPLAN is an estimate, but it gives you a general idea of the memory you need.

The V$SQL_WORKAREA view contains detailed statistics about memory usage and sorting behavior that have occurred in the database.

WITH sql_workarea AS

(SELECT sql_id | |'-'| | child_number sql_id_child

Operation_type operation

Last_execution last_exec

Round (active_time / 1000000, 2) seconds

Optimal_executions | |'/'| | onepass_executions | |'/'| |

Multipasses_executions olm

Substr (sql_text, 1,155) sql_text

Rank () over (ORDER BY active_time DESC) ranking

FROM v$sql_workarea

JOIN v$sql

USING (sql_id, child_number))

SELECT sql_id_child "SQL_ID-CHILD"

Seconds

Operation

Last_exec

Olm "Oswap 1mop M"

Sql_text

FROM sql_workarea

WHERE ranking oradebug event 10032 trace name context forever,level 12

-Sort Parameters

Sort_area_size 12691456

Sort_area_retained_size 12691456

Sort_multiblock_read_count 1

Max intermediate merge width 773

* 2014-12-15 13 13 14-34-36. 765

Sorcls: sorp 0x7fd8df3ea1c8

-Sort Statistics

Input records 55500

Output records 55500

Total number of comparisons performed 656484

Comparisons performed by in-memory sort 656484

Total amount of memory used 12691456

Uses version 2 sort

-End of Sort Statistics

1.4 use indexes to circumvent sorting

If indexes exist on some or all of the columns in the order by clause, it is possible for oracle to use indexes to get records in the required order, thus avoiding sorting operations.

If the index appears on the same column as the column in the order by clause, oracle can read the records directly from the index in the order in which they are sorted. However, reading records in the order of keystrokes requires a full scan of index leaf blocks one by one. Although fast full scan is much more efficient than regular full index scan, fast full scan cannot return records in indexed order. Therefore, it cannot be used to avoid sorting operations.

SQL > select * from customers order by cust_last_name,cust_first_name,cust_year_of_birth

55500 rows selected.

Execution Plan

Plan hash value: 2792773903

| | 0 | SELECT STATEMENT | | 55500 | 9810K | | 2609 (1) | 00:00:02 |

| | 1 | SORT ORDER BY | | 55500 | 9810K | 12m | 2609 (1) | 00:00:02 |

| | 2 | TABLE ACCESS FULL | CUSTOMERS | 55500 | 9810K | | 405K (1) | 00:00:01 |

Statistics

1 recursive calls

0 db block gets

1456 consistent gets

1454 physical reads

0 redo size

6366832 bytes sent via SQL*Net to client

41213 bytes received via SQL*Net from client

3701 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

55500 rows processed

However, if we create an index on the column of order by and use the first_rows hint (or index hint), oracle will use the index instead of sorting:

SQL > create index cust_member_i on customers (cust_last_name,cust_first_name,cust_year_of_birth)

Index created.

SQL > select / * + index (customers cust_member_i) * / * from customers order by cust_last_name,cust_first_name,cust_year_of_birth

55500 rows selected.

Execution Plan

Plan hash value: 3023887059

| | 0 | SELECT STATEMENT | | 55500 | 9810K | 20550 (1) | 00:00:15 |

| | 1 | TABLE ACCESS BY INDEX ROWID | CUSTOMERS | 55500 | 9810K | 20550 (1) | 00:00:15 |

| | 2 | INDEX FULL SCAN | CUST_MEMBER_I | 55500 | | 225 (0) | 00:00:01 |

Statistics

1 recursive calls

0 db block gets

26557 consistent gets

1701 physical reads

0 redo size

11063057 bytes sent via SQL*Net to client

41213 bytes received via SQL*Net from client

3701 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

55500 rows processed

Although sorting is no longer required by using an index, the cost of reading both the index and the table block, which has been read sequentially by the block, is much less expensive than reading the table block using a full table scan alone. In general, this means that using indexes to avoid sorting can actually lead to worse performance. However, it is faster to retrieve the first row of records using the index. Because as soon as the required record is retrieved, it returns immediately. By contrast, the sorting method requires that all records must be retrieved and sorted before any record is returned. Therefore, when the optimizer targets first_rows_n, the optimizer tends to use indexes, while when the target is all_rows, full table scans are used.

2 grouping and aggregation

The aggregate operation returns data, average, summation, maximum and minimum values based on multiple rows of records. The grouping operation allows us to return a collection of recordsets that share the same group by value.

2.1 aggregation operation

Aggregate operations (sum and average) must process each row of input data. Therefore, they are usually associated with full table scans:

SQL > select sum (quantity_sold) from sales

Execution Plan

Plan hash value: 3519235612

| | 0 | SELECT STATEMENT | | 1 | 3 | 525 (2) | 00:00:01 |

| | 1 | SORT AGGREGATE | | 1 | 3 |

| | 2 | PARTITION RANGE ALL | | 918K | 2691K | 525 (2) | 00:00:01 | 1 | 28 |

| | 3 | TABLE ACCESS FULL | SALES | 918K | 2691K | 525 (2) | 00:00:01 | 1 | 28 | |

If there is an index on the aggregated column, a fast full scan of that index is usually more efficient:

2.2 maximum and minimum

Unlike most other aggregation operations, max and min operations do not need to read every row of records if there is an index on the relevant column. If there is a B-tree index, we can determine the maximum or minimum value by checking the first or last index entry, which requires only 3-5 logical reads.

SQL > select max (amount_sold) from sales; (no index)

Execution Plan

Plan hash value: 3519235612

| | 0 | SELECT STATEMENT | | 1 | 5 | 525 (2) | 00:00:01 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | 2 | PARTITION RANGE ALL | | 918K | 4486K | 525K (2) | 00:00:01 | 1 | 28 |

| | 3 | TABLE ACCESS FULL | SALES | 918K | 4486K | 525K (2) | 00:00:01 | 1 | 28 | |

Statistics

0 recursive calls

0 db block gets

1635 consistent gets

1619 physical reads

0 redo size

536 bytes sent via SQL*Net to client

524 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

Where there is an index:

SQL > select max (amount_sold) from sales

Execution Plan

Plan hash value: 781264156

| | 0 | SELECT STATEMENT | | 1 | 5 | 3 (0) | 00:00:01 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | 2 | INDEX FULL SCAN (MIN/MAX) | AMOUNT_SOLD_IDX | 1 | 5 | 3 (0) | 00:00:01 |

Statistics

1 recursive calls

0 db block gets

3 consistent gets

2 physical reads

0 redo size

536 bytes sent via SQL*Net to client

524 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

However, if we want to find the maximum and minimum values at the same time, oracle needs to spend a lot of logical reading, and the full scan index will be more expensive, and oracle chooses full table scan.

SQL > select max (amount_sold), min (amount_sold) from sales

Execution Plan

Plan hash value: 3519235612

| | 0 | SELECT STATEMENT | | 1 | 5 | 525 (2) | 00:00:01 |

| | 1 | SORT AGGREGATE | | 1 | 5 |

| | 2 | PARTITION RANGE ALL | | 918K | 4486K | 525K (2) | 00:00:01 | 1 | 28 |

| | 3 | TABLE ACCESS FULL | SALES | 918K | 4486K | 525K (2) | 00:00:01 | 1 | 28 | |

Statistics

1 recursive calls

0 db block gets

1635 consistent gets

1619 physical reads

0 redo size

618 bytes sent via SQL*Net to client

524 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

It is actually a better way to submit MAX and MIN queries separately and merge the results together:

SQL > select max_sold,min_sold from (select max (amount_sold) max_sold from sales) maxt, (select min (amount_sold) min_sold from sales) mint

Execution Plan

Plan hash value: 3650580342

| | 0 | SELECT STATEMENT | | 1 | 26 | 6 (0) | 00:00:01 |

| | 1 | NESTED LOOPS | | 1 | 26 | 6 (0) | 00:00:01 |

| | 2 | VIEW | | 1 | 13 | 3 (0) | 00:00:01 |

| | 3 | SORT AGGREGATE | | 1 | 5 |

| | 4 | INDEX FULL SCAN (MIN/MAX) | AMOUNT_SOLD_IDX | 1 | 5 | 3 (0) | 00:00:01 |

| | 5 | VIEW | | 1 | 13 | 3 (0) | 00:00:01 |

| | 6 | SORT AGGREGATE | | 1 | 5 |

| | 7 | INDEX FULL SCAN (MIN/MAX) | AMOUNT_SOLD_IDX | 1 | 5 | 3 (0) | 00:00:01 |

Statistics

0 recursive calls

0 db block gets

6 consistent gets

0 physical reads

0 redo size

602 bytes sent via SQL*Net to client

524 bytes received via SQL*Net from client

2 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

1 rows processed

2.3 "Top N" query

For the top 10 best-selling items in dollar terms on the sales table, the following query must be wrong:

SQL > select * from sales where rownum select / * top10 subquery * / * from (select cust_id,prod_id,time_id,amount_sold from sales order by amount_sold desc) where rownum

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.