How to increase Distinct to improve query efficiency 07/19 Update SLTechnology News&Howtos

How to increase Distinct to improve query efficiency

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail how to increase Distinct to improve query efficiency. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

Only by adding the DISTINCT keyword, Oracle necessarily needs to sort all the fields that follow. In the past, it is often found that because developers do not understand SQL very well, adding DISTINCT in front of more than 20 fields in the SELECT list makes it almost impossible to execute the query, or even produce ORA-7445 errors. So the impact of DISTINCT on performance has been emphasized to developers. The following editor to explain how to increase Distinct to improve query efficiency?

How to increase Distinct to improve query efficiency

The first thing that comes to mind is that it is possible that DISTINCT is in the subquery. Due to the addition of DISTINCT, the result set of the first step is reduced, resulting in improved query performance. As soon as you look at the SQL, you find that DISTINCT is actually in the outermost layer of the query.

Because the original SQL is too long and involves too many tables, it is difficult to say clearly. Here we simulate an example that cannot see the obvious difference in execution time between the two due to the limitations of the amount of data and the complexity of SQL. This paper illustrates the problem from the logical reading comparison of the two cases.

First, establish a simulation environment:

SQL > CREATE TABLE T1 AS SELECT * FROM DBA_OBJECTS

2 WHERE OWNER = 'SYS'

3 AND OBJECT_TYPE NOT LIKE'% BODY'

4 AND OBJECT_TYPE NOT LIKE 'JAVA%'

Table created.

SQL > CREATE TABLE T2 AS SELECT * FROM DBA_SEGMENTS WHERE OWNER = 'SYS'

Table created.

SQL > CREATE TABLE T3 AS SELECT * FROM DBA_INDEXES WHERE OWNER = 'SYS'

Table created.

SQL > ALTER TABLE T1 ADD CONSTRAINT PK_T1 PRIMARY KEY (OBJECT_NAME)

Table altered.

SQL > CREATE INDEX IND_T2_SEGNAME ON T2 (SEGMENT_NAME)

Index created.

SQL > CREATE INDEX IND_T3_TABNAME ON T3 (TABLE_NAME)

Index created.

SQL > EXEC DBMS_STATS.GATHER_TABLE_STATS (USER,'T1, METHOD_OPT = > 'FOR ALL INDEXED COLUMNS SIZE 100, CASCADE = > TRUE)

PL/SQL procedure successfully completed.

SQL > EXEC DBMS_STATS.GATHER_TABLE_STATS (USER,'T2, METHOD_OPT = > 'FOR ALL INDEXED COLUMNS SIZE 100, CASCADE = > TRUE)

PL/SQL procedure successfully completed.

SQL > EXEC DBMS_STATS.GATHER_TABLE_STATS (USER,'T3, METHOD_OPT = > 'FOR ALL INDEXED COLUMNS SIZE 100, CASCADE = > TRUE)

PL/SQL procedure successfully completed.

Still using the structure in the above example, take a look at the difference between the original SQL and the addition of DISTINCT:

How to increase Distinct to improve query efficiency

SQL > SET AUTOT TRACE

SQL > SELECT T1.OBJECT_NAME, T1.OBJECT_TYPE, T2.TABLESPACE_NAME

2 FROM T1, T2

3 WHERE T1.OBJECT_NAME = T2.SEGMENT_NAME

4 AND T1.OBJECT_NAME IN

5 (

6 SELECT INDEX_NAME FROM T3

7 WHERE T3.TABLESPACE_NAME = T2.TABLESPACE_NAME

311 rows selected.

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=668 Bytes=62124)

1 0 HASH JOIN (SEMI) (Cost=12 Card=668 Bytes=62124)

2 1 HASH JOIN (Cost=9 Card=668 Bytes=39412)

3 2 TABLE ACCESS (FULL) OF 'T2' (Cost=2 Card=668 Bytes=21376)

4 2 TABLE ACCESS (FULL) OF 'T1' (Cost=6 Card=3806 Bytes=102762)

5 1 TABLE ACCESS (FULL) OF 'T3' (Cost=2 Card=340 Bytes=11560)

Statistics

0 recursive calls

0 db block gets

93 consistent gets

0 physical reads

0 redo size

8843 bytes sent via SQL*Net to client

723 bytes received via SQL*Net from client

22 SQL*Net roundtrips to/from client

0 sorts (memory)

0 sorts (disk)

311 rows processed

SQL > SELECT DISTINCT T1.OBJECT_NAME, T1.OBJECT_TYPE, T2.TABLESPACE_NAME

2 FROM T1, T2

3 WHERE T1.OBJECT_NAME = T2.SEGMENT_NAME

4 AND T1.OBJECT_NAME IN

5 (

6 SELECT INDEX_NAME FROM T3

7 WHERE T3.TABLESPACE_NAME = T2.TABLESPACE_NAME

311 rows selected.

Execution Plan

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=16 Card=1 Bytes=93)

1 0 SORT (UNIQUE) (Cost=16 Card=1 Bytes=93)

2 1 HASH JOIN (Cost=12 Card=1 Bytes=93)

3 2 HASH JOIN (Cost=5 Card=668 Bytes=44088)

4 3 TABLE ACCESS (FULL) OF 'T3' (Cost=2 Card=340 Bytes=11560)

5 3 TABLE ACCESS (FULL) OF 'T2' (Cost=2 Card=668 Bytes=21376)

62 TABLE ACCESS (FULL) OF 'T1' (Cost=6 Card=3806 Bytes=102762)

Statistics

0 recursive calls

0 db block gets

72 consistent gets

0 physical reads

0 redo size

8843 bytes sent via SQL*Net to client

723 bytes received via SQL*Net from client

22 SQL*Net roundtrips to/from client

1 sorts (memory)

0 sorts (disk)

311 rows processed

As can be seen from the statistics, the logical reading of statements with DISTINCT is higher than that without DISTINCT. Why this happens, we should start with the implementation of the plan.

Without DISTINCT, because of the query using in subquery, Oracle uses HASH JOIN SEMI for the second join, which is more expensive than ordinary HASH JOIN.

After adding DISTINCT, Oracle knows that sorting and deduplicating operations must be done eventually, so it chooses HASH JOIN as the connection method. This is why logical reading has been reduced with the addition of DISTINCT. But at the same time, with the addition of DISTINCT, the statement adds a sort operation, which is not available without DISTINCT.

When the join table has a large amount of data, but the final result of SELECT is not much, and the number of SELECT columns is not many, after adding DISTINCT, the cost of this sort is less than the cost of SEMI JOIN join. This is the real reason why adding an DISTINCT operation makes queries more efficient, which seems impossible.

Finally, it should be noted that the purpose of this article is to show that nothing is immutable during optimization, almost anything can happen, and do not be limited by some so-called dead rules. It's okay to understand that. This article is not intended to provide a way to optimize SQL. Strictly speaking, adding DISTINCT and not adding DISTINCT are two completely different SQL statements. Although the two are equivalent in this example, it is the result of a common limitation of the table structure, constraints, and the data itself. In another environment, the results of the two SQL may be very different, so don't try to use the example in this article as a way to optimize.

This is the end of the article on "how to increase Distinct to improve query efficiency". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.