[exploration] two methods for querying and deleting duplicate records and their performance comparison 04/16 Update SLTechnology News&Howtos

[exploration] two methods for querying and deleting duplicate records and their performance comparison

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Here I give two methods to query and delete duplicate records, one is done with the assistance of rowid, and the other is done with the help of the power of analysis function.

The execution efficiency of these two methods is more efficient than other methods. Even so, there is a fundamental performance difference between the two methods, and I will show you the essential differences between the two methods through experiments.

1. Create an experimental table and initialize several sample data

Sec@ora10g > create table t (x number, y varchar2 (10))

Sec@ora10g > insert into t values (1, 'sec')

Sec@ora10g > insert into t values (2, 'Andy01')

Sec@ora10g > insert into t values (2, 'Andy02')

Sec@ora10g > insert into t values (3, 'Anna')

Sec@ora10g > insert into t values (4, 'Anna')

Sec@ora10g > insert into t values (5, 'John')

Sec@ora10g > commit

Sec@secooler > analyze table t compute statistics for table for all indexes for all indexed columns

Table analyzed.

Sec@ora10g > select * from t

X Y

1 sec

2 Andy01

2 Andy02

3 Anna

4 Anna

5 John

6 rows selected.

two。 The first method to use rowid to assist query and delete duplicate records

1) query duplicate records

Sec@ora10g > SELECT *

2 FROM t t1

3 WHERE t1.ROWID (SELECT MIN (t2.ROWID))

4 FROM t t2

5 WHERE t1.x = t2.x)

6 /

X Y

2 Andy02

BTW: if you want to query content that repeats both the x and y fields, you can add another "AND t1.y = t2.y" condition to the subquery above.

2) Delete duplicate records

You can simply rewrite the above query statement into a delete statement to complete the deletion task.

Sec@ora10g > DELETE FROM t T1

2 WHERE t1.ROWID (SELECT MIN (t2.ROWID))

3 FROM t t2

4 WHERE t1.x = t2.x)

5 /

1 row deleted.

As you can see, the duplicate content of the x field has been deleted.

Sec@ora10g > select * from t

X Y

1 sec

2 Andy01

3 Anna

4 Anna

5 John

3. The second method of using analysis function to assist query and delete duplicate records

1) use the analysis function to quickly locate the location of duplicate records. Rows with rn values greater than 1 in the following results represent duplicate rows.

Sec@ora10g > SELECT t1.x

2 t1.y

3 ROW_NUMBER () OVER (PARTITION BY t1.x ORDER BY t1.ROWID) rn

4 FROM t t1

5 /

X Y RN

1 sec 1

2 Andy01 1

2 Andy02 2

3 Anna 1

4 Anna 1

5 John 1

6 rows selected.

2) further use the above rn result as an auxiliary condition to get repeated records.

Sec@ora10g > SELECT t2.x, t2.y

2 FROM (SELECT t1.x

3 t1.y

4 ROW_NUMBER () OVER (PARTITION BY t1.x ORDER BY t1.ROWID) rn

5 FROM t T1) T2

6 WHERE t2.rn > 1

7 /

X Y

2 Andy02

3) deletion method

(1) the first method is to use rowid to construct delete statements to complete deletion, which is inefficient.

Sec@ora10g > DELETE FROM t WHERE ROWID IN (

2 SELECT rowid

3 FROM (SELECT t1.x

4 t1.y

5 ROW_NUMBER () OVER (PARTITION BY t1.x ORDER BY t1.ROWID) rn

6 FROM t T1) T2

7 WHERE t2.rn > 1

9 /

1 row deleted.

(2) the second method can be accomplished by constructing intermediate table T1, which is a very efficient deduplication method and is recommended to be used in a database environment with massive data.

Sec@ora10g > create table T1 as

2 SELECT t2.x, t2.y

3 FROM (SELECT t1.x

4 t1.y

5 ROW_NUMBER () OVER (PARTITION BY t1.x ORDER BY t1.ROWID) rn

6 FROM t T1) T2

7 WHERE t2.rn = 1

8 /

Table created.

Sec@ora10g > drop table t

Table dropped.

Sec@ora10g > alter table T1 rename to t

Table altered.

Sec@ora10g > select * from t

X Y

1 sec

2 Andy01

3 Anna

4 Anna

5 John

4. By comparing the execution plans of the two query methods, we can get the source of the inherent performance gap between the two methods.

1) the execution plan for the first auxiliary query using rowid is as follows

Sec@ora10g > set autot trace exp

Sec@ora10g > SELECT *

2 FROM t t1

3 WHERE t1.ROWID (SELECT MIN (t2.ROWID))

4 FROM t t2

5 WHERE t1.x = t2.x)

6 /

Execution Plan

Plan hash value: 3924487551

| | 0 | SELECT STATEMENT | | 5 | 55 | 6 (0) | 00:00:01 |

| | * 1 | FILTER | | |

| | 2 | TABLE ACCESS FULL | T | 6 | 66 | 3 (0) | 00:00:01 |

| | 3 | SORT AGGREGATE | | 1 | 11 |

| | * 4 | TABLE ACCESS FULL | T | 1 | 11 | 3 (0) | 00:00:01 |

Predicate Information (identified by operation id):

1-filter ("T1" .ROWID (SELECT MIN ("T2" .ROWID) FROM "T"T2" WHERE

"T2". "X" =: B1)

4-filter ("T2". "X" =: B1)

2) the second execution plan for using analysis functions to assist queries is as follows

Sec@ora10g > SELECT t1.x

2 t1.y

3 ROW_NUMBER () OVER (PARTITION BY t1.x ORDER BY t1.ROWID) rn

4 FROM t t1

5 /

Execution Plan

Plan hash value: 2335850315

| | 0 | SELECT STATEMENT | | 6 | 66 | 4 (25) | 00:00:01 |

| | 1 | WINDOW SORT | | 6 | 66 | 4 (25) | 00:00:01 |

| | 2 | TABLE ACCESS FULL | T | 6 | 66 | 3 (0) | 00:00:01 |

3) by comparing the above two implementation plans, the following conclusions can be drawn.

The first method uses twice TABLE ACCESS FULL, and the second method uses only one TABLE ACCESS FULL.

From the execution plan, it can be concluded that the method of using analysis function is more efficient.

5. Summary

In the maintenance of DBA database, the removal of duplicate records is often inevitable. It is an arduous task to remove duplicate records in a database with a large amount of data. if the method is not chosen correctly, it may be difficult to complete the task.

The two methods I introduce here are relatively efficient. Please take your time to understand the details.

There are many means and methods to complete the task, and only the method that minimizes the maintenance time and the impact on the production database is the correct and acceptable method.

Good luck.

-- The End--

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.