In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
How to delete duplicate records using SQL statements? Many novices are not very clear about this. In order to help you solve this problem, the following editor will explain it in detail. People with this need can come and learn. I hope you can gain something.
If you want to delete the same data of mobilePhone (phone), officePhone (phone) and email (email) at the same time, use this statement to remove duplicates in the past:
Delete from Table where id not in (select max (id) from Table group by mobilePhone,officePhone,email) or delete from Table where id not in (select min (id) from Table group by mobilePhone,officePhone,email) delete from Table where id not in (select max (id) from Table group by mobilePhone,officePhone,email) or delete from Table where id not in (select min (id) from Table group by mobilePhone,officePhone,email)
The lower one will be a little faster. The above data is OK for data efficiency of less than 1 million, ranging from a few minutes to dozens of minutes in the case of repetition of 1 to 5, but if the amount of data reaches more than 3 million, the efficiency drops abruptly, and if you repeat more data, you will often run for dozens of hours, sometimes locking the watch and running all night. I had no choice but to find a new and feasible method, and today I finally got something:
/ / query the ID of unique data and import them into temporary table tmp select min (id) as mid into tmp from table group by mobilePhone,officePhone,email / / query the duplicated data and insert insert into finally select (fields except ID) from customers_1 where id in (select mid from tmp) / / ID of unique data in finally table, and import them into temporary table tmp select min (id) as mid into tmp from table group by mobilePhone,officePhone Email / / queries the duplicated data and inserts insert into finally select (fields other than ID) from customers_1 where id in (select mid from tmp) into the finally table
Efficiency comparison: the 5 million data were removed by delete method (1 beat and 2 repeats) for about 4 hours. Four hours, a long time.
Remove the weight of 5 million data with temporary table insertion (1 beat 2 repeat) in less than 10 minutes.
In fact, the way to delete is relatively slow, which may be the reason for deleting while looking for it, while using temporary tables, you can pick out the data ID that does not repeat and put it in the temporary table, and then press the ID selected by the temporary table to find out and insert them into the new table, and then delete the original table, so you can quickly repeat it.
SQL statement removes duplicate records and gets duplicate records
Look up the duplicate data that exists in these fields in the table according to the names of certain fields and delete them in the order of insertion, depending on order by and row_num.
Method 1: repeat processing according to multiple conditions:
Delete tmp from (select row_num = row_number () over (partition by field, field order by time desc) from table where time > getdate ()-1) tmp where row_num > 1 delete tmp from (select row_num = row_number () over (partition by field, field order by time desc) from table where time > getdate ()-1) tmp where row_num > 1
The second method is to remove the weight according to a single condition:
Delete from table where primary key ID not in (select max (primary key ID) from table group by fields that need to be deduplicated having count (fields that need to be deduplicated) > = 1) delete from table where primary key ID not in (select max (primary key ID) from table fields having count (fields that need to be deduplicated) > = 1)
Note: in order to improve efficiency, temporary tables can be used in both methods. Tables in not in can extract temporary tables # tmp first.
Then it is executed by not exists. In order to avoid excessive quantity, the amount of deletion can be controlled by Top in batch.
Delete top (2) from table where not exists (select primary key ID from # tmp where # tmp. Primary key ID= table. Primary key ID) is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.