In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to check and remove duplicates in a SQL". The content is simple and clear. I hope it can help you solve your doubts. Let me lead you to study and learn this article "how to check duplicates and remove duplicates in a SQL".
Preface
When using SQL to retrieve data, we often encounter the need to remove duplicates when there are duplicate values in the table. This paper classifies the common methods.
⛳️1.distinct
Topic: now the operator needs to check which schools the users come from. Please take out the school's deduplicated data from the user information table.
Example: user_profile
Mysql > SELECT DISTINCT university FROM user_profile
? According to the example, the query returns the following results
? Tips:
The keyword distinct in SQL is deduplicated:
Distinct stands for a unique meaning in English.
He means to remove weight in SQL: for example, the column "university" appears twice in Peking University.
After using distinct for de-re-query, Peking University appears only once.
Distinct is generally inefficient
When distinct is in use, put it after select, and remove the duplicate values of all the subsequent fields.
? Expand:
Topic: now the operator needs to check the total number of users.
Select count (distinct university) from user_profile
⛳️2.group by
? For example, Chestnut, there is such a table task
Note:
Task_id: task id
Order_id: order id
Start_time: start Tim
Note: one task corresponds to multiple orders
Topic: list the total number of tasks
? According to the example, the query method is as follows:
Step 1: list all unique values of task_id (deduplicated record, null is also a value)
Select task_idfrom Taskgroup by task_id
Step 2: total tasks
Select count (task_id) task_numfrom (select task_idfrom Taskgroup by task_id) tmp; ⛳️3.row_number window function
? For example, Chestnut, there is such a table task
Note:
Task_id: task id
Order_id: order id
Start_time: start Tim
Note: one task corresponds to multiple orders
Topic: query the duplicate data of the whole table
? According to the example, the query method is as follows:
-used in sql that supports window functions
Select count (case when rn=1 then task_id else null end) task_numfrom (select task_id, row_number () over (partition by task_id order by start_time) rnfrom Task) tmp
? Tips:
Sorting can be realized by using three window functions ROW_NUMBER (), DENSE_RANK () and RANK () in MySQL8.0.
One thing to note is the alias after as. Do not repeat the alias with the previous function name, or you will report an error.
The following are examples of ranking achieved by these three functions:
-three sentences for the above three rankings
Select xuehao,score, ROW_NUMBER () OVER (order by score desc) as row_r from scores_tb;select xuehao,score, DENSE_RANK () OVER (order by score desc) as dense_r from scores_tb;select xuehao,score, RANK () over (order by score desc) as r from scores_tb
-one sentence can also query different rankings.
SELECT xuehao,score,ROW_NUMBER () OVER w AS 'row_r',DENSE_RANK () OVER w AS' dense_r',RANK () OVER w AS 'r'FROM scores_tbWINDOW w AS (ORDER BY score desc); delete duplicate data
Create test data
We create a personnel information table and insert some duplicate data into it.
CREATE TABLE Person (id int auto_increment primary key comment 'primary key', Name VARCHAR (20) NULL,Age INT NULL,Address VARCHAR (20) NULL,Sex CHAR (2) NULL) INSERT INTO Person (ID,Name,Age,Address,Sex) VALUES (1, 'Zhang San', 18, 'Beijing Road 18', 'male'), (2,'Li Si', 19, 'Beijing Road 29', 'male'), (3, 'Wang Wu', 19, 'Nanjing Road 11', 'female'), (4, 'Zhang San', 18, 'Beijing Road 18', 'male'), (5,'Li Si') 19, 29 Beijing Road, male), (6, Zhang San, 18, 18 Beijing Road, male), (7, Wang Wu, 19, 11 Nanjing Road, female), (8, Ma Liu, 18, 19 Nanjing Road, female)
Topic: there are duplicate records in the database. Delete and retain one of them.
We found that apart from the self-increasing ID, there are several pieces of data that are duplicated in other fields.
? Step 1: find the duplicate data mysql > SELECT MAX (ID) ID,Name,Age,Address,SexFROM PersonGROUP BY Name,Age,Address,SexHAVING COUNT (1) > 1
? Tips:
HAVING will group the number of data rows greater than 1, which is the duplicate data we are looking for.
Either the Max function or the Min function can be used above, just to ensure the uniqueness of the extracted data.
? Step 2: delete duplicate data
In fact, the last result to be retained in our database is the data queried in the second step.
We can just delete the other data.
How do I delete it? We use ID to rule out.
DELETE FROM PersonWHERE EXISTS (SELECT * FROM (SELECTMAX (ID) ID,Name,Age,Address,SexFROM PersonGROUP BY Name,Age,Address,SexHAVING COUNT (1) > 1) TWHERE Person.Name=T.NameAND Person.Age=T.AgeAND Person.Address=T.AddressAND Person.Sex=T.SexAND Person.ID
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 221
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.