In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article is to share with you what the SQL de-duplication method has. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
When using SQL to retrieve data, we often encounter duplicate values in the table, for example, if we want to get uv (unique visitors), we need to remove duplicates.
Distinct or group by clauses are usually used in MySQL, but you can also use the row_number window function to remove duplicates in sql that support window functions (such as Hive SQL, Oracle, and so on).
For example, Chestnut, there is such a table as task:
Note:
Task_id: task id
Order_id: order id
Start_time: start Tim
Note: one task corresponds to multiple orders
We need to ask for the total number of tasks, because task_id is not unique, so we need to repeat:
Distinct
-- list all unique values of task_id (deduplicated records) total select distinct task_idfrom Task;-- tasks select count (distinct task_id) task_numfrom Task
Distinct is generally inefficient. It is not suitable to show the specific value after weight removal, but is generally used in conjunction with count to calculate the number of entries.
When distinct is in use, it is placed after select, and the values of all subsequent fields are deduplicated uniformly. For example, if there are two fields after distinct, then the records 1 and 2 are not duplicate values.
Group by
-- lists all unique values of task_id (deduplicated records, null is also a value)-- Total number of select task_id-- from Task-- group by task_id;-- tasks select count (task_id) task_numfrom (select task_id from Task group by task_id) tmp
Row_number
Row_number is a window function. The syntax is as follows:
Row_number () over (partition by order by)
The partition by part can be omitted.
-- use select count (case when rn=1 then task_id else null end) task_numfrom (select task_id, row_number () over (partition by task_id order by start_time) rn from Task) tmp in sql that supports window functions
In addition, use a table test to manage the use of distinct and group by in deweighting:
-- semicolon below; used to separate lines select distinct user_idfrom Test;-- returns 1; 2select distinct user_id, user_typefrom Test;-- returns 1, 1; 1, 2; 2, 1select user_idfrom Testgroup by user_id;-- returns 1; 2select user_id, user_typefrom Testgroup by user_id, user_type;-- returns 1; 1; 2; 2, 1select user_id, user_typefrom Testgroup by user_id -- Hive, Oracle, etc. will report an error. Mysql can write like this. Returns 1, 1 or 1, 2; 2, 1 (two lines). Only the fields after group by will be deduplicated, that is, the number of records returned in the end is equal to the number of records in the previous sql, that is, two-- fields that are not placed after group by but are placed in select will only return a record (as if it is usually the first one, it should be irregular) Thank you for reading! This is the end of this article on "what are the ways to remove weight from SQL?". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.