How to clear the duplicate data of the result set in MySQL 04/16 Update SLTechnology News&Howtos

How to clear the duplicate data of the result set in MySQL

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how to remove the duplicate data in the result set in MySQL. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Keywords Distinct and Distinctrow

The keyword distinct generally follows directly after the SELECT in the query statement, replacing the optional keyword all, while the keyword all is the default. Distinctrow is an alias for distinct, and it produces the same effect as distinct:

Double-click the code to select all 1 2 3 4SELECT [ALL | DISTINCT | DISTINCTROW] select_ exprr [from table_ reference [where where_condition]

To illustrate the use of these keywords, let's take the data in the following table as an example. Among them, the table contains some fruit names and their corresponding colors:

The following query retrieves the names of all fruits from the above table and lists them alphabetically:

Double-click the code to select all 1 2SELECT nameFROM fruits

Since there is no color information attached, each fruit variety is repeated:

Now, let's query again with the keyword distinct to see what the result is:

Double-click the code to select all 1 2SELECT DISTINCT nameFROM fruits

As expected, because of the color information of the fruit, the name of each fruit appears only once:

Second, the trade-off of duplicate data

Sometimes the keyword distinct cannot be used because deleting replicated data can lead to incorrect results. Consider the following situations:

The customer wants to generate an employee table for some statistics. To do this, we can use the following command:

Double-click the code to select all 1 2 3 4 5SELECT name, gender, salaryFROM employeesORDER BY name

Oddly enough, there is a duplicate "Kristen Ruegg" in the result:

Customers say they don't want duplicates, so the developer adds the keyword distinct to the SELECT statement. Well, this will be able to meet the customer's requirements, but the problem will follow, because the company does have two employees with duplicate names. So, adding the keyword distinct deletes a valid record, so the result is actually wrong. We can confirm through emp_id_number that there are indeed two employees named Kristen Rueggs:

Double click the code to select all 1 2 3 4 5 6SELECT name, gender, salary, emp_id_numberFROM employeesORDER BY name

Here are the problematic data, all of which have unique emp_id_numbers:

The above situation tells us: when using the keyword distinct, make sure you don't inadvertently delete valid data!

Third, the difference between the keyword Distinct and Group By

The logical effect of grouping all selected column data using distinct is the same as that of grouping all column data without aggregation. For such a query, the group by command simply produces a column of grouped values. When a column is displayed and grouped in alignment, the query gives different values in that column. However, when multiple columns are displayed and grouped, the query gives a different group sum of the values in each column. For example, the following query produces exactly the same results as the first SELECT distinct command:

Double-click the code to select all 1 2 3SELECT nameFROM fruits GROUP BY name

Similarly, the following statement produces exactly the same result as our SELECT distinct statement on the employee table:

Double-click the code to select all 1 2 3 4 5SELECT name, gender, salary FROM employeesGROUP BY name

The difference between the keyword distinct and group by is that the group by clause sorts the data records. Therefore:

Double click the code to select all 1 2 3 4 5 6 7 8 9 10 11SELECT name, gender, salary FROM employeesGROUP BY name; or: SELECT DISTINCT name, gender, salary FROM employeesORDER BY name

IV. Statistics of duplicate data

The keyword Distinct can be used in the COUNT () function to count how many different values are contained in a column. COUNT (distinct expression) counts the number of different non-zero values for a given expression. The expression can be a column name for which you want to count the number of different non-zero values.

Here is all the data in the table employee:

Applying the Count distinct function to the name field results in six different names:

This is how to remove duplicate data from the result set in the MySQL shared by the editor. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.