What's the difference between distinct in SQL and row_number () over () 07/19 Update SLTechnology News&Howtos

What's the difference between distinct in SQL and row_number () over ()

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article is to share with you about the difference between distinct and row_number () over () in SQL. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Without saying much, let's take a look at it.

In SQL, the keyword distinct is used to return a unique different value. Its grammatical format is:

SELECT DISTINCT column name FROM table name

Suppose you have a table "CESHIDEMO" that contains two fields, NAME and AGE, in the following format:

Looking at the above table, we will find that there are two records with the same NAME and three records with the same AGE. If we run the following SQL statement

/ * where PPPRDER is the name of Schema, that is, table CESHIDEMO is in PPPRDER * /

Select name from PPPRDER.CESHIDEMO

The results will be as follows:

Looking at this result, we will find that in the above four records, there are two records with the same NAME value, that is, the values of the second record and the third record are both "gavin". So, what if we want records with the same NAME to show only one? At this point, you need to use the distinct keyword! Next, run the following SQL statement

Select distinct name from PPPRDER.CESHIDEMO

The results will be as follows:

Observing the result, it is obvious that our request has been realized! However, we can't help thinking, what will happen if you apply the distinct keyword to both fields at the same time? Now that we've thought of it, let's try it by running the following SQL statement

Select distinct name, age from PPPRDER.CESHIDEMO

The results are as follows:

Observe the result, alas, it doesn't seem to work? She showed all the records! There are two records with the same NAME value and three records with the same AGE value, which has not changed at all. But in fact, this should be the result. Because when distinct acts on multiple fields, she will only "de-duplicate" all records with the same field value, obviously our "poor" four records do not meet this condition, so distinct will think that the above four records are not the same. Empty words, next, let's add an exact same record to the table "CESHIDEMO" and verify it. The table after adding a record is as follows:

Then run the following SQL statement

Select distinct name, age from PPPRDER.CESHIDEMO

The results are as follows:

Observing the result, it perfectly validates our above conclusion.

In addition, there is one thing to pay special attention to, that is, the keyword distinct can only be placed in front of all fields in the SQL statement. If misplaced, SQL will not report an error, but it will not have any effect.

3 row_number () over ()

In the SQL Server database, we provide a function row_number () to label the records in the database table, followed by a function over (), and the function over () is to group and sort the records in the table. The syntax used by both is:

ROW_NUMBER () OVER (PARTITION BY COLUMN1 ORDER BY COLUMN2)

The records in the table are grouped by field COLUMN1 and sorted by field COLUMN2, where

PARTITION BY: indicates grouping

ORDER BY: indicates sorting

Next, we also test with the data in the table "CESHIDEMO". First, give the result of the query without using the row_number () over () function, as shown below:

Then, run the following SQL statement

Select PPPRDER.CESHIDEMO.*, row_number () over (partition by age order by name desc) from PPPRDER.CESHIDEMO

The results are as follows:

As can be seen from the above results, there is an additional column marked with numerical sorting on the basis of the original table. Then analyze the SQL statement we run and find that it is indeed grouped by the value of the field AGE and sorted by the value of the field NAME! Therefore, the function of the function has been verified.

Next, we study how to use the row_number () over () function to achieve the de-duplication function. By observing the above results, we can find that if we group by NAME, sort by AGE, and then take the first record of each group, we may be able to achieve the function of "de-duplicating". To try it, run the following SQL statement

/ * * where rn represents the last added column * /

Select * from (select PPPRDER.CESHIDEMO.*, row_number () over (partition by name order by age desc) rn from PPPRDER.CESHIDEMO) where rn = 1

After running, the result is as follows:

Observing the above results, we found that, alas, the function of "de-duplication" of data was accidentally realized by us. Unfortunately, if we are careful, we will find a very uncomfortable thing, that is, when we execute the above SQL statement to "deduplicate", a record with a NAME value of "gavin" and an AGE value of "18" is filtered out, but in real life, things of the same name and different ages are too normal.

This is the difference between distinct in SQL and row_number () over (). The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.