In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces what the SQL window function is, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let Xiaobian take you to understand it.
The window function looks like:
Expression OVER (PARTITION BY grouping field ORDER BY sort field)
There are two abilities:
When the expression is rank () dense_rank () row_number (), it has the ability to sort groups.
When the expression is an aggregate function such as sum (), it has the ability to accumulate.
Regardless of the ability, the window function does not affect the number of rows of data, but spreads the calculation evenly over each row.
These two abilities need to be distinguished and understood.
Bottom table
The above is the bottom table of the example, with a total of 8 pieces of data, cities 1 and 2, each with an area of 1 to 4 below, and each data has the number of people of the data.
Grouping sorting
ORDER BY people is fine if we rank it by population, but what if we want to rank it within the city?
At this point, we need to use the grouping sorting ability of the window function:
SELECT *, rank () over (PARTITION BY city ORDER BY people) FROM test
The SQL means to sort by people within the city group.
In fact, PARTITION BY is optional, if we ignore it:
SELECT *, rank () over (ORDER BY people) FROM test
It is also valid, but this statement is equivalent to normal ORDER BY, so PARTITION BY is generally used when grouping sorting using window functions.
The difference of each grouping sorting function
We print out the results of rank () dense_rank () row_number ():
SELECT *, rank () over (PARTITION BY city ORDER BY people), dense_rank () over (PARTITION BY city ORDER BY people), row_number () over (PARTITION BY city ORDER BY people) FROM test
In fact, you can guess from the results that when the three functions encounter the same value in sorting, they have the following differences in ranking statistical logic:
Rank (): values rank the same at the same time, but occupy ranking numbers.
Dense_rank (): the value is the same at the same time, but does not occupy the ranking number, the overall ranking is more compact.
Row_number (): force ranking by line number regardless of whether the value is the same or not.
The above example can be optimized because all the window logic is the same, and we can extract it into a variable using WINDOW AS:
SELECT *, rank () over wd, dense_rank () over wd, row_number () over wdFROM testWINDOW wd as (PARTITION BY city ORDER BY people) cumulative aggregation
As we said before, the use of aggregate functions will turn queries into aggregate mode. If you do not use GROUP BY, the number of rows returned after aggregation will be compressed to one row, and even with GROUP BY, the number of rows returned will generally be greatly reduced because the grouping is aggregated.
However, aggregations using window functions do not result in a reduction in the number of rows returned, so how is this aggregation calculated? Let's just look at the following example:
SELECT *, sum (people) over (PARTITION BY city ORDER BY people) FROM test
As you can see, in each city group, after sorting by people, the same values are accumulated (the same values will be merged together). This is what BI tools generally call the implementation of RUNNGIN_SUM. Of course, in general, our collation rules use dates that will never be repeated, so we will not encounter the problem of merging calculation in the first red box.
Cumulative functions also include avg () min (), etc., all of which can also act on window functions, and their logic can be understood as follows:
You may be wondering, isn't it more convenient to directly sum (previous line result, next line)? To test the guess, let's try the result of avg ():
It can be seen that if you directly take advantage of the cache of the results of the previous row, then the avg results must be inaccurate, so the window cumulative aggregation is recalculated per row. Of course, we don't rule out the possibility of doing additional performance optimizations for sum, max, and min, but avg can only repeat calculations per line.
Used in combination with GROUP BY
The window function can be used in combination with GROUP BY, following the rule that the window range takes effect on subsequent query results, so it doesn't really care whether GROUP BY is performed or not. Let's look at the following example:
The cumulative aggregation after grouping by region is based on the granularity of the data rows after GROUP BY, rather than the previous detail rows.
Summary
Window functions are very useful in scenarios such as sorting or accumulating GVM in the calculation group. We just need to keep two knowledge points in mind:
Grouping sorting makes sense only when combined with PARTITION BY.
Cumulative aggregation acts on the query result row granularity and supports all aggregate functions.
The address for discussion is: intensive reading of "SQL window functions" Issue # 405 ascoders/weekly
Thank you for reading this article carefully. I hope the article "what is the SQL window function" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.