Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is a SQL window function

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article mainly explains "what is SQL window function", interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let Xiaobian take you to learn "What is SQL window function"!

Window Function is a new feature defined in SQL2003, and improved in SQL2011 and SQL2016, adding several extensions. Window functions differ from familiar ordinary functions and aggregate functions in that they perform one calculation for each row of data: enter multiple rows (a window) and return a value. In analytical queries such as reports, window functions can elegantly express certain requirements and play an irreplaceable role.

What is a window function?

The window function appears in the expression list of the SELECT clause, and its most prominent feature is the OVER keyword. The syntax is defined as follows:

window_function (expression) OVER ( [ PARTITION BY part_list ] [ ORDER BY order_list ] [ { ROWS | RANGE } BETWEEN frame_start AND frame_end ] )

These include the following options:

PARTITION BY means to partition the data first by part_list

ORDER BY means to sort the data in each partition by order_list

Figure 1. Basic concepts of window function

The last item represents the definition of Frame, i.e.: What data does the current window contain?

ROWS selects several rows before and after, for example, ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING indicates that the previous 3 rows to the next 3 rows, a total of 7 rows of data (or less than 7 rows, if the boundary is encountered)

RANGE selects a range of data, for example RANGE BETWEEN 3 PRECEDING AND 3 FOLLOWING indicates all rows with values in the range [c−3,c+3][c−3,c+3], cc is the value of the current row

Figure 2. Rows window and Range window

Logically and semantically speaking, the computation "process" of a window function is as follows:

Partitioning and reordering all input data by window definition (if necessary)

For each row of data, calculate its Frame range

Input the row set in Frame into window function, and fill the calculation result into current row.

For example:

SELECT dealer_id, emp_name, sales, ROW_NUMBER() OVER (PARTITION BY dealer_id ORDER BY sales) AS rank, AVG(sales) OVER (PARTITION BY dealer_id) AS avgsales FROM sales

In the above query, the rank column indicates the sales ranking of the employee under the current dealer;avgsales indicates the average sales of all employees under the current dealer. The query results are as follows:

+------------+-----------------+--------+------+---------------+ | dealer_id | emp_name | sales | rank | avgsales | +------------+-----------------+--------+------+---------------+ | 1 | Raphael Hull | 8227 | 1 | 14356 | | 1 | Jack Salazar | 9710 | 2 | 14356 | | 1 | Ferris Brown | 19745 | 3 | 14356 | | 1 | Noel Meyer | 19745 | 4 | 14356 | | 2 | Haviva Montoya | 9308 | 1 | 13924 | | 2 | Beverly Lang | 16233 | 2 | 13924 | | 2 | Kameko French | 16233 | 3 | 13924 | | 3 | May Stout | 9308 | 1 | 12368 | | 3 | Abel Kim | 12369 | 2 | 12368 | | 3 | Ursa George | 15427 | 3 | 12368 | +------------+-----------------+--------+------+---------------+

Note: Each part of the grammar is optional:

If PARTITION BY is not specified, the data is not partitioned; in other words, all data is treated as one partition

If ORDER BY is not specified, partitions are not sorted and are typically used for order-independent window functions such as SUM()

If you do not specify a Frame clause, the following Frame definition is assumed by default:

RANGE BETWEEN UNBOUND PRECEDING AND UNBOUND FOLLOWING

If ORDER BY is specified, the first row in the partition to the current value is used by default. RANGE BETWEEN UNBOUND PRECEDING AND CURRENT ROW

Finally, window functions can be divided into three categories:

Aggregate: AVG(), COUNT(), MIN(), MAX(), SUM()...

Value: FIRST_VALUE(), LAST_VALUE(), LEAD(), LAG()...

Ranking: RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE()...

Limited by space, this article does not explore the meaning of each window function. Pay attention to the Java technology stack of the public number, reply in the background: interview, you can get the MySQL series of interview questions and answers I organized, very complete.

Note: Frame definitions do not apply to all window functions, such as ROW_NUMBER(), RANK(), LEAD(), etc. These functions always apply to the entire partition, not the current Frame.

window function VS. aggregate function

In the sense of aggregation, it seems that both window functions and Group By aggregation functions can do the same thing. However, the similarities between them were limited to this! The key difference is that the window function simply appends the result to the current result, it does not modify any existing rows or columns. Group By does something completely different: it keeps only one row of aggregate results for each Group.

Some readers may ask, isn't it a modification that the windowing function apparently changes the order in which results are returned? Because SQL and relational algebra are defined on a multi-set basis, the result set itself has no order, ORDER BY is simply the order in which the results are finally presented.

On the other hand, logically and semantically, the parts of a SELECT statement can be thought of as being "executed" in the following order:

Figure 3. Logical execution order of SQL parts

Notice that the evaluation of window functions comes just before ORDER BY and after most of SQL. This is also consistent with the window function's appending, not modifying semantics-the result set is determined at this point, and the window function is calculated accordingly. Don't select * again, send you 12 query skills, recommend to see.

Execution of window functions

The classic execution of window functions is divided into sorting and function evaluation.

Figure 4. The execution process of a window function is usually divided into two steps: sorting and evaluation.

PARTITION BY and ORDER BY in window definitions are easy to sort. For example, for the window PARTITION BY a, b ORDER BY c, d, we can sort the input data by (a,b,c,d)(a, b,c,d) or (b,a,c,d)(b,a,c,d), and then the data will be arranged as shown in Figure 1.

Next consider: How to handle Frame?

For the Frame of the entire partition (such as RANGE BETWEEN UNBOUND PRECEDING AND UNBOUND FOLLOWING), just calculate it once for the entire partition, there is nothing to say;

For frames that grow gradually (e.g. RANGE BETWEEN UNBOUND PRECEDING AND CURRENT ROW), it is easy to maintain the accumulated state with Aggregator;

For sliding frames (e.g. ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING) it is relatively difficult. A classic approach is to ask Aggregator to support not only add but also remove, which may be more complicated than you think, for example consider the implementation of MAX() below.

Optimization of window function

There is a limit to what the optimizer can do for window functions. For the sake of completeness, a brief explanation is still made here.

Usually, we first extract the window function from Project into a separate operator called Window.

Figure 5. Optimization Process of Window Function

Sometimes, a SELECT statement contains multiple window functions with the same or different window definitions (OVER clauses). Obviously, for the same window, there is no need to partition and sort again, we can combine them into a Window operator.

For different windows, most simply, we can divide them all into different Windows, as shown in the above figure. In actual execution, each Window needs to be sorted once, and the cost is not small.

Is it possible to compute multiple window functions at once? In some cases, this is possible. For example, there are two window functions in this example:

... ROW_NUMBER() OVER (PARTITION BY dealer_id ORDER BY sales) AS rank, AVG(sales) OVER (PARTITION BY dealer_id) AS avgsales ...

Although these two windows are not exactly the same, AVG(sales) does not care about the order within the partition and can reuse the window of ROW_NUMBER().

Parallel Execution of Window Functions

Most modern DBMSs support parallel execution. For window functions, since the computation between partitions is completely independent, we can easily dispatch partitions to different nodes (threads) to achieve inter-partition parallelism.

But what if the window function has only one global partition (no PARTITION BY clause), or if the number of partitions is too small to be sufficiently parallel? The Removable Aggregator technique we mentioned above is obviously no longer available, it relies on the internal state of a single Aggregator and is difficult to parallelize effectively.

In this paper, TUM proposes to use Segment Trees to achieve efficient intra-partition parallelism. A segment tree is an N-ary tree data structure, where each node contains the partial aggregation results under the current node.

The following illustration shows an example of calculating SUM() using a binary segment tree. For example, 1212 in the third row in the figure below represents the aggregation result of leaf nodes 5+75+7, while 2525 above it represents the aggregation result of leaf nodes 5+7+3+105+7+3+10.

Figure 6. Computes the sum of a given range using a segment tree

Assuming that the current Frame is the 2nd to 8th lines, that is, 7+3+10+...+ needs to be calculated. 47+3+10+...+ Sum of 4 intervals. With the segment tree in place, we can compute the aggregate directly using 7+13+207+13+20 (shown in red).

A segment tree can be constructed in O(nlogn)O(nlog n) time and can query the aggregate results of any interval in O(logn)O(log n) time. Even better, not only can queries be multithreaded concurrently without interference, but the construction of segment trees can also be well parallelized.

At this point, I believe that everyone has a deeper understanding of "what is SQL window function", may wish to actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report