What is the window function in MySQL8.0 07/11 Update SLTechnology News&Howtos

What is the window function in MySQL8.0

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/01 Report--

This article is about what window functions are in MySQL8.0. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

There were no window functions in previous versions of MySQL, and window functions were not introduced until MySQL8.0. The window function performs a calculation on each record in the query, and the result of the calculation is obtained from multiple records associated with that record.

1. Window function and aggregate function

Window functions are very similar to aggregate functions in that they are executed on a set of records rather than on the entire table. However, an aggregate function returns only one result after a set of records is executed, while the window function returns a result for each row of records in the change group.

two。 Common window functions

The main window functions defined in MySQL8.0 are as follows:

The function name parameter describes whether cume_dist () accumulates the distribution value. That is, the ratio of the number of rows in which the packet value is less than or equal to the current value to the total number of rows in the packet. The range of values is (0pl). Dense_rank () No uninterrupted sorting within the group. When using this function, there can be grouping in the form of 1, 1, 1, 2, and 2. First_value () is; first_value (expr) returns the first value in the group as of the current row. Lag () Yes; lag (expr, [N, [default]]) takes the Nth line forward from the current line, and defaults to 1 if N is missing. If not, default is returned by default. The default value of default is NULLlast_value (); last_value (expr) returns the last value in the group as of the current row. Lead () Yes; lead (expr, [N, [default]]) takes the Nth line back from the current line. The function is the opposite of lag (), and the rest is the same as lag (). Nth_value () is; nth_value (expr,N) returns the Nth row in the packet as of the current line. The first_value\ last_value\ nth_value function has a similar function, except that it returns data for different line numbers of the current line in the group. Ntile () Yes; ntile (N) returns the bucket number of the current row within the group. In the calculation, we should first divide all the data in the change group into N buckets, and then return the bucket number where each record is located. Returns the cumulative percentage from 1 to Npercent_rank () No. The function calculates as follows: the number of rows of all records less than the value of the record / the total number of rows of the packet-1. Therefore, the return value of the record is changed to a non-interrupted intra-group sort of [0Power1] rank (). The sorting results may be as follows: 1 whether the current line is the sequence number in its packet. Regardless of whether there are duplicate values in the sorting results, the sorting results are as follows: 1meme2pence3pence5

Note: the 'parameter' column indicates whether the function can add parameters. "No" means that parameters cannot be added to the parentheses of the function. Expr can represent either a field or a calculation on a field, such as sum (col), and so on. The following is the same.

3. Over clause

The over clause specifies how records are partitioned for processing by window functions. If over () is empty, the entire query record is grouped as a group. If the over clause is not empty, it can specify how query records are grouped and how records are sorted within the group. In addition, the over clause can also be used with aggregate functions. If the over clause appears after the aggregate function, then the aggregate function becomes a window function. If there is no over clause, they are still aggregate functions. The main aggregate functions that can use the over clause are as follows:

Avg (), bit_and (), bit_or (), bit_xor (), count (), max (), min (), stddev_pop (), stddev (), std (), stddev_samp (), sum (), var_pop (), variance (), var_samp

For the window functions described in the previous section, the over () clause is mandatory.

The common grammatical forms in over clauses are:

Over_clause:

{OVER (window_spec) | OVER window_name}

Where:

Window_spec:

[window_name] [partition_clause] [order_clause] [frame_clause]

Window_name: refers to the window clause defined in the query statement. If you encounter a group by, having clause, order by clause, then the window clause should be placed between the having clause and the order by clause. The syntax is as follows:

WINDOW window_name AS (window_spec)

[, window_name AS (window_spec)]

And

Window_spec:

[window_name] [partition_clause] [order_clause] [frame_clause]

From the grammatical structure, we can see that the window clause is actually a separate extraction of the content placed in parentheses in over ().

Partition_clause: the parittion by expr clause. Used to specify how records are grouped. The expr in the syntax can be not only the field itself, but also an evaluation expression. For example, there is a field ts of type timestramp in the record, and both partition by ts and partition by hour (ts) are valid in MySQL.

Order_clause: order by expr desc | asc,expr desc | asc. Used to specify how to sort within a group.

Frame_clause: used to specify the subset partition method in the current grouping. The frame can be moved within each group based on the position of the current row. Use frame to calculate the sum of pipelining (from the partition to the current line) and the rolling average (rolling averages).

Its grammatical structure is as follows:

Frame_clause: frame_units frame_extentframe_units: {ROWS | RANGE} frame_extent: {frame_start | frame_between} frame_between: BETWEEN frame_start AND frame_end frame_start, frame_end: {CURRENT ROW | UNBOUNDED PRECEDING | UNBOUNDED FOLLOWING | expr PRECEDING | expr FOLLOWING}

Where:

Frame_units is used to indicate the relationship between the current line and frame

ROWS: used to define the start and end lines of the frame (offset is based on location); RANGE: defines the interval of the frame. (the base of the offset is the value of the current row)

Frame_entent is used to indicate the start and end lines of the frame. One is by specifying start and end (frame_start,frame_end. Frame_end may not be specified, if not explicitly specified, the current line defaults to the end line), and the other uses between (frame_between). The syntax of frame_between is simple. Let's take a look at frame_start and frame_end.

Current row: when used with rows, the boundary is the current line. When used with range, the boundary is the peer of the current row (personally, I understand that the peer here should be all records equal to the value of the current row).

Unbounded precceding: when you use it, the first line of each partition is the boundary.

Unbounded following: when you use it, the first line of each partition is the boundary.

Expr preceding\ expr following: the offset that can be set up (preceding) and down (following) by expr personalization.

4. Code example

The table structure is as follows:

4.1 row_number\ dense_rank\ rankselect order_date,sum (quantity) as quantity,rank () over (ORDER BY sum (quantity) desc) as rank_result,dense_rank () over (ORDER BY sum (quantity) desc) as dense_result,row_number () over (ORDER BY sum (quantity) desc) as row_resultfrom spm_order group by order_date-- limits part of the data, which has no practical significance. It is OK to show the difference between these three functions having quantity > = 98order by quantity desc.

The running results are as follows:

From the above results, we can see:

Once the rank () function encounters a duplicate value, the sequence number is broken. For example, the sequence number that appears next after two 7s is 9.

Even if there are duplicate values in the dense_rank () function, the sequence number is continuous. After two 7s, the next sequence number is 8.

Row_number () will not have the same sequence number.

4.2 cume_dist\ percent_rankselect order_date,num,cume_dist () over (order by num asc) as cume_result,percent_rank () over (order by num asc) as percent_resultfrom (select order_date,count (1) as numfrom spm_order group by order_datehaving num > = 27) aorder by num asc

The result of the code running is as follows

The analysis is as follows:

Cume_dist (): first of all, there are 10 total records. When num=27, there are five values with num less than or equal to 27, so its cume_dist () value is 0.5; when num=28, there are seven values less than or equal to 28, so cume_dist () value is 0.7; and so on.

Percent_rank (). When num=27, the number of records with num less than 27 is 0, so percent_rank () is 0; when num=28, num

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.