In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces how to use the windowing function in MySQL, which has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, let the editor take you to understand it.
(1) definition of windowing function
Windowing function, also known as OLAP function (Online Analytical Processing, online Analytical processing), is mainly used to analyze and process data in real time. The windowing function is not supported in previous versions of MySQL, but it has been supported since version 8.0.
# windowing function syntax func_name () OVER ([PARTITION BY] [ORDER BY ASC | DESC])
Parsing of windowing function statements:
The function is divided into two parts, one is the function name, the number of windowing functions is relatively small, a total of only 11 windowing functions + aggregate functions (all aggregate functions can be used as windowing functions). According to the nature of the function, some need to write parameters, some do not need to write parameters.
The other part is the over statement. Over () must be written, and the parameters in it are all optional parameters, which can be selectively used according to the requirements:
The first parameter is the partition by + field, which means to divide the dataset into multiple parts based on this field
The second parameter is the order by + field, according to which the data of each window is sorted in ascending or descending order.
The windowing function is similar to the grouping aggregation function, which divides the data into multiple parts through specified fields. The difference is:
The SQL standard allows all aggregate functions to be used as windowing functions, and windowing functions and aggregate functions are distinguished by the OVER keyword.
Aggregate functions return only one value per group, and windowing functions can return multiple values per group.
Among the 11 windowing functions, the three sorting functions, ROW_NUMBER (), RANK () and DENSE_RANK (), are the most commonly used in practical work. Let's learn about these three windowing functions through a simple dataset.
# first create a virtual salesman sales data CREATE TABLE Sales (idate date, iname char (2), sales int) # insert data INSERT INTO Sales VALUES into the table ('2021 prime,' Ding Yi, 200), ('2021 prime,' Ding Yi, 180), ('2021 prime,' Li Si, 100), ('2021 prime,' Liu Meng, 150), ('2021 prime,' Liu Meng, 180) ('2021 prime,' Wang er, 200), ('2021 prime,' Wang er, 180), ('2021 prime,' Wang er, 300), ('2021 prime,' Zhang San', 300), ('2021 prime,' Zhang San', 280), ('2021 prime,' Zhang San', 280), ('2021 prime,' Zhang San, 280) # data query SELECT * FROM Sales; # query the worst sales salesman of each month SELECT month (idate), iname,sales, ROW_NUMBER () OVER (PARTITION BY month (idate) ORDER BY sales) as sales_order FROM Sales SELECT * FROM (SELECT month (idate), iname,sales, ROW_NUMBER () OVER (PARTITION BY month (idate) ORDER BY sales) as sales_order FROM Sales) as tWHERE sales_order=1
# the difference between ROW_NUMBER (), RANK () and DENSE_RANK () SELECT * FROM (SELECT month (idate) as imonth,iname,sales, ROW_NUMBER () OVER (PARTITION BY month (idate) ORDER BY sales) as row_order,RANK () OVER (PARTITION BY month (idate) ORDER BY sales) as rank_order, DENSE_RANK () OVER (PARTITION BY month (idate) ORDER BY sales) as dense_order FROM Sales) as t
ROW_NUMBER (): sort sequentially-1, 2, 3
RANK (): sort side by side, skip repeating sequence numbers-1, 1, 3
DENSE_RANK (): sort side by side without skipping repeated sequence numbers-1, 1, 2
(2) the practical application scenario of windowing function
In the actual job or interview, you may encounter some problems, such as asking the user to log in continuously, check-in days and so on. Here is a way to solve this kind of problem with windowing function.
# first create a virtual user login table and insert data create table user_login (user_id varchar, login_time datetime) Insert into user_login values (1), (1), (1), (1), (1), (1), (), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1 (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2), (2) '2020-11-20 05VERV 38RV 18'), (2ZHI 202011-1909VRV 21VIREV 42'), (2MIHERE 202011-0200VOLING 1938'), (2MIREE 2011109VOL09VL03R03R11'), (2LINGOLING 2020-10-3107RV 44VIE55'), (2LTHING 2020-10-308MUR 5633'), (2MIHERE 202010-2909RV 30WOR 28') # View data SELECT * FROM user_login
There are usually three situations when calculating the number of consecutive login days:
Check the continuous login of each user
View the maximum number of consecutive days of login per user
View users who have logged in for more than N days in a certain period of time
For the first case: check the continuous login of each user
According to practical experience, we know that users may log in several times in a period of time, and we have to output all this information, so the final output fields can be user ID, first login date, end login date, and consecutive login days.
# data preprocessing: since the statistical window period is the number of days, you can convert the login time field into a date format and then remove it (excluding multiple logins in the same day) # to facilitate subsequent code viewing, put the processing results in a new table and operate create table user_login_date (select distinct user_id, date (login_time) login_date from user_login step by step) # the processed data are as follows: select * from user_login_date;# the first case: check the continuous login of each user # sort the user login data create table user_login_date_1 (select *, rank () over (partition by user_id order by login_date) irank from user_login_date); # View the result select * from user_login_date_1 # add a secondary column to help determine whether a user logs in to create table user_login_date_2 continuously (select *, date_sub (login_date, interval irank DAY) idate # data_sub minus the specified time interval from user_login_date_1 from the specified date; # View the result select * from user_login_date_2 # calculate the consecutive login days per user select user_id, min (login_date) as start_date, max (login_date) as end_date, count (login_date) as days from user_login_date_2 group by user_id,idate # = [Integration Code Solve the user continuous login problem] = select user_id, min (login_date) start_date, max (login_date) end_date, count (login_date) days from (select *, date_sub (login_date, interval irank day) idate from (select *, rank () over (partition by user_id order by login_date) irank from (select distinct user_id, date (login_time) login_date from user_login) as a) as b) as c group by user_id Idate
For the second case: view the maximum number of consecutive days of login per user
# calculate the maximum number of consecutive login days per user select user_id,max (days) from (select user_id, min (login_date) start_date, max (login_date) end_date, count (login_date) days from (select *, date_sub (login_date, interval irank day) idate from (select *) Rank () over (partition by user_id order by login_date) irank from (select distinct user_id, date (login_time) login_date from user_login) as a) as b) as c group by user_id,idate) as d group by user_id
For the third case: check the users who have logged in for more than N days in a certain period of time.
Let's say that our demand is to view users who have logged on to ≥ for 5 days in a row during this period. This requirement can also be filtered with the results of the first case query.
# View users who have logged on to ≥ for 5 consecutive days during this period: select distinct user_id from (select user_id, min (login_date) start_date, max (login_date) end_date, count (login_date) days from (select *, date_sub (login_date, interval irank day) idate from (select * Rank () over (partition by user_id order by login_date) irank from (select distinct user_id, date (login_time) login_date from user_login) as a) as b) as c group by user_id,idate having days > = 5) as d
It is possible to get results in this way, but it is a bit troublesome to solve this problem. Here is a simple method: reference a new static window function lead ()
Select *, lead (login_date,4) over (partition by user_id order by login_date) as idate5 from user_login_date
The lead function has three parameters, the first parameter is the specified column (in this case, the login date is used), the second parameter is the value of the current row several rows back, here is 4, that is, the date of the fifth login, and the third parameter is that if the null value returned can be replaced with the specified value, the third parameter is not used. The over statement is split for user_id, and each window is in ascending order for the login date.
Use the fifth login date-login_date+1. If it is equal to 5, you have been logged in for five consecutive days. If you get a null value or greater than 5, you have not logged in for five consecutive days. The code and result are as follows:
# calculate the difference between the fifth login date and the current day select *, datediff (idate5,login_date) + 1 days from (select *, lead (login_date,4) over (partition by user_id order by login_date) idate5from user_login_date) as a # find the records with a difference of 5 days select distinct user_id from (select *, datediff (idate5,login_date) + 1 as days from (select *, lead (login_date,4) over (partition by user_id order by login_date) idate5 from user_logrin_date) as a) as b where days = 5
[exercise] Meituan takeout platform data analysis interview questions-- SQL
The existing transaction data sheet user_goods_table is as follows:
Now the boss wants to know the distribution of takeout preferences for each user and find out which takeout category each user buys the most.
# Analysis title: the output field is required to be the user name user_name, and the user buys the most takeout goods_kind # solution: this is a problem of grouping and sorting, you can consider the window function # step 1: use the window function row_number () Group statistics and ranking select user_name,goods_kind,count (goods_kind), rank () over (partition by user_name order by count (goods_kind) desc) as irankfrom user_goods_tablegroup by user_name,goods_kind for the takeout categories purchased by each user # step 2: select the first takeout category for each user: select user_id,goods_kind from (select user_name,goods_kind,count (goods_kind), rank () over (partition by user_name order by count (goods_kind) desc) as irankfrom user_goods_tablegroup by user_name,goods_kind) as a where irank=1. Thank you for reading this article carefully. I hope the article "how to use the window function in MySQL" shared by the editor will be helpful to you. At the same time, I also hope that you will support and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.