Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the method of calculating the number of consecutive days with HiveSQL

2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

In this issue, the editor will bring you what is the method of using HiveSQL to calculate the number of consecutive days. The article is rich in content and analyzes and describes it from a professional point of view. I hope you can get something after reading this article.

In day-to-day work, you may often receive requests like this from the business side:

Count the longest consecutive check-in days per user this year

Statistics on topics that have been posted for more than 5 days in the last month

Statistics of goods with daily sales of more than 100 for more than 3 consecutive days in this quarter.

This kind of "days in a row" problem seems simple, but in fact, it requires high thinking ability and the ability to write complex SQL statements. Let's take a demand we have received as an example to propose a solution.

Create table user_calendar_record (user_id bigint comment 'user ID', event_type int comment' record type', event_data string comment 'record data', upload_time string comment 'upload time' del_status int comment 'deletion status') partitioned by (pt_date string comment 'record (partition) date')

Now you need to find out the entries for each user type 24 in April. If users record this item for one week or more, it shows that TA pays special attention to certain aspects and should focus on operation.

The idea of writing SQL is as follows. To avoid too much nesting, all steps are represented by child tables, and then the completed statements are synthesized.

1. Add a list of rankings grouped by user ID and sorted by record date. Because users can record more than once a day, use the dense_rank () function instead of rank () or row_number ().

(select user_id,pt_date, dense_rank () over (partition by user_id order by pt_date) as date_rank from user_calendar_record where pt_date > = 20190401 and pt_date = 7.

Select user_id,max (day_count) as max_day_count from t_cgroup by uidhaving max (day_count) > = 7

If you add up the above four steps, you will have the following complete SQL statement:

Select user_id,max (day_count) as max_day_countfrom (select user_id,start_point, count (distinct pt_date) as day_countfrom (select user_id,pt_date, date_sub (pt_date, cast (date_rank as int)) as start_point from (select user_id,pt_date Dense_rank () over (partition by user_id order by pt_date) as date_rank from user_calendar_record where pt_date > = 20190401 and pt_date = 7 What if you also need to get the start date for the maximum number of consecutive days at the same time? You can save the date count value as a temporary table, the maximum value of consecutive dates as another temporary table, and then join the two tables to get the results.

The above is what the editor shares with HiveSQL to calculate the number of consecutive days. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report