Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use the date and period of SQL

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use the date and period of SQL". In the daily operation, I believe many people have doubts about how to use the date and period of SQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts about "how to use the date and period of SQL". Next, please follow the editor to study!

First: the use of dates and periods

The screening of dates and time periods is often used in work, because the weekly, monthly, quarterly and annual performance is often the focus of analysis when pulling reports, dashboards and various analyses.

Time segment extraction: Extract syntax-field can be day, hour, minute, month, quarter, etc.

-- source can be of date or timestamp type

Extract (field FROM source)

Use SELECT extract (year FROM '2020-08-05 0915);-- the result is 2020

SELECT extract (quarter FROM '2020-08-05 0915);-- result is 3

SELECT extract (month FROM '2020-08-05 0915);-- result is 8

SELECT extract (week FROM '2020-08-05 09);-the result is 31, the week ordinal of the year

SELECT extract (day FROM '2020-08-05 0915);-- result is 5

SELECT extract (hour FROM '2020-08-05 0915);-- result is 9

SELECT extract (minute FROM '2020-08-05 0915);-- the result is 30

SELECT extract (second FROM '2020-08-05 0915);-- result is 8

There are some differences in the fields that can be extracted from different databases. Take Hive as an example, it supports day, dayofweek, hour, minute, month, quarter, second, week and year. Among them, weekly, monthly and annual are the most widely used, because whether the company's internal products or commercial products provide data background statistics, weekly and monthly reports (such as nearly 7 days, nearly 30 days) pay most attention to the performance cycle.

Note:

Impala support: YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, EPOCH

Hive support: day, dayofweek, hour, minute, month, quarter, second, week and year

Hive introduced this function from the Hive2.2.0 version.

Zhou's extraction grammar

When counting according to the interval of the week, you need to identify the date of Monday and the date of Sunday. The following function is often used at this time:

Next_day (STRING start_date, STRING day_of_week)

-- returns the date of the next week corresponding to the current date

-- 2020-08-05 is Wednesday

SELECT next_day ('2020-08-05)-the corresponding date for next Monday: 2020-08-10

SELECT next_day ('2020-08-05)-corresponding date for next Tuesday: 2020-08-11

SELECT next_day ('2020-08-05)-corresponding date for next Wednesday: 2020-08-12

SELECT next_day ('2020-08-05)-- the corresponding date for next Thursday: 2020-08-06, that is, this Thursday

SELECT next_day ('2020-08-05)-- the corresponding date for next Friday: 2020-08-07, that is, this Friday

SELECT next_day ('2020-08-05)-- the corresponding date for the next Saturday: 2020-08-08, that is, this Saturday

SELECT next_day ('2020-08-05)-- the corresponding date for the next Sunday: 2020-08-09, that is, this Sunday

English from Monday to Sunday (Monday,Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)

Use

So how do I get the date corresponding to Monday of the week in which the current date is located? You only need to get the date corresponding to next Monday of the current date, and then subtract 7 days to get:

SELECT date_add (next_day ('2020-08-05),-7)

Similarly, to obtain the date corresponding to the Sunday of the week in which the current date is located, you only need to get the date corresponding to the next Monday of the current date, and then subtract 1 day to obtain:

Select date_add (next_day ('2020-08-05),-1)

Extraction grammar for the month of 2020-08-09

As for how to extract a month from a single date, the LAST_DAY function can change the date of each month into the last day of the month (28th, 29th, 30th or 31st), as follows:

Last_day (STRING date)

Use SELECT last_day ('2020-08-05');-- 2020-08-31

In addition to the above, you can also use the date_format function, such as:

SELECT date_format ('2020-08-05mm')

-range of dates 2020-08

Window of the month: applications using add_months plus trunc ()

-returns the corresponding date after the addition and subtraction of the month

-2020-07-05

Select add_months ('2020-08-05,-1)

-- returns the beginning of the month of the current date

-2020-08-01

Select trunc ("2020-08-05", 'MM')

As can be seen from the above example, if you simply use add_months, minus N months, you can just get the data of an integer month, but if you add the trunc () function, the value will start on the first day of the previous N months.

-- Select all data from 2020-07-05 to 2020-08-05

BETWEEN add_months ('2020-08-05,-1) AND' 2020-08-05'

-- Select all data between 2020-07-01 and 2020-08-05

BETWEEN add_months (trunc ("2020-08-05", 'MM'),-1) AND' 2020-08-05 'second: temporary table and Common Table _ Expression (WITH)

These two methods are often used in daily work. For some complex computing tasks, in order to avoid too much JOIN, some of the data that need to be extracted is usually extracted in the form of temporary table or CTE before the main query block.

The practice of temporary watches:

CREATE TEMPORARY TABLE table_1 AS

SELECT

Columns

FROM table A

CREATE TEMPORARY table_2 AS

SELECT

Columns

FROM table B

SELECT

Table_1.columns

Table_2.columns

C.columns

FROM table C JOIN table_1

JOIN table_2

What CTE does:

Note that this syntax is supported by Hive and Impala, but not by lower versions of MySQL (higher versions)

WITH employee_by_title_count AS (

SELECT

T.name as job_title

, COUNT (e.id) as amount_of_employees

FROM employees e

JOIN job_titles t on e.job_title_id = t.id

GROUP BY 1

),

Salaries_by_title AS (

SELECT

Name as job_title

, salary

FROM job_titles

)

SELECT *

FROM employee_by_title_count e

JOIN salaries_by_title s ON s.job_title = e.job_title

You can see that the usage of TEMP TABLE and CTE WITH is actually very similar, and the purpose is to make your Query more clear and elegant. Many people are used to writing all the Query in a single block, using too much JOIN or SUBQUERY, resulting in the loss of logic and not knowing where to write. It is definitely a bonus to use TEMP TABLE and CTE as assistance at the right time.

Third: the combined use of Aggregation and CASE WHEN

Combining Aggregation function (SUM/COUNT/COUNT DISTINCT/MIN/MAX) with CASE WHEN is the most powerful and interesting way to use it. This use creates an effect similar to that of SUMIF/COUNTIF in EXCEL, and a lot of efficient analysis can be done in this way.

Table Name: orderColumn: register_date, order_date, user_id, country, order_sales, order_id data preparation CREATE TABLE order (

Register_date string

Order_date string

User_id string

Country string

Order_sales decimal (10Phone2)

Order_id string)

INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-09", "001",'c0-06-07 210, "o1")

INSERT INTO TABLE order VALUES ("2020-06-08", "2020-06-09", "002", 'c1fujimen 220, "O2")

INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-10", "003", 'c21409230, "o3")

INSERT INTO TABLE order VALUES ("2020-06-09", "2020-06-10", "004", "c3", "O4")

INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-20", "005", 'c4century, 300, "o5")

INSERT INTO TABLE order VALUES ("2020-06-10", "2020-06-23", "006", 'c5century Magne400, "O6")

INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-19", "007",'c6-06-07 600, "O7")

INSERT INTO TABLE order VALUES ("2020-06-12", "2020-06-18", "008", "c7million" 700, "o8")

"INSERT INTO TABLE order VALUES" ("2020-06-07", "2020-06-09", "009",'c8-06-07 100, "o9")

INSERT INTO TABLE order VALUES ("2020-06-15", "2020-06-18", "0010",'c9 June 15 200, "O10")

INSERT INTO TABLE order VALUES ("2020-06-15", "2020-06-19", "0011", 'c10 magic weight 250, "O11")

INSERT INTO TABLE order VALUES ("2020-06-12", "2020-06-29", "0012", "c11", "o12")

INSERT INTO TABLE order VALUES ("2020-06-16", "2020-06-19", "0013", "c12-6-16" 230, "o13")

INSERT INTO TABLE order VALUES ("2020-06-17", "2020-06-20", "0014", 'c13hammer 290, "o14")

INSERT INTO TABLE order VALUES ("2020-06-20", "2020-06-29", "0015", 'c14percent, 203, "o15"); CASE WHEN time, for retention / utilization analysis-allows multiple columns to be deduplicated

Set hive.groupby.skewindata = false

-allows grouping or sorting using location numbers

Set hive.groupby.orderby.position.alias = true

SELECT

Date_add (Next_day (register_date, 'MO'),-1) AS week_end

COUNT (DISTINCT CASE WHEN order_date BETWEEN register_date AND date_add (register_date,6) THEN user_id END) AS first_week_order

COUNT (DISTINCT CASE WHEN order_date BETWEEN date_add (register_date, 7) AND date_add (register_date,13) THEN user_id END) AS sencod_week_order

COUNT (DISTINCT CASE WHEN order_date BETWEEN date_add (register_date, 14) AND date_add (register_date,20) THEN user_id END) as third_week_order

FROM order

GROUP BY 1

The above example shows whether the user creates an order after registering. For example, how many order users are placed in the first week, the second week and the third week after registration, so that the usage and retention of users can be analyzed.

Note: for the above usage, you need to configure two parameters:

Hive.groupby.skewindata = false: multiple columns are allowed to remove duplicates, otherwise an error is reported:

SemanticException [Error 10022]: DISTINCT on different columns not supported with skew in data

Hive.groupby.orderby.position.alias = true: grouping or sorting using location numbers is allowed, otherwise an error is reported:

SemanticException [Error 10025]: line 79:13 Expression not in GROUP BY key''MO''

CASE WHEN time to analyze the amount of money consumed by each user SELECT

User_id

SUM (CASE WHEN order_date BETWEEN register_date AND date_add (register_date,6) THEN order_sales END) AS first_week_amount

SUM (CASE WHEN order_date BETWEEN date_add (register_date, 7) AND date_add (register_date,13) THEN order_sales END) AS second_week_amount

FROM order

GROUP BY 1

By filtering out the date of registration and consumption, and counting the amount of consumption, each user in each period of time (the first week, the second week after registration … And so on) the amount of consumption, you can observe whether users continue to maintain consumption habits or consumption amount becomes lower and so on.

Quantitative analysis of the amount of CASE WHEN that exceeds a certain quota SELECT

User_id

COUNT (DISTINCT CASE WHEN order_sales > = 100THEN order_id END) AS count_of_order_greateer_than_100

FROM order

GROUP BY 1

The above example is similar to the use of countif, for each user, count the number of orders whose order amount is greater than a certain value, and analyze to filter out high-value customers.

The number of CASE WHEN, plus the use of time SELECT

User_id

MIN (CASE WHEN order_sales > 100 THEN order_date END) AS first_order_date_over1000

MAX (CASE WHEN order_sales > 100 THEN order_date END) AS recent_order_date_over100

FROM order

GROUP BY 1

CASE WHEN plus MIN/MAX time, you can get the date of the user's first purchase of more than a certain amount of order, and the date of the last purchase of more than a certain amount of order.

Fourth: other uses of Window Function

Window Function is not only a function often used at work, but also a question that is often asked during an interview. A common usage scenario is grouping and fetching topN. Another usage introduced in this article is to use the windowing function for user access session analysis.

Session refers to a series of interactions that take place on a website within a specified period of time. For example, a session can include multiple web browsing, events, social interactions, and e-commerce transactions. Session is the equivalent of a container that contains the actions that users perform on the site.

Session has an expiration time, such as 30 minutes, that is, more than 30 minutes of inactivity, the session will become obsolete.

Suppose Zhang San visits the website and starts timing from the moment he arrives at the website. If 30 minutes have passed and Zhang San still does not engage in any form of interaction, it will be regarded as the end of this session. However, as long as Zhang San interacts with an element (such as an event, a social interaction, or a new web page), he will add another 30 minutes to that interaction to reset the expiration time.

Data preparation Table Name: user_visit_actionColumns: user_id, session_id, page_url, action_timeCREATE TABLE user_visit_action (

User_id string

Session_id string

Page_url string

Action_time string)

INSERT INTO TABLE user_visit_action VALUES ("001", "ss001", "http://a.com","2020-08-06 1315 34 11.478")

INSERT INTO TABLE user_visit_action VALUES ("001", "ss001", "http://b.com","2020-08-06 13 ss001 35 11.478")

INSERT INTO TABLE user_visit_action VALUES ("001", "ss001", "http://c.com","2020-08-06 1315" 3615 11.478 ")

INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://a.com","2020-08-06 14 ss002 30 11.478")

INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://b.com","2020-08-06 1415 3115 11.478")

INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://e.com","2020-08-06 14pur33 11.478")

INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://f.com","2020-08-06 14 ss002 35 11.478")

INSERT INTO TABLE user_visit_action VALUES ("002", "ss003", "http://u.com","2020-08-06 1834ghut 11.478")

INSERT INTO TABLE user_visit_action VALUES ("002", "ss003", "http://k.com","2020-08-06 1815" 3815 11.478 ")

Users access session Analysi

The sample data sheet, as above, has links and times for users, visits, and pages. The following uses partition by to express the browsing behavior of each user between different visits.

SELECT

User_id

Session_id

Page_url

DENSE_RANK () OVER (PARTITION BY user_id, session_id ORDER BY action_time ASC) AS page_order

MIN (action_time) OVER (PARTITION BY user_id, session_id) AS session_start_time

MAX (action_time) OVER (PARTITION BY user_id, session_id) AS session_finisht_time

FROM user_visit_action

The above query returns for each user, each visit, the priority of browsing the page, and the start and end time of the session, based on which the result can be stored in TEMP TABLE or CTE for further analysis.

At this point, the study on "how to use the date and period of SQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report