In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "how to use the date and period of SQL". In the daily operation, I believe many people have doubts about how to use the date and period of SQL. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts about "how to use the date and period of SQL". Next, please follow the editor to study!
First: the use of dates and periods
The screening of dates and time periods is often used in work, because the weekly, monthly, quarterly and annual performance is often the focus of analysis when pulling reports, dashboards and various analyses.
Time segment extraction: Extract syntax-field can be day, hour, minute, month, quarter, etc.
-- source can be of date or timestamp type
Extract (field FROM source)
Use SELECT extract (year FROM '2020-08-05 0915);-- the result is 2020
SELECT extract (quarter FROM '2020-08-05 0915);-- result is 3
SELECT extract (month FROM '2020-08-05 0915);-- result is 8
SELECT extract (week FROM '2020-08-05 09);-the result is 31, the week ordinal of the year
SELECT extract (day FROM '2020-08-05 0915);-- result is 5
SELECT extract (hour FROM '2020-08-05 0915);-- result is 9
SELECT extract (minute FROM '2020-08-05 0915);-- the result is 30
SELECT extract (second FROM '2020-08-05 0915);-- result is 8
There are some differences in the fields that can be extracted from different databases. Take Hive as an example, it supports day, dayofweek, hour, minute, month, quarter, second, week and year. Among them, weekly, monthly and annual are the most widely used, because whether the company's internal products or commercial products provide data background statistics, weekly and monthly reports (such as nearly 7 days, nearly 30 days) pay most attention to the performance cycle.
Note:
Impala support: YEAR, QUARTER, MONTH, DAY, HOUR, MINUTE, SECOND, MILLISECOND, EPOCH
Hive support: day, dayofweek, hour, minute, month, quarter, second, week and year
Hive introduced this function from the Hive2.2.0 version.
Zhou's extraction grammar
When counting according to the interval of the week, you need to identify the date of Monday and the date of Sunday. The following function is often used at this time:
Next_day (STRING start_date, STRING day_of_week)
-- returns the date of the next week corresponding to the current date
-- 2020-08-05 is Wednesday
SELECT next_day ('2020-08-05)-the corresponding date for next Monday: 2020-08-10
SELECT next_day ('2020-08-05)-corresponding date for next Tuesday: 2020-08-11
SELECT next_day ('2020-08-05)-corresponding date for next Wednesday: 2020-08-12
SELECT next_day ('2020-08-05)-- the corresponding date for next Thursday: 2020-08-06, that is, this Thursday
SELECT next_day ('2020-08-05)-- the corresponding date for next Friday: 2020-08-07, that is, this Friday
SELECT next_day ('2020-08-05)-- the corresponding date for the next Saturday: 2020-08-08, that is, this Saturday
SELECT next_day ('2020-08-05)-- the corresponding date for the next Sunday: 2020-08-09, that is, this Sunday
English from Monday to Sunday (Monday,Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday)
Use
So how do I get the date corresponding to Monday of the week in which the current date is located? You only need to get the date corresponding to next Monday of the current date, and then subtract 7 days to get:
SELECT date_add (next_day ('2020-08-05),-7)
Similarly, to obtain the date corresponding to the Sunday of the week in which the current date is located, you only need to get the date corresponding to the next Monday of the current date, and then subtract 1 day to obtain:
Select date_add (next_day ('2020-08-05),-1)
Extraction grammar for the month of 2020-08-09
As for how to extract a month from a single date, the LAST_DAY function can change the date of each month into the last day of the month (28th, 29th, 30th or 31st), as follows:
Last_day (STRING date)
Use SELECT last_day ('2020-08-05');-- 2020-08-31
In addition to the above, you can also use the date_format function, such as:
SELECT date_format ('2020-08-05mm')
-range of dates 2020-08
Window of the month: applications using add_months plus trunc ()
-returns the corresponding date after the addition and subtraction of the month
-2020-07-05
Select add_months ('2020-08-05,-1)
-- returns the beginning of the month of the current date
-2020-08-01
Select trunc ("2020-08-05", 'MM')
As can be seen from the above example, if you simply use add_months, minus N months, you can just get the data of an integer month, but if you add the trunc () function, the value will start on the first day of the previous N months.
-- Select all data from 2020-07-05 to 2020-08-05
BETWEEN add_months ('2020-08-05,-1) AND' 2020-08-05'
-- Select all data between 2020-07-01 and 2020-08-05
BETWEEN add_months (trunc ("2020-08-05", 'MM'),-1) AND' 2020-08-05 'second: temporary table and Common Table _ Expression (WITH)
These two methods are often used in daily work. For some complex computing tasks, in order to avoid too much JOIN, some of the data that need to be extracted is usually extracted in the form of temporary table or CTE before the main query block.
The practice of temporary watches:
CREATE TEMPORARY TABLE table_1 AS
SELECT
Columns
FROM table A
CREATE TEMPORARY table_2 AS
SELECT
Columns
FROM table B
SELECT
Table_1.columns
Table_2.columns
C.columns
FROM table C JOIN table_1
JOIN table_2
What CTE does:
Note that this syntax is supported by Hive and Impala, but not by lower versions of MySQL (higher versions)
WITH employee_by_title_count AS (
SELECT
T.name as job_title
, COUNT (e.id) as amount_of_employees
FROM employees e
JOIN job_titles t on e.job_title_id = t.id
GROUP BY 1
),
Salaries_by_title AS (
SELECT
Name as job_title
, salary
FROM job_titles
)
SELECT *
FROM employee_by_title_count e
JOIN salaries_by_title s ON s.job_title = e.job_title
You can see that the usage of TEMP TABLE and CTE WITH is actually very similar, and the purpose is to make your Query more clear and elegant. Many people are used to writing all the Query in a single block, using too much JOIN or SUBQUERY, resulting in the loss of logic and not knowing where to write. It is definitely a bonus to use TEMP TABLE and CTE as assistance at the right time.
Third: the combined use of Aggregation and CASE WHEN
Combining Aggregation function (SUM/COUNT/COUNT DISTINCT/MIN/MAX) with CASE WHEN is the most powerful and interesting way to use it. This use creates an effect similar to that of SUMIF/COUNTIF in EXCEL, and a lot of efficient analysis can be done in this way.
Table Name: orderColumn: register_date, order_date, user_id, country, order_sales, order_id data preparation CREATE TABLE order (
Register_date string
Order_date string
User_id string
Country string
Order_sales decimal (10Phone2)
Order_id string)
INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-09", "001",'c0-06-07 210, "o1")
INSERT INTO TABLE order VALUES ("2020-06-08", "2020-06-09", "002", 'c1fujimen 220, "O2")
INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-10", "003", 'c21409230, "o3")
INSERT INTO TABLE order VALUES ("2020-06-09", "2020-06-10", "004", "c3", "O4")
INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-20", "005", 'c4century, 300, "o5")
INSERT INTO TABLE order VALUES ("2020-06-10", "2020-06-23", "006", 'c5century Magne400, "O6")
INSERT INTO TABLE order VALUES ("2020-06-07", "2020-06-19", "007",'c6-06-07 600, "O7")
INSERT INTO TABLE order VALUES ("2020-06-12", "2020-06-18", "008", "c7million" 700, "o8")
"INSERT INTO TABLE order VALUES" ("2020-06-07", "2020-06-09", "009",'c8-06-07 100, "o9")
INSERT INTO TABLE order VALUES ("2020-06-15", "2020-06-18", "0010",'c9 June 15 200, "O10")
INSERT INTO TABLE order VALUES ("2020-06-15", "2020-06-19", "0011", 'c10 magic weight 250, "O11")
INSERT INTO TABLE order VALUES ("2020-06-12", "2020-06-29", "0012", "c11", "o12")
INSERT INTO TABLE order VALUES ("2020-06-16", "2020-06-19", "0013", "c12-6-16" 230, "o13")
INSERT INTO TABLE order VALUES ("2020-06-17", "2020-06-20", "0014", 'c13hammer 290, "o14")
INSERT INTO TABLE order VALUES ("2020-06-20", "2020-06-29", "0015", 'c14percent, 203, "o15"); CASE WHEN time, for retention / utilization analysis-allows multiple columns to be deduplicated
Set hive.groupby.skewindata = false
-allows grouping or sorting using location numbers
Set hive.groupby.orderby.position.alias = true
SELECT
Date_add (Next_day (register_date, 'MO'),-1) AS week_end
COUNT (DISTINCT CASE WHEN order_date BETWEEN register_date AND date_add (register_date,6) THEN user_id END) AS first_week_order
COUNT (DISTINCT CASE WHEN order_date BETWEEN date_add (register_date, 7) AND date_add (register_date,13) THEN user_id END) AS sencod_week_order
COUNT (DISTINCT CASE WHEN order_date BETWEEN date_add (register_date, 14) AND date_add (register_date,20) THEN user_id END) as third_week_order
FROM order
GROUP BY 1
The above example shows whether the user creates an order after registering. For example, how many order users are placed in the first week, the second week and the third week after registration, so that the usage and retention of users can be analyzed.
Note: for the above usage, you need to configure two parameters:
Hive.groupby.skewindata = false: multiple columns are allowed to remove duplicates, otherwise an error is reported:
SemanticException [Error 10022]: DISTINCT on different columns not supported with skew in data
Hive.groupby.orderby.position.alias = true: grouping or sorting using location numbers is allowed, otherwise an error is reported:
SemanticException [Error 10025]: line 79:13 Expression not in GROUP BY key''MO''
CASE WHEN time to analyze the amount of money consumed by each user SELECT
User_id
SUM (CASE WHEN order_date BETWEEN register_date AND date_add (register_date,6) THEN order_sales END) AS first_week_amount
SUM (CASE WHEN order_date BETWEEN date_add (register_date, 7) AND date_add (register_date,13) THEN order_sales END) AS second_week_amount
FROM order
GROUP BY 1
By filtering out the date of registration and consumption, and counting the amount of consumption, each user in each period of time (the first week, the second week after registration … And so on) the amount of consumption, you can observe whether users continue to maintain consumption habits or consumption amount becomes lower and so on.
Quantitative analysis of the amount of CASE WHEN that exceeds a certain quota SELECT
User_id
COUNT (DISTINCT CASE WHEN order_sales > = 100THEN order_id END) AS count_of_order_greateer_than_100
FROM order
GROUP BY 1
The above example is similar to the use of countif, for each user, count the number of orders whose order amount is greater than a certain value, and analyze to filter out high-value customers.
The number of CASE WHEN, plus the use of time SELECT
User_id
MIN (CASE WHEN order_sales > 100 THEN order_date END) AS first_order_date_over1000
MAX (CASE WHEN order_sales > 100 THEN order_date END) AS recent_order_date_over100
FROM order
GROUP BY 1
CASE WHEN plus MIN/MAX time, you can get the date of the user's first purchase of more than a certain amount of order, and the date of the last purchase of more than a certain amount of order.
Fourth: other uses of Window Function
Window Function is not only a function often used at work, but also a question that is often asked during an interview. A common usage scenario is grouping and fetching topN. Another usage introduced in this article is to use the windowing function for user access session analysis.
Session refers to a series of interactions that take place on a website within a specified period of time. For example, a session can include multiple web browsing, events, social interactions, and e-commerce transactions. Session is the equivalent of a container that contains the actions that users perform on the site.
Session has an expiration time, such as 30 minutes, that is, more than 30 minutes of inactivity, the session will become obsolete.
Suppose Zhang San visits the website and starts timing from the moment he arrives at the website. If 30 minutes have passed and Zhang San still does not engage in any form of interaction, it will be regarded as the end of this session. However, as long as Zhang San interacts with an element (such as an event, a social interaction, or a new web page), he will add another 30 minutes to that interaction to reset the expiration time.
Data preparation Table Name: user_visit_actionColumns: user_id, session_id, page_url, action_timeCREATE TABLE user_visit_action (
User_id string
Session_id string
Page_url string
Action_time string)
INSERT INTO TABLE user_visit_action VALUES ("001", "ss001", "http://a.com","2020-08-06 1315 34 11.478")
INSERT INTO TABLE user_visit_action VALUES ("001", "ss001", "http://b.com","2020-08-06 13 ss001 35 11.478")
INSERT INTO TABLE user_visit_action VALUES ("001", "ss001", "http://c.com","2020-08-06 1315" 3615 11.478 ")
INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://a.com","2020-08-06 14 ss002 30 11.478")
INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://b.com","2020-08-06 1415 3115 11.478")
INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://e.com","2020-08-06 14pur33 11.478")
INSERT INTO TABLE user_visit_action VALUES ("001", "ss002", "http://f.com","2020-08-06 14 ss002 35 11.478")
INSERT INTO TABLE user_visit_action VALUES ("002", "ss003", "http://u.com","2020-08-06 1834ghut 11.478")
INSERT INTO TABLE user_visit_action VALUES ("002", "ss003", "http://k.com","2020-08-06 1815" 3815 11.478 ")
Users access session Analysi
The sample data sheet, as above, has links and times for users, visits, and pages. The following uses partition by to express the browsing behavior of each user between different visits.
SELECT
User_id
Session_id
Page_url
DENSE_RANK () OVER (PARTITION BY user_id, session_id ORDER BY action_time ASC) AS page_order
MIN (action_time) OVER (PARTITION BY user_id, session_id) AS session_start_time
MAX (action_time) OVER (PARTITION BY user_id, session_id) AS session_finisht_time
FROM user_visit_action
The above query returns for each user, each visit, the priority of browsing the page, and the start and end time of the session, based on which the result can be stored in TEMP TABLE or CTE for further analysis.
At this point, the study on "how to use the date and period of SQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.