In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "what common functions Hive has". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "what common functions Hive has"!
1. HQL judges empty functions
https://www.w3school.com.cn/sql/func_date_format.asp
2. time transfer function
year(string date): Returns the date or timestamp of the year part String: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970
https://blog.csdn.net/wzzfeitian/article/details/55097563
https://www.yiibai.com/hive/hive_built_in_functions.html
3. The group by function requires multiple grouping.
Data type conversion: blog.csdn.net/qq_31573519/article/details/100139218
4. hive Get the first and last day of the previous month//Get the first day of the previous month date ('Y-m-01', strtotime ('-1 month')); //Get date of last month ('Y-m-t',strtotime('-1 month')); SELECT--concat on the first day of last month (SUBSTR(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()),DAY (FROM_UNIXTIME(UNIX_TIMESTAMP())),1,7),'-01'),--Last day of last month DATE_SUB (FROM_UNIXTIME(UNIX_TIMESTAMP()),DAY (FROM_UNIXTIME(UNIX_TIMESTAMP()))select--first day of last month trunc (add_months(CURRENT_TIMESTAMP,-1),'MM '),--first day of last month concat (substr(add_months(from_unixtime(unix_timestamp(),'y-MM-dd'),-1),1,7),'-01'),--last day of last month date_sub(trunc(CURRENT_TIMESTAMP,'MM'),1);5. SELECT TRUNC(sysdate(0),'MM ') ;6. Select TRUNC(add_months(CURRENT_TIMESTAMP, -2),'MM ');7. cast(column_name as decimal(10,2)) cast function rounding (recommended) select cast(68.6666666668 as decimal(10,2));_c068.67select cast(68.6666666668 as decimal(10,3));_c068.6678. ROUND((SUM(delivered_num) / SUM(plan_num) * 100),2)cast(column_name as decimal(10,2)) cast function rounded (recommended) CAST((SUM(delivered_num) / SUM(plan_num) * 100) AS DECIMAL(10,2))9. Select CAST( CAST ('913 ' AS DOUBLE) / CAST ('1000' AS DOUBLE) * cast ('100 ' as DOUBLE) as DECIMAL(10,2)); convert to double, multiply and divide, and then convert to decimal==> 91.3 Column aliases must use "double quotes"10. if statement
if null statement
Function: 1. coalesce( value1,value2,… )2. if( value1 is null, value2, value1)IF( Test Condition, True Value, False Value ) Examples: hive> select coalesce(col1, col2, cols) as res1, if(col1 is null, if(col2 is null, col3, col2), col1) as res2 > from( > select 1 as col1, 2 as col2, 3 as col3 > union all > select null as col1, 2 as col2, 3 as col3 > union all > select null as col1, null as col2, 3 as col3 > union all > select null as col1, null as col2, null as col3 > ) as test results: res1 res2 1 1 2 2 3 3 null null11. One-to-many table association de-duplication statistics
Background: Suppose we are an e-commerce website, using users to place orders, orders have source data, each order will have multiple order details, order details record commodity related information, and commodity delivery, delivery, receipt time are different.
Demand: To calculate data related to order dimensions, you need to calculate the quantity of issue documents, shipment documents and receipt documents under different order sources.
Analysis: Order and order details are one-to-many. An order has multiple details. Goods in each detail have their own issue, delivery, and receipt times. Goods may not be shipped, delivered, or received in time, resulting in blanks. To count these quantities according to order dimensions, you only need to determine that among the multiple details corresponding to the order, only issue and delivery are available. The time of receipt can be collected, and then summarized.
Achieve:
First, count the basic data, associate the order with the order details, query the order ID, order source, and judge whether to issue, ship, or receive according to whether the issue time, delivery time, and receipt time exist.
Grouping according to the order source and order id, and then using the single-line function max to extract whether to issue, ship, and receive values under the specific id grouping;
Finally, the data checked in the previous step is counted, and then grouped according to the source of the order. The single line function sum is used to calculate the total order quantity, issue quantity, delivery quantity and receipt quantity.
At this time, you need to use group by to group. At the same time, after grouping, you need to perform a single-line function on the select column for the ungrouped fields, such as max, min, sum, etc.
with base_data as ( select source, --order source max(if_out) if_out max(if_send) if_send max(if_received) if_received from ( select o.id, --Order id o.source, --source ( case when od.out_time is not null then 1 else 0 end ) as if_out, --whether to issue ( case when od.send_time is not null then 1 else 0 end ) as if_send, --whether to send ( case when od.received_time is not null then 1 else 0 end ) as if_received --whether received from order o left join order_detail od on o.id = od.order_id where o.create_time > '2021-04-01' ) a group by source,id)select source, --order source sum(1), --Order quantity sum(if_out), --Total quantity of issued documents sum(if_send), --Total shipment quantity sum(if_received) --Total number of orders received from base_datagroup by source
In fact, an order contains multiple commodity data, and the three different times of each commodity are different. If you want to calculate the three times according to the whole order, you only need to take the data of multiple commodities corresponding to the order and take their minimum or maximum time for grouping statistics.
Note that distinct is only used to remove duplication for the whole row of data. If the query result has multiple columns, such as user_id, name, age, order_id, etc., and the user_id, name, and age in the query data are all the same, but there are multiple order_ids, then distinct will only remove the row data with the same four fields.
At this point, I believe everyone has a deeper understanding of "what common functions Hive has", so let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.