Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the common functions of Hive

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what common functions Hive has". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "what common functions Hive has"!

1. HQL judges empty functions

https://www.w3school.com.cn/sql/func_date_format.asp

2. time transfer function

year(string date): Returns the date or timestamp of the year part String: year("1970-01-01 00:00:00") = 1970, year("1970-01-01") = 1970

https://blog.csdn.net/wzzfeitian/article/details/55097563

https://www.yiibai.com/hive/hive_built_in_functions.html

3. The group by function requires multiple grouping.

Data type conversion: blog.csdn.net/qq_31573519/article/details/100139218

4. hive Get the first and last day of the previous month//Get the first day of the previous month date ('Y-m-01', strtotime ('-1 month')); //Get date of last month ('Y-m-t',strtotime('-1 month')); SELECT--concat on the first day of last month (SUBSTR(DATE_SUB(FROM_UNIXTIME(UNIX_TIMESTAMP()),DAY (FROM_UNIXTIME(UNIX_TIMESTAMP())),1,7),'-01'),--Last day of last month DATE_SUB (FROM_UNIXTIME(UNIX_TIMESTAMP()),DAY (FROM_UNIXTIME(UNIX_TIMESTAMP()))select--first day of last month trunc (add_months(CURRENT_TIMESTAMP,-1),'MM '),--first day of last month concat (substr(add_months(from_unixtime(unix_timestamp(),'y-MM-dd'),-1),1,7),'-01'),--last day of last month date_sub(trunc(CURRENT_TIMESTAMP,'MM'),1);5. SELECT TRUNC(sysdate(0),'MM ') ;6. Select TRUNC(add_months(CURRENT_TIMESTAMP, -2),'MM ');7. cast(column_name as decimal(10,2)) cast function rounding (recommended) select cast(68.6666666668 as decimal(10,2));_c068.67select cast(68.6666666668 as decimal(10,3));_c068.6678. ROUND((SUM(delivered_num) / SUM(plan_num) * 100),2)cast(column_name as decimal(10,2)) cast function rounded (recommended) CAST((SUM(delivered_num) / SUM(plan_num) * 100) AS DECIMAL(10,2))9. Select CAST( CAST ('913 ' AS DOUBLE) / CAST ('1000' AS DOUBLE) * cast ('100 ' as DOUBLE) as DECIMAL(10,2)); convert to double, multiply and divide, and then convert to decimal==> 91.3 Column aliases must use "double quotes"10. if statement

if null statement

Function: 1. coalesce( value1,value2,… )2. if( value1 is null, value2, value1)IF( Test Condition, True Value, False Value ) Examples: hive> select coalesce(col1, col2, cols) as res1, if(col1 is null, if(col2 is null, col3, col2), col1) as res2 > from( > select 1 as col1, 2 as col2, 3 as col3 > union all > select null as col1, 2 as col2, 3 as col3 > union all > select null as col1, null as col2, 3 as col3 > union all > select null as col1, null as col2, null as col3 > ) as test results: res1 res2 1 1 2 2 3 3 null null11. One-to-many table association de-duplication statistics

Background: Suppose we are an e-commerce website, using users to place orders, orders have source data, each order will have multiple order details, order details record commodity related information, and commodity delivery, delivery, receipt time are different.

Demand: To calculate data related to order dimensions, you need to calculate the quantity of issue documents, shipment documents and receipt documents under different order sources.

Analysis: Order and order details are one-to-many. An order has multiple details. Goods in each detail have their own issue, delivery, and receipt times. Goods may not be shipped, delivered, or received in time, resulting in blanks. To count these quantities according to order dimensions, you only need to determine that among the multiple details corresponding to the order, only issue and delivery are available. The time of receipt can be collected, and then summarized.

Achieve:

First, count the basic data, associate the order with the order details, query the order ID, order source, and judge whether to issue, ship, or receive according to whether the issue time, delivery time, and receipt time exist.

Grouping according to the order source and order id, and then using the single-line function max to extract whether to issue, ship, and receive values under the specific id grouping;

Finally, the data checked in the previous step is counted, and then grouped according to the source of the order. The single line function sum is used to calculate the total order quantity, issue quantity, delivery quantity and receipt quantity.

At this time, you need to use group by to group. At the same time, after grouping, you need to perform a single-line function on the select column for the ungrouped fields, such as max, min, sum, etc.

with base_data as ( select source, --order source max(if_out) if_out max(if_send) if_send max(if_received) if_received from ( select o.id, --Order id o.source, --source ( case when od.out_time is not null then 1 else 0 end ) as if_out, --whether to issue ( case when od.send_time is not null then 1 else 0 end ) as if_send, --whether to send ( case when od.received_time is not null then 1 else 0 end ) as if_received --whether received from order o left join order_detail od on o.id = od.order_id where o.create_time > '2021-04-01' ) a group by source,id)select source, --order source sum(1), --Order quantity sum(if_out), --Total quantity of issued documents sum(if_send), --Total shipment quantity sum(if_received) --Total number of orders received from base_datagroup by source

In fact, an order contains multiple commodity data, and the three different times of each commodity are different. If you want to calculate the three times according to the whole order, you only need to take the data of multiple commodities corresponding to the order and take their minimum or maximum time for grouping statistics.

Note that distinct is only used to remove duplication for the whole row of data. If the query result has multiple columns, such as user_id, name, age, order_id, etc., and the user_id, name, and age in the query data are all the same, but there are multiple order_ids, then distinct will only remove the row data with the same four fields.

At this point, I believe everyone has a deeper understanding of "what common functions Hive has", so let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report