Lateral view in hive combined with the use of udtf function to solve production problems 04/30 Update SLTechnology News&Howtos

Lateral view in hive combined with the use of udtf function to solve production problems

2025-04-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

= create table psn (id int, name string, likes array, address map) partitioned by (age int) row format delimitedfields terminated by'\ t'collection items terminated by'- 'map keys terminated by': 'lines terminated by'\ n; = hive > load data local inpath'/ root/a.txt' overwrite into table psn partition (age=10); Loading data to table default.psn partition (age=10) OKTime taken: 3.817 seconds====hive > select * from psn OK1 zhang3 ["sing", "tennis", "running"] {"beijing": "daxing"} 102 li4 ["sing", "pingpong", "swim"] {"shanghai": "baoshan"} 103 wang5 ["read", "joke", "football"] {"guangzou": "baiyun"} 10 hobbies = demand: count how many times each hobby has appeared at once, how many times each city has appeared, how many times each district has appeared. Analysis: this requirement is a bit like hive implementing a wordcount case, or it is an aggregation of two wc cases, except that this one does not use split. In the wc case, we solved a list of recorded wc operations perfectly using explode. However, in the udtf function (split/explode) in hive, there can be only one udtf function in the select clause, and the udtf function cannot be used with other fields and functions. # can only select explode (..) From emp; # cannot select explode (..), explode (..) From emp; # cannot select id,explode (..) From emp; this will cause problems that cannot be handled for some complex logic, such as the wc operation of the above two-column records. At this point, you need to use lateral view, which can organize the multi-row results produced by the udtf function into a virtual table. = hive > select count (distinct C1), count (distinct c2), count (distinct c3) from psn > lateral view explode (likes) T1 as C1 > lateral view explode (address) T2 as c2PowerT1 and T2 are the table names of the virtual table generated by the udtf function, c1/c2/c3 is the field alias # the array produces one column of data through explode, and the map collection produces two columns. Hadoop job information for Stage-1: number of mappers: 1 Number of reducers: 12019-04-24 22 reduce 59 Stage-1 map = 0%, reduce = 0% 2019-04-24 22 22 Stage-1 map 25681 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.76 sec2019-04-24 22 Stage-1 map 59 36268 Stage-1 map = 100%, reduce = 100% Cumulative CPU 4.15 secMapReduce Total cumulative CPU time: 4 seconds 150 msecEnded Job = job_1556088929464_0004MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.15 sec HDFS Read: 14429 HDFS Write: 14429 SUCCESSTotal MapReduce CPU Time Spent: 4 seconds 150 msecOK8 3 3Time taken: 35.986 seconds, Fetched: 1 row (s) =

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.