Example Analysis of grouped Limit non-UDF Scheme in Hive 07/13 Update SLTechnology News&Howtos

Example Analysis of grouped Limit non-UDF Scheme in Hive

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you the Hive grouping Limit non-UDF program example analysis, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!

Description: id (self-increment), type (aaa, bbb,ccc, ddd), status (ok,error) three fields, each type, filter the record of status='ok' and the smallest id.

Mysql:

Create table having_test (id int (11), type varchar (50), status varchar (50); mysql > select * from having_test +-+ | id | type | status | 1 | aaa | ok | | 2 | aaa | error | | 3 | aaa | ok | 4 | bbb | ok | 5 | ccc | error | 6 | ccc | ok | 7 | ddd | error | +-- -+-+ mysql > select * from having_test where status='ok' group by type having min (id) +-+ | id | type | status | +-+ | 1 | aaa | ok | | 4 | bbb | ok | | 6 | ccc | ok | +-+

It is very simple to implement in mysql, first group and then having, but sql syntax is not fully supported on hive. Will it be so simple on hive? the answer is no.

In HIVE:

Create table tmp_wjk_having_test (id int, type string, status string) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t'; load data local inpath'/ tmp/load.csv' overwrite into table tmp_wjk_having_test; select * from tmp_wjk_having_test 1 aaa ok 2 aaa error 3 aaa ok 4 bbb ok 5 ccc error 6 ccc ok 7 ddd error select * from tmp_wjk_having_test where status='ok' group by type having min (id) FAILED: Error in semantic analysis: Line 1:73 Expression not in GROUP BY key 'id' # hive does not support this type of writing. Or subquery select * from tmp_wjk_having_test T1 join (select min (id) id from tmp_wjk_having_test where status='ok' group by type) T2 on t1.id=t2.id; 1 aaa ok 14 bbb ok 46 ccc ok 6

The subquery has no effect on the small dataset, but the best thing to apply to big data is to pass only one table and then come up with the results. So I'm still thinking about a new plan.

Select *, min (id) ii from tmp_wjk_having_test where status='ok' group by type; aaa 1 1bbb 4 4ccc 6 6

This kind of plan is feasible. Problem points:

1. Why does the condition of min (id) clearly write about the non-where in select, but it does play the role of screening?

two。 Why do you get 3 columns (type, id, min (id)) when it is obviously select *, min (id), and only 2 columns (type, min (id)) if you write it as select type (id)?

Select type,min (id) ii from tmp_wjk_having_test where status='ok' group by type;aaa 1bbb 4ccc 6

+ Update on November 11, 2014

3. General feasible scheme:

2 columns: select type,min (id) ii from tmp_wjk_having_test where status='ok' group by type; multiple columns: select T1. * from having_test T1 join (select name,min (age) mm from having_test group by name) T2 on t1.name = t2.name and t1.age=t2.mm. The above is all the contents of this article entitled "sample Analysis of grouped Limit non-UDF schemes in Hive". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.