Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to packet Top N by Hive

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you how to group Hive to take Top N, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Hive packet fetching Top N

Hive has added row_number, rank and dense_rank analysis functions since version 0.11.0, which can query the top value after grouping and sorting.

Usage rules:

Row_number () over ([partition col1] [order by col2])

Rank () over ([partition col1] [order by col2])

Dense_rank () over ([partition col1] [order by col2])

They are all grouped according to the col1 field, and then sort the col2 field to generate a line number for each row after sorting, which is incremented from 1

Both col1 and col2 can be multiple fields separated by','.

Difference

1) row_number: the line number keeps increasing regardless of whether the value of the col2 field is equal or not. For example, there are two records whose values are equal, but one is the first and the other is the second.

2) rank: when the col2 of the two records are equal, the line number of the record is the same, but the line number of the next col2 value is incremented by N (N is the number of repetitions). For example, there are two lines juxtaposing the first, the next is the third, and there is no second.

3) dense_rank: when the col2 of the upper and lower records is equal, the line number of the next col2 value increases by 1. For example, there are two items juxtaposing the first, and the next is the second.

Row_number can realize paging query.

Actual operation

Create a tabl

Create table t (name string, sub string, score int) row format delimited fields terminated by

The data is in the attached a.txt.

A chinese 98a english 90d chinese 88c english 82c math 98b math 89b chinese 79z english 90z math 89z chinese 80e math 99e english 87d english 90 load data load data local inpath'/ home/hadoop/hive-example/a.txt' into table tb4

Grouping sorting

-- row_numberselect *, row_number () over (partition by sub order by score) as od from t;-- rankselect *, rank () over (partition by sub order by score) as od from t;-- dense_ranselect *, dense_rank () over (partition by sub order by score desc) from t; Business examples-- Statistics of the top three select * from (select *, row_number () over (partition by sub order by score desc) as od from t) t where od in each discipline

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report