In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "how to sort in Hive", the content is easy to understand, clear, hope to help you solve doubts, the following let the editor lead you to study and learn "how to sort in Hive" this article.
1. Global sorting: order by
The order by clause appears at the end of the select statement; the order by clause sorts the final result; by default, ascending order (ASC) is used; DESC can be used, followed by the field name to indicate descending order
ORDER BY performs global sorting, with only one reduce
-- sort by alias
Select empno, ename, job, mgr, sal + nvl (comm, 0) salcomm, deptno from emp order by salcomm desc
-- Multi-column sorting
Select empno, ename, job, mgr, sal + nvl (comm, 0) salcomm, deptno from emp order by deptno, salcomm desc
2. Internal sorting of each MR: sort by
Order by is inefficient for large-scale data; in many business scenarios, we do not need globally ordered data, so we can use sort by;sort by to generate a sort file for each reduce, sort within reduce, and get locally ordered results.
-- set the number of reduce
Set mapreduce.job.reduces=2;-View employee information select * from emp sort by sal desc in descending order of salary
-- Import the query results into the file (in descending order of salary). Generate two output files, each with internal data arranged in descending order of salary
Insert overwrite local directory'/ home/hadoop/output/sortsal' select * from emp sort by sal desc
3. Partition sorting: distribute by
Distribute by sends specific rows to a specific reducer to facilitate subsequent aggregation and sorting operations; distribute by is similar to the partition operation in MR, which can be combined with sort by operation to make the partition data orderly; distribute by should be written before sort by
-- divide the data into three regions, each with data
Set mapreduce.job.reduces=3
Insert overwrite local directory'/ home/hadoop/output/distBy1' select empno, ename, job, deptno, sal + nvl (comm, 0) salcomm from emp distribute by deptno sort by salcomm desc
4 、 cluster by
When distribute by and sort by are the same field, you can use cluster by to simplify syntax; cluster by can only be in ascending order and cannot specify collation;-- syntactically equivalent
Select * from emp distribute by deptno sort by deptno; select * from emp cluster by deptno
Sort summary:
Order by . It is inefficient to perform global sorting. Use it cautiously in production environment
Sort by . Make the data locally ordered (within the reduce)
Distribute by . Grouping data according to specified conditions, often in conjunction with sort by, to make the data locally ordered cluster by.
When distribute by and sort by are the same field, you can use cluster by to simplify syntax
The above is all the contents of the article "how to sort in Hive". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.