In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces how to use order by,distribute by,sort by,cluster by in hive. It is very detailed and has a certain reference value. Friends who are interested must finish it!
Instructions for using order by,distribute by,sort by,cluster by query
Sort meteorological data by year and temperature to ensure that all rows with the same year end up in one reducer partition / / one reduce (massive data, very slow) select year, temperatureorder by year asc, temperature desclimit 100; / multiple reduce (massive data, fast) select year, temperature distribute by year sort by year asc, temperature desclimit 100
Order by (global sort)
Order by will globally sort the input, so there is only one reducer (multiple reducer cannot guarantee global order)
With only one reducer, it takes a long time to calculate when the input size is large.
In hive.mapred.mode=strict mode, it is mandatory to add limit restrictions to reduce the size of reducer data
For example, when limit 100 is limited, if the number of map is 50, the input size of reducer is 100 to 50
Distribute by (similar to split buckets)
The data is divided into different output reduce files according to the fields specified by distribute by.
Sort by (similar to in-bucket sorting)
Sort by is not a global sort, it sorts the data before it enters the reducer.
Therefore, if you sort with sort by and set mapred.reduce.tasks > 1, sort by only guarantees that the output of each reducer is ordered, not globally.
Cluster by
Cluster by not only has the function of distribute by but also has the function of sort by.
However, sorting can only be in reverse order, and the collation cannot be specified as asc or desc.
Therefore, it is often thought that cluster by = distribute by + sort by
The above is all the contents of the article "how to use order by,distribute by,sort by,cluster by in hive". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.