In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
1. Order by
The order by in Hive has the same function as order by in the traditional SQL language, and it will sort the query results globally, so only hive's sql has order by and all data will be processed in the same reducer (no matter how many map, no matter how many block files, only one reducer will be launched). But for a large amount of data, it will take a long time to execute.
This is also a little different from the traditional sql: if hive.mapred.mode=strict is specified (the default is nonstrict), you must specify limit to limit the number of output entries, because all data will be carried out on the same reducer side, and the result may not be obtained in the case of a large amount of data, so in such a strict mode, you must specify the number of output entries.
2. Sort by
If sort by is specified in Hive, sorting will be done on each reducer, that is, local ordering is guaranteed (the data from each reducer is ordered, but there is no guarantee that all data is ordered, unless there is only one reducer). The advantage is that after performing local sorting, it can improve the efficiency of the next global sorting (in fact, a merge sort can be done to achieve global sorting).
3. Distribute by and sort by are used together
Ditribute by controls how the output of map is divided in reducer. For example, we have a table, mid refers to the merchant to which the store belongs, money is the profit of the merchant, and name is the name of the store.
Store:
MidmoneynameAA15.0 Store 1AA20.0 Store 2BB22.0 Store 3CC44.0 Store 4
Execute the hive statement:
Select mid, money, name from store distribute by mid sort by mid asc, money asc
All of our same mid data will be sent to the same reducer for processing, because distribute by mid is specified so that we can count the ranking of the profits of each store in each merchant (this must be globally ordered, because the same merchant will be processed in the same reducer). It is important to note that distribute by must be written before sort by.
4. Cluster by
The function of cluster by is to combine distribute by with sort by. The following two statements are equivalent:
Select mid, money, name from store cluster by midselect mid, money, name from store distribute by mid sort by mid
If you need to get the same effect as the statement in 3:
Select mid, money, name from store cluster by mid sort by money
Note that columns specified by cluster by can only be in descending order, not asc and desc.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.