In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
In this issue, the editor will bring you about the principles and skills of Hive. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.
Hive Order/Sort/Distribute/Cluster By:
Order By: all data is sorted in one reducer. In order to prevent slow sorting caused by too much data, hive is set to strict mode (i.e. hive.mapred.mode=strict) by default, and the query statement must be followed by a limit condition, unless hive.mapred.mode is set to nonstrict (please set it carefully when the amount of data is large).
Sort By: sorts the data before sending it to reducer, numeric if the column is numeric, and word order if the column is a string. The effect of this sort is that the results from each reducer are ordered, but the overall order is not necessarily ordered (that is, the final result is piecewise ordered).
Distribute By: data with the same column value is sent to the same reducer, but there is no guarantee that a column value is a reducer, and there is no guarantee that the data will be sorted when sent to reducer. For example, if the values of the five columns are sent to two reducer, then reducer1 will get x1memex2 and reducer2 will get x4Personx3, and the data in each reducer will not be sorted.
Cluster By: this sort is the same as Distribute By plus Sort By. For example, if you use Cluster By for the above data, reducer1 will get x1memex1memex2reducer2 and x3memx4. However, Cluster By can only specify the distribution and sorting of the same field, but the combination of Distribute By and Sort By can specify different column values for distribution and sorting.
Add: this paragraph in the official document is not quite understood, who can explain: Note: It may be confusing as to the difference between SORT BY alone of a single column and CLUSTER BY. The difference is that CLUSTER BY partitions by the field and SORT BY if there are multiple reducers partitions randomly in order to distribute data (and load) uniformly across the reducers.
Detailed reference: Hive LanguageManual
Note: if SELECT DISTINCT xxx FROM table SORT BY yyy, an error Invalid table alias or column reference 'xxx': (possible column names are: _ col0) will be reported, because the keywords used by DISTINCT will be sorted in ascending order by default after using the keyword DISTINCT (you can use ORDER BY or SORT BY to change the rules), and when using DISTINCT, the sorted fields must be followed by the SELECT.
Hive conversion:
Hive supports implicit type conversion, such as implicit conversion of types when necessary, such as event_day to string type, while 20140703 of int type is converted to string type when encountered with where event_day=20140703 statements.
Or use a cast, such as cast (event_day as double) = 20140703.0
Detailed steps to official documentation: Allowed Implicit Conversions, or Hive data type conversion
Understanding of the table structure / partition of Hive:
First, describe a phenomenon:
CREATE EXTERNAL TABLE IF NOT EXISTS test (name STRING, age INT) PARTITIONED BY (date INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY'\ t cause TABLE test ADD PARTITION (date=20140710) LOCATION'/ test/20140710/';LOAD DATA LOCAL INPATH "/ home/work/20140710.txt" INTO TABLE test PARTITION (date=20140710); # 20140710.txt content: # Baron 999999999 (Note: out of int range)
A: as mentioned above, the value of the age field in the content obtained by "SELECT * FROM test;" is the same as the previous effect of re-querying NULL; after "ALTER TABLE test CHANGE COLUMN age age STRING AFTER name;", but after "ALTER TABLE test DROP IF EXISTS PARTITION (date=20140710);" and re-adding PARTITION, you can query the age value.
B: conversely, if age is STRING when defining the table, the query can get the age value, but if you change the age field type to int, you can't query it, and after changing it to STRING, you can look up the data again (no PARTITION update operation is designed in this process).
After many turnover tests, we know such a phenomenon (or principle): the PARTITION of Hive is only affected by the table structure properties defined before it, and the table attribute updates that occur after the PARTITION is created will not affect the metadata saved by PARTITION at the beginning of creation, which holds some attributes of the previously defined table.
Therefore, the fields saved by metadata in A's PARTITION are of type INT, and even if the table field is updated to STRING, the properties that already exist in PARTITION will not be updated. The partition metadata in B saves the STRING type, which cannot be found after changing to INT (cannot be transformed), and can still be found after changing to STRING type, and the metadata in the partition remains unchanged. (this paragraph is a bit untenable. I wonder if the metadata and HIVE's current table structure work together on the query, resulting in intersection constraints; can any friends explain this? )
These are the principles and skills of Hive shared by the editor. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.