In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains the "Hive join underlying mapreduce is how to achieve", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Hive join underlying mapreduce is how to achieve" it!
Common Join
If hive.auto.convert.join=true is not enabled or the MapJoin condition is not met, the Hive parser converts the Join operation to Common Join and completes the join during the Reduce phase. And the whole process includes Map, Shuffle, Reduce stages.
1Map stage
When reading the data of the table, Map outputs as key in the Join on condition. If Join has multiple associated keys, the combination of these associated keys is used as key.
The output value of Map is a column that needs to be output or as a condition after join; at the same time, the Tag information of the table is also included in the value, which is used to indicate the table corresponding to the value; sort by key
2Shuffle stage
Take the hash value according to key and distribute the key/value to different reduce according to the hash value
3Reduce stage
The join operation is completed according to the value of key, and the data in different tables is identified by Tag. Throw away the table number during the merge process
4 examples
Drop table if exists wedw_dwd.user_info_df; CREATE TABLE wedw_dwd.user_info_df (user_id string COMMENT 'user id', user_name string COMMENT' user name') row format delimited fields terminated by'\ t 'STORED AS textfile +-- + | user_id | user_name | +-+ | 1 | Xiao Hong | | 2 | Xiao Ming | | 3 | floret | +-+
Drop table if exists wedw_dwd.order_info_df; CREATE TABLE wedw_dwd.order_info_df (user_id string COMMENT 'user id', course_name string COMMENT' course name') row format delimited fields terminated by'\ t 'STORED AS textfile +-- + | user_id | course_name | +-+ | 1 | spark | | 2 | flink | | 3 | java | +-+
Select t1.userkeeper dwd.userroominfogramdf t1join wedw_dwd.order_info_df t2on t1.user_id = t2.user_id +-- + | user_id | user_name | course_name | +-+ | 1 | Xiao Hong | spark | | 2 | small Ming | flink | | 3 | floret | java | +-+
Illustration: (throw away the table number during the merge)
Thank you for your reading, the above is the content of "how the underlying mapreduce of Hive join is realized". After the study of this article, I believe you have a deeper understanding of how the underlying mapreduce of Hive join is realized, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.