How to implement the join underlying mapreduce of Hive 04/27 Update SLTechnology News&Howtos

How to implement the join underlying mapreduce of Hive

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This article mainly explains the "Hive join underlying mapreduce is how to achieve", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "Hive join underlying mapreduce is how to achieve" it!

Common Join

If hive.auto.convert.join=true is not enabled or the MapJoin condition is not met, the Hive parser converts the Join operation to Common Join and completes the join during the Reduce phase. And the whole process includes Map, Shuffle, Reduce stages.

1Map stage

When reading the data of the table, Map outputs as key in the Join on condition. If Join has multiple associated keys, the combination of these associated keys is used as key.

The output value of Map is a column that needs to be output or as a condition after join; at the same time, the Tag information of the table is also included in the value, which is used to indicate the table corresponding to the value; sort by key

2Shuffle stage

Take the hash value according to key and distribute the key/value to different reduce according to the hash value

3Reduce stage

The join operation is completed according to the value of key, and the data in different tables is identified by Tag. Throw away the table number during the merge process

4 examples

Drop table if exists wedw_dwd.user_info_df; CREATE TABLE wedw_dwd.user_info_df (user_id string COMMENT 'user id', user_name string COMMENT' user name') row format delimited fields terminated by'\ t 'STORED AS textfile +-- + | user_id | user_name | +-+ | 1 | Xiao Hong | | 2 | Xiao Ming | | 3 | floret | +-+

Drop table if exists wedw_dwd.order_info_df; CREATE TABLE wedw_dwd.order_info_df (user_id string COMMENT 'user id', course_name string COMMENT' course name') row format delimited fields terminated by'\ t 'STORED AS textfile +-- + | user_id | course_name | +-+ | 1 | spark | | 2 | flink | | 3 | java | +-+

Illustration: (throw away the table number during the merge)

Thank you for your reading, the above is the content of "how the underlying mapreduce of Hive join is realized". After the study of this article, I believe you have a deeper understanding of how the underlying mapreduce of Hive join is realized, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.