Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Map JOIN in Hive

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you the example analysis of Map JOIN in Hive. I believe most people don't know much about it, so share this article for your reference. I hope you can learn a lot after reading this article. Let's learn about it together.

Map-side JOIN

Join on the map side is suitable for loading small tables into memory when a table is very small (it can be stored in memory). Hive supports automatic conversion to map-side join starting from 0.7. The configuration is as follows:

SET hive.auto.convert.join=true;-default true after hivev0.11.0

SET hive.mapjoin.smalltable.filesize=600000000;-the default is 25m

SET hive.auto.convert.join.noconditionaltask=true;-default true, so you do not need to specify map join hint

SET hive.auto.convert.join.noconditionaltask.size=10000000;-controls the size of tables loaded into memory

Once the join configuration on the map side is enabled, Hive automatically checks whether the small table is larger than the size configured by the hive.mapjoin.smalltable.filesize. If it is larger, it becomes a normal join, and if it is less than, it becomes a join on the map side.

The principle of map-side join is shown in the following figure:

First, Task A (task executed locally by the client) is responsible for reading small table a, converting it into a HashTable data structure, writing it to a local file, and then loading it into the distributed cache.

The Task B task then starts the map task to read the large table b, and in the Map phase, according to the hashtable association between each record and table an in the distributed cache, and outputs the result

Note: there are no reduce tasks on the map side of join, so map directly outputs the results, that is, how many map tasks will produce as many result files.

The above is all the contents of the article "sample Analysis of Map JOIN in Hive". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report