Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use mapjoin in hive

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to use mapjoin in hive, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand it.

A problem with hive is encountered, such as hive sql:

Select f. A. F. B from A t join B f on (f.a=t.an and f.ftime=20110802)

In this statement, table B has 3 billion rows of records, table A has only 100 rows of records, and the data tilt in table B is particularly serious. There are 1.5 billion rows of records on a key, which is particularly slow in the process of running, and errors are reported when there is not enough memory in the process of reduece.

To solve this problem for users, consider the principle of using mapjoin,mapjoin:

MapJoin simply means reading small tables into memory in the Map phase and sequentially scanning large tables to complete Join.

The above diagram is the schematic diagram of Hive MapJoin, from an article by Facebook engineer Liyin Tang introducing Join optimization slice. From the diagram, you can see that MapJoin is divided into two phases:

Through MapReduce Local Task, read the small table into memory, generate HashTableFiles and upload it to Distributed Cache. Here, the HashTableFiles will be compressed.

MapReduce Job in the Map phase, each Mapper reads the HashTableFiles from Distributed Cache to memory, scans the large tables sequentially, Join directly in the Map phase, and passes the data to the next MapReduce task.

MAPJION will read all the small tables into memory, and directly match the data of another table with the data of the table in memory in the map phase. Because the join operation is carried out in map, the efficiency of running reduce is saved and will be much higher.

In this way, it will not fail because the data tilt causes a reduce to load and drop too much data. So the original sql can specify the use of mapjoin when using join by using hint.

Select / * + mapjoin (A) * / f.arecoverf.b

From A t join B f

On (f.a=t.an and f.ftime=20110802)

Run again and find that the execution is much more efficient than the previous way of writing.

Another great advantage of mapjoin is the ability to perform join operations with unequal connections. If this operation is not supported by the discourse method of join directly, hive syntax parsing will directly throw an error. If you write the unequal to where, it will cause Cartesian product, the data will be abnormally large, the speed will be very slow. It may even fail to run successfully.

According to the calculation principle of mapjoin, MAPJION will read all the small tables into memory and directly match the data of another table with the data of the table in memory in the map phase. In this case, even the Cartesian product does not have much efficiency impact on the running speed of the task.

And the where condition of hive is itself an operation in the map phase, so writing non-equivalent comparisons in where will not cause additional burden.

From this point of view, programs developed with MAPJOIN can complete non-equivalent join operations using only one process of map, and the efficiency will be greatly improved.

Example:

Select / * + MAPJOIN (a) * / a.start_level, b.*

From dim_level a join (select * from test) b

Where b.xx > = a.start_level and b.xx

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report