Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to merge small hive files

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces how to merge hive small files, the article is very detailed, has a certain reference value, interested friends must read it!

Cause:

Recently, a new partition table has been built in the warehouse, with a data volume of about 1.2 billion rows. There are many partitions, one partition a day since July 2008.

A task is configured

When group by this table, we found that more than 2800 maps were started.

The execution time is also 10 minutes high.

Then I saw in the hdfs file that there were more than 20 small files in each partition of the table, each of which was not too 300KB--1MB.

Parameters of the previous hive:

Hive.merge.mapfiles=true

Hive.merge.mapredfiles=false

Hive.merge.rcfile.block.level=true

Hive.merge.size.per.task=256000000

Hive.merge.smallfiles.avgsize=16000000

Hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat

Mapred.max.split.size=256000000

Mapred.min.split.size=1

Mapred.min.split.size.per.node=1

Mapred.min.split.size.per.rack=1

Hive.merge.mapredfiles refers to merging small files at the end of Map-Reduce 's task.

Solution:

1. Modify parameter hive.merge.mapredfiles=true

two。 A new table is generated by map_reduece, and the generated file becomes one file per partition.

The efficiency of performing group by discovery again has been greatly improved.

The above is all the contents of the article "how to merge small hive Files". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report