In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces the example analysis of merge small files in hive, which has certain reference value. Interested friends can refer to it. I hope you will gain a lot after reading this article. Let Xiaobian take you to understand it together.
When Hive input consists of many small files, since each small file will start a map task, if the file is too small, the map task will start and initialize longer than the logical processing time, which will cause resource waste, even OOM.
For this reason, when we start a task and find that the input data is small but the number of tasks is large, we need to pay attention to the input merge at the front of the Map.
Of course, when we write data to a table, we also need to pay attention to the output file size
1. Map Input Merge small files
Corresponding parameters:
set mapred.max.split.size= 25600000; #Maximum input size per Map
set mapred.min.split.size.per.node= 1000000; #Minimum size of split on a node
set mapred.min.split.size.per.rack= 1000000; #Minimum size of split under a switch
set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; #Merge small files before executing Map
When org.apache.hadoop.hive.ql.io.CombineHiveInputFormat is enabled, multiple small files on a data node are merged, the number of merged files is determined by the mapred.max.split.size limit.
mapred.min.split.size.per.node determines whether files on multiple data nodes need to be merged ~
mapred.min.split.size.per.rack determines whether files on multiple switches need to be merged ~
2. output merge
set hive.merge.mapfiles = true #Merge small files at the end of Map-only tasks
set hive.merge.mapredfiles = true #Merge small files at the end of Map-Reduce tasks
set hive.merge.size.per.task = 256*1000*1000 #merge file size
set hive.merge.smallfiles.avgsize= 1600000 #Start a separate map-reduce task to merge files when the average size of the output file is less than this value
Thank you for reading this article carefully. I hope that the article "Sample Analysis of Merge Small Files in hive" shared by Xiaobian will be helpful to everyone. At the same time, I hope that everyone will support you a lot and pay attention to the industry information channel. More relevant knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
We are trying add the second member to DAG, but we were receiving the following error:To correct thi
© 2024 shulou.com SLNews company. All rights reserved.