In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces how to increase and decrease the number of map in hive, which is very detailed and has certain reference value. Friends who are interested must finish reading it!
How to merge small files and reduce the map number?
Suppose a SQL task:
Select count (1) from popt_tbaccountcopy_mes where pt = '2012-07-04'
Inputdir / group/p_sdo_data/p_sdo_data_etl/pt/popt_tbaccountcopy_mes/pt=2012-07-04 of the task
There are 194 files, many of which are much smaller than 128m, with a total size of 9G, and 194 map tasks will be used for normal execution.
Total computing resources consumed by Map: SLOTS_MILLIS_MAPS= 623020
I reduce the number of map by merging small files before map execution:
Set mapred.max.split.size=100000000
Set mapred.min.split.size.per.node=100000000
Set mapred.min.split.size.per.rack=100000000
Set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
Then execute the above statement, using 74 map tasks, the computing resources consumed by map: SLOTS_MILLIS_MAPS= 333500
For this simple SQL task, the execution time may be about the same, but half the computing resources are saved.
Roughly explain, 100000000 means 100m, and the parameter set hive.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat; indicates that small files are merged before execution.
The first three parameters determine the size of merged file blocks. Those larger than 128m are separated according to 128m, and those less than 128m and more than 100m are separated according to 100m, and those less than 100m are separated (including the rest of small files and separated large files).
It was merged and 74 blocks were finally generated.
How to increase the number of maps appropriately?
When the files of input are very large, the logic of the task is complex, and the execution of map is very slow, we can consider increasing the number of Map to reduce the amount of data processed by each map, so as to improve the efficiency of task execution.
Suppose there is a task:
Select data_desc
Count (1)
Count (distinct id)
Sum (case when …)
Sum (case when...)
Sum (…)
From a group by data_desc
If table a has only one file, the size is 120m, but contains tens of millions of records, it must be time-consuming to use 1 map to complete this task. In this case, we should consider splitting this file into multiple files reasonably.
This can be done with multiple map tasks.
Set mapred.reduce.tasks=10
Create table a_1 as
Select * from a
Distribute by rand (123)
In this way, the records of table a will be randomly distributed into the aqum1 table containing 10 files, and then replace table an in the above sql with aqum1, which will take 10 map tasks to complete.
Each map task will certainly be much more efficient when processing data larger than 12m (millions of records).
It seems that there is some contradiction between these two kinds, one is to merge small files, the other is to split large files into small files, which is the key point to pay attention to.
According to the actual situation, two principles should be followed to control the number of map: to make a large amount of data use the appropriate number of map, and to make a single map task handle the appropriate amount of data.
The above is all the contents of the article "how to increase or decrease the number of map in hive". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.