What is the arrangement and principle of hadoop job tuning parameters 04/19 Update SLTechnology News&Howtos

What is the arrangement and principle of hadoop job tuning parameters

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

What is the arrangement and principle of hadoop tuning parameters? in view of this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

1 Map side tuning parameter

1.1 the internal principle of MapTask operation

When map task starts to operate and produces intermediate data, the intermediate results are not simply written to disk. The intermediate process is more complex, and the memory buffer is used to cache some of the results that have been produced, and some pre-sorting is done in the memory buffer to optimize the performance of the whole map. As shown in the figure above, each map corresponds to a memory buffer (MapOutputBuffer, that is, the buffer in the image above).

In memory), map will first write some of the generated results to the buffer. This buffer defaults to the 100MB size, but this size can be adjusted according to the parameter set when the job is submitted, which is: io.sort.mb. When the data generated by map is very large, and the io.sort.mb is enlarged, then the number of spill of map in the whole calculation process is bound to be reduced, and the operation of map task to the disk will become less. If the bottleneck of map tasks is on disk, this adjustment will greatly improve the computing performance of map. The memory structure of map for sort and spill is as follows:

In the process of running, map keeps writing existing calculation results to the buffer, but the buffer may not be able to cache all the map output. When the map output exceeds a certain threshold (such as 100m), then map must write the data in the buffer to disk, a process called spill in mapreduce. Map does not wait until the buffer is full before spill, because if you write the spill when it is all full, it will inevitably cause the computing part of the map to wait for the buffer to free space. So, map actually starts spill when buffer is full to a certain extent (say, 80%). This threshold is also controlled by a job configuration parameter, io.sort.spill.percent, which defaults to 0.80 or 80%. This parameter also affects the frequency of spill, which in turn affects map

The frequency of reading and writing to the disk during the task run cycle. But in non-special cases, there is usually no need for artificial adjustment. Adjusting io.sort.mb is more convenient for users.

When the calculation part of the map task is complete, if the map has output, it will generate one or more spill files, which are the output of map. Before map exits normally, the spill needs to be merged (merge) into one, so map still has a merge process before it ends. In the process of merge, there is a parameter that adjusts the behavior of the process: io.sort.factor. This parameter defaults to 10. It indicates how many parallel stream can be written to the merge spill file at most when it comes to the merge file. For example, if the data generated by map is very large, the resulting spill file is greater than 10, and io.sort.factor uses the default 10, then when the map calculation is completed to do merge, there is no way to merge all the spill files into one at a time, but will be divided into multiple times, up to 10 stream at a time. That is to say, when the intermediate result of map is very large, increasing io.sort.factor will help to reduce the number of merge, and then reduce the frequency of map reading and writing to the disk, which may achieve the purpose of optimizing the job.

When job specifies combiner, we all know that after the introduction of map, map results will be merged on the map side according to the functions defined by combiner. The time to run the combiner function may be before or after the merge is completed, and the timing can be controlled by one parameter, min.num.spill.for.combine (default 3). When combiner is set in job and the number of spill is at least 3, then the combiner function will run before the merge produces the result file. In this way, we can reduce the amount of data written to the disk file when spill needs a lot of merge and a lot of data needs to do conbine, also in order to reduce the frequency of reading and writing to the disk, and it is possible to optimize the job.

The way to reduce the access of intermediate results to and from disk is not only these, but also compression. That is to say, in the middle of map, whether it is the time of spill or the result file produced by merge, it can be compressed. The advantage of compression is that the amount of data written to the read disk is reduced by compression. It is especially useful for job, where the intermediate result is very large and disk speed becomes the bottleneck of map execution. The parameter that controls whether the map intermediate result uses compression is: mapred.compress.map.output (true/false). When this parameter is set to true, map will compress the data and then write it to disk when writing the intermediate result. When reading the result, it will decompress the data first and then read the data. The consequence of this is that the amount of intermediate data written to disk will be less, but cpu will consume some to compress and decompress. So this method is usually suitable for the case that the intermediate result of job is very large, and the bottleneck is not in cpu, but in the read and write of disk. To put it bluntly, use cpu for IO. According to observation, usually most jobs cpu is not a bottleneck, unless the operation logic is extremely complex. So it is usually profitable to use compression for intermediate results. The following is a comparison of the amount of data read and written to the local disk of the wordcount intermediate result with and without compression:

The intermediate result of map is not compressed:

Map intermediate result compression:

As you can see, the same job, the same data, in the case of compression, the intermediate result of map can be reduced by nearly 10 times. If the bottleneck of map is on disk, then the performance of job will be greatly improved.

When using map intermediate result compression, users can also choose which compression format to use for compression. Now the compression formats supported by hadoop are: GzipCodec,LzoCodec,BZip2Codec,LzmaCodec and other compression formats. Generally speaking, LzoCodec is suitable for achieving a more balanced cpu and disk compression ratio. But it also depends on the specific situation of job. If you want to choose the compression algorithm of the intermediate result, you can set the configuration parameter: mapred.map.output.compression.codec=org.apache.hadoop.io.compress.DefaultCodec or other compression method chosen by the user.

1.2 tuning of Map side related parameters

2 Reduce side tuning parameters

2.1 the internal principle of ReduceTask operation

The operation of reduce is divided into three stages The order is copy- > sort- > reduce. Since each map of job divides the data into map output results into n partition according to the number of reduce (n), it is possible that the intermediate result of map contains some of the data that each reduce needs to process. Therefore, in order to optimize the execution time of the reduce, when the first map of the job ends, all the reduce will try to download the partition part of the data corresponding to the reduce from the completed map. This process is commonly known as shuffle, that is, the copy process.

When Reduce task is doing shuffle, it is actually downloading part of the data belonging to this reduce from different completed map, because there are usually many map, so for a reduce, downloading can also be parallel from multiple map downloads, this parallelism can be adjusted, the adjustment parameter is: mapred.reduce.parallel.copies (default 5). By default, there are only 5 parallel download threads downloading data from map. If there are 100 or more map completed by job within a period of time, then reduce can only download data of up to 5 map at the same time, so this parameter is more suitable for job with a lot of map and faster completion, which is conducive to faster reduce access to its own part of the data.

When each download thread of reduce downloads some map data, there may be an error on the machine where the intermediate result of the map is located, or the file of the intermediate result is lost, or the network is cut off, and so on, so the download of reduce may fail, so the download thread of reduce will not wait endlessly. When the download still fails after a certain period of time, then the download thread will give up the download. And then try to download it from another place (because map may run again during this time). So the maximum download time period of the reduce download thread can be adjusted as follows: mapred.reduce.copy.backoff (default 300s). If the network of the cluster environment is itself a bottleneck, users can increase this parameter to prevent the reduce download thread from being misjudged as a failure. However, in the case of a better network environment, there is no need to adjust. Generally speaking, professional cluster networks should not have too many problems, so there are not many cases in which this parameter needs to be adjusted.

When Reduce downloads map results locally, merge is also required, so the configuration option of io.sort.factor will also affect the behavior of reduce when merge. As mentioned above, when reduce is found to be very high in iowait in the shuffle phase, it is possible to increase the concurrent throughput of merge by increasing this parameter to optimize reduce efficiency.

Reduce does not write the downloaded map data to disk immediately during the shuffle phase, but caches it in memory first, and then brushes it to disk when the memory usage reaches a certain amount. This memory size control is not set by io.sort.mb like map, but by another parameter: mapred.job.shuffle.input.buffer.percent (default 0.7), which is actually a percentage, which means that the maximum amount of memory used by shuffile in reduce memory is: 0.7 × maxHeap of reduce task. That is, if the reduce

A percentage of the maximum heap usage of task (usually set by mapred.child.java.opts, such as-Xmx1024m) is used to cache data. By default, reduce uses 70% of its heapsize to cache data in memory. If the heap of reduce is adjusted larger for business reasons, the corresponding cache size will also become larger, which is why the parameter used by reduce for caching is a percentage rather than a fixed value.

Assuming that the max heapsize of the mapred.job.shuffle.input.buffer.percent is 0.7 max heapsize reduce task is 1G, then the memory used for downloading data cache is about 700MB. This 700m memory, like the map side, does not wait until it is fully written before it will be brushed to the disk, but when the 700m is used to a certain limit (usually a percentage), it will start to brush to the disk. This threshold can also be set by the job parameter: mapred.job.shuffle.merge.percent (default 0.66). If the download speed is fast and it is easy to stretch the memory cache, then adjusting this parameter may be helpful to the performance of reduce.

When reduce has downloaded all the data on the map corresponding to its own partition, it will begin the real reduce computing phase (there is a sort phase that usually takes a very short time, which is completed in a few seconds, because the entire download phase is already downloading while sort, and then merge). When reduce task really enters the evaluation phase of the reduce function, there is a parameter that can also adjust the calculation behavior of reduce. That is: mapred.job.reduce.input.buffer.percent (default 0.0). Because reduce calculation certainly needs memory consumption, and when reading the data needed by reduce, memory is also needed as buffer. This parameter controls how much memory percentage is needed as the buffer percentage of data that reduce reads sort good data. The default is 0, that is, by default, reduce reads and processes data all from disk. If this parameter is greater than 0, then a certain amount of data will be cached in memory and transferred to reduce. When the memory consumption of reduce computing logic is very small, it can be divided into parts of memory to cache data. Anyway, the memory of reduce is idle.

This is the answer to the question about the arrangement and principle of hadoop tuning parameters. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.