Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze JobSplit Source Code in MapReduce

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail for you how to carry out JobSplit source code analysis in MapReduce, the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

MapReduce source code analysis-JobSplit

According to the principle of MapRudece, we analyze the source code of MR from four processes:

Split stage

MapTask stage

Shuffle stage

ReduceTask stage

Let's start with the source code for the Split phase.

Split source code analysis

MR is submitted to RM through JobSubmitter.submitJobInternal

Split the input file of job through writeSplits (JobContext job, Path jobSubmitDir) in submitJobInternal

WriteSplit only encapsulates the new and old api, and selects the new and old api according to your code. Here, call writeNewSplits and use the new API to split the file.

The logical process of the whole slice is mainly in writeNewSplit.

① writeNewSplits source code analysis

Entering the writeNewSplits () method, you can see that after the method first gets the splits array information, sorting, it will give priority to the large files, and finally return the number of mapper. It is divided into two parts: determining the number of slices and writing slice information. The task of determining the number of slices is left to getSplits (job) of FileInputFormat, and the task of writing slice information is left to the JobSplitWriter.createSplitFiles (jobSubmitDir, conf, jobSubmitDir.getFileSystem (conf), array) method, which writes both slice information and SplitMetaInfo into HDFS. Return array.length, which returns the number of map tasks. The default number of map is: default_num = total_size / block_size

Its internal logic is mainly divided into the following steps:

Create an InputFormat instance, and call the getSplits method with the instance to split the file. The getSplits interior is the main logic of segmentation.

Sort slice files in reverse sort order according to the size of split

CreateSplitFiles: implement the array array that saves the slice information to a file

② getSplits source code analysis

GetSplits mainly slices the files and writes the file path path, the offset (that is, the starting position, which is the starting position of the split in the whole file), the split size splitSize, the locations information Host of the block where the offset is located, and the host information in memory into the FileSplit object, and a split corresponds to an object, and finally it is returned in splits.

③ createFile source code analysis

The files created by createSplitFiles include two files, namely, the slice file that records the slice and the slice metadata file that records the slice metadata.

Supplementary content:

The size of split slices in getSplits method

First of all, you need to distinguish between two concepts: block and split. Blocks are a concept in HDFS, and files are stored in blocks in HDFS. Slicing is a concept in MapReduce

The size of split is known from the formula, which depends on the size relationship among minSize, blockSize and maxSize, which also determines the size relationship between split and block blocks. In practice, we should ensure that split and block have an one-to-one relationship.

On how to carry out JobSplit source code analysis in MapReduce to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report