What is the MapTask process? 07/11 Update SLTechnology News&Howtos

What is the MapTask process?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what MapTask process is". The explanation content in this article is simple and clear, easy to learn and understand. Please follow the idea of Xiaobian and go deep into it slowly to study and learn "what MapTask process is" together.

MapTask process source code interpretation 1. Start mapTask process analysis from step 24 of job submission process, enter submitJob --LocalJobRunner.java line 788 Job = new Job(JobID.downgrade(jobid), jobSubmitDir); //Create a Job that can be executed. The Job: LocalJobRunner$Job , and is a thread $represents the internal class 2. Because the current Job object is a thread, all execution threads need to execute the run method, so directly find the run method of LocalJobRunner to view it. --Navigate to line 537 TaskSplitMetaInfo[] taskSplitMetaInfos = SplitMetaInfoReader.readSplitMetaInfo(jobId, localFs, conf, systemJobDir);//Read the metainfo of the slice, i.e. the job.splitmetainfo file generated in the temporary directory during job submission 3. Go down the breakpoint and locate the following code--547 Line List mapRunnable = getMapTaskRunnable ( taskSplitMetaInfos, jobId, mapOutputFiles);//According to the metainfo information of the slice, you can find out how many slices there are, and then generate the corresponding number of Runnable objects. Runnable: LocalJobRunnber$Job$MapTaskRunnable --associate with thread 4, ExecutorService mapService = createMapExecutor(); //Create a thread pool object runTasks(mapRunnable, mapService, "map");//Submit all LocalJobRunnber$Job$MapTaskRunnable objects to the thread pool for execution and enter the runTasks method. --LocalJobRunner 466 line 5,//each thread is responsible for a Runnable execution, locate the run method inside each Runnable, view the specific execution (nested in the way of inner classes) for (Runnable r : runnable) { service.submit(r); }LocalJobRunnber$Job$MapTaskRunnable When given to each thread for execution, it will execute to the run method of LocalJobRunnber$Job$MapTaskRunnable, so next look at the run method of LocalJobRunnber$Job$MapTaskRunnable --248 lines in LocalJobRunner

6. Enter the run method and navigate to line 254 MapTask map = new MapTask(systemJobFile.toString(), mapId, taskId, info.getSplitIndex(), 1); //Create MapTask object --Executes in each thread, creates a mapTask object

7. Enter map.run(localConf, Job.this); --271 line//execute the run method of MapTask and associate it with the run in MapTask method

Enter the MapTask run method and first set partitions = jobContext.getNumReduceTasks(); if (partitions > 1) { partitioner = (org.apache.hadoop.mapreduce.Partitioner) ReflectionUtils.newInstance(jobContext.getPartitionerClass(), job); } else { partitioner = new org.apache.hadoop.mapreduce.Partitioner() { @Override public int getPartition(K key, V value, int numPartitions) { return partitions - 1; } }; }8. Locate line 347 of the run method in MapTask and enter the runNewMapper() method. Determine in advance whether to use the new api to enter the runNewMapper() method. Locate line 745 of MapTask and start reading the source code. 9. Create the Mapper object by--reflection. Example: WordCountMapper org.apache.hadoop.mapreduce.Mapper mapper = (org.apache.hadoop.mapreduce.Mapper) ReflectionUtils.newInstance(taskContext.getMapperClass(), job); --Create InputFormat objects in a reflective manner, e.g. TextInputFormat (default) org.apache.hadoop.mapreduce.InputFormat inputFormat = (org.apache.hadoop.mapreduce.InputFormat) ReflectionUtils.newInstance(taskContext.getInputFormatClass(), job); --Get the slice information that the current MapTask is responsible for org.apache.hadoop.mapreduce.InputSplit split = null; split = getSplitDetails(new Path(splitIndex.getSplitLocation()), splitIndex.getStartOffset()); --Get RecordReader object org.apache.hadoop.mapreduce.RecordReader input = new NewTrackingRecordReader (split, inputFormat, reporter, taskContext);

output = new NewOutputCollector(taskContext, job, umbilical, reporter); the method enters

11. Locate MapTask at line 710 collector = createSortingCollector(job, reporter); //Collector object, can be understood as buffer object 12. Enter the createSortingCollector method, --MapTask line 388 13, collector.init(context); --Initialize buffer object collector: MapTask$MapOutputBuffer14. Enter init method --MapTask line 968 15, ①: locate to init method line 980--//get overflow percentage 80%, configure final float sprayer = by mapreduce.map.sort.spill.percent parameter job.getFloat(JobContext.MAP_SORT_SPILL_PERCENT, (float)0.8);--//Get buffer size 100M, configure final int sortmb = job.getInt(MRJobConfig.IO_SORT_MB, MRJobConfig.DEFAULT_IO_SORT_MB);--//Get sort object QuickSort.class, index sorter = ReflectionUtils.newInstance(job.getClass( MRJobConfig.MAP_SORT_CLASS, QuickSort.class, IndexedSorter.class), job);--//Get key's comparator object comparator = job.getOutputKeyComparator();--//Get serialization object of key k/v serialization Get serialization object of kv--//Get counter object output counters--//compression Get codec, perform compression operation--//combiner Get Combiner object, use combiner-- www.example.com () in overflow and merge//spillThread.start; Start overflow thread, overflow operation will occur only when overflow percentage is reached 16, mapper.run(mapperContext); execute run method in Mapper object, for example, run method in WordCountMapper enters mapper.run() method to execute setup(context); --143 line executes map(context.getCurrentKey(), context.getCurrentValue(), context); - -146, enter the map() method in wordCount, which is a process context.wirte(outK,outV) executed in a loop; write the kv processed in the map method to execute cleanup(context); thank you for reading, the above is the content of "MapTask process is what", after learning this article, I believe you have a deeper understanding of how MapTask process is, and the specific use situation needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.