What is the ReduceTask process? 07/03 Update SLTechnology News&Howtos

What is the ReduceTask process?

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "what the ReduceTask process is like". In the daily operation, I believe many people have doubts about what the ReduceTask process is. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "what the ReduceTask process is like." Next, please follow the editor to study!

Interpretation of ReduceTask process Source Code

1. The final files are file.out and file.out.index, waiting for the copy of reduce.

2. In the run method in LocalJobRunner$Job: / / 555 lines of if (numReduceTasks > 0) in the LocalJobRunner class {/ / determine the number of reduceTask / / create Runnable objects: LocalJobRunner$Job$ReduceTaskRunnable List reduceRunnables = getReduceTaskRunnables (jobId, mapOutputFiles); / / create thread pool ExecutorService reduceService = createReduceExecutor (); / / submit all LocalJobRunner$Job$ReduceTaskRunnable to thread pool for execution. RunTasks (reduceRunnables, reduceService, "reduce");} 3, enter runTasks (reduceRunnables, reduceService, "reduce"); method / / 559 lines for (Runnable r: runnables) {/ / loop each Runnable, which is submitted to the thread pool for execution. Service.submit (r);}

4. When the thread executes, run the run method in LocalJobRunner$Job$ReduceTaskRunnable

5. Create ReduceTask object / / LocalJobRunner class ~ 332 lines ReduceTask reduce = new ReduceTask (systemJobFile.toString (), reduceId, taskId,mapIds.size (), 1); 6. Execute run method / / LocalJobRunner class in ReduceTask-- > 347line reduce.run (localConf, Job.this);-- > / / enter run method 7, call to run method of ReduceTask / / ReduceTask class ~ 320lines initialize (job, getJobID (), reporter, useNewApi) / / initialize ~ 333 lines sortPhase.complete (); / / sort ~ 382 lines RawComparator comparator = job.getOutputValueGroupingComparator (); / / 387 lines get the packet comparator 8, enter the following code (390 lines) runNewReducer (job, umbilical, reporter, rIter, comparator, keyClass, valueClass) Enter the / / ReduceTask~577 line in the runNewReducer method-- get the information about job org.apache.hadoop.mapreduce.TaskAttemptContext taskContext = new org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl (job, getTaskID (), reporter);-- the operation of reflection creates a reduce object, such as WordCountReducerorg.apache.hadoop.mapreduce.Reducer reducer = (org.apache.hadoop.mapreduce.Reducer) ReflectionUtils.newInstance (taskContext.getReducerClass (), job) -- create RecordWriter object org.apache.hadoop.mapreduce.RecordWriter trackedRW = new NewTrackingRecordWriter (this, taskContext)

9. Go down and locate reducer.run (reducerContext); method-- > then enter (Reducer's run method) / / 628th line setup (context); reduce (context.getCurrentKey (), context.getValues (), context); / / execute the reduce method in WordCountReducer, which is a circular calling process .context.write (key,outv); / / the flow of data writing source code is as follows: ①: reduceContext.write (key, value) ②: output.write (key,value); / / enter the write method of ReduceTask / / 557 lines ③: real.write (key,value); / / real: TextOutputFormat$LineRecordWriter enter the real.write () method / / TextOutputFormat class ~ 84 lines writeObject (key); / / write out keywriteObject (value) / / write out the source code of value write key ~ take a brief look: / / TextOutputFormat class ~ 75 lines private void writeObject (Object o) throws IOException {if (o instanceof Text) {Text to = (Text) o; out.write (to.getBytes (), 0, to.getLength ()) } else {out.write (o.toString (). GetBytes (StandardCharsets.UTF_8)); / / call the toString method of the object, convert the returned string into bytes, and write}} 10, cleanup (context) through the stream / / clear the files related to students and generate the partition file as a whole MR working mechanism source code interpretation summary source code summary description: 1. Look at the source code purpose: familiar with the entire MR process, we can explain the knowledge points to the specific location of the source code. Prepare for the interview. 2. In the whole MR, there will be N MapTask (determined by the number of slices) and N ReduceTask (self-set number)-the effect in the cluster is that multiple MapTask run in parallel, and the number of parallel rows is determined by the resources of the cluster. Multiple ReduceTask runs in parallel, and the number of parallel rows is determined by the resources of the cluster. Generally speaking, the number of ReduceTask is relatively small, basically can be parallel at the same time. At this point, the study on "what the ReduceTask process is like" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.