What is the overall working mechanism of MR in java? 07/09 Update SLTechnology News&Howtos

What is the overall working mechanism of MR in java?

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the overall MR working mechanism in java". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let Xiaobian take you to learn "what is the overall MR working mechanism in java"!

1. Interpretation of the overall MR working mechanism source code (job submission process)1.1. Job submission process--Take wordCount case as an example to debug breakpoints 1. In the WordCountDriver class, make a breakpoint (entry) at job.waitForCompletion(true); and run in debug mode. a. What you do in Configuration conf = new Configuration();conf is read all relevant configuration files b. And create the job object, complete by--Job = Job.getInstance(conf);

If (state == JobState.DEFINE) { //--Determine the current state of the job, if it is state, commit submit();}3. Parameters in waitForCompletion() method boolean verbose ~ verbose:true (default) if (verbose) { monitorAndPrintJob(); --Monitor the current job and print the job information} 4. Enter submit() method ~ position is Job.java~ line 1562 --ensureState(JobState.DEFINE); confirm the status of Job again --setUseNewAPI(); set to use new API --Hadoop provides 2 APIs --connect(); Clarify whether the current submitted Job runs locally or in a cluster environment 4.1. Enter connect() method --Job.java~ line 1534 --cluster is understood as an environment object required for the current job to run. Cluster is null at the beginning, and object creation is carried out through anonymous internal classes 4.2 Enter return new Cluster(getConfiguration()) method --Job.java~1540 Line 4.3 Enter Cluster.java class to view Cluster's parametric structure --Cluster.java~105 Line 4.4 Enter initialize(jobTrackAddr, conf); method, navigate to initProviderList();//Get the list of environments where Job runs 4.5 Enter initProviderList() method //Get a list of environments in which job runs --Cluster.java ~75 line 4.5 View line 124 in Cluster.java class, view traversal providerList There are 2 runtime environments YarnClientProtocolProvider ==> Cluster Environment LocalClientProtocolProvider==> Local Environment 4.6 Enter Cluster.java class line 130, clientProtocol = provider.create(conf) method, enter 4.7 YarnClientProtocolProvider.class class line 19

4.7 clientProtocol = null, continue to go down, you can see that the following operation is a judgment of the current operating environment Determine which environment it is based on Provider and current conf YarnClientProtocolProvider ==> YarnRunner --Yarn's running object LocalClientProtocolProvider==> LocalJobRunner --Local running object 5, Connect() execution is complete, continue to execute downward, look at Job.java line 1565, //construct job submitter object final JobSubmitter submitter = this.getJobSubmitter(this.cluster.getFileSystem(), this.cluster.getClient()); - -1565, using the file system object and client object of the current constructor 6. Continue down to line 1570 of Job.java, this line of code is the real job submission return submitter.submitJobInternal(Job.this, cluster); submit Job through JobSubmitter7, the state of job changes to executable, this.state = Job.JobState.RUNNING; --Job.java class line 1573 8. Break from line 1570 to enter the method, enter the JobSubmitter.java class, locate to line 139 of the class, submitJobInternal() method, go down the breakpoint 9. Locate to checkSpecs(job); method, used to verify the output path to enter the method

10. Go to output.checkOutputSpeces(job) and view the source code --Go to FileOutPutFormat.java and navigate to line 151. One result of this is that the output path verification is done before job submission

11. Jump out of checkSpecs(job); method, continue down --JobSubPath jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);//Get Job temporary working directory --D:/tmp/hadoop/mapred/staging/Administrator1590679188/.staging12. Continue down to code line 157, submitClient.getNewJobId(); //Get the jobId of the submitted job JobID jobId = submitClient.getNewJobID(); //jobId=job_local11590679188_001 In native mode, we know that every job has a jobId, regardless of whether the program is local or yarn. 13. Path submitJobDir = new Path(jobStagingArea, jobId.toString()); //Generate Job Submission Path--D:/tmp/hadoop/mapred/staging/Administrator1590679188/.staging/job_local11590679188_001 job14. copyAndConfigureFiles(job, submitJobDir);//Copy Job related configuration information and create Job Submission Path on disk

15. Enter uploadResourcesInternal(job,submitJobDir); method Enter from JobTransmitter class line 99

16. Enter uploadResourcesInternal(job,submitJobDir) method to read configuration items

17. Enter writeSplits(job,submitJobDir); method writeSplits(job, submitJobDir); //generate slice information

18. Locate maps = writeNewSplits(job, jobSubmitDir);, enter this method//generate slices and enter

slice object splits content is: file:///D:/input/inputWord/JaneEyre.txt:0+36306679 (file, read position from 0 to 36306679) slice is a logical statement that records where the file is read from

19. The contents recorded in the slice object splits are: Which file is read, starting from position 0 of the file and reading to that position

20. return array.length; //Return the number of slices back to line 200, assign the value returned by writeSplits(job, submitJobDir) to maps

21. conf.setInt(MRJobConfig.NUM_MAPS, maps); //Set how many MapTasks to start according to the number of slices, and finally there are two files in the job submission path:

--job.split Slice details

--job.splitmetainfo Slice description information

writeConf(conf, submitJobFile); //Write all the configuration information of the job to the job submission path and finally generate a file under the job submission path: job.xml. This file records all xml configuration information (including your own settings)

23. According to the slice information (determine the number of mapTasks started) and configuration information, really start executing the job task

24, status = submitClient.submitJob( jobId, submitJobDir.toString(), job.getCredentials()); //Submit the job for execution

At this point, I believe that everyone has a deeper understanding of "how the overall MR working mechanism in java works." Let's actually operate it! Here is the website, more related content can enter the relevant channels for inquiry, pay attention to us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.