Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze Hadoop source code

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces you how to carry out Hadoop source code analysis, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Everyone is familiar with the file system, before the analysis of HDFS, we did not spend a lot of time to introduce the background of HDFS, after all, we still have a certain understanding of the file system, but also have good documentation. Before analyzing the MapReduce part of Hadoop, let's first understand how the system works, and then move on to our analysis section.

Take the wordcount with Hadoop as an example (here is the startup line):

Hadoop jar hadoop-0.19.0-examples.jar wordcount / usr/input/usr/output

After the user submits a task, the task is coordinated by JobTracker, first performing the Map phase (M1 MagneM2 and M3 in the figure), and then performing the Reduce phase (R1 and R2 in the figure). Both Map phase and Reduce phase actions are monitored by TaskTracker and run in a Java virtual machine independent of TaskTracker.

Our inputs and outputs are directories on HDFS (as shown in the figure above). The input is described by the InputFormat interface, and its implementation, such as ASCII file, JDBC database, etc., respectively deals with the data source and provides some characteristics of the data. Through the InputFormat implementation, you can obtain the implementation of the InputSplit interface, which is used to partition the data (the splite1 to splite5 in the figure is the result of the partition). At the same time, you can also obtain the implementation of the RecordReader interface from InputFormat and generate pairs from the input. Once you have it, you can start doing map operations.

Map operates through context.collect (and finally through OutputCollector. Collect) writes the result to context. When the output of Mapper is collected, they are written out to the output file in a specified way by the Partitioner class. We can provide Combiner for Mapper, when Mapper outputs it, key-value pairs are not immediately written to the output, they will be collected in list (a key value a list), when writing a certain number of key-value pairs, this part of the buffer will be merged in Combiner, and then output to Partitioner (the yellow part of M1 in the figure corresponds to Combiner and Partitioner).

After the action of Map is done, enter the Reduce phase. This phase is divided into three steps: Shuffle, sort and reduce.

In the shuffling phase, Hadoop's MapReduce framework transmits the relevant results to a certain Reducer according to the key in the Map results (the intermediate results of the same key generated by multiple Mapper are distributed on different machines, and when this step is over, they are all transferred to the machine that processes the key). The file transfer in this step uses the HTTP protocol.

Sorting and shuffling are done together, and this phase merges pairs from different Mapper with the same key value.

In the Reduce phase, the above information obtained through Shuffle and sort will be sent to Reducer. It is processed in the reduce method, and the output result is output to DFS through OutputFormat.

On how to carry out Hadoop source code analysis to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report