What does distributed computing Hadoop mean? 04/20 Update SLTechnology News&Howtos

What does distributed computing Hadoop mean?

2025-04-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article introduces what distributed computing Hadoop refers to, the content is very detailed, interested friends can refer to, I hope it can be helpful to you.

What is Hadoop: Hadoop is a software platform that develops and runs to deal with large-scale data. It is an open source software framework for Appach to implement distributed computing of massive data in a cluster composed of a large number of computers.

The core designs of the Hadoop framework are HDFS and MapReduce. HDFS provides the storage of massive data, and MapReduce provides the calculation of the data.

The process of data processing in Hadoop can be simply understood as follows: the data is processed through the cluster of Haddop and the result is obtained.

HDFS:Hadoop Distributed File System,Hadoop 's distributed file system.

Large files are divided into default 64m blocks and are distributed and stored in the cluster machine.

The file data1 in the following figure is divided into three blocks, which are distributed on different machines in the form of redundant mirrors.

MapReduce:Hadoop creates a task for each input split to call Map calculation, and in this task, the records in this split are processed in turn (record). Map will output the results in the form of key--value, hadoop is responsible for sorting the output of map according to key value as the input of Reduce, and the output of Reduce Task is the output of the whole job, which is saved on HDFS.

The cluster of Hadoop is mainly composed of NameNode,DataNode,Secondary NameNode,JobTracker,TaskTracker.

As shown in the following figure:

The NameNode records how the file is split into block and that the block is stored in those DateNode nodes.

NameNode also saves the status information of the file system running.

What is stored in DataNode is the split blocks.

Secondary NameNode helps NameNode collect status information about the running file system.

JobTracker is responsible for running Job when tasks are submitted to the Hadoop cluster and is responsible for scheduling multiple TaskTracker.

TaskTracker is responsible for a map or reduce task.

On the distributed computing Hadoop refers to what is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.