What are the advantages of Hadoop distribution 04/29 Update SLTechnology News&Howtos

What are the advantages of Hadoop distribution

2025-04-29 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what are the advantages of Hadoop distribution". In daily operation, I believe many people have doubts about the advantages of Hadoop distribution. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts about "what are the advantages of Hadoop distribution?" Next, please follow the editor to study!

What is a Hadoop distributed system?

From the naming point of view, first of all, we can think that he is a tool, some people may think that it is nonsense, but this idea is very important to me, because he is an important reason for me to study hard. Since it is a tool, I believe I can master it.

He is a distributed infrastructure developed by the Apache Foundation. He comes from three famous papers of Google, which is also the origin of the concept of cloud computing put forward by Google. You can read these three classic papers when you have time. It is a framework for computing and storing highly concurrent data implemented by users without the need to analyze the underlying situation. His function is to let users do not have to use expensive servers, using the usual small host can complete the massive data query and storage of the basic framework.

2. What are the advantages of Hadoop distribution?

1. High reliability: Hadoop distribution has a good fault-tolerant mechanism. If a machine fails, the replica machine it maintains will directly replace the malfunctioning machine to ensure the normal operation of the system.

2. High scalability: Hadoop can use computer cluster integration for computing, which can be easily extended to thousands of computer nodes.

3. High efficiency: Hadoop makes full use of the characteristics of its distributed cluster, divides the massive data into blocks, and subdivides it to each minicomputer for computing, which ensures its efficient parallel computing.

Hadoop is developed in Java language, so the cross-platform is very good, and there are many advantages, so I won't introduce them one by one here.

3. MapReduce, the core of Hadoop

How MapReduce works is roughly divided into the following stages:

Client: submit a MapReduce job.

Jobtracker: coordinate operating procedures. Jobtracker is a Java program, and its main class is JobTracker.

Tasktracker: running the task assigned by the job, he is also a Java application, and his main class is TaskTracker.

Distributed file system: generally known as HDFS, which is used to share job files between entities.

1. Job submission

The RunJob () method of JobClient is used to create a JobClient instance and call the submitJob () method as a convenient way to call the runJob () method to poll the job. If he finds any changes in the job, he automatically submits the job to the console. If the submission is successful, he will submit the job count to the console, and if it fails, he will submit the error record to the console.

2. Job initialization

When JobTracker gets the job by calling its submitJob (), it is placed in the internal job queue and dispatched by the Job Scheduler (job scheduler) and initialized. Initialization includes creating a running business object, task, and record, which makes it easy for the program to track the running status and process of the task.

3. Assignment of jobs

Tasktarcker runs a simple loop that periodically sends a "heartbeat" to JobTracker, telling JobTracker,tasktracker if he is still alive and acting as a bridge between the two.

Task execution

Tasktracker has been assigned a task, the next step is to run, first of all, he will copy the job Jar file to the file system where tasktracker is located through the shared file system, so as to localize the job JAR file, tasktracker, copy the application from the distributed cache to the local disk and extract it, then create the task instance and run it.

4. Progress and status

MapReduce job is a constant time batch job, running time from a few seconds to several hours or more, such a long time the user must need to grasp the progress of the job, a job and each of his tasks have a status, including the job or task status such as: running status, success status, failure status, map and reduce progress, job counter value, status message, or description, and so on. These states are constantly changing with the time of the job.

5. Completion of homework

When JobTracker receives the notification that the last task of the job has been completed, it changes the job status to success, and then when JobClient queries the status, it knows that the executed task has been executed successfully. So the JobClient side prints a customer message to inform the customer, and then returns from the runJob () method.

At this point, the study on "what are the advantages of Hadoop distribution" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.