Summary of single Point of failure solution in Hadoop 2.0 07/09 Update SLTechnology News&Howtos

Summary of single Point of failure solution in Hadoop 2.0

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Project construction

The kernel of Hadoop 1.0 is mainly composed of two branches: MapReduce and HDFS. It is well known that the design defect of these two systems is a single point of failure, that is, the two core services of MR's JobTracker and HDFS's NameNode have a single point of failure, which has not been solved for a long time, which makes Hadoop only suitable for offline storage and offline computing for a long time.

Thankfully, these issues have been solved very completely in Hadoop 2.0. The Hadoop 2.0 kernel consists of three branches, namely HDFS, MapReduce and YARN, based on which other systems in the Hadoop ecosystem, such as HBase, Hive, and Pig, are developed. As of the release of this article, the single point of failure of the three subsystems of Hadoop 2.0 has been solved or is being solved (Hadoop HA). This article will introduce the current progress and specific solutions.

Before formally introducing a single point of failure solution, let's briefly review these three systems (all three systems use a simple master/slaves architecture, where master is a single point of failure).

(1) HDFS: a distributed storage system modelled on google GFS, which consists of NameNode and DataNode services. NameNode stores metadata information (fsp_w_picpath) and operation log (edits). Because it is unique, its availability directly determines the availability of the whole storage system.

(2) with the introduction of the new resource management system in YARN:Hadoop 2.0, Hadoop is no longer limited to MapReduce computing, but supports a variety of computing frameworks. It consists of two types of services, ResourceManager and NodeManager, in which ResourceManager, as the only component of the whole system, has a single point of failure.

(3) MapReduce: at present, there are two kinds of MapReduce implementations, namely, MapReduce that can run independently, which consists of two types of services, JobTracker and TaskTraker, in which JobTracker has a single point of failure, and the other is MapReduce On YARN. In this implementation, each job uses a job tracker (ApplicationMaster) independently, which no longer affects each other and does not have a single point of failure. The single point of failure mentioned in this article is actually a single point of failure of JobTracker in the first implementation.

First of all, let's talk about the current progress of solving Hadoop's single point of failure. By the time of the publication of this article, HDFS's single point of failure has been solved and two sets of feasible solutions have been provided. MapReduce single point of failure (JobTracker) is solved by CDH4 (CDH4 packages MRv1 and MRv2 at the same time, where the single point of failure refers to MRv1's single point of problem) and has been released. The single point of failure of YARN has not yet been resolved, but a solution has been proposed, because the solution draws on the implementation of HDFS HA and MapReduce HA, because it will be resolved soon.

Generally speaking, the single point of failure solution architecture of HDFS, MapReduce and YARN in Hadoop is exactly the same, which is divided into manual mode and automatic mode, where manual mode means that the administrator switches between master and slave through command, which is usually useful during service upgrade. Automatic mode can reduce operation and maintenance costs, but it is potentially dangerous. The architecture of these two modes is as follows.

[manual mode]

[automatic mode]

In Hadoop HA, it mainly consists of the following components:

(1) MasterHADaemon: running in the same process as the Master service, you can receive external RPC commands to control the start and stop of the Master service

(2) SharedStorage: shared storage system, where active master writes information to the shared storage system, and standby master reads this information to keep synchronized with active master, thus reducing switching time. Common shared storage systems are zookeeper (adopted by YARN HA), NFS (adopted by HDFS HA), HDFS (adopted by MapReduce HA) and bookeeper-like systems (adopted by HDFS HA).

(3) ZKFailoverController: a switching controller based on Zookeeper, which is mainly composed of two core components: ActiveStandbyElector and HealthMonitor, in which ActiveStandbyElector is responsible for interacting with the zookeeper set × × × by trying to obtain a global lock to determine whether the managed master enters the active or standby state; HealthMonitor is responsible for monitoring the status of each active master to switch according to their states. no, no, no.

(4) Zookeeper cluster: the core function controls the whole cluster with one and only one active master by maintaining a global lock. Of course, if ShardStorge uses zookeeper, some other status and runtime information will be recorded.

In particular, you need to consider the following issues to solve the HA problem:

(1) brain-split: when switching between master and slave, due to incomplete handover or other reasons, the client and Slave mistakenly think that there are two active master, which finally makes the whole cluster in a chaotic state. To solve the problem of cerebral fissure, isolation (Fencing) mechanism is usually adopted, which includes three aspects:

Shared storage fencing: make sure that only one Master writes data to shared storage.

Client fencing: make sure that only one Master can respond to the client's request.

Slave fencing: make sure that only one Master can issue commands to Slave.

Two fenching implementations are provided in the Hadoop public library, namely sshfence and shellfence (the default implementation). Sshfence refers to logging in to the target Master node through ssh and using the command fuser to kill the process (locating the process pid through the tcp port number, which is more accurate than the jps command). Shellfence refers to the execution of a user-defined shell command (script) to complete isolation.

(2) external transparency of handover: in order to ensure that the whole handover is transparent, Hadoop should ensure that all clients and Slave can be automatically redirected to the new active master. This is usually done by retrying to link the new master after several unsuccessful attempts to connect the old master, and there is a certain delay in the whole process. In the new version of Hadoop RPC, users can set parameters such as RPC client attempt mechanism, number of attempts and attempt timeout.

In order to verify the above general scheme, take MapReduce HA as an example. In CDH4, the introduction of HA scheme can refer to my article: "introduction of JobTracker HA scheme in CDH". The architecture diagram is as follows:

Hadoop 2.0 HDFS HA solution readable article: "Hadoop 2.0 NameNode HA and Federation practice". At present, HDFS2 provides two HA solutions, one is based on NFS shared storage, and one based on Paxos algorithm Quorum Journal Manager (QJM). Its basic principle is to use 2N+1 JournalNode to store EditLog. Every time most of the write data operations (> = Number1) return successfully, the write will be considered successful and the data will not be lost. At present, the community is trying to use Bookeeper as a shared storage system. The HDFS HA architecture diagram given by HDFS-1623 is as follows:

At present, the slowest progress is the YARN HA solution, which has been documented and is under specification and development. Please refer to https://issues.apache.org/jira/browse/YARN-149. Generally speaking, its overall architecture is similar to that of MapReduce HA and YARN HA, but the shared storage system uses Zookeeper. The reason for adopting a lightweight "storage system" such as Zookeeper (it should be noted that zookeeper is not designed to store, but to provide distributed coordination services, but it can safely and reliably store a small amount of data to solve the problem of data sharing among multiple services in a distributed environment) is because most of the information in YARN can be dynamically reconstructed through the heartbeat information of NodeManager and ApplicationMaster. ResourceManager itself only needs to record a small amount of information to Zookeeper.

Generally speaking, the difficulty of HA solution depends on the amount of information recorded by Master itself and information reconfiguration. If the recorded information is very large and can not be dynamically reconstructed, such as NameNode, you need a shared storage system with high reliability and performance. If Master keeps a lot of information, but most of it can be dynamically reconstructed through Slave, then the HA solution is much easier, typical representatives are MapReduce and YARN. From another point of view, because the computing framework is not very sensitive to the loss of information, such as the loss of information on a completed task, it can be obtained only by recalculation, which makes the HA design of the computing framework much less difficult than that of storage systems.

Hadoop HA configuration method:

(1) HDFS HA:Hadoop 2.0 NameNode HA and Federation practice

(2) MapReduce HA:Configuring JobTracker High Availability

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.