In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article is to share with you about the difference between Hadoop1.x and Hadoop2.x. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Background of Hadoop 2.0 production
HDFS and MapReduce in Hadoop 1.0 have problems in terms of high availability and scalability.
The high pressure of JobTracker access affects the scalability of the system.
It is difficult to support computing frameworks other than MapReduce, such as Spark, Storm and so on.
NameNode single point of failure, difficult to apply to online scenarios
The pressure of NameNode is too high, and the memory is limited, which affects the scalability of the system.
Problems in HDFS
Problems in MapReduce
HDFS 2.x
Resolve HDFS 1. 0 single point of failure and memory constraints.
Solve a single point of failure
Reference HDFS High Availability Using the Quorum Journal Manager
Reference ZooKeeper Getting Started Guide
HDFS HA: solved by preparing NameNode
If the resident NameNode fails, switch to the standby NameNode.
Solve the problem of limited memory
HDFS Federation (Federal)
Scale horizontally to support multiple NameNode
Each NameNode is in charge of part of the directory.
All NameNode share all DataNode stored resources.
2.x only the architecture has changed, and the mode of use remains the same.
Transparent to HDFS users
The commands and API in HDFS 1.x can still be used
Active and standby NameNode
Solve a single point of failure
Master NameNode provides external services, standby NameNode synchronizes master NameNode metadata to be switched
All DataNode report block information to both NameNode simultaneously.
Two switching options
Manual switching: switch between preparations through the command, you can use HDFS upgrade and other occasions; (X)
Automatic switching: based on Zookeeper. (√)
Automatic switching scheme based on Zookeeper
Zookeeper Failove Controller: monitor Namenode health and register Namenode with Zookeeper
After NameNode dies, ZKFC is a NameNode competitive lock, and the NameNode that acquires the ZKFC lock becomes active.
HDFS 2.x Federation
Through multiple namenode/namespace, the storage and management of metadata are distributed to multiple nodes, so that namenode/namespace can scale horizontally by adding machines.
It can distribute the load of a single namenode to multiple nodes, and it will not degrade the performance of HDFS when the scale of HDFS data is large. Multiple namsespace can be used to isolate different types of applications, and the storage and management of HDFS of different types of applications can be assigned to different namenode.
YARN
YARN-Yet Another Resource Negotiator
The new resource management system introduced by Hadoop 2.0 evolved directly from MRv1.
Core idea: separate the resource management and task scheduling functions of JobTracker in MRv1, which are implemented by ResourceManager and ApplicationMaster processes respectively;-ResourceManager: responsible for resource management and scheduling of the whole cluster; there is only one in the whole cluster
ApplicationMaster: responsible for application-related transactions, such as task scheduling, task monitoring and fault tolerance; each application corresponds to an ApplicationMaster
With the introduction of YARN, multiple computing frameworks can run in a cluster.
Each application corresponds to one ApplicationMaster
At present, several computing frameworks can be run on YARN, such as MapReduce, Spark, Storm and so on.
MapReduce On YARN
MapReduce that runs on YARN is called MRv2
Run MapReduce jobs directly on YARN, rather than on MRv1 systems built by JobTracker and TaskTracker
JobTracker and TaskTracker do not exist in Hadoop2.0
The basic functions of the MRv2 module:
YARN: responsible for resource management and scheduling
MRAppMaster: responsible for task segmentation, task scheduling, task monitoring and fault tolerance of an application / job
Map/Reduce Task: task-driven engine, consistent with MRv1
Each application / job (MapReduce job) corresponds to one MRAppMaster
A single application / job fails to run, does not affect other applications / jobs, and is restarted by YARN
After the task fails, MRAppMaster re-applies for resources
Responsible for application / job related affairs, including re-allocating resources from YARN to internal tasks, task segmentation, task health and fault tolerance, etc.
Thank you for reading! This is the end of the article on "what's the difference between Hadoop1.x and Hadoop2.x". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.