What are the knowledge points of the federal mechanism of Hadoop 04/18 Update SLTechnology News&Howtos

What are the knowledge points of the federal mechanism of Hadoop

2025-04-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what are the knowledge points of Hadoop's federation mechanism". The explanation in this article is simple and clear, easy to learn and understand. Please follow the ideas of Xiaobian and go deep into it slowly to study and learn "what are the knowledge points of Hadoop's federation mechanism" together.

Limitations and Shortcomings of Hadoop

The core components of Hadoop 1.0, MR and HDFS, have several major shortcomings:

1) Low level of abstraction. For simple functions, write a lot of code.

2) Limited ability to express. MR abstracts complex distributed programming work to two functions, Map and Reduce, which are not enough to be done in a production environment.

3. Manage complex dependencies between operations. Practical applications usually require a large number of collaborative jobs, and there are often complex dependencies between jobs.

4. Low iteration efficiency. For iterative tasks, the data in HDFS files need to be read and written repeatedly, which greatly reduces the iteration efficiency.

5. Waste of resources. The Reduce task needs to wait for all Map tasks to complete before starting.

6. Poor timeliness. Suitable for offline batch processing.

II. THE REALIZATION OF THE FEDERATION

A federation is formed by using multiple NNs. NNs are independent and do not need to call each other. NN is federated and belongs to a federation, and the DN managed is used as a public storage of blocks. The concept of block pool, each namespace has a pool, datanodes will store all the pools in the cluster, the management between block pools is independent, a namespace does not need to coordinate with other namespaces when generating a blockid, the failure of a namespace will not affect the datanodes to other namesodes services. A namespace and its block pool are used as a management unit. After deletion, the pool corresponding to datanodes will also be deleted. When the cluster is upgraded, this snap-in is also upgraded independently. clusterID is introduced here to identify all nodes in the cluster. This id is generated after a namenode format, and is used for formats of other namenode in the cluster.

III. Main advantages:

Namespace scalability--Jointly add namespace horizontal extensions. DN is also expanded with the addition of NN.

Performance-File system throughput is not limited by a single NameNode. Add more Namenode clusters to extend file system read/write throughput.

Isolation-Isolates different types of programs and controls the allocation of resources to some extent

IV. Configuration:

Federated configurations are backward compatible, allowing a currently running single-node environment to be converted to a federated environment without changing any configuration. The new configuration scheme ensures that the configuration files are the same for all nodes in a clustered environment. The concept of NameServiceID is introduced here as a suffix for namenodes. Step 1: Configure the attribute dfs.nameservices for datanodes to identify namesenodes. Step 2: Add this suffix to each namenode.

V. Operation:

#Create federation, do not specify ID will be automatically generated

$HADOOP_HOME/bin/hdfs namenode -format [-clusterId ]

#Upgrade Hadoop to Cluster

$HADOOP_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR

-upgrade -clusterId

#Extension of existing federation

$HADOOP_HOME/bin/hdfs dfsadmin -refreshNamenodes

：

Withdrawal from the Federation

$HADOOP_HOME/sbin/distribute-exclude.sh

$HADOOP_HOME/sbin/refresh-namenodes.sh

What is CDH?

It is one of many branches of Hadoop, maintained by Cloudera, built on a stable version of Apache Hadoop, and integrated with many patches for direct use in production environments.

Advantages of CDH: clear division of versions

Fast version updates

Support Kerberos security authentication document clarity

Supports multiple installation methods (Cloudera Manager, YUM, RPM, Tarball) What is CM Cloudera Manager? is to facilitate Hadoop in clusters

It greatly simplifies the installation and configuration management of host, Hadoop, Hive, Spark and other services in the cluster.

Cloudera Manager has four main functions:

(1) Management: Manage clusters, such as adding and deleting nodes.

(2) Monitoring: Monitor the health of the cluster, and comprehensively monitor the various indicators and system operation conditions set.

(3) Diagnosis: Diagnose the problems that occur in the cluster and give suggestions for solutions to the problems that occur.

(4) Integration: Integration of multiple components of Hadoop.

Thank you for your reading. The above is the content of "What are the federation mechanism knowledge points of Hadoop?" After studying this article, I believe that everyone has a deeper understanding of what the federation mechanism knowledge points of Hadoop have. The specific use situation still needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.