How to realize dynamic uplink and downlink datanode nodes and replica balancing mechanism 02/14 Update SLTechnology News&Howtos

How to realize dynamic uplink and downlink datanode nodes and replica balancing mechanism

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "dynamic uplink and offline datanode nodes and replica balancing mechanism". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Test the project to the new server

Start datanode sbin/hadoop-daemon.sh start datanode

Get rid of tmp

This article mainly explains in detail how to dynamically add nodes to the cluster in the Hadoop2.6.0 environment from three parts: basic preparation, adding DataNode and adding NodeManager.

Basic preparation

In the basic preparation part, the main purpose is to set up the system environment in which hadoop is running.

Modify the system hostname (through hostname and / etc/sysconfig/network)

Modify the hosts file to configure the hosts of all nodes in the cluster (keep the hosts file uniform for all nodes in the cluster)

Set the password-free login of NameNode (required by both HA) to DataNode (implemented by ssh-copy-id command, which can be exempted from the permission modification after cp * .pub file)

Modify the slave file of the master node and add the ip information of the new node (used when the cluster is restarted)

Scp the configuration file of hadoop to the new node

Add DataNode

For the newly added DataNode node, you need to start the datanode process to add it to the cluster

On the new node, just run sbin/hadoop-daemon.sh start datanode

Then check the cluster situation through hdfs dfsadmin-report in namenode

Finally, you also need to set hdfs load balancing, because the default data transmission bandwidth is relatively low, which can be set to 64m, that is, hdfs dfsadmin-setBalancerBandwidth 67108864.

The default threshold of balancer is 10%, that is, the difference in total storage utilization between each node and the cluster is no more than 10%. We can set it to 5%.

Then start Balancer,sbin/start-balancer.sh-threshold 5 and wait for the cluster self-balancing to complete.

Add Nodemanager

Because Hadoop 2.x introduces the YARN framework, each compute node can be managed through NodeManager. Similarly, after starting the NodeManager process, it can be added to the cluster.

In the new node, you can run sbin/yarn-daemon.sh start nodemanager

In ResourceManager, check the cluster status through yarn node-list

This is the end of the content of "dynamic uplink and offline datanode nodes and how to achieve replica balance mechanism". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.