Dynamic addition of nodes in hadoop+Spark+hbase cluster 12/06 Update SLTechnology News&Howtos

Dynamic addition of nodes in hadoop+Spark+hbase cluster

2025-12-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

One of the advantages of distributed systems is dynamic scalability, which is definitely not possible if nodes need to be restarted. Later, I studied it and found that there is no need to restart the cluster, just start the following processes on the new nodes.

Take hadoop, spark, and hbase as examples:

1. Add datanode nodes to hadoop

Because there is a big difference between 1.x and 2.x, I will take 2.7 as an example.

On the namenode node, copy the hadoop-2.7 to the new node and delete the files in the data and logs directories on the new node.

1. Add hdfs data node datanode

Start hdfs on this node:

. / sbin/hadoop-daemon.sh start datanode # (background mode) or. / bin/hdfs datanode # (console mode) 2. Start automatically next time

Modify all the $HADOOP_HOME/etc/hadoop/slaves configuration files of the machines in the cluster and add new nodes.

3. Refresh the node information:. / bin/hdfs dfsadmin-refreshNodes4, check the hdfs node status: that is, how many nodes are there. / bin/hdfs dfsadmin-report5, balance the data after startup, use the command. / sbin/start-balancer.sh

Without balance, cluster stores all the new data on the new node, which reduces productivity:

View the status of the hdfs node:

Hdfsdfsadmin-report1048576 (= 1Mb/s) 104857600 (= 100Mb/s) hdfsdfsadmin-setBalancerBandwidth 104857600

# setting the bandwidth of replicated data between different nodes is limited. Default is 1MB/s.

Start-balancer.sh-threshold 1

# set if the disk utilization of a datanode is 1% higher than the aPCge level, the utilization rate of each node is less than 1% if it is transferred to other lower-than-aPCge datanode.

Or:

Start-balancer.shstop-balancer.sh6, uninstall nod

In general, it is not recommended to stop uninstalling the node directly. It needs to be excluded from the cluster first.

Edit the configuration file: $HADOOP_HOME/etc/hadoop/hdfs-core.xml dfs.hosts... / etc/hadoop/datanode-allow.list dfs.hosts.exclude... / etc/hadoop/datanode-deny.list

Add the name of the node to be uninstalled in datanode-deny.list

Refresh the node information:. / bin/hdfs dfsdmin-refreshNodes # will see the node become Dead and Decommissioned state and then stop the node:. / sbin/hadoop-daemon.sh stop datanode7, add the NodeManager task node

# launch:

. / sbin/yarn-daemon.sh start nodemanager # (background mode) or. / bin/yarn nodemanager # (console mode)

# stop:

. / sbin/yarn-daemon.sh stop nodemanager8, the safe mode of shutting down the namenode node. / bin/hadoop dfsadmin-safemode leave II, adding worker node 1 to spark, adding one node is to execute on this node:. / sbin/start-slave.sh spark://:7077

Complete the registration of the new node and join the cluster.

2. Start verification by adding new nodes

Execute the jps command, and slave can see the Worker process

3. View Spark UI

See that there are new nodes in Workers

4. Stop node:. / sbin/stop-slave.sh

Then the management side of the master will show that the node is "dead". However, this message will be displayed until master is restarted.

5. Hope to start this new node automatically next time

Add this node name to the $SPARK_HOME/conf/slaves file.

3. Add RegionServer1 to hbase, start HRegionServer process hbase-daemon.shstart regionserver 2, start HquorumPeer process hbase-daemon.shstart zookeeper 3, view cluster status hbase shell, enter status4, and load balancer:

Enter: balance_switch true in hbase shell

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.