In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "dynamic uplink and offline datanode nodes and replica balancing mechanism". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Test the project to the new server
Start datanode sbin/hadoop-daemon.sh start datanode
Get rid of tmp
This article mainly explains in detail how to dynamically add nodes to the cluster in the Hadoop2.6.0 environment from three parts: basic preparation, adding DataNode and adding NodeManager.
Basic preparation
In the basic preparation part, the main purpose is to set up the system environment in which hadoop is running.
Modify the system hostname (through hostname and / etc/sysconfig/network)
Modify the hosts file to configure the hosts of all nodes in the cluster (keep the hosts file uniform for all nodes in the cluster)
Set the password-free login of NameNode (required by both HA) to DataNode (implemented by ssh-copy-id command, which can be exempted from the permission modification after cp * .pub file)
Modify the slave file of the master node and add the ip information of the new node (used when the cluster is restarted)
Scp the configuration file of hadoop to the new node
Add DataNode
For the newly added DataNode node, you need to start the datanode process to add it to the cluster
On the new node, just run sbin/hadoop-daemon.sh start datanode
Then check the cluster situation through hdfs dfsadmin-report in namenode
Finally, you also need to set hdfs load balancing, because the default data transmission bandwidth is relatively low, which can be set to 64m, that is, hdfs dfsadmin-setBalancerBandwidth 67108864.
The default threshold of balancer is 10%, that is, the difference in total storage utilization between each node and the cluster is no more than 10%. We can set it to 5%.
Then start Balancer,sbin/start-balancer.sh-threshold 5 and wait for the cluster self-balancing to complete.
Add Nodemanager
Because Hadoop 2.x introduces the YARN framework, each compute node can be managed through NodeManager. Similarly, after starting the NodeManager process, it can be added to the cluster.
In the new node, you can run sbin/yarn-daemon.sh start nodemanager
In ResourceManager, check the cluster status through yarn node-list
This is the end of the content of "dynamic uplink and offline datanode nodes and how to achieve replica balance mechanism". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.