How to build a fully distributed cluster in hadoop 09/22 Update SLTechnology News&Howtos

How to build a fully distributed cluster in hadoop

2025-09-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to build a fully distributed cluster in hadoop. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

Hadoop distributed cluster building (environment: on linux virtual machine)

1. Preparation work: (plan host name, ip and usage, build three sets first, and dynamically add the fourth one)

In the usage column, you can also use namenode,secondaryNamenode and jobTracker

Separate deployment, depending on the actual needs, not unique)

Hostname Machine ip usage

Cloud01 192.168.1.101 namenode/secondaryNamenode/jobTracker

Cloud02 192.168.1.102 datanode/taskTracker

Cloud03 192.168.1.103 datanode/taskTracker

Cloud04 192.168.1.104 datanode/taskTracker

two。 Configure the linux environment (refer to pseudo-distributed building below)

2.1 modify hostname (cloud01,cloud02,cloud03)

2.2 modify the ip of each machine (as assigned by yourself)

2.3 modify the mapping between hostname and ip

(only modify it on cloud01, copy it to other machines after the change, instruction:

Scp / etc/profile cloud02:/etc/

Scp / etc/profile cloud03:/etc/)

2.4 turn off the firewall

2.5 restart

3. Install jdk (refer to pseudo-distributed building, version take jdk1.6.0_45 as an example)

Just install it on one machine and copy it to another machine (the software is best managed)

For example, on cloud01, jdk is installed under / soft/java

(use instruction: scp-r / soft/java/ cloud02:/soft/

Scp-r / soft/java/ cloud03:/soft/

You can copy the jdk over. But let's not copy it for the time being, and copy it together after the following hadoop is installed.)

4. Install hadoop cluster (hadoop version takes hadoop-1.1.2 as an example)

4.1 upload the hadoop package to the / soft directory and decompress it to that directory (see pseudo-distributed build)

4.2 configure hadoop (this time 6 files need to be configured)

4.21hadoop-env.sh

On the ninth line

Export JAVA_HOME=/soft/java/jdk1.6.0_45 (pay attention to removing the previous #)

4.22core-site.xml

Fs.default.name

Hdfs://cloud01:9000

Hadoop.tmp.dir

/ soft/hadoop-1.1.2/tmp

4.23hdfs-site.xml

Dfs.replication

three

4.24mapred-site.xml

Mapred.job.tracker

Cloud01:9001

4.25masters (specify secondarynamenode address)

Cloud01

4.26slaves (specify child node)

Cloud02

Cloud03

4.3 copy the configured hadoop to the other two machines

Copy the soft folder directly (it contains jdk and hadoop, so it is highly recommended

Documents should be managed uniformly)

Directive:

Scp-r / soft/ cloud02:/

Scp-r / soft/ cloud03:/

4.4 configure ssh logon-free

It is login-free from primary node to child node.

That is, login-free from cloud01 to cloud02 and cloud03

Just generate it on cloud01.

Instruction: ssh-keygen-t rsa

And copy it to the other two machines.

Instruction: ssh-copy-id-I cloud02

Ssh-copy-id-I cloud03

4.5 formatting hadoop

You just need to format it on cloud01 (master node namenode).

Instruction: hadoop namenode-format

4.6 Verification

Start cluster directive: start-all.sh

If the startup process, error safemode related Exception

Execute command: hadoop dfsadmin-safemode leave (exit safe mode)

Start hadoop again

Then jps, check each machine to see if it is the same as the planned use)

OK, if it goes as planned, it will be done.

5. Add a node dynamically

(it is very common and practical in the actual production process.)

Cloud04 192.168.1.104 datanode/taskTracker

Add a linux through clone (take clone cloud01 as an example. This will not be the case in the actual production process.

Because virtual machines are rarely used in the actual production process, they are all direct servers. Note that when clone

You have to stop the machine that wants clone first)

5.2 modify hostname, ip address, configuration mapping file, turn off firewall, and then hadoop configuration

Add cloud04 to the file slaves, set no login, restart

If you clone, you no longer need to configure the mapping file and turn off the firewall. Because

The machine on your clone has been configured.

5.3 after restarting the machine, start datanode and taskTracker respectively

Instruction: hadoop-daemon.sh start datanode

Hadoop-daemon.sh start tasktracker

5.4 run the command refresh on cloud01, the node where namenode is located

Hadoop dfsadmin-refreshNodes

5.5 Verification

Ip:50070 of http://linux (hdfs management interface)

To see if there is one more node, if there is one more node, it is done!

6. Delete a node (here for collection)

6.1 modify the / soft/hadoop-1.1.2/conf/hdfs-site.xml file on cloud01

Add configuration information:

Dfs.hosts.exclude

/ soft/hadoop-1.1.2/conf/excludes

6.2 determine the machine to be removed from the shelf

The content of the file defined by dfs.hosts.exclude is one per line for each machine that needs to be offline.

6.3 force configuration reload

Instruction: hadoop dfsadmin-refreshNodes

6.4 shut down the node

Instruction: hadoop dfsadmin-report

You can view the nodes connected on the cluster now.

When Decommission is in progress, it will show:

Decommission Status: Decommission in progress

When the execution is complete, the following is displayed:

Decommission Status: Decommissioned

6.5 Edit the excludes file again

Once the machine is off the shelf, they can be removed from the excludes file

Log in to the machine to be taken off the shelf and you will find that the DataNode process is gone, but the TaskTracker still exists

It needs to be handled by hand.

This is the end of the article on "how to build a fully distributed hadoop cluster". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.