Hadoop distributed deployment 10/24 Update SLTechnology News&Howtos

Hadoop distributed deployment

2025-10-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Roles in Hadoop Cluster:

HDFS:

NameNode,NN

SecondaryNameNode,SNN

DataNode,DN

YARN:

ResourceManager

NodeManager

Considerations for hadoop distributed deployment in a production environment:

HDFS Cluster:

NameNode and Secondary should be deployed separately to avoid simultaneous failure of NameNode and SecondaryNode that cannot be recovered

The number of DataNode should be at least 3, because at least 3 copies of data should be saved.

YARN Cluster:

ResourceManager is deployed on separate nodes

NodeManager runs on DataNode

The Hadoop cluster architecture is shown in the following figure:

When I deployed distributed in the test environment, I deployed the NameNode, SecondaryNameNode and ResourceManager roles on the same server Master node

Three slave nodes deploy DataNode and NodeManager

1. Configure the hosts file

Append the following to the / etc/hosts file on node1, node2, node3, node4

172.16.2.3 node1.hadooptest.com node1 master172.16.2.14 node2.hadooptest.com node2172.16.2.60 node3.hadooptest.com node3172.16.2.61 node4.hadooptest.com node4

2. Create hadoop users and groups

If you need to start or stop the entire cluster through the master node, you also need to configure users on the master node to run the service (such as hdfs and yarn) to be able to remotely connect to each slave node through ssh in the form of a key

Execute on node1, node2, node3 and node4, respectively

Useradd hadoopecho 'pawssw0rd' | passwd-- stdin hadoop

Su-hadoopssh-keygen-t rsa

Upload the public key from node1 to node2, node3, and node4 respectively

Ssh-copy-id-I. ssh / id_rsa.pub hadoop@node2ssh-copy-id-I. ssh / id_rsa.pub hadoop@node3 ssh-copy-id-I. ssh / id_rsa.pub hadoop@node4

Note: the master node should also transfer the public key to its own hadoop account, otherwise, enter the password when starting secondarynamenode.

[hadoop@node1 hadoop] $node1ssh-copy-id-I. ssh / id_rsa.pub hadoop@0.0.0.0

Test login from node1 to node2, node3, node4

[hadoop@OPS01-LINTEST01 ~] $ssh node2 'date'Tue Mar 27 14:26:10 CST 2018 [hadoop@OPS01-LINTEST01 ~] $ssh node3 'date'Tue Mar 27 14:26:13 CST 2018 [hadoop@OPS01-LINTEST01 ~] $ssh node4 'date'Tue Mar 27 14:26:17 CST 2018

3. Configure the hadoop environment

It needs to be executed on node1, node2, node3 and node4.

Vim / etc/profile.d/hadoop.shexport HADOOP_PREFIX=/bdapps/hadoopexport PATH=$PATH:$ {HADOOP_PREFIX} / bin:$ {HADOOP_PREFIX} / sbinexport HADOOP_COMMON_HOME=$ {HADOOP_PREFIX} export HADOOP_YARN_HOME=$ {HADOOP_PREFIX} export HADOOP_HDFS_HOME=$ {HADOOP_PREFIX} export HADOOP_MAPRED_HOME=$ {HADOOP_PREFIX}

Node1 configuration

Create a directory

[root@OPS01-LINTEST01 ~] # mkdir-pv / bdapps / data//hadoop/hdfs/ {nn,snn,dn} mkdir: created directory `/ bdapps'mkdir: created directory` / data//hadoop'mkdir: created directory `/ data//hadoop/hdfs'mkdir: created directory` / data//hadoop/hdfs/nn'mkdir: created directory `/ data//hadoop/hdfs/snn'mkdir: created directory` / data//hadoop/hdfs/dn'

Configure permissions

Chown-R hadoop:hadoop / data/hadoop/hdfs/cd / bdapps/ [root@OPS01-LINTEST01 bdapps] # lshadoop-2.7.5 [root@OPS01-LINTEST01 bdapps] # ln-sv hadoop-2.7.5 hadoop [root@OPS01-LINTEST01 bdapps] # cd hadoop [root@OPS01-LINTEST01 hadoop] # mkdir logs

Change the master group of all files in the hadoop directory to hadoop, and add write permissions to the logs directory

[root@OPS01-LINTEST01 hadoop] # chown-R hadoop:hadoop. / * [root@OPS01-LINTEST01 hadoop] # lltotal 140drwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 bindrwxr-xr-x 3 hadoop hadoop 4096 Dec 16 09:12 etcdrwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 includedrwxr-xr-x 3 hadoop hadoop 4096 Dec 16 09:12 libdrwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 libexec-rw-r--r-- 1 hadoop hadoop 86424 Dec 16 09:12 LICENSE.txtdrwxr -xr-x 2 hadoop hadoop 4096 Mar 27 14:51 logs-rw-r--r-- 1 hadoop hadoop 14978 Dec 16 09:12 NOTICE.txt-rw-r--r-- 1 hadoop hadoop 1366 Dec 16 09:12 README.txtdrwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 sbindrwxr-xr-x 4 hadoop hadoop 4096 Dec 16 09:12 share [root@OPS01-LINTEST01 hadoop] # chmod Grouw logs

Core-site.xml file configuration

Fs.defaultFS hdfs://master:8020 true

Yarm-site.xml file configuration

Note: yarn-site.xml is a ResourceManager role-related configuration. In a production environment, this role and NameNode should be deployed separately, so the master in this file and the master in core-site.xml are not the same machine. Because I am here to simulate distributed deployment in a test environment

NameNode and ResourceManager are deployed on the same machine, so you need to configure this file on the NameNode server.

Yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker.address master:8031 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088 yarn.nodemanager.aux-services mapreduce_shuffle Yarn.nodemanager.auxservices.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

Hdfs-site.xml file configuration

Modify the number of dfs.replication copies retained

Dfs.replication 2 dfs.namenode.name.dir file:///data/hadoop/hdfs/nn dfs.datanode.data.dir file:///data/hadoop/hdfs/dn fs.checkpoint.dir file:///data/hadoop/hdfs/snn fs.checkpoint.edits.dir file:///data/hadoop/hdfs/snn

Mapred-site.xml file configuration

Cp mapred-site.xml.template mapred-site.xml mapreduce.framework.name yarn

Slaves file configuration

Node2node3node4

Hadoop-env.sh file configuration

Export JAVA_HOME=/usr/java/jdk1.8.0_151

Configure node2, node3, node4 nodes

# create hadoop installation directory, data directory and logs directory, and modify permissions mkdir-pv / bdapps/ data/hadoop/hdfs/ {nn,snn,dn} chown-R hadoop.hadoop / data/hadoop/hdfs/tar zxf hadoop-2.7.5.tar.gz-C / bdapps/cd / bdappsln-sv hadoop-2.7.5 hadoop cd hadoopmkdir logschmod Grouw logschown-R hadoop:hadoop. / *

Profile modification

Since we have previously modified the hadoop-related configuration files in the master node (node1), we can synchronize directly from the master node to the node2, node3, and node4 nodes

Scp / bdapps/hadoop/etc/hadoop/* node2:/bdapps/hadoop/etc/hadoop/scp / bdapps/hadoop/etc/hadoop/* node3:/bdapps/hadoop/etc/hadoop/scp / bdapps/hadoop/etc/hadoop/* node4:/bdapps/hadoop/etc/hadoop/

Start hadoop related services

Master node

As in pseudo-distributed mode, the directory used for everywhere data needs to be initialized before the NN of the HDFS cluster is started, if the directory specified by the dfs.namenode.name.dir attribute in the hdfs-site.xml does not exist

The format command is created automatically, and if it exists in advance, make sure that its permissions are set correctly, and the format operation clears all its internal data and re-establishes a new file system. You need to execute the following command on the master node as the hdfs user

Hdfs namenode-format

There are two ways to start a cluster node:

1. Log in to each node to start the service manually

2. Start the whole cluster under the control of the master node

When the cluster is large, it is cumbersome to start the services of each node, so hadoop provides start-dfs.sh and stop-dfs.sh to start and stop the entire hdfs cluster, and start-yarn.sh and stop-yarn.sh to start and stop the entire yarn cluster.

[hadoop@node1 hadoop] $start-dfs.sh Starting namenodes on [master] hadoop@master's password: master: starting namenode, logging to / bdapps/hadoop/logs/hadoop-hadoop-namenode-node1.outnode4: starting datanode, logging to / bdapps/hadoop/logs/hadoop-hadoop-datanode-node4.outnode2: starting datanode, logging to / bdapps/hadoop/logs/hadoop-hadoop-datanode-node2.outnode3: starting datanode Logging to / bdapps/hadoop/logs/hadoop-hadoop-datanode-node3.outStarting secondary namenodes [0.0.0.0] hadoop@0.0.0.0's password: 0.0.0.0: starting secondarynamenode, logging to / bdapps/hadoop/logs/hadoop-hadoop-secondarynamenode-node1.out [hadoop@node1 hadoop] $jps69127 NameNode69691 Jps69566 SecondaryNameNode

[root@node2 ~] # jps66968 DataNode67436 Jps [root@node3 ~] # jps109281 DataNode109991 Jps [root@node4 ~] # jps108753 DataNode109674 Jps

Stop the service of the entire cluster

[hadoop@node1 hadoop] $stop-dfs.shStopping namenodes on [master] master: stopping namenodenode4: stopping datanodenode2: stopping datanodenode3: stopping datanodeStopping secondary namenodes [0.0.0.0] 0.0.0.0: stopping secondarynamenode

test

On the master node, upload a file

[hadoop@node1 ~] $hdfs dfs-mkdir / test [hadoop@node1 ~] $hdfs dfs-put / etc/fstab / test/fstab [hadoop@node1 ~] $hdfs dfs-ls-R / test-rw-r--r-- 2 hadoop supergroup 223 2018-03-27 16:48 / test/fstab

[hadoop@node2] $ls / data/hadoop/hdfs/dn/current/BP-1194588190-172.16.2.3-1522138946011/current/finalized/ [hadoop@node2] $

No fstab file

[hadoop@node3] $cat / data/hadoop/hdfs/dn/current/BP-1194588190-172.16.2.3-1522138946011/current/finalized/subdir0/subdir0/blk_1073741825UUID=dbcbab6c-2836-4ecd-8d1b-2da8fd160694 / ext4 defaults 1 1tmpfs / dev/shm tmpfs defaults 0 0devpts / dev/pts devpts gid=5 Mode=620 0 0sysfs / sys sysfs defaults 0 0proc / proc proc defaults 0 0dev/vdb1 none swap sw 0 0

[hadoop@node4 root] $cat / data/hadoop/hdfs/dn/current/BP-1194588190-172.16.2.3-1522138946011/current/finalized/subdir0/subdir0/blk_1073741825UUID=dbcbab6c-2836-4ecd-8d1b-2da8fd160694 / ext4 defaults 1 1tmpfs / dev/shm tmpfs defaults 0 0devpts / dev/pts devpts gid=5 Mode=620 0 0sysfs / sys sysfs defaults 0 0proc / proc proc defaults 0 0dev/vdb1 none swap sw 0 0

Conclusion: since we keep 2 copies of the data, we only have copies of the files on node3,node4, but not on node2.

Start the yarn cluster

[hadoop@node1 ~] $start-yarn.sh starting yarn daemonsstarting resourcemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-resourcemanager-node1.outnode4: starting nodemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-nodemanager-node4.outnode2: starting nodemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-nodemanager-node2.outnode3: starting nodemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-nodemanager-node3.out [hadoop@node1 ~] $jps78115 ResourceManager71574 NameNode71820 SecondaryNameNode78382 Jps

[ansible@node2 ~] $sudo su-hadoop [hadoop@node2 ~] $jps68800 DataNode75400 Jps74856 NodeManager

View the Web UI console