Centos7 builds hadoop2.10 High availability (HA) 07/09 Update SLTechnology News&Howtos

Centos7 builds hadoop2.10 High availability (HA)

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

This article introduces how to build a hadoop2.10 high availability cluster in centos7. First, prepare 6 machines: 2 nn (namenode), 4 dn (datanode), 3 jns (journalnodes).

IPhostname processes 192.168.30.141s141nn1 (namenode), zkfc (DFSZKFailoverController), zk (QuorumPeerMain) 192.168.30.142s142dn (datanode), jn (journalnode), zk (QuorumPeerMain) 192.168.30.143s143dn (datanode), jn (journalnode), zk (QuorumPeerMain) 192.168.30.144s144dn (datanode), jn (journalnode) 192.168.30.145s145dn (datanode) 192.168.30.146s146nn2 (namenode), zkfc (DFSZKFailoverController)

Each machine jps process:

Since I use a vmware virtual machine, after configuring a machine, I use cloning, clone the remaining machines, and modify hostname and IP, so that each machine configuration is unified to add hdfs users and user groups, configure jdk environment, and install hadoop. This time, we build a high availability cluster under hdfs users, you can refer to: centos7 to build hadoop2.10 pseudo-distribution mode

Here are some steps and details for installing a highly available cluster:

1. Set up hostname and hosts for each machine

Modify the hosts file. After the hosts is set, you can use hostname to access the machine, which is convenient. Modify as follows:

127.0.0.1 locahost192.168.30.141 s141192.168.30.142 s142192.168.30.143 s143192.168.30.144 s144192.168.30.145 s145192.168.30.146 s146

two。 Set ssh secret login. Since both S141 and S146 are namenode, it is best to log in to all machines without secret login from both hdfs and root users.

If we set s141 to nn1,s146 and nn2, we need s141 and s146 to be able to log in to other machines through ssh, so we need to generate a key pair under the hdfs user of s141 and s146 machines, and send the s141 and s146 public keys to other machines and put them in the ~ / .ssh/authorized_keys file. More specifically, we need to add the public key to all machines (including ourselves).

Generate key pairs on s141 and s146 machines

Ssh-keygen-t rsa-P''- f ~ / .ssh/id_rsa

Append the contents of the id_rsa.pub file to the / home/hdfs/.ssh/authorized_keys of the s141-s146 machine. Now that no authorized_keys file is available on other machines, we can rename the id_rsa.pub to authorized_keys. If the id_rsa.pub file already exists on other machines, you can append the id_rsa.pub content to the file, and you can use the scp command for remote replication:

S141 machine public key is copied to other machines

Scp id_rsa.pub hdfs@s141:/home/hdfs/.ssh/id_rsa_141.pubscp id_rsa.pub hdfs@s142:/home/hdfs/.ssh/id_rsa_141.pubscp id_rsa.pub hdfs@s143:/home/hdfs/.ssh/id_rsa_141.pubscp id_rsa.pub hdfs@s144:/home/hdfs/.ssh/id_rsa_141.pubscp id_rsa.pub hdfs@s145:/home/hdfs/.ssh/id_rsa_141. Pubscp id_rsa.pub hdfs@s146:/home/hdfs/.ssh/id_rsa_141.pub

S146 machine public key is copied to other machines

Scp id_rsa.pub hdfs@s141:/home/hdfs/.ssh/id_rsa_146.pubscp id_rsa.pub hdfs@s142:/home/hdfs/.ssh/id_rsa_146.pubscp id_rsa.pub hdfs@s143:/home/hdfs/.ssh/id_rsa_146.pubscp id_rsa.pub hdfs@s144:/home/hdfs/.ssh/id_rsa_146.pubscp id_rsa.pub hdfs@s145:/home/hdfs/.ssh/id_rsa_146. Pubscp id_rsa.pub hdfs@s146:/home/hdfs/.ssh/id_rsa_146.pub

You can use cat to append the secret key to the authorized_keys file on each machine

Cat id_rsa_141.pub > > authorized_keyscat id_rsa_146.pub > > authorized_keys

At this point, the authorized_keys file permission needs to be changed to 644 (note that this permission issue often leads to the failure of ssh secret login)

Chmod 644 authorized_keys

3. Configure the hadoop configuration file (${hadoop_home} / etc/hadoop/)

Configuration details:

Note: S141 and S146 have exactly the same configuration, especially ssh.

1) configure nameservice

[hdfs-site.xml] dfs.nameservices mycluster

2) dfs.ha.namenodes. [nameservice ID]

[hdfs-site.xml] dfs.ha.namenodes.mycluster nn1,nn2

3) dfs.namenode.rpc-address. [nameservice ID]. [namenode ID]

[hdfs-site.xml] configure the rpc address for each nn. Dfs.namenode.rpc-address.mycluster.nn1 s141:8020 dfs.namenode.rpc-address.mycluster.nn2 s146:8020

4) dfs.namenode.http-address. [nameservice ID]. [namenode ID]

Configure webui Port

[hdfs-site.xml] dfs.namenode.http-address.mycluster.nn1 s141:50070 dfs.namenode.http-address.mycluster.nn2 s146:50070

5) dfs.namenode.shared.edits.dir

The name node shares the editing directory. Select three journalnode nodes, and here select three machines: S142, s143, and s144

[hdfs-site.xml] dfs.namenode.shared.edits.dir qjournal://s142:8485;s143:8485;s144:8485/mycluster

6) dfs.client.failover.proxy.provider. [nameservice ID]

Configure a java class for HA failover (the configuration is fixed), which client uses to determine which node is active.

[hdfs-site.xml] dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

7) dfs.ha.fencing.methods

Script list or java class, nn in the disaster recovery active state.

[hdfs-site.xml] dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files / home/hdfs/.ssh/id_rsa

8) fs.defaultFS

Configure the hdfs file system name service. The mycluster here is the dfs.nameservices configured above

[core-site.xml] fs.defaultFS hdfs://mycluster

9) dfs.journalnode.edits.dir

Configure the local path where JN stores edit.

[hdfs-site.xml] dfs.journalnode.edits.dir / home/hdfs/hadoop/journal

Full profile:

Core-site.xml

Fs.defaultFS hdfs://mycluster/ hadoop.tmp.dir / home/hdfs/hadoop

Hdfs-site.xml

Dfs.replication 3 dfs.hosts / opt/soft/hadoop/etc/dfs.include.txt dfs.hosts.exclude / opt/soft/hadoop/etc/dfs.hosts.exclude.txt dfs.nameservices mycluster dfs.ha.namenodes.mycluster nn1 Nn2 dfs.namenode.rpc-address.mycluster.nn1 s141:8020 dfs.namenode.rpc-address.mycluster.nn2 s146:8020 dfs.namenode.http-address.mycluster.nn1 s141:50070 dfs.namenode.http-address.mycluster.nn2 s146:50070 dfs.namenode.shared.edits.dir qjournal://s142:8485 S143VOL8485 × s144VOL8485According to mycluster dfs.client.failover.proxy.provider.mycluster org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh.private-key-files / home/hdfs/.ssh/id_rsa dfs.journalnode.edits.dir / home/hdfs/hadoop/journal

Mapred-site.xml

Mapreduce.framework.name yarn

Yarn-site.xml

Yarn.resourcemanager.hostname s141 yarn.nodemanager.aux-services mapreduce_shuffle

4. Deployment details

1) start the jn process on the jn node respectively (s142 _ mens143 _ _ 144)

Hadoop-daemon.sh start journalnode

2) after starting jn, synchronize disk metadata between the two NN

A) if it is a brand new cluster, format the file system first, which only needs to be executed on a nn.

[s141 | s146]

Hadoop namenode-format

B) if you convert a non-HA cluster to a HA cluster, copy the metadata of the original NN to another NN.

1. Step one

On the s141 machine, copy the hadoop data to the directory corresponding to s146

Scp-r / home/hdfs/hadoop/dfs hdfs@s146:/home/hdfs/hadoop/

two。 Step two

Run the following command on the new nn (unformatted nn, in my case S146) to implement standby boot. Note: s141namenode is required to be in startup state (can be executed: hadoop-daemon.sh start namenode).

Hdfs namenode-bootstrapStandby

If the s141 name node is not started, it will fail, as shown in the figure:

After starting the s141 name node, execute the command on s141

Hadoop-daemon.sh start namenode

Then after executing the standby boot command, note: prompt whether to format, select N, as shown in the figure:

3. Step three

Execute the following command on one of the NN to complete the transfer of edit logs to the jn node.

Hdfs namenode-initializeSharedEdits

If a java.nio.channels.OverlappingFileLockException error is reported during execution:

It indicates that the namenode node (hadoop-daemon.sh stop namenode) needs to be stopped during namenode startup.

After execution, check to see if there is edit data in s142Query s143. Here, check the production mycluster directory, which contains editing log data, as shown below:

4. Step 4

Start all nodes.

Start the name node and all data nodes on s141:

Hadoop-daemon.sh start namenodehadoop-daemons.sh start datanode

Start the name node on s146

Hadoop-daemon.sh start namenode

When you visit http://192.168.30.141:50070/ and http://192.168.30.146:50070/ in your browser, you will find that both namenode are standby.

At this point, you need to manually use the command to switch one of them to active mode, where S141 (nn1) is set to active

Hdfs haadmin-transitionToActive nn1

At this point, s141 is active.

Common hdfs haadmin commands:

At this point, the manual disaster recovery configuration is highly available, but this method is not intelligent and cannot automatically perceive disaster recovery, so the automatic disaster recovery configuration is introduced below.

5. Automatic disaster recovery configuration

Need to introduce zookeeperquarum and zk disaster recovery controller (ZKFC) two components

Set up a zookeeper cluster, select three machines, s141, http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.5.6, s142, and download zookeeper: cluster

1) decompress zookeeper:

Tar-xzvf apache-zookeeper-3.5.6-bin.tar.gz-C / opt/soft/zookeeper-3.5.6

2) configure environment variables, add zk environment variables to / etc/profile, and recompile the / etc/profile file

The copy code is as follows: source / etc/profile

3) configure the zk configuration file, which is unified for the three machines.

# The number of milliseconds of each ticktickTime=2000# The number of ticks that the initial # synchronization phase can takeinitLimit=10# The number of ticks that can pass between # sending a request and getting an acknowledgementsyncLimit=5# the directory where the snapshot is stored.# do not use / tmp for storage / tmp here is just # example sakes.dataDir=/home/hdfs/zookeeper# the port at which the clients will connectclientPort=2181# the maximum number of client connections.# increase this if you need to handle more clients#maxClientCnxns=60## Be sure to read the maintenance section of the # administrator guide before turning on autopurge.## http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance## The number of snapshots to retain in dataDir#autopurge.snapRetainCount=3# Purge task interval in hours# Set to "0" to disable autopurge feature# Autopurge.purgeInterval=1server.1=s141:2888:3888server.2=s142:2888:3888server.3=s143:2888:3888

4) respectively

Create a myid file with a value of 1 (corresponding to the server.1 in the zoo.cfg configuration file) under the / home/hdfs/zookeeper directory of s141 (the dataDir path configured in the zoo.cfg configuration file)

Create a myid file with a value of 2 (corresponding to server.2 in the zoo.cfg configuration file) under the / home/hdfs/zookeeper directory of S142 (the dataDir path configured in the zoo.cfg configuration file)

Create a myid file with a value of 3 (corresponding to server.3 in the zoo.cfg configuration file) under the / home/hdfs/zookeeper (dataDir path configured in the zoo.cfg configuration file) directory of S143

5) start zk on each machine separately

ZkServer.sh start

If you start successfully, the zk process will appear:

Configure hdfs-related configurations:

1) stop all processes in hdfs

Stop-all.sh

2) configure hdfs-site.xml to enable automatic disaster recovery.

[hdfs-site.xml] dfs.ha.automatic-failover.enabled true

3) configure core-site.xml and specify the connection address of zk.

Ha.zookeeper.quorum s141:2181,s142:2181,s143:2181

4) distribute the above two files to all nodes.

5) in one of the NN (s141), initialize the HA state in ZK

Hdfs zkfc-formatZK

The following results indicate success:

You can also check it in zk:

6) start the hdfs cluster

Start-dfs.sh

View each machine process:

Start successfully. Take a look at webui again.

S146 is the active state.

S141 is on standby

At this point, the hadoop automatic disaster recovery HA has been built.

Summary

The above is the editor to introduce to you the centos7 build hadoop2.10 high availability (HA), I hope to help you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.