Hdfs--hadoop-- double namenode triple datanode 07/11 Update SLTechnology News&Howtos

Hdfs--hadoop-- double namenode triple datanode

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Premise: build zookeeper cluster, java environment, secret interworking

Note: the bold part of the text indicates the part that needs to be modified according to the actual situation.

Zookeeper can refer to: zookeeper portal

Download the installation package

Https://mirrors.tuna.tsinghua.edu.cn/apache/ is one of the download sites for installation packages provided by the official website.

structure

Hostname studycentos156clientIP192.168.233.155192.168.233.156192.168.233.158 Service 1zookeeper 1zookeeper 2zookeeper 3 Service 2namenodenamenode Service 3datanodedatanodedatanode Service 4journalnodejournalnode Service 5nodemanagernodemanagernodemanager Service 6zkfczkfc Service 7ResourceManager

Service profile:

Zookeeper: distributed application coordination service.

Namenode: management services. Manage metadata, maintain directory trees, and respond to requests.

Data is stored in datanode:hadoop.

Journalnode: achieve namenode data sharing and maintain data consistency.

Unified Management and allocation of Resources in ResourceManager:yarn Cluster

Nodemanager:ResourceManager 's agent on each machine

Reference documentation: introduction to hadoop

Note: zookeeper and hadoop can not be installed on one machine, only need to be specified in the configuration file.

Start installing the primary node 1-study

Check the java environment

Java-version

If you can display the version notes jdk install ok

# download the file (probably because the version update takes effect)

Cd / tmp

Wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz

# decompress the file

Tar axf hadoop-2.7.5/hadoop-2.7.5.tar.gz-C / usr/local

# rename to facilitate configuration management

Cd / usr/local

Rename hadoop-2.7.5 hadoop hadoop-2.7.5

# modify environment variables

Vim / etc/profile

Add the following at the end of the file

Export HADOOP_HOME=/usr/local/hadoop

Export PATH=$ {HADOOP_HOME} / bin:$ {HADOOP_HOME} / sbin:$ {PATH}

Export HADOOP_MAPARED_HOME=$ {HADOOP_HOME}

Export HADOOP_COMMON_HOME=$ {HADOOP_HOME}

Export HADOOP_HDFS_HOME=$ {HADOOP_HOME}

Export HADOOP_YARN_HOME=$ {HADOOP_HOME}

Export YARN_HOME=$ {HADOOP_HOME}

Export YARN_CONF_DIR=$ {HADOOP_HOME} / etc/hadoop

Export HADOOP_CONF_DIR=$ {HADOOP_HOME} / etc/hadoop

Export HDFS_CONF_DIR=$ {HADOOP_HOME} / etc/hadoop

Export LD_LIBRARY_PATH=$ {HADOOP_HOME} / lib/native/:$ {LD_LIBRARY_PATH}

Export HADOOP_COMMON_LIB_NATIVE_DIR=$ {HADOOP_HOME} / lib/native

Export HADOOP_OPTS= "- Djava.library.path=$ {HADOOP_HOME} / lib/native"

# make it effective

Source / etc/profile

# create a file storage directory

Mkdir-p / usr/local/hadoop/ {name,data,tmp,journal}

# modify configuration file

Cd $HADOOP_HOME/etc/hadoop

# modify the slaves file to specify the slave server

Vim slaves

Study

Centos156

Client

# modify core-site.xml, specify hdfs cluster, temporary file directory, zookeeper, etc.

Vim core-site.xml

Fs.defaultFS

Hdfs://hadoop

The logical service name of HDFS, and the hadoop location can be written as anything

Hadoop.tmp.dir

/ usr/local/hadoop/tmp

Hadoop temporary file storage directory, or you can write file:/usr/local/hadoop/tmp

Io.file.buffer.size

4096

Specify the IO cache size of the execution file, so that the machine can be set larger.

Ha.zookeeper.quorum

Study:2181,centos156:2181,client:2181

Specify zookeeper address

Reference documentation

Vim hdfs-site.xml

Dfs.nameservices

Hadoop

The logical name of HDFS NN needs to be the same as the logical service name of HDFS in core-site.xml. Here, use the hadoop above

Dfs.ha.namenodes.hadoop

Study,centos156

Hadoop logical name namenode node list, hadoop corresponds to logical name

Dfs.namenode.rpc-address.hadoop.study

Study:9000

Rpc communication address of study in hadoop

Dfs.namenode.http-address.hadoop.study

Study:50070

Http communication address of study in hadoop

Dfs.namenode.rpc-address.hadoop.centos156

Centos156:9000

Rpc communication address of centos156 in hadoop

Dfs.namenode.http-address.hadoop.centos156

Centos156:50070

Http communication address of centos156 in hadoop

Dfs.namenode.shared.edits.dir

Qjournal://study:8485;centos156:8485;client:8485/hadoop

The URI address of the journalNode, and the active namenode writes the edit log to the journalNode

Dfs.journalnode.edits.dir

/ usr/local/hadoop/journal

Directory used to store editlog and other status information

Dfs.ha.automatic-failover.enabled

True

Start automatic failover. Please see the reference documentation for details.

Dfs.client.failover.proxy.provider.hadoop

Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

Java class that implements the interaction between the client and active NameNode

Dfs.ha.fencing.methods

Sshfence

To solve the problem of brain fissure in HA cluster, only one nn is allowed to write data

Dfs.ha.fencing.ssh.private-key-files

/ root/.ssh/id_rsa

The location stored ssh key, which specifies the user key. It is recommended that you do not use root. It is used for failover. It can not be set.

Dfs.ha.fencing.ssh.connect-timeout

5000

Ssh connection timeout. If the key is not set above, you may not set this.

Dfs.namenode.name.dir

File:/usr/local/hadoop/name

Namenode data storage directory

Dfs.datanode.data.dir

File:/usr/local/hadoop/data

Datanode data storage directory

Dfs.replication

two

Parameter client, parameter node level, which specifies how many copies a file has in hdfs. You can set it to 2 or 3. You cannot miss the number of datanode nodes.

Dfs.webhdfs.enabled

True

Allow webhdfs for data reading

Webhdfs detailed explanation

Vim mapred-site.xml

Mapreduce.framework.name

Yarn

Mapreduce framework, generally using yarn

MapReduce detailed explanation

Vim yarn-site.xml

Yarn.nodemanager.aux-services

Mapreduce_shuffle

The way nodemanager loads services at startup is assigned to mapreduce

Yarn.nodemanager.aux-services.mapreduce_shuffle.class

Org.apache.hadoop.mapred.ShuffleHandler

Java class that implements mapreduce_shuffle

Yarn.resourcemanager.hostname

Study

List of resourcemanager nodes, which can be found on a namenode

Modify the JAVA_HOME in the script file

Vim hadoop-env.sh

Export JAVA_HOME=/usr/local/jdk

Note that export HADOOP_SSH_OPTS is an automatic hadoop service ssh usage port. If you do not use the default port 22, please modify this option, otherwise HA cannot implement it.

Vim yarn-env.sh

JAVA_HOME=/usr/local/jdk

Major Node 2-centos156

# copy the hadoop file on the study node to the / usr/local directory of centos156

Scp-r study:/usr/local/hadoop / usr/local

# modify environment variables

Scp-r study:/etc/profile / etc/

Source / etc/profile

# if you are using another user to execute hadoop, you need to modify the owner and other information of the hadoop file

Slave node-client

# same as the main node 2-centos156 operation

Scp-r study:/usr/local/hadoop / usr/local

Scp-r study:/etc/profile / etc/

Source / etc/profile

Start the hadoop cluster

The first time you run hadoop, you need to format the data, so it will be troublesome to start, and then you only need start-all.sh to stop stop-all.sh.

Premise: zookeeper status is normal, jdk status is normal, and environment variables are set normally

Primary node 1-study

# create a namespace

Hdfs zkfc-formatZK

# start journalnode

Hadoop-daemon.sh start journalnode (preferably started by all three nodes)

# formatting namenode

Hdfs namenode-format hadoop

# start namenode

Hadoop-daemon.sh start namenode

# start zfkc

Hadoop-daemon.sh start zkfc

Major Node 2-centos156

# start journalnode

Hadoop-daemon.sh start journalnode

# obtain formatted metadata from hdfs namenode-bootstrapStandby

Hdfs namenode-bootstrapStandby

# start namenode

Hadoop-daemon.sh start namenode

# start zfkc

Hadoop-daemon.sh start zkfc

Slave node-client

# start journalnode

Hadoop-daemon.sh start journalnode

Restart the hadoop cluster after the above three node commands have been run

Shut down the cluster

Stop-all.sh

Start the cluster

Start-all.sh

Access the status page

Http://192.168.233.155:50070/dfshealth.html#tab-overview

Http://192.168.233.156:50070/dfshealth.html#tab-overview

The datanodes on the page can see the status of datanode, and utilities can view files and logs

Hadoop common commands

Shut down the cluster stop-all.sh

Start the cluster start-all.sh

Start namenode hadoop-daemon.sh start namenode separately

Start datanode hadoop-daemon.sh start datanode separately

Start journalnode hadoop-daemon.sh start journalnode separately

Start zkfc hadoop-daemon.sh start zkfc separately

Manually transfer active namenode hdfs haadmin-transitionToActive-- forcemanual study

View / the following file hadoop fs-ls /

Upload files to hadoop hadoop fs-put

I.e. Hadoop fs-put / etc/passwd /

Create the command hadoop fs-mkdir in hadoop

I.e. Hadoop fs-mkdir / tmp

Create an empty file hadoop fs-touchz

I.e. Hadoop fs-touchz / tmp/hello

View the file hadoop fs-cat

I.e. Hadoop fs-cat / passwd

Move or rename hadoop fs-mv

Download a file or directory from hadoop to local

Hadoop fs-get

I.e. Hadoop fs-get / passwd / tmp

Modify file permissions hadoop fs-chmod [- R]

I.e. Hadoop fs-chmod 777 / passwd

Delete the file hadoop fs-rm

Delete the directory hadoop fs-rm-r

Hadoop data migration data backup

Mkdir / tmp/hadoop

Chmod 777 / tmp/hadoop

Hadoop fs-copyToLocal hdfs://study:9000/ / tmp/hadoop

Data recovery

First transfer the file to the target machine using a U disk or any other means

Hadoop fs-copyFromLocal / tmp/hadoop hdfs://study:9000/

Hadoop fs-ls /

Error reporting processing

Master: Host key verification failed.

Please check whether the authorized_keys and known_hosts files have information about the host, and whether the ssh hostname can be connected.

A master node in hadoop cluster starts abnormally after namenode hangs up.

2018-01-17 08 master.HMaster 53 15 FATAL [hadoop1:16000.activeMasterManager] master.HMaster: Failed to become active master

Org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby

At org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation (StandbyState.java:87)

At org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation (NameNode.java:1774)

At org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation (FSNamesystem.java:1313)

At org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo (FSNamesystem.java:3850)

At org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo (NameNodeRpcServer.java:1011)

At org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo (ClientNamenodeProtocolServerSideTranslatorPB.java:843)

At org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod (ClientNamenodeProtocolProtos.java)

Process:

Check the status (see the web page) and find that the status of study is standby,centos156 and the status is active.

If there is one active, the cluster status is still normal, and if both are standby, it is an exception.

The abnormal state may be caused by abnormal data. The namenode hangs for a long time and the data is out of sync. Method 1. Synchronize the data from centos 156nodes. Method 2. Delete all hadoop data on all nodes and delete the hadoop-ha directory in zk.

Method two is used this time.

All nodes delete all files under the name, data, logs, tmp directories under ${HADOOP_HOME}

Delete data in zk

ZkCli.sh

Ls /

Rmr / hadoop-ha

Quit

Regenerate data

Primary node 1-study

# create a namespace

Hdfs zkfc-formatZK

# start journalnode

Hadoop-daemon.sh start journalnode (preferably started by all three nodes)

# formatting namenode

Hdfs namenode-format hadoop

# start namenode

Hadoop-daemon.sh start namenode

# start zfkc

Hadoop-daemon.sh start zkfc

Major Node 2-centos156

# start journalnode

Hadoop-daemon.sh start journalnode

# obtain formatted metadata from hdfs namenode-bootstrapStandby

Hdfs namenode-bootstrapStandby

# start namenode

Hadoop-daemon.sh start namenode

# start zfkc

Hadoop-daemon.sh start zkfc

Slave node-client

# start journalnode

Hadoop-daemon.sh start journalnode

Check the status must be active, it's all new

! Attention! Rebuilding hadoop data will cause hbase to be unable to obtain the data in zookeeper. We don't know how to recover it, so we can only delete the / hbase/table table reference in zookeeper.

Hadoop introduction

Profile reference documentation

Webhdfs detailed explanation

Hadoop HDFS common file operation commands

Hadoop fs manages file permissions: https://www.cnblogs.com/linn/p/5526071.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.