In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Premise: build zookeeper cluster, java environment, secret interworking
Note: the bold part of the text indicates the part that needs to be modified according to the actual situation.
Zookeeper can refer to: zookeeper portal
Download the installation package
Https://mirrors.tuna.tsinghua.edu.cn/apache/ is one of the download sites for installation packages provided by the official website.
structure
Hostname studycentos156clientIP192.168.233.155192.168.233.156192.168.233.158 Service 1zookeeper 1zookeeper 2zookeeper 3 Service 2namenodenamenode Service 3datanodedatanodedatanode Service 4journalnodejournalnode Service 5nodemanagernodemanagernodemanager Service 6zkfczkfc Service 7ResourceManager
Service profile:
Zookeeper: distributed application coordination service.
Namenode: management services. Manage metadata, maintain directory trees, and respond to requests.
Data is stored in datanode:hadoop.
Journalnode: achieve namenode data sharing and maintain data consistency.
Unified Management and allocation of Resources in ResourceManager:yarn Cluster
Nodemanager:ResourceManager 's agent on each machine
Reference documentation: introduction to hadoop
Note: zookeeper and hadoop can not be installed on one machine, only need to be specified in the configuration file.
Start installing the primary node 1-study
Check the java environment
Java-version
If you can display the version notes jdk install ok
# download the file (probably because the version update takes effect)
Cd / tmp
Wget https://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.5/hadoop-2.7.5.tar.gz
# decompress the file
Tar axf hadoop-2.7.5/hadoop-2.7.5.tar.gz-C / usr/local
# rename to facilitate configuration management
Cd / usr/local
Rename hadoop-2.7.5 hadoop hadoop-2.7.5
# modify environment variables
Vim / etc/profile
Add the following at the end of the file
Export HADOOP_HOME=/usr/local/hadoop
Export PATH=$ {HADOOP_HOME} / bin:$ {HADOOP_HOME} / sbin:$ {PATH}
Export HADOOP_MAPARED_HOME=$ {HADOOP_HOME}
Export HADOOP_COMMON_HOME=$ {HADOOP_HOME}
Export HADOOP_HDFS_HOME=$ {HADOOP_HOME}
Export HADOOP_YARN_HOME=$ {HADOOP_HOME}
Export YARN_HOME=$ {HADOOP_HOME}
Export YARN_CONF_DIR=$ {HADOOP_HOME} / etc/hadoop
Export HADOOP_CONF_DIR=$ {HADOOP_HOME} / etc/hadoop
Export HDFS_CONF_DIR=$ {HADOOP_HOME} / etc/hadoop
Export LD_LIBRARY_PATH=$ {HADOOP_HOME} / lib/native/:$ {LD_LIBRARY_PATH}
Export HADOOP_COMMON_LIB_NATIVE_DIR=$ {HADOOP_HOME} / lib/native
Export HADOOP_OPTS= "- Djava.library.path=$ {HADOOP_HOME} / lib/native"
# make it effective
Source / etc/profile
# create a file storage directory
Mkdir-p / usr/local/hadoop/ {name,data,tmp,journal}
# modify configuration file
Cd $HADOOP_HOME/etc/hadoop
# modify the slaves file to specify the slave server
Vim slaves
Study
Centos156
Client
# modify core-site.xml, specify hdfs cluster, temporary file directory, zookeeper, etc.
Vim core-site.xml
Fs.defaultFS
Hdfs://hadoop
The logical service name of HDFS, and the hadoop location can be written as anything
Hadoop.tmp.dir
/ usr/local/hadoop/tmp
Hadoop temporary file storage directory, or you can write file:/usr/local/hadoop/tmp
Io.file.buffer.size
4096
Specify the IO cache size of the execution file, so that the machine can be set larger.
Ha.zookeeper.quorum
Study:2181,centos156:2181,client:2181
Specify zookeeper address
Reference documentation
Vim hdfs-site.xml
Dfs.nameservices
Hadoop
The logical name of HDFS NN needs to be the same as the logical service name of HDFS in core-site.xml. Here, use the hadoop above
Dfs.ha.namenodes.hadoop
Study,centos156
Hadoop logical name namenode node list, hadoop corresponds to logical name
Dfs.namenode.rpc-address.hadoop.study
Study:9000
Rpc communication address of study in hadoop
Dfs.namenode.http-address.hadoop.study
Study:50070
Http communication address of study in hadoop
Dfs.namenode.rpc-address.hadoop.centos156
Centos156:9000
Rpc communication address of centos156 in hadoop
Dfs.namenode.http-address.hadoop.centos156
Centos156:50070
Http communication address of centos156 in hadoop
Dfs.namenode.shared.edits.dir
Qjournal://study:8485;centos156:8485;client:8485/hadoop
The URI address of the journalNode, and the active namenode writes the edit log to the journalNode
Dfs.journalnode.edits.dir
/ usr/local/hadoop/journal
Directory used to store editlog and other status information
Dfs.ha.automatic-failover.enabled
True
Start automatic failover. Please see the reference documentation for details.
Dfs.client.failover.proxy.provider.hadoop
Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
Java class that implements the interaction between the client and active NameNode
Dfs.ha.fencing.methods
Sshfence
To solve the problem of brain fissure in HA cluster, only one nn is allowed to write data
Dfs.ha.fencing.ssh.private-key-files
/ root/.ssh/id_rsa
The location stored ssh key, which specifies the user key. It is recommended that you do not use root. It is used for failover. It can not be set.
Dfs.ha.fencing.ssh.connect-timeout
5000
Ssh connection timeout. If the key is not set above, you may not set this.
Dfs.namenode.name.dir
File:/usr/local/hadoop/name
Namenode data storage directory
Dfs.datanode.data.dir
File:/usr/local/hadoop/data
Datanode data storage directory
Dfs.replication
two
Parameter client, parameter node level, which specifies how many copies a file has in hdfs. You can set it to 2 or 3. You cannot miss the number of datanode nodes.
Dfs.webhdfs.enabled
True
Allow webhdfs for data reading
Webhdfs detailed explanation
Vim mapred-site.xml
Mapreduce.framework.name
Yarn
Mapreduce framework, generally using yarn
MapReduce detailed explanation
Vim yarn-site.xml
Yarn.nodemanager.aux-services
Mapreduce_shuffle
The way nodemanager loads services at startup is assigned to mapreduce
Yarn.nodemanager.aux-services.mapreduce_shuffle.class
Org.apache.hadoop.mapred.ShuffleHandler
Java class that implements mapreduce_shuffle
Yarn.resourcemanager.hostname
Study
List of resourcemanager nodes, which can be found on a namenode
Modify the JAVA_HOME in the script file
Vim hadoop-env.sh
Export JAVA_HOME=/usr/local/jdk
Note that export HADOOP_SSH_OPTS is an automatic hadoop service ssh usage port. If you do not use the default port 22, please modify this option, otherwise HA cannot implement it.
Vim yarn-env.sh
JAVA_HOME=/usr/local/jdk
Major Node 2-centos156
# copy the hadoop file on the study node to the / usr/local directory of centos156
Scp-r study:/usr/local/hadoop / usr/local
# modify environment variables
Scp-r study:/etc/profile / etc/
Source / etc/profile
# if you are using another user to execute hadoop, you need to modify the owner and other information of the hadoop file
Slave node-client
# same as the main node 2-centos156 operation
Scp-r study:/usr/local/hadoop / usr/local
Scp-r study:/etc/profile / etc/
Source / etc/profile
Start the hadoop cluster
The first time you run hadoop, you need to format the data, so it will be troublesome to start, and then you only need start-all.sh to stop stop-all.sh.
Premise: zookeeper status is normal, jdk status is normal, and environment variables are set normally
Primary node 1-study
# create a namespace
Hdfs zkfc-formatZK
# start journalnode
Hadoop-daemon.sh start journalnode (preferably started by all three nodes)
# formatting namenode
Hdfs namenode-format hadoop
# start namenode
Hadoop-daemon.sh start namenode
# start zfkc
Hadoop-daemon.sh start zkfc
Major Node 2-centos156
# start journalnode
Hadoop-daemon.sh start journalnode
# obtain formatted metadata from hdfs namenode-bootstrapStandby
Hdfs namenode-bootstrapStandby
# start namenode
Hadoop-daemon.sh start namenode
# start zfkc
Hadoop-daemon.sh start zkfc
Slave node-client
# start journalnode
Hadoop-daemon.sh start journalnode
Restart the hadoop cluster after the above three node commands have been run
Shut down the cluster
Stop-all.sh
Start the cluster
Start-all.sh
Access the status page
Http://192.168.233.155:50070/dfshealth.html#tab-overview
Http://192.168.233.156:50070/dfshealth.html#tab-overview
The datanodes on the page can see the status of datanode, and utilities can view files and logs
Hadoop common commands
Shut down the cluster stop-all.sh
Start the cluster start-all.sh
Start namenode hadoop-daemon.sh start namenode separately
Start datanode hadoop-daemon.sh start datanode separately
Start journalnode hadoop-daemon.sh start journalnode separately
Start zkfc hadoop-daemon.sh start zkfc separately
Manually transfer active namenode hdfs haadmin-transitionToActive-- forcemanual study
View / the following file hadoop fs-ls /
Upload files to hadoop hadoop fs-put
I.e. Hadoop fs-put / etc/passwd /
Create the command hadoop fs-mkdir in hadoop
I.e. Hadoop fs-mkdir / tmp
Create an empty file hadoop fs-touchz
I.e. Hadoop fs-touchz / tmp/hello
View the file hadoop fs-cat
I.e. Hadoop fs-cat / passwd
Move or rename hadoop fs-mv
Download a file or directory from hadoop to local
Hadoop fs-get
I.e. Hadoop fs-get / passwd / tmp
Modify file permissions hadoop fs-chmod [- R]
I.e. Hadoop fs-chmod 777 / passwd
Delete the file hadoop fs-rm
Delete the directory hadoop fs-rm-r
Hadoop data migration data backup
Mkdir / tmp/hadoop
Chmod 777 / tmp/hadoop
Hadoop fs-copyToLocal hdfs://study:9000/ / tmp/hadoop
Data recovery
First transfer the file to the target machine using a U disk or any other means
Hadoop fs-copyFromLocal / tmp/hadoop hdfs://study:9000/
Hadoop fs-ls /
Error reporting processing
Master: Host key verification failed.
Please check whether the authorized_keys and known_hosts files have information about the host, and whether the ssh hostname can be connected.
A master node in hadoop cluster starts abnormally after namenode hangs up.
2018-01-17 08 master.HMaster 53 15 FATAL [hadoop1:16000.activeMasterManager] master.HMaster: Failed to become active master
Org.apache.hadoop.ipc.RemoteException (org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
At org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation (StandbyState.java:87)
At org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation (NameNode.java:1774)
At org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation (FSNamesystem.java:1313)
At org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo (FSNamesystem.java:3850)
At org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo (NameNodeRpcServer.java:1011)
At org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo (ClientNamenodeProtocolServerSideTranslatorPB.java:843)
At org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod (ClientNamenodeProtocolProtos.java)
Process:
Check the status (see the web page) and find that the status of study is standby,centos156 and the status is active.
If there is one active, the cluster status is still normal, and if both are standby, it is an exception.
The abnormal state may be caused by abnormal data. The namenode hangs for a long time and the data is out of sync. Method 1. Synchronize the data from centos 156nodes. Method 2. Delete all hadoop data on all nodes and delete the hadoop-ha directory in zk.
Method two is used this time.
All nodes delete all files under the name, data, logs, tmp directories under ${HADOOP_HOME}
Delete data in zk
ZkCli.sh
Ls /
Rmr / hadoop-ha
Quit
Regenerate data
Primary node 1-study
# create a namespace
Hdfs zkfc-formatZK
# start journalnode
Hadoop-daemon.sh start journalnode (preferably started by all three nodes)
# formatting namenode
Hdfs namenode-format hadoop
# start namenode
Hadoop-daemon.sh start namenode
# start zfkc
Hadoop-daemon.sh start zkfc
Major Node 2-centos156
# start journalnode
Hadoop-daemon.sh start journalnode
# obtain formatted metadata from hdfs namenode-bootstrapStandby
Hdfs namenode-bootstrapStandby
# start namenode
Hadoop-daemon.sh start namenode
# start zfkc
Hadoop-daemon.sh start zkfc
Slave node-client
# start journalnode
Hadoop-daemon.sh start journalnode
Check the status must be active, it's all new
! Attention! Rebuilding hadoop data will cause hbase to be unable to obtain the data in zookeeper. We don't know how to recover it, so we can only delete the / hbase/table table reference in zookeeper.
Hadoop introduction
Profile reference documentation
Webhdfs detailed explanation
Hadoop HDFS common file operation commands
Hadoop fs manages file permissions: https://www.cnblogs.com/linn/p/5526071.html
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.