In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Roles in Hadoop Cluster:
HDFS:
NameNode,NN
SecondaryNameNode,SNN
DataNode,DN
YARN:
ResourceManager
NodeManager
Considerations for hadoop distributed deployment in a production environment:
HDFS Cluster:
NameNode and Secondary should be deployed separately to avoid simultaneous failure of NameNode and SecondaryNode that cannot be recovered
The number of DataNode should be at least 3, because at least 3 copies of data should be saved.
YARN Cluster:
ResourceManager is deployed on separate nodes
NodeManager runs on DataNode
The Hadoop cluster architecture is shown in the following figure:
When I deployed distributed in the test environment, I deployed the NameNode, SecondaryNameNode and ResourceManager roles on the same server Master node
Three slave nodes deploy DataNode and NodeManager
1. Configure the hosts file
Append the following to the / etc/hosts file on node1, node2, node3, node4
172.16.2.3 node1.hadooptest.com node1 master172.16.2.14 node2.hadooptest.com node2172.16.2.60 node3.hadooptest.com node3172.16.2.61 node4.hadooptest.com node4
2. Create hadoop users and groups
If you need to start or stop the entire cluster through the master node, you also need to configure users on the master node to run the service (such as hdfs and yarn) to be able to remotely connect to each slave node through ssh in the form of a key
Execute on node1, node2, node3 and node4, respectively
Useradd hadoopecho 'pawssw0rd' | passwd-- stdin hadoop
Log in to node1 and create a key
Su-hadoopssh-keygen-t rsa
Upload the public key from node1 to node2, node3, and node4 respectively
Ssh-copy-id-I. ssh / id_rsa.pub hadoop@node2ssh-copy-id-I. ssh / id_rsa.pub hadoop@node3 ssh-copy-id-I. ssh / id_rsa.pub hadoop@node4
Note: the master node should also transfer the public key to its own hadoop account, otherwise, enter the password when starting secondarynamenode.
[hadoop@node1 hadoop] $node1ssh-copy-id-I. ssh / id_rsa.pub hadoop@0.0.0.0
Test login from node1 to node2, node3, node4
[hadoop@OPS01-LINTEST01 ~] $ssh node2 'date'Tue Mar 27 14:26:10 CST 2018 [hadoop@OPS01-LINTEST01 ~] $ssh node3 'date'Tue Mar 27 14:26:13 CST 2018 [hadoop@OPS01-LINTEST01 ~] $ssh node4 'date'Tue Mar 27 14:26:17 CST 2018
3. Configure the hadoop environment
It needs to be executed on node1, node2, node3 and node4.
Vim / etc/profile.d/hadoop.shexport HADOOP_PREFIX=/bdapps/hadoopexport PATH=$PATH:$ {HADOOP_PREFIX} / bin:$ {HADOOP_PREFIX} / sbinexport HADOOP_COMMON_HOME=$ {HADOOP_PREFIX} export HADOOP_YARN_HOME=$ {HADOOP_PREFIX} export HADOOP_HDFS_HOME=$ {HADOOP_PREFIX} export HADOOP_MAPRED_HOME=$ {HADOOP_PREFIX}
Node1 configuration
Create a directory
[root@OPS01-LINTEST01 ~] # mkdir-pv / bdapps / data//hadoop/hdfs/ {nn,snn,dn} mkdir: created directory `/ bdapps'mkdir: created directory` / data//hadoop'mkdir: created directory `/ data//hadoop/hdfs'mkdir: created directory` / data//hadoop/hdfs/nn'mkdir: created directory `/ data//hadoop/hdfs/snn'mkdir: created directory` / data//hadoop/hdfs/dn'
Configure permissions
Chown-R hadoop:hadoop / data/hadoop/hdfs/cd / bdapps/ [root@OPS01-LINTEST01 bdapps] # lshadoop-2.7.5 [root@OPS01-LINTEST01 bdapps] # ln-sv hadoop-2.7.5 hadoop [root@OPS01-LINTEST01 bdapps] # cd hadoop [root@OPS01-LINTEST01 hadoop] # mkdir logs
Change the master group of all files in the hadoop directory to hadoop, and add write permissions to the logs directory
[root@OPS01-LINTEST01 hadoop] # chown-R hadoop:hadoop. / * [root@OPS01-LINTEST01 hadoop] # lltotal 140drwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 bindrwxr-xr-x 3 hadoop hadoop 4096 Dec 16 09:12 etcdrwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 includedrwxr-xr-x 3 hadoop hadoop 4096 Dec 16 09:12 libdrwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 libexec-rw-r--r-- 1 hadoop hadoop 86424 Dec 16 09:12 LICENSE.txtdrwxr -xr-x 2 hadoop hadoop 4096 Mar 27 14:51 logs-rw-r--r-- 1 hadoop hadoop 14978 Dec 16 09:12 NOTICE.txt-rw-r--r-- 1 hadoop hadoop 1366 Dec 16 09:12 README.txtdrwxr-xr-x 2 hadoop hadoop 4096 Dec 16 09:12 sbindrwxr-xr-x 4 hadoop hadoop 4096 Dec 16 09:12 share [root@OPS01-LINTEST01 hadoop] # chmod Grouw logs
Core-site.xml file configuration
Fs.defaultFS hdfs://master:8020 true
Yarm-site.xml file configuration
Note: yarn-site.xml is a ResourceManager role-related configuration. In a production environment, this role and NameNode should be deployed separately, so the master in this file and the master in core-site.xml are not the same machine. Because I am here to simulate distributed deployment in a test environment
NameNode and ResourceManager are deployed on the same machine, so you need to configure this file on the NameNode server.
Yarn.resourcemanager.address master:8032 yarn.resourcemanager.scheduler.address master:8030 yarn.resourcemanager.resource-tracker.address master:8031 yarn.resourcemanager.admin.address master:8033 yarn.resourcemanager.webapp.address master:8088 yarn.nodemanager.aux-services mapreduce_shuffle Yarn.nodemanager.auxservices.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.scheduler.class org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
Hdfs-site.xml file configuration
Modify the number of dfs.replication copies retained
Dfs.replication 2 dfs.namenode.name.dir file:///data/hadoop/hdfs/nn dfs.datanode.data.dir file:///data/hadoop/hdfs/dn fs.checkpoint.dir file:///data/hadoop/hdfs/snn fs.checkpoint.edits.dir file:///data/hadoop/hdfs/snn
Mapred-site.xml file configuration
Cp mapred-site.xml.template mapred-site.xml mapreduce.framework.name yarn
Slaves file configuration
Node2node3node4
Hadoop-env.sh file configuration
Export JAVA_HOME=/usr/java/jdk1.8.0_151
Configure node2, node3, node4 nodes
# create hadoop installation directory, data directory and logs directory, and modify permissions mkdir-pv / bdapps/ data/hadoop/hdfs/ {nn,snn,dn} chown-R hadoop.hadoop / data/hadoop/hdfs/tar zxf hadoop-2.7.5.tar.gz-C / bdapps/cd / bdappsln-sv hadoop-2.7.5 hadoop cd hadoopmkdir logschmod Grouw logschown-R hadoop:hadoop. / *
Profile modification
Since we have previously modified the hadoop-related configuration files in the master node (node1), we can synchronize directly from the master node to the node2, node3, and node4 nodes
Scp / bdapps/hadoop/etc/hadoop/* node2:/bdapps/hadoop/etc/hadoop/scp / bdapps/hadoop/etc/hadoop/* node3:/bdapps/hadoop/etc/hadoop/scp / bdapps/hadoop/etc/hadoop/* node4:/bdapps/hadoop/etc/hadoop/
Start hadoop related services
Master node
As in pseudo-distributed mode, the directory used for everywhere data needs to be initialized before the NN of the HDFS cluster is started, if the directory specified by the dfs.namenode.name.dir attribute in the hdfs-site.xml does not exist
The format command is created automatically, and if it exists in advance, make sure that its permissions are set correctly, and the format operation clears all its internal data and re-establishes a new file system. You need to execute the following command on the master node as the hdfs user
Hdfs namenode-format
There are two ways to start a cluster node:
1. Log in to each node to start the service manually
2. Start the whole cluster under the control of the master node
When the cluster is large, it is cumbersome to start the services of each node, so hadoop provides start-dfs.sh and stop-dfs.sh to start and stop the entire hdfs cluster, and start-yarn.sh and stop-yarn.sh to start and stop the entire yarn cluster.
[hadoop@node1 hadoop] $start-dfs.sh Starting namenodes on [master] hadoop@master's password: master: starting namenode, logging to / bdapps/hadoop/logs/hadoop-hadoop-namenode-node1.outnode4: starting datanode, logging to / bdapps/hadoop/logs/hadoop-hadoop-datanode-node4.outnode2: starting datanode, logging to / bdapps/hadoop/logs/hadoop-hadoop-datanode-node2.outnode3: starting datanode Logging to / bdapps/hadoop/logs/hadoop-hadoop-datanode-node3.outStarting secondary namenodes [0.0.0.0] hadoop@0.0.0.0's password: 0.0.0.0: starting secondarynamenode, logging to / bdapps/hadoop/logs/hadoop-hadoop-secondarynamenode-node1.out [hadoop@node1 hadoop] $jps69127 NameNode69691 Jps69566 SecondaryNameNode
Log in to the datanode node to view the process
[root@node2 ~] # jps66968 DataNode67436 Jps [root@node3 ~] # jps109281 DataNode109991 Jps [root@node4 ~] # jps108753 DataNode109674 Jps
Stop the service of the entire cluster
[hadoop@node1 hadoop] $stop-dfs.shStopping namenodes on [master] master: stopping namenodenode4: stopping datanodenode2: stopping datanodenode3: stopping datanodeStopping secondary namenodes [0.0.0.0] 0.0.0.0: stopping secondarynamenode
test
On the master node, upload a file
[hadoop@node1 ~] $hdfs dfs-mkdir / test [hadoop@node1 ~] $hdfs dfs-put / etc/fstab / test/fstab [hadoop@node1 ~] $hdfs dfs-ls-R / test-rw-r--r-- 2 hadoop supergroup 223 2018-03-27 16:48 / test/fstab
Log in to node2
[hadoop@node2] $ls / data/hadoop/hdfs/dn/current/BP-1194588190-172.16.2.3-1522138946011/current/finalized/ [hadoop@node2] $
No fstab file
Log in to node3 and you can see the fstab file
[hadoop@node3] $cat / data/hadoop/hdfs/dn/current/BP-1194588190-172.16.2.3-1522138946011/current/finalized/subdir0/subdir0/blk_1073741825UUID=dbcbab6c-2836-4ecd-8d1b-2da8fd160694 / ext4 defaults 1 1tmpfs / dev/shm tmpfs defaults 0 0devpts / dev/pts devpts gid=5 Mode=620 0 0sysfs / sys sysfs defaults 0 0proc / proc proc defaults 0 0dev/vdb1 none swap sw 0 0
Log in to node4 and you can also see the fstab file
[hadoop@node4 root] $cat / data/hadoop/hdfs/dn/current/BP-1194588190-172.16.2.3-1522138946011/current/finalized/subdir0/subdir0/blk_1073741825UUID=dbcbab6c-2836-4ecd-8d1b-2da8fd160694 / ext4 defaults 1 1tmpfs / dev/shm tmpfs defaults 0 0devpts / dev/pts devpts gid=5 Mode=620 0 0sysfs / sys sysfs defaults 0 0proc / proc proc defaults 0 0dev/vdb1 none swap sw 0 0
Conclusion: since we keep 2 copies of the data, we only have copies of the files on node3,node4, but not on node2.
Start the yarn cluster
Log in to node1 (master) and execute start-yarn.sh
[hadoop@node1 ~] $start-yarn.sh starting yarn daemonsstarting resourcemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-resourcemanager-node1.outnode4: starting nodemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-nodemanager-node4.outnode2: starting nodemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-nodemanager-node2.outnode3: starting nodemanager, logging to / bdapps/hadoop/logs/yarn-hadoop-nodemanager-node3.out [hadoop@node1 ~] $jps78115 ResourceManager71574 NameNode71820 SecondaryNameNode78382 Jps
Log in to node2, execute the jps command, and you can see that the NodeManager service has been started
[ansible@node2 ~] $sudo su-hadoop [hadoop@node2 ~] $jps68800 DataNode75400 Jps74856 NodeManager
View the Web UI console
Other reference documents:
Http://www.codeceo.com/understand-hadoop-hbase-hive-spark-distributed-system-architecture.html
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.