In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
First, prepare the physical cluster.
1. How to build a physical cluster.
The physical cluster of three nodes is deployed by setting up three virtual machines.
2. Virtual machine preparation.
Prepare a recently built virtual machine for cloning. (it is recommended that no operation has been done)
Right-click on the virtual machine you want to clone, manage, clone.
Do the following in the pop-up dialog box.
(1) next step.
(2) Select the current state in the virtual machine, next.
(3) Select to create a complete clone, next.
(4) enter the name of the virtual machine, and next.
(5) the cloning was completed.
(6) follow the above steps to create a virtual machine named slave02.
3. Virtual machine network configuration.
Because the slave01 and slave02 virtual machines are cloned, the Nic information of these two virtual machines should be modified.
The slave01 is modified as follows:
(1) enter the command: vi / etc/udev/rules.d/70-persistent-net.rules
(2) enter command: vi / etc/sysconfig/network-scripts/ifcfg-eth0
(3) modify the host name, enter the command: vi / etc/sysconfig/network
(4) restart the system, command: reboot
Slave02 modified with slave01, note: IPADDR is not the same, hostname is not the same!
Finally, restart all node network cards (service network restart) to make it effective to ensure that each virtual function is realized and the external network together with!
You also need to disable SELINUX:vi / etc/selinux/config
> 2. Cluster planning.
1. Host planning:
Master00/192.168.169.159:
Namenode,Datanode,ResourceManager,Journalnode,Zookeeper
Slave01/192.168.169.160:
Namenode,Datanode,ResourceManager,Journalnode,Zookeeper
Slave02/192.168.169.161:
Datanode,Journalnode,Zookeeper
2. Software planning:
JDK1.8
CentOS6.5
Zookeeper3.4.6
Hadoop2.7.3
3. User planning:
The hadoop user groups and users of each node need to be created by themselves:
Master00 is hadoop:hadoop
Slave01 is hadoop:hadoop
Slave02 is hadoop:hadoop
4. Catalogue planning:
Software storage directory: / home/hadoop/app/
Data log directory: / home/hadoop/data/
Third, prepare before installation.
1. Synchronize the current system time and date with NTP:
(1) install ntp:yum install ntp online
(2) date and time of execution synchronization: ntpdate pool.ntp.org
(3) View current system time: date
Note: the above commands need to be executed on each node!
2. Hosts file check:
All nodes need to configure the following information: vi / etc/hosts
3. Disable the firewall: chkconfig iptables off (this is a permanent shutdown and needs to be restarted after execution)
Check: service iptables status
4. Configure SSH password-free communication
(1) configure SSH: the following is master00 as an example configuration (slave01 and slave02 also need to do the following)
(2) copy the id_rsa.pub from all nodes to the authorized_keys file in master00
(3) send the authorized_keys file in master00 to all nodes.
Slave01:
Slave02:
Access each other through SSH, if you can access each other without a password, that is, SSH configuration is successful!
5. The use of scripts: it is convenient for Hadoop to build distributed clusters.
(1) create a / home/hadoop/tools directory on the master00 node
(2) upload the script to this directory (you can upload it with the Xftp tool)
Deploy.conf script: https://blog.51cto.com/14572091/2442729
Deploy.sh script: https://blog.51cto.com/14572091/2442731
RunRemoteCmd.sh script: https://blog.51cto.com/14572091/2442728
(3) add permissions to the script
[hadoop@master00 tools] $chmod uplix deploy.sh
[hadoop@master00 tools] $chmod uplix runRemoteCmd.sh
(4) configure PATH
(5) on the master00 node, create the software installation directory of all nodes with one click through script
RunRemoteCmd.sh "mkdir / home/hadoop/app" all
Note: if the hostname is different from mine, you need to modify the deploy.conf configuration file
6. Installation of hadoop related software
(1) install JDK, upload JDK to app directory, and decompress it
(2) modify the name of the file to jdk
(3) add JDK environment variable: vi / etc/profile
Make the configuration file effective: source / etc/profile
(4) check whether JDK is installed successfully: java-version
The above results indicate that the JDK installation of the master00 node is successful.
(5) copy the JDK installation package on the master00 node to other nodes: deploy.sh jdk / home/hadoop/app/ slave
Then repeat the JDK configuration on the master00 node on the slave01 and slave02 nodes and check for success!
7. Zookeeper installation.
(1) upload zookeeper to app directory and decompress it.
(2), rename to zookeeper
(3) modify the configuration file in zookeeper
(4) copy the zookeeper installation directory to another node through script deploy.sh: deploy.sh zookeeper / home/hadoop/app/ slave
(5) create relevant directories on all nodes through the script runRemoteCmd.sh:
RunRemoteCmd.sh "mkdir-p / home/hadoop/data/zookeeper/zkdata" all
RunRemoteCmd.sh "mkdir-p / home/hadoop/data/zookeeper/zkdatalog" all
(6) enter the zkdata directory on the three nodes, and create the file myid, which is entered as 1, 2, 3, as shown below:
Master00:
Slave01:
Slave02:
(7) configure zookeeper environment variables
Make the configuration effective: source / etc/profile
Note: each node should be configured!
(8) start zookeeper on the master00 node
(9) use the runRemoteCmd.sh script to start zookeeper on all nodes:
RunRemoteCmd.sh "/ home/hadoop/app/zookeeper/bin/zkServer.sh start" zookeeper
(10) check whether the QuorumPeerMain process on all nodes is successful:
UnRemoteCmd.sh "jps" zookeeper
(11), view the zookeeper status on all nodes
RunRemoteCmd.sh "/ home/hadoop/app/zookeeper/bin/zkServer.sh status" zookeeper
If one node is leader and the other node is follower, zookeeper is installed successfully.
Fourth, build the Hadoop cluster.
1. Hadoop software installation
(1) upload and decompress.
(2), rename to hadoop
2. Hadoop configuration and use of HDFS
(1) modify the installation directory of JAVA_HOME
(2) configuration core-site.xml file. The following is my configuration. For specific configuration, please refer to the official hadoop documentation.
Fs.defaultFS
Hdfs://cluster1
Hadoop.tmp.dir
/ home/hadoop/data/tmp
Ha.zookeeper.quorum
Master00:2181,slave01:2181,slave02:2181
(3) configuration hdfs-site.xml file. The following is my configuration. For specific configuration, please refer to the official hadoop documentation.
Dfs.replication
three
Dfs.permissions
False
Dfs.permissions.enabled
False
Dfs.nameservices
Cluster1
Dfs.ha.namenodes.cluster1
Master00,slave01
Dfs.namenode.rpc-address.cluster1.master00
Master00:9000
Dfs.namenode.http-address.cluster1.master00
Master00:50070
Dfs.namenode.rpc-address.cluster1.slave01
Slave01:9000
Dfs.namenode.http-address.cluster1.slave01
Slave01:50070
Dfs.ha.automatic-failover.enabled
True
Dfs.namenode.shared.edits.dir
Qjournal://master00:8485;slave01:8485;slave02:8485/cluster1
Dfs.client.failover.proxy.provider.cluster1
Org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
Dfs.journalnode.edits.dir
/ home/hadoop/data/journaldata/jn
Dfs.ha.fencing.methods
Shell (/ bin/true)
Dfs.ha.fencing.ssh.private-key-files
/ home/hadoop/.ssh/id_rsa
Dfs.ha.fencing.ssh.connect-timeout
10000
Dfs.namenode.handler.count
one hundred
(4) configuration slaves file: mainly configure the host name where the DataNode node is located.
(5) distribute hadoop installation packages to all nodes
Deploy.sh hadoop/ home/hadoop/app/ slave
(6) configure hadoop environment variables
Make the configuration effective: source / etc/profile
Note: each node should be configured!
(7) start HDFS
1) start the zookeeper process on all nodes
2) start the Journalnode process on all nodes
RunRemoteCmd.sh "/ home/hadoop/app/hadoop/sbin/hadoop-daemon.sh start journalnode" all
3), first format on the primary node (such as master00)
Bin/hdfs namenode-format
Bin/hdfs zkfc-formatZK
Bin/hdfs namenode
4) at the same time, synchronizing data needs to be performed on the standby node (such as slave01)
Bin/hdfs namenode-bootstrapStandby
5) after the slave01 synchronization data is complete, press the Ctrl+C key on the master00 node to end the namenode process, and then close the journalnode process on all nodes
RunRemoteCmd.sh "/ home/hadoop/app/hadoop/sbin/hadoop-daemon.sh stop journalnode" all
6) start all processes related to HDFS with one click
Sbin/start-dfs.sh
7) verify that HDFS is installed successfully
Enter the URL: http://master00:50070 in the browser to view the Web interface
Enter the URL: http://slave01:50070 in the browser to view the Web interface
8) check whether HDFS is available
Hadoop fs-mkdir / test
Hadoop fs-put test.txt / test
Hadoop fs-ls / test
3. Hadoop is configured with YARN.
(1) configure mapred-site.xml file. The following is my configuration. For specific configuration, please refer to the official hadoop documentation.
Mapreduce.framework.name
Yarn
(2) configuration yarn-site.xml file. The following is my configuration. For specific configuration, please refer to the official hadoop documentation.
Yarn.resourcemanager.connect.retry-interval.ms
2000
Yarn.resourcemanager.ha.enabled
True
Yarn.resourcemanager.ha.automatic-failover.enabled
True
Yarn.resourcemanager.ha.automatic-failover.embedded
True
Yarn.resourcemanager.cluster-id
Yarn-rm-cluster
Yarn.resourcemanager.ha.rm-ids
Rm1,rm2
Yarn.resourcemanager.hostname.rm1
Master00
Yarn.resourcemanager.hostname.rm2
Slave01
Yarn.resourcemanager.recovery.enabled
True
Yarn.resourcemanager.zk.state-store.address
Master00:2181,slave01:2181,slave02:2181
Yarn.resourcemanager.zk-address
Master00:2181,slave01:2181,slave02:2181
Yarn.resourcemanager.address.rm1
Master00:8032
Yarn.resourcemanager.scheduler.address.rm1
Master00:8034
Yarn.resourcemanager.webapp.address.rm1
Master00:8088
Yarn.resourcemanager.address.rm2
Slave01:8032
Yarn.resourcemanager.scheduler.address.rm2
Slave01:8034
Yarn.resourcemanager.webapp.address.rm2
Slave01:8088
Yarn.nodemanager.aux-services
Mapreduce_shuffle
Yarn.nodemanager.aux-services.mapreduce_shuffle.class
Org.apache.hadoop.mapred.ShuffleHandler
(3) start YARN
1) execute the start YARN command on master00
Sbin/start-yarn.sh
2) execute the start YARN command on slave01
Sbin/yarn-daemon.sh start resourcemanager
3) Open the Web interface in the browser to view
Http://master00:8088
Http://slave01:8088
4) check the status of ResourceManager
Bin/yarn rmadmin-getServiceState rm1
Bin/yarn rmadmin-getServiceState rm2
5) run the WordCount test
Hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount / test/test.txt / test/out/
View job execution status
If there is no exception, YARN is installed successfully
So far, the Hadoop distributed cluster has been built successfully!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.