In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
Part I: prepare the Linux environment
The steps of creating a virtual machine installation system will not be discussed here. For detailed steps, please see my other articles.
Open the built virtual machine
1. Modify Hostname
1. Temporarily modify hostname
Hostname bigdata-01.liu.com
This modification will fail after the system is rebooted.
2. Permanently modify hostname
Vim / etc/sysconfig/network
After opening it, edit the following
NETWORKING=yes # using the network
HOSTNAME=bigdata-01.liu.com # set hostname
2. Configure Host
Vim / etc/hosts
Add the following
172.18.74.172 bigdata-01.liu.com
Third, turn off the firewall
View firewall status
Service iptables status
Temporarily turn off the firewall
Service iptables stop
Permanently shut down the firewall (reboot is required to take effect)
Chkconfig iptables off
4. Close selinux
Selinux is a sub-security mechanism of Linux, which can be disabled in the learning environment.
Vim / etc/sysconfig/selinux
Set SELINUX to disabled
SELINUX=disabled
5. Install JDK
Check if jdk is installed on the system
Java-version
If it shows that there is an openjdk that needs to be uninstalled before installing Oracle's jdk (other versions of jdk do not support some commands of hadoop)
Rpm-qa | grep java
Files that uninstall openjdk,.noarch can be deleted without deletion
Rpm-e-- nodeps java-1.7.0-openjdk-headless-1.7.0.191-2.6.15.4.e17_5.x86_64
Rpm-e-- nodeps java-1.7.0-openjdk-1.7.0.191-2.6.15.4.e17_5.x86_64
Then use rpm-qa again | grep java to see if openjdk has been uninstalled, and if there is one, uninstall it again
Remote virtual machine with xshell remote tool, install lrzsz command
Yum-y install lrzsz
Import the jdk package and extract it to the / opt/modules directory
Rz
Tar-zvxf jdk-8u181-linux-x64.tar.gz-C / opt/modules
Add environment variabl
To set the environment variable JAVA_HOME of JDK, you need to modify the configuration file / etc/profile, and append
Export JAVA_HOME= "/ opt/modules/jdk1.8.0_181"
Export PATH=$JAVA_HOME/bin:$PATH
After the modification is completed, execute source / etc/profile to make the modification take effect
Execute java-version again, and you can see that the installation is complete
Part II: hadoop installation
Full division is the real use of multiple Linux hosts to deploy Hadoop, planning for the Linux machine cluster, so that each module of Hadoop is deployed on different machines.
I. Environmental preparation
1. Clone a virtual machine
Select the machine to be cloned on the left side of the Vmware, where the original BigData01 machine is cloned, and in the virtual machine menu, select the clone command under the management menu.
Select "create full Clone", the virtual machine name is BigData02, select the virtual machine file save path, and clone.
Clone a virtual machine named BigData03 again.
two。 Configure the network
Modify the name of the network card
Edit the network card information on the BigData02 and BigData03 machines. Execute the sudo vim / etc/udev/rules.d/70-persistent-net.rules command. Because it is cloned from the BigData01 machine, the network card eth0 of BigData01 is retained and another network card eth2 is added. And the Mac address of eth0 is the same as that of BigData01, and the Mac address is not allowed to be the same, so to delete eth0, only the eth2 network card is retained, and the eth2 is renamed to eth0. Copy the mac address of the modified eth0 and modify the HWADDR property in the network-scripts file.
Vim / etc/sysconfig/network-scripts/ifcfg-eth0
Modify network parameters:
BigData02 machine IP changed to 172.18.74.173
BigData03 machine IP changed to 172.18.74.174
!! If you build a Hadoop environment in a real server, you need to build two more virtual machines. Follow the above steps to do it again. It is not ideal to clone the host in the server!
3. Configure Hostname and hosts
BigData02 configures hostname to bigdata-02.liu.com
BigData03 configures hostname to bigdata-03.liu.com
The hosts of BigData01, BigData02 and BigData03 machines are all configured as follows:
172.18.74.172 bigdata-01.liu.com
172.18.74.173 bigdata-02.liu.com
172.18.74.174 bigdata-03.liu.com
4. Configure the SSH client on Windows
Add SSH links to BigData02 and BigData03 machines on the SSH client in the local Windows
Second, server function planning
Bigdata-01.liu.combigdata-02.liu.combigdata-03.liu.comNameNodeResourceManageSecondaryNameNodeDataNodeDataNodeDataNodeNodeManagerNodeManagerNodeManagerHistoryServer
Install a new Hadoop on the first machine
Create a hadoop directory
Mkdir-p / opt/modules/app
Import the Hadoop package and extract it to the Hadoop directory
Rz
Tar-zxf / opt/sofeware/hadoop-2.7.4-with-centos-6.7.tar.gz-C / opt/modules/app/
Change to the / opt/modules/app/hadoop-2.7.4/etc/hadoop directory, and configure the Hadoop JDK path in this directory to modify the JDK path in the hadoop-env.sh, mapred-env.sh, and yarn-env.sh files:
Export JAVA_HOME= "/ opt/modules/jdk1.8.0_181"
Configure core-site.xml
Vim core-site.xml
Fs.defaultFS
Hdfs://bigdata-01.liu.com:8020
Hadoop.tmp.dir
/ opt/modules/app/hadoop-2.7.4/data/tmp
Dfs.namenode.name.dir
File://${hadoop.tmp.dir}/dfs/name
Dfs.datanode.data.dir
File://${hadoop.tmp.dir}/dfs/data
Fs.defaultFS is the address of NameNode.
Hadoop.tmp.dir is the address of the hadoop temporary directory, and by default, the data files for NameNode and DataNode will be stored in the corresponding subdirectory in this directory. You should make sure that this directory exists, and if not, create it first.
Configure hdfs-site.xml
Vim hdfs-site.xml
Dfs.namenode.secondary.http-address
Bigdata-03.liu.com:50090
Dfs.namenode.secondary.http-address is the http access address and port number of the specified secondaryNameNode, because in the plan, we plan to plan BigData03 as a SecondaryNameNode server, so set it here: bigdata-03.liu.com:50090
Configure slaves
Vim slaves
Bigdata-01.liu.com
Bigdata-02.liu.com
Bigdata-03.liu.com
The slaves file specifies which DataNode nodes are on the HDFS
Configure yarn-site.xml
Vim yarn-site.xml
Yarn.nodemanager.aux-services
Mapreduce_shuffle
Yarn.resourcemanager.hostname
Bigdata-02.liu.com
Yarn.log-aggregation-enable
True
Yarn.log-aggregation.retain-seconds
106800
According to the plan yarn.resourcemanager.hostname, the designated resourcemanager server points to bigdata-02.liu.com.
Yarn.log-aggregation-enable is to configure whether log aggregation is enabled.
Yarn.log-aggregation.retain-seconds is to configure how long the aggregated logs can be kept on the HDFS.
Configure mapred-site.xml
Copy a mapred-site.xml file from mapred-site.xml.template
Cp mapred-site.xml.template mapred-site.xml
Vim mapred-site.xml
Mapreduce.framework.name
Yarn
Mapreduce.jobhistory.address
Bigdata-01.liu.com:10020
Mapreduce.jobhistory.webapp.address
Bigdata-01.liu.com:19888
Mapreduce.framework.name sets the mapreduce task to run on yarn
Mapreduce.jobhistory.address is the history server that sets up mapreduce and is installed on the BigData01 machine
Mapreduce.jobhistory.webapp.address is the web page address and port number of the setting history server
4. Set SSH login without password
The machines in the Hadoop cluster will access each other through SSH, so it is unrealistic to enter a password for each access, so it is necessary to configure that the SSH between the machines is logged in without a password.
Generate a public key on BigData01
Ssh-keygen-t rsa
Enter all the way to the default value, and then the public key file (id_rsa.pub) and private key file (id_rsa) are generated in the .ssh directory of the current user's Home directory. If the .ssh file is hidden, ls-a will be displayed in the home directory.
Distribute the public key
Ssh-copy-id bigdata-01.liu.com
Ssh-copy-id bigdata-02.liu.com
Ssh-copy-id bigdata-03.liu.com
5. Distribute Hadoop documents
First create a directory for Hadoop on the other two machines
Mkdir-p / opt/modules/app
Distributed through Scp
Du-sh / opt/modules/app/hadoop-2.7.4/share/doc
Scp-r / opt/modules/app/hadoop-2.7.4/ bigdata-02.liu.com:/opt/modules/app
Scp-r / opt/modules/app/hadoop-2.7.4/ bigdata-03.liu.com:/opt/modules/app
Format NameNode
Perform formatting on the NameNode machine bigdata-01:
/ opt/modules/app/hadoop-2.7.4/bin/hdfs namenode-format
After formatting, a current directory is generated under the / opt/modules/app/hadoop-2.7.4/data/tmp/dfs/data/ directory, which contains a series of files.
Note:
If you need to reformat NameNode, you need to delete all the files under the original NameNode and DataNode, otherwise an error will be reported. The directories where NameNode and DataNode are located are configured in the attributes hadoop.tmp.dir, dfs.namenode.name.dir and dfs.datanode.data.dir in core-site.xml.
Because for each format, a cluster ID is created by default and written to the VERSION files of NameNode and DataNode (the directories of VERSION files are dfs/name/current and dfs/data/current). When reformatting, a new cluster ID is generated by default. If the original directory is not deleted, it will result in the new cluster ID in the VERSION file in namenode and the old cluster ID in DataNode, and an error will be reported in case of inconsistency.
Start the cluster
Change to / opt/modules/app/hadoop-2.7.4 directory
Start HDFS
/ opt/modules/app/hadoop-2.7.4/sbin/start-dfs.sh
Jps to view services that have been started
Start YARN
/ opt/modules/app/hadoop-2.7.4/sbin/start-yarn.sh
You can also use this command to get there in one step.
/ opt/modules/app/hadoop-2.7.4/sbin/start-all.sh
Start ResourceManager on BigData02:
Sbin/yarn-daemon.sh start resourcemanager
Start the log server
Since we are planning to run the MapReduce logging service on the BigData03 server, we will start it on BigData03
/ opt/modules/app/hadoop-2.7.4/sbin/mr-jobhistory-daemon.sh start historyserver
View the HDFS Web page
Http://bigdata-01.liu.com:50070/
If the domain name is not resolved, you can enter the ip+ port in the search bar, such as:
172.18.74.172:50070
View the YARN Web page
Http://bigdata-02.liu.com:8088/cluster
172.18.74.173:8088/cluster
All right, this is where the Hadoop cluster environment is built. It is suggested that when building the Hadoop environment, Ha friends should first understand what the various components are, which is more conducive to the smooth completion of the Hadoop environment. Some of my classmates learned this relatively early, and at that time no one understood what Hadoop was. I listened to them say that it took 10 days and a half months to build it. I gradually learned a little about Hadoop under their influence, and it took me more than a day to build it. Many unexpected situations have been encountered in the process of building, so if there is no unexpected situation in the process of building, plus knowing in advance, it can be done in half a day, never give up when you encounter difficulties, and you will win if you persist!
. . . Once upon a time, the chariot and horse was very slow, the letter was far away, and there was only one person I could love in my life!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.