Platform Construction of Hadoop2.0 distributed Cluster 10/30 Update SLTechnology News&Howtos

Platform Construction of Hadoop2.0 distributed Cluster

2025-10-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Preparation before installation of Hadoop cluster

Basic environment

Four Centos6.5IP addresses: 192.168.174.128192.168.174.129192.168.174.130192.168.174.131 four hosts create new hadoop users and implement ssh secret-free login iptables shutdown and selinux for disabled

1. Modify hostname and ip address mapping

For the convenience of later operation, the hostnames are changed to hadoop01, hadoop02, hadoop03 and hadoop04 respectively. To change the host name, you only need to modify the hostname line of the / etc/sysconfig/network file. Here, the blogger will not repeat it. Then modify the / etc/hosts file to write the mapping between the ip address and the host name so that other hosts can correspond to the ip address according to the host name.

two。 Install JDK

MapReduce, one of the core components of Hadoop, is based on java, so you need to configure a basic java environment. JDK installation is very simple and has been mentioned many times before. Download the jdk installation package and extract the jdk to the specified directory.

Tar-zxvf jdk-8u181-linux-x64.tar.gz-C / usr/local/java

Modify the environment variable to enter / etc/profile

Export JAVA_HOME=/usr/local/java/jdk1.8.0_181export PATH=$PATH:$JAVA_HOME/bin

Reloading the environment variable takes effect. JDK needs to be installed and configured on all four nodes

3.Zookeeper installation configuration

Zookeeper is responsible for coordinating Hadoop consistency and is an indispensable component for Hadoop to implement HA. According to how Zookeeper works, you need to install Zookeeper on an odd number of nodes. This article installs Zookeeper on three nodes: hadoop01, hadoop02, and hadoop03.

Download the zookeeper installation package and extract the zookeeper installation package

Set environment variable, modify / etc/profile

Export ZOOKEEPER_HOME=/usr/local/zookeeper/zookeeper-3.4.6export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin

Reload environment variables take effect

Enter the conf directory under the zookeeper decompression directory, and modify the configuration file zoo.cfg. There is no zoo.cfg file at the beginning. Just copy the zoo_sample.cfg file and rename it to zoo.cfg.

Create the corresponding data directory and datalog directory

Mkdir-p / opt/zookeeper/datalog

Create a new myid file in each data directory, and write the myid file of hadoop01 to the myid file of hadoop01 and write to 2, namely server. The number after. Also note that the / opt/zookeeper directory and its subdirectories are given read and write permissions to the hadoop user, because zookeeper is later used as the hadoop user.

At this point, the basic installation and configuration of zookeeper is complete, and the zookeeper service is started as a hadoop user.

ZkServer.sh start

View zookeeper status

ZkServer.sh status

II. Hadoop installation and configuration

Download the hadoop installation package and extract the hadoop installation package

Note that the extracted directories user and group should be hadoop, just like the previous zookeeper, and the user is the hadoop user during the use of Hadoop.

Set environment variables and modify the configuration file / etc/profile

Export HADOOP_HOME=/usr/local/hadoop-2.6.4export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Note that hadoop needs to be configured with bin and sbin, otherwise many of the following commands cannot be used. Reloading the environment variable takes effect.

Then modify the configuration file of hadoop, go to the etc/hadoop directory under the hadoop installation directory, and modify the configuration files: hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml, in which the configuration file mapred-site.xml has a sample mapred-site.xml.template in this directory. Copy the file and rename it to mapred-site.xml.

Modify the configuration file hadoop-env.sh. Mainly to configure the java directory

Modify the configuration file core-site.xml

Fs.defaultFS hdfs://jsj/ hadoop.tmp.dir / usr/local/hdpdata ha.zookeeper.quorum hadoop01:2181,hadoop02:2181,hadoop03:2181

Modify the configuration file hdfs-site.xml, which is about the configuration of HDFS, as can be seen from the name of the configuration file.

Dfs.nameservices jsj dfs.ha.namenodes.jsj nn1,nn2 dfs.namenode.rpc-address.jsj.nn1 hadoop01:9000 dfs.namenode.rpc-address.jsj.nn2 hadoop02:9000 dfs.namenode.http-address.jsj.nn1 hadoop01:50070 dfs.namenode.http-address.jsj.nn2 hadoop02:50070 dfs.namenode.shared.edits.dir qjournal://hadoop01:8485;hadoop02:8485 Hadoop03:8485/jsj dfs.journalnode.edits.dir / usr/local/journaldata dfs.ha.automatic-failover.enabled true dfs.client.failover.proxy.provider.jsj org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence shell (/ bin/true) dfs.ha.fencing.ssh.private-key-files / home/hadoop/.ssh/id_rsa dfs.ha.fencing.ssh.connect-timeout 30000

Modify the configuration file mapred-site.xml, that is, the configuration related to MapReduce.

Mapreduce.framework.name yarn mapreduce.jobhistory.address hadoop03:10020 mapreduce.jobhistory.webapp.address hadoop03:19888

Modify the configuration file yarn-site.xml. Related configuration of yarn platform

Yarn.log-aggregation-enable true yarn.resourcemanager.ha.enabled true yarn.resourcemanager.cluster-id abc yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 hadoop01 yarn.resourcemanager.hostname.rm2 hadoop02 yarn.resourcemanager.zk-address hadoop01:2181,hadoop02:2181,hadoop03:2181 yarn.nodemanager.aux-services mapreduce_shuffle

Finally, modify the slaves file

Hadoop02hadoop03hadoop04

At this point, the configuration of Hadoop cluster-related configuration files is complete, and the relevant configuration is completed on the four nodes of hadoop01, hadoop02, hadoop03 and hadoop04.

The completion of the modification of the configuration file does not mean the end of the Hadoop installation, and several actions are needed before it can be used properly.

Start the zookeeper service in hadoop01, hadoop02, hadoop03.

ZkServer.sh start

Start journalnode in hadoop01, hadoop02, hadoop03

Hadoop-daemon.sh start journalnode

Format hdfs,hadoop01 execution

Hdfs namenode-format

Then check the hadoop installation directory to make sure that hdpdata and journaldata are available in both hadoop01 and hadoop02. Not copied from one node to another.

Start namenode in hadoop01

Hadoop-daemon.sh start namenode

Execute in Hadoop02

Hdfs namenode-bootstrapStandby

Format zkfc,Hadoop01 execution

Hdfs zkfc-formatZk

Start HDFS in hadoop01

Start-dfs.sh

After completing the above operations, Hadoop should be able to provide services normally. Enter the ip address of hadoop01 in the browser and the port number is 50070 to see if the web interface of HDFS is working properly.

Launch the yarn platform in hadoop01 and hadoop02

Start-yarn.sh

Access port 8088 of the ip address of hadoop01 to see if the yarn platform is providing services normally.

The Hadoop installation configuration is complete, and the explanation of the configuration file will be added later. The installation package used in this article is given by the teacher during the learning process. Hadoop is open source. I believe the relevant installation package is not difficult to find.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.