Simple construction of hadoop running environment 04/27 Update SLTechnology News&Howtos

Simple construction of hadoop running environment

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Simple construction of hadoop running environment

Hadoop is an open source distributed computing platform under the Apache Software Foundation. Hadoop, based on Hadoop distributed File system (HDFS,Hadoop Distributed Filesystem) and MapReduce (open source implementation of Google MapReduce), provides users with a transparent distributed infrastructure with the underlying details of the system.

For Hadoop clusters, they can be divided into two main categories of roles: Master and Salve. A HDFS cluster consists of a NameNode and several DataNode. NameNode as the master server manages the namespace of the file system and the client's access to the file system, and the DataNode in the cluster manages the stored data. The MapReduce framework consists of a single JobTracker running on the master node and a TaskTracker running on each cluster slave node. The master node is responsible for scheduling all the tasks that make up a job, which are distributed on different slave nodes. The master node monitors their execution and re-performs the previous failed tasks; the slave node is only responsible for the tasks assigned by the master node. When a Job is submitted, after the JobTracker receives the submission job and configuration information, it distributes the configuration information to the slave node, schedules the task and monitors the execution of the TaskTracker.

As can be seen from the above introduction, HDFS and MapReduce together constitute the core of the Hadoop distributed system architecture. HDFS implements distributed file system on the cluster, while MapReduce implements distributed computing and task processing on the cluster. HDFS provides file operation and storage support in the process of MapReduce task processing. MapReduce implements task distribution, tracking and execution on the basis of HDFS, and collects results, which interact with each other to complete the main tasks of Hadoop distributed cluster.

Preparation before installation

3 linux virtual machines with "SoftwareDevelopment Workstation" package installed

Jdk-6u45-linux-x64.bin installation package, hadoop-1.0.0.tar.gz installation package

1. Set the IP address and host name of the linux virtual machine as follows:

Hostname IP address master 192.168.232.129salve1 192.168.232.130salve2 192.168.232.131

2. Add content to the / etc/hosts file so that virtual machines can ping each other using hostnames (very important!)

# you can modify the / etc/hosts file of only one host Then use the scp command to remotely transfer the hosts file [root@master ~] # vim / etc/hostsmaster 192.168.232.129salve1 192.168.232.130salve2 192.168.232.131 [root@master ~] # scp / etc/hosts root@salve1:/etc/hosts [root@master ~] # scp / etc/hosts root@salve2:/etc/hosts# test [root@master ~] # ping master [root@master ~] # ping salve1 [root@master ~] # ping salve2

3. Set up three host ssh password-free authentication configurations

# generate password on master host [root@master ~] # ssh-keygen-t rsa# hit enter twice, id_rsa.pub (public key file) and id_rsa (private key file) will be generated in / root/.ssh directory # append public key file to / root/.ssh/authorized_keys file And test whether it is possible to log in without password [root@master ~] # cat / root/.ssh/id_rsa.pub > > / root/.ssh/authorized_ Keys [root @ master ~] # ssh master# to transfer public key files remotely to salve1, [root@master ~] # scp .ssh / id_rsa.pub root@salve1:/root/.ssh/ [root@master ~] # scp .ssh / id_rsa.pub root@salve2:/root/.ssh/# on the salve2 host appends the transferred public key file to authorized_keys Implement master host ssh login without password salve1, salve2 host [root@salve1] # cat .ssh / id_rsa.pub > > .ssh / authorized_ Keys [root @ salve2 ~] # cat .ssh / id_rsa.pub > > .ssh / authorized_keys# test [root@master ~] # ssh salve1 [root@master ~] # ssh salve2# need to note that the above operations need to be done once on other hosts So that all three hosts can log in without a password to each other # if the login fails after the above operations are completed, please modify the ssh configuration file of the three hosts, modify the following parameters, and restart the ssh service [root@master ~] # vim / etc/ssh/ssh_configRSAAuthentication yes # enable RSA authentication PubkeyAuthentication yes # enable public key and private key pairing authentication [root@master ~] # service sshd restart

Start the installation and configuration

1. Java environment installation (if SoftwareDevelopment Workstation package is installed when Linux is installed, this step can be skipped)

1.1.install JDK

JDK should be installed on all machines. Install it on the master host now, and then other hosts can repeat the steps.

# create / usr/java directory, copy jdk-6u45-linux-x64.bin to this directory, and grant execution permission And execute [root@master ~] # mkdir / usr/java [root@master ~] # cp jdk-6u45-linux-x64.bin / usr/java/ [root@master ~] # cd / usr/java/ [root@master java] # chmod + x jdk-6u45-linux-x64.bin [root@master java] #. / jdk-6u45-linux-x64.bin [root@master java] # lsjdk1.6.0_45 jdk-6u45-linux-x64.bin# when the installation is complete There are more jdk1.6.0_45 folders in the front directory. End of installation

1.2. Configure environment variables

# Edit / etc/profile file Append the following to the end [root@master ~] # vim / etc/profile# set java environmentexport JAVA_HOME=/usr/java/jdk1.6.0_45export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/libexport PATH=$PATH:$JAVA_HOME/bin:$$JAVA_HOME/jre/bin# execute the following command to make its configuration effective immediately And test [root@master ~] # source / etc/profile [root@master ~] # java-versionjava version "1.6.0y45" OpenJDK Runtime Environment (rhel-2.4.3.3.el6-x86_64 u45-b15) OpenJDK 64-Bit Server VM (build 24.45-b08, mixed mode)

1.3. Install and configure other hosts (ibid., briefly)

2. Hadoop cluster installation and configuration

2.1Cluster installation of hadoop

JDK should be installed on all machines. Install it on the master host now, and then other hosts can repeat the steps.

# extract the package to the / usr/ directory and rename [root@master ~] # tar-zxvf hadoop-1.0.0.tar.gz-C / usr/ [root@master ~] # cd / usr/ [root@master usr] # mv hadoop-1.0.0 hadoop# create the tmp folder [root@master usr] # cd hadoop [root@master hadoop] # mkdir tmp under the hadoop directory

2.2. Configure environment variables

# Edit / etc/profile file, and append the following to the end [root@master ~] # vim / etc/profile# set hadoop pathexport HADOOP_HOME=/usr/hadoopexport PATH=$PATH:$HADOOP_HOME/bin# execute the following command to make its configuration take effect immediately [root@master ~] # source / etc/profile

2.3.Config hadoop

1) configure hadoop-env.sh

The file is located in the / usr/hadoop/conf directory

[root@master conf] # vim hadoop-env.sh# if the JDK is configured by yourself, add the following # set java enviromentexport JAVA_HOME=/usr/java/jdk1.6.0_45# if you are installing the SoftwareDevelopment Workstation package, add the following # set java enviromentexport JAVA_HOME=/usr

2) configure core-site.xml file

[root@master conf] # vim core-site.xml hadoop.tmp.dir / usr/hadoop/tmp A base for other temporary directories fs.default.name hdfs://192.168.232.129:9000

3) configure hdfs-site.xml file

[root@master conf] # vim hdfs-site.xml dfs.replication 1

4) configure mapred-site.xml file

[root@master conf] # vim mapred-site.xml mapred.job.tracker http://192.168.232.129:9001

5) configure masters file

# remove the original localhost from [root@master conf] # vim masters192.168.232.129 # master host IP address # master # master hostname (make sure the hosts file has been modified)

6) configure the salves file

[root@master conf] # vim slaves192.168.232.130 # salve1 IP address 192.168.232.131 # salve2 IP address # salve1 # salve1 hostname (make sure the hosts file has been modified) # salve1 # salve2 hostname (make sure the hosts file has been modified)

Master host configuration completed

2.4.Configuring salve1 hosts, salve2 hosts

# only need to transfer the configured hadoop folder to salve1 and salve2 remotely to [root@master ~] # scp-r / usr/hadoop root@salve1:/usr/ [root@master ~] # scp-r / usr/hadoop root@salve2:/usr/# and modify the salve1 host, The / etc/profile file of the salve2 host [root@salve1 ~] # vim / etc/profile# set hadoop pathexport HADOOP_HOME=/usr/hadoopexport PATH=$PATH:$HADOOP_HOME/bin# executes the following command to make its configuration take effect immediately [root@salve1 ~] # source / etc/profile

Configuration complete

III. Startup and verification

1. Format the HDFS file system (only once, no need to format the next startup)

[root@master ~] # hadoop namenode-format

2. Start hadoop

[root@master ~] # start-all.sh

3. Verify hadoop

3.1.View the process using the jps that comes with java

# View [root@master ~] # jps5434 JobTracker4447 SecondaryNameNode5221 NameNode5535 Jps# on master [root@salve1 ~] # jps4313 Jps4260 TaskTracker4171 DataNode

3.2.Use "hadoop dfsadmin-report" to check the status of the Hadoop cluster

[root@master ~] # hadoop dfsadmin-report

3.3. Use web pages to view clusters

1) visit http://192.168.232.129:50030

2) visit http://192.168.232.129:50070

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.