How to deploy hadoop2.4.1 distributed environment on centos6.5-64bit 07/06 Update SLTechnology News&Howtos

How to deploy hadoop2.4.1 distributed environment on centos6.5-64bit

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how to deploy hadoop2.4.1 distributed environment on centos6.5-64bit. It is very detailed and has a certain reference value. Friends who are interested must read it!

1 Task

Deploy the hadoop2.4.1 distributed environment on centos6.5-64bit.

2 prepare the virtual machine

We are not tuhao, there are not so many machines for us to play with. It doesn't matter, we use virtual machines to deploy. Prepare 6 virtual machines, all install the linux system. Centos, debian, and so on are fine, but they are illustrated here with CentOS. One maste node, three slave nodes, and another client node are used to deploy the nutch environment and feed data to the hdfs. The last monitor node is used as the deployment monitoring platform later, as shown in the following table:

Hostname IP system role description master.hadoop192.168.122.100CentOS 6.564bitNameNode is responsible for managing distributed data and decomposing tasks to execute slave1.hadoop192.168.122.101CentOS 6.564bit

DataNode is responsible for distributed data storage and task execution slave2.hadoop

192.168.122.102CentOS 6.5 64bit

DataNode

Responsible for distributed data storage and task execution slave3.hadoop

192.168.122.103Debian 7.5 64bit

DataNode

Responsible for distributed data storage and task execution client.hadoop192.168.122.200CentOS 6.5 64bit

Nutch solr Chinese word segmentation

Using Nutch web crawler to feed data monitor.hadoop192.168.122.201Debian 7.5 64bit to hdfs

Ganglia

Nagios

Monitor and control

3 install and configure Hadoop

Hadoop requires all nodes to use the same user and the same directory structure deployment (not verified).

But my understanding is:

1. Use the same user deployment. It's easy to understand that when master manages slave, you need to log in to the slave node to execute the script. If the users are different, the script cannot identify which user to use to log in to the slave node.

two。 Do not necessarily use the same path. Because there are environment variables to point to the deployment path, you don't have to deploy to the same path. However, if the deployment path is different, it is not convenient to manage the deployment. So try to be the same.

3.1 preparation work

‍ 3.1.1 configure virtual machine IP address and hostname ‍

Configure the virtual machines ip and hostname according to the table above, and add the following parsing to the / etc/hosts file after configuration. If you conditionally configure your own DNS server, this step can be changed to add the following resolution to the DNS server, and then all the node DNS addresses point to their own DNS server IP. This is achieved as long as the host name resolution IP of all nodes is consistent.

192.168.122.100 master.hadoop192.168.122.101 slave1.hadoop192.168.122.102 slave2.hadoop192.168.122.103 slave3.hadoop192.168.122.104 client.hadoop192.168.122.105 monitor.hadoop

Verify that each node can be parsed correctly after the ‍ is completed. ‍

3.1.2 create a user

All nodes create hadoop users, and ssh password-less login is configured between the master node and the slave node.

When you're done, ssh slave1.hadoop on master.hadoop to see if you can log in without a password. The same tests whether master.hadoop to all nodes can log in successfully without a password.

3.1.3 install JDK

Download address: http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.html

Here we choose to download the linux x64 version of the tar package. Jdk-7u60-linux-x64.tar.gz

Tar-zxvf jdk-7u60-linux-x64.tar.gzmv jdk1.7.0_60/ / usr/lib/

Vim / etc/profile adds environment variables such as JAVA_HOME

# JAVAexport JAVA_HOME=/usr/lib/jdk1.7.0_60export JRE_HOME=/usr/lib/jdk1.7.0_60/jre export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

Source / etc/profile enables the environment variable to take effect. Check the java version. The version number can be viewed, indicating that the JAVA environment has been configured successfully.

[root@master ~] # source / etc/profile [root@master ~] # java-versionjava version "1.7.0o55" OpenJDK Runtime Environment (rhel-2.4.7.1.el6_5-x86_64 u55-b13) OpenJDK 64-Bit Server VM (build 24.51-b03, mixed mode)

All machines must have JDK installed. All node servers use the same version of JDK.

3.2 install hadoop

The configuration of the slave node is the same as the master node. So we just need to deploy the master node, slave node and other master nodes to synchronize the scp copy or rsync to the slave node after the scp node is configured.

There are two ways to install hadoop. One is to download the hadoop binary compiled package. Configure the environment variables. Just configure the hadoop configuration file. The other is to compile and install. I would like to introduce both ways, and you can choose one of them when you actually deploy them.

3.2.1 extract and install the compiled version

Download the tar package and extract it

# wget http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.4.1/hadoop-2.4.1.tar.gz#tar zxvf hadoop-2.4.1.tar.gz # mv hadoop-2.4.1/ / opt/#cd / opt#chown-R hadoop.hadoop hadoop-2.4.1/

Configure the environment variable vim / etc/profile, add HADOOP_HOME and other environment variables, in which the commented-out HADOOP_ROOT_LOGGER=DEBUG,console can be opened during debugging, and the output DEBUG log is more detailed.

# HADOOPexport HADOOP_HOME=/opt/hadoopexport HADOOP_PREFIX=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_PREFIXexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_PREFIX/lib/nativeexport HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoopexport HADOOP_HDFS_HOME=$HADOOP_PREFIXexport HADOOP_MAPRED_HOME=$HADOOP_PREFIXexport HADOOP_YARN_HOME=$HADOOP_PREFIX#export HADOOP_ROOT_LOGGER=DEBUG,consoleexport LD_LIBRARY_PATH=$HADOOP_PREFIX/lib/nativeexport PATH=$PATH:$HADOOP_HOME/bin

It is important to note that the native library in the compiled version provided by apache is 32bit. The WARN prompt will be reported when the 64bit system starts the cluster. There are two ways to solve this WARN. One is to compile the source code package, get the nativ library file of 64bit and replace the original 32bit file. Another way is to find someone else's compiled 64bit native library library file online and download and replace it.

3.2.2 compile and install the source version

First, you need to install some dependent packages.

Yum install lzo-devel zlib-devel gcc gcc-c++ autoconf automake libtool ncurses-devel openssl-devel cmake3.3 configure Hadoop3.3.1 core-site.xml hadoop.tmp.dir / home/hadoop/tmp Abase for other temporary directories. Fs.defaultFS hdfs://master.hadoop:9000 io.file.buffer.size 4096 3.3.2 hdfs-site.xml

Dfs.nameservices hadoop-cluster1 dfs.namenode.secondary.http-address master.hadoop:50090 dfs.namenode.name.dir file:///home/hadoop/dfs/name dfs.datanode.data.dir file:///home/hadoop/dfs/data dfs.replication 3 dfs.webhdfs.enabled True 3.3.3 mapred-site.xml mapreduce.framework.name yarn mapreduce.jobtracker.http.address master.hadoop:50030 mapreduce.jobhistory.address master.hadoop:10020 mapreduce.jobhistory.webapp.address master.hadoop:19888 3.3.4 yarn-site.xml

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.address master.hadoop:8032 yarn.resourcemanager.scheduler.address master.hadoop:8030 yarn.resourcemanager.resource-tracker.address master.hadoop:8031 yarn.resourcemanager.admin.address master.hadoop:8033 yarn.resourcemanager.webapp.address master.hadoop:8088 3.3.5 slaves

Slave1.hadoopslave2.hadoopslave3.hadoop above is all the content of the article "how to deploy a hadoop2.4.1 distributed environment on centos6.5-64bit". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.