Practice of Hadoop installation and deployment 07/02 Update SLTechnology News&Howtos

Practice of Hadoop installation and deployment

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Environment preparation # support platform #

GNU/Linux is a platform for product development and operation. Hadoop has been verified on a cluster system consisting of 4000 nodes of GNU/Linux hosts.

Win32 platform is supported as a development platform. Since distributed operations have not been fully tested on the Win32 platform, they are not supported as a production platform.

Required software #

The software required for Linux and Windows includes:

JavaTM1.5.x, must be installed, it is recommended to choose the Java version issued by Sun.

Ssh must install and keep sshd running so that the remote Hadoop daemon can be managed with Hadoop scripts.

Additional software requirements under Windows

Cygwin-provides shell support in addition to the above software.

Installation step #

In this paper, Ubuntu as the test environment, in view of the configuration of the test environment, do not split complex users, first deployed to the current users.

Install softwar

If your cluster has not yet installed the required software, you have to install them first.

Update apt-get source configuration # $sudo apt-get update install java environment #

The environment of this article uses jdk1.7

There are two ways to use openjdk and install it directly with apt-get

$sudo apt-get install-y openjdk-7-jdk$export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Or download jdk through the official website of oracle, and then extract the installation.

Http://www.oracle.com/technetwork/java/javase/archive-139210.html

Then set up JAVA_HOME

The environment of this article is JAVA_HOME=/usr/local/jdk

Lrwxrwxrwx 1 root root 22 Jun 22 10:20 jdk-> / usr/local/jdk1.7.0_80/drwxr-xr-x 8 uucp 4096 Apr 11 2015 jdk1.7.0_80/

Environment variables can be configured to .bash _ profile

Configure the SSH environment #

Install ssh services and clients

$sudo apt-get install-y openssh-server

Start the SSH service

$sudo service ssh start

Configure login-free

$ssh-keygen-t rsa-f ~ / .ssh/id_rsa-P''$cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys$chmod 6000.ssh/authorized_keys

Landing-free test

$ssh localhostThe authenticity of host 'localhost (:: 1)' can't be established.ECDSA key fingerprint is SHA256:8PGiorJvZpfFOJkMax6qVaSG8KyRRNnVJGjhNqVqh/k.Are you sure you want to continue connecting (yes/no)? yes$exit installation Hadoop#$cd / usr/local$sudo wget http://apache.fayea.com/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz$sudo tar xzvf hadoop-2.6.4.tar.gz$sudo ln-s hadoop-2.6.4.tar.gz hadoop# modify directory permission Change to the current user's $sudo chown-R XXXXX hadoop* configuration #

Configure pseudo-distributed:

Modify etc/hadoop/core-site.xml

Fs.defaultFS hdfs://localhost:9000

Etc/hadoop/hdfs-site.xml

Dfs.replication 1 starts hadoop# $bin/hdfs namenode-format $sbin/start-dfs.sh # View process $jps429 SecondaryNameNode172 NameNode1523 Jps286 DataNode

Namenode web address: http://localhost:50070/

You can execute the command to test it.

# create input files$ mkdir input$ echo "Hello Docker" > input/file2.txt$ echo "Hello Hadoop" > input/file1.txt# create input directory on HDFS$ hadoop fs-mkdir-p input# put input files to HDFS$ hdfs dfs-put. / input/* input# run wordcount$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount input output# print the input files$ echo-e "\ ninput file1.txt:" $hdfs dfs-cat input/file1.txt$ echo-e "\ ninput file2. Txt: "$hdfs dfs-cat input/file2.txt# print the output of wordcount$ echo-e"\ nwordcount output: "$hdfs dfs-cat output/part-r-00000

The following message appears during debug debugging:

WARN io.ReadaheadPool: Failed readahead on ifile

EBADF: Bad file descriptor

After consulting the information, it is said that the file is closed when the file is read quickly, and it may also be caused by other bug, which is ignored here.

You can also temporarily ban mapreduce.ifile.readahead=false.

Hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar wordcount-D mapreduce.ifile.readahead=false input output

MR is currently running in local mode. If you want to run through YARN, you need to configure and start the YARN service.

Single node YARN#

Modify etc/hadoop/mapred-site.xml

Mapreduce.framework.name yarn

Etc/hadoop/yarn-site.xml

Yarn.nodemanager.aux-services mapreduce_shuffle

Start the service

$sbin/start-yarn.sh

You can rerun the example of the previous step

You can check the RM web address: http://localhost:8088/

Docker configuration Test Environment #

The premise already has a docker environment

Will download

Jdk-7u80-linux-x64.tar.gz is placed in the jdk folder

Put the hadoop-2.6.4.tar.gz package in the dist folder

Directory structure:

. / ├── Dockerfile ├── dist │ └── hadoop-2.6.4.tar.gz └── jdk └── jdk-7u80-linux-x64.tar.gz

DockerFile hainiubl/hadoop-node:apache:

FROM ubuntu:latestMAINTAINER sandy # install softwareRUN apt-get updateRUN apt-get install-y ssh vim openssh-serverADD jdk/jdk-7u80-linux-x64.tar.gz / usr/localRUN ln-s / usr/local/jdk1.7.0_80 / usr/local/jdk & & rm-rf / usr/local/jdk-7u80-linux-x64.tar.gz# install hadoopADD dist/hadoop-2.6.4.tar.gz / usr/local/RUN ln-s / usr/local/hadoop-2.6.4 / Usr/local/hadoopENV JAVA_HOME=/usr/local/jdkENV HADOOP_HOME=/usr/local/hadoopENV HADOOP_MAPRED_HOME=$HADOOP_HOMEENV HADOOP_COMMON_HOME=$HADOOP_HOMEENV HADOOP_HDFS_HOME=$HADOOP_HOMEENV HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoopENV YARN_HOME=$HADOOP_HOMEENV YARN_CONF_DIR=$HADOOP_HOME/etc/hadoopENV PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin# ssh without keyRUN ssh-keygen-t rsa-f ~ / .ssh/id_rsa-P'& &\ Cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

Build:

$docker build-t hainiubl/hadoop-node:apache. /

Directory structure:

. / ├── Dockerfile ├── config │ ├── core-site.xml │ ├── hadoop-env.sh │ ├── hdfs-site.xml │ ├── mapred-site.xml │ ├── run-wordcount.sh │ └── yarn-site.xml

DockerFile hainiubl/hadoop-pseudo

ROM hainiubl/hadoop-node:apacheMAINTAINER sandy ADD config/* / root/RUN mv / root/core-site.xml $HADOOP_HOME/etc/hadoop/ & & mv / root/hadoop-env.sh $HADOOP_HOME/etc/hadoop/RUN chmod + x ~ / run-wordcount.sh & &\ chmod + x $HADOOP_HOME/sbin/start-dfs.sh & &\ chmod + x $HADOOP_HOME/sbin/start-yarn.shCMD ["sh", "- c", "/ etc/init.d/ssh start; bash"]

Build:

$docker build-t hainiubl/hadoop-pseudo:apache. /

Start the docker node:

$docker run-itd-p 50070 it hadoop-pseudo sh 50070-p 8088 it hadoop-pseudo sh 8088-- name hadoop-pseudo hainiubl/hadoop-pseudo:apache & $docker exec-it hadoop-pseudo sh-c "/ usr/local/hadoop/bin/hdfs namenode-format & & / usr/local/hadoop/sbin/start-dfs.sh & & bash"

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.