Construction of hadoop stand-alone environment tested by big data (Super detailed version) 10/20 Update SLTechnology News&Howtos

Construction of hadoop stand-alone environment tested by big data (Super detailed version)

2025-10-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Friendly tip: this article is super long, please prepare Guazi.

The operation mode of Hadoop

Stand-alone mode is the default mode of Hadoop, in which no daemons are needed, and all programs run on a single JVM. This mode is mainly used to develop and debug mapreduce application logic.

In pseudo-distributed mode, the Hadoop daemon runs on a machine, simulating a small-scale cluster. This mode adds the function of code debugging to the stand-alone mode, which allows you to check the operation of simulation nodes such as NameNode,DataNode,Jobtracker,Tasktracker.

Both stand-alone mode and pseudo-distributed mode are used for the purpose of development and debugging, and the real Hadoop cluster runs in a fully distributed mode.

Installation steps in stand-alone mode

A clean linux basic environment (important, if there is a problem in this environment, it will all be a problem later)

For your convenience, I have installed one, you just need to download and import it into vm and then use it.

Download address: follow the official account [Test Group Diary] dialog box to reply "linux" or join QQ group 522720170.

Link: https://pan.baidu.com/s/1qXRjaK8 password: xjfk

Turn off the firewall (for centos7, not for lower versions)

Execute the following two commands:

Systemctl stop firewalld.service

Systemctl disable firewalld.service

Modify host name

Vi / etc/hosts

Then append the name of your virtual machine to the end of the two lines. If you are using the virtual machine provided by us, the name is linux. The effect after the append is shown in the figure.

Restart the network: / etc/rc.d/init.d/network restart

Set password-less login (for hadoop startup)

Cd ~ # enter the current user's directory

Mkdir-p / root/.ssh # the root users we use

Cd / .ssh/

Ssh-keygen-t rsa # if prompted, press enter cat id_rsa.pub > > authorized_keys # to join the authorization

Install jdk1.8 and configure environment variables

Tar decompression

Unzipped the cp package to / usr/lib/java/ (create it if you don't have a java directory)

Vi / etc/profile, with the following added at the end:

Export JAVA_HOME=/usr/lib/java/jdk1.8.0_11

Export JRE_HOME=/usr/lib/java/jdk1.8.0_11/jre

Export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH

Export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

Execute source / etc/profile to make the environment variable effective

Verify the success, as shown in the following figure

Install hadoop2.7.4

Tar decompression

Unzipped the cp package to / usr/lib/hadoop/ (create it if you don't have a hadoop directory)

Set up hadoop-env.sh

Vi / usr/lib/hadoop/hadoop-2.7.4/etc/hadoop/hadoop-env.sh

Find # The java implementation to use. Add the following below to this sentence:

# export JAVA_HOME=$ {JAVA_HOME}

Export JAVA_HOME=/usr/lib/java/jdk1.8.0_11

Export HADOOP_HOME=/usr/lib/hadoop/hadoop-2.7.4

Export PATH=$PATH:/usr/lib/hadoop/hadoop-2.7.4/bin

Export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

Execute source / usr/lib/hadoop/hadoop-2.7.4/etc/hadoop/hadoop-env.sh to make the environment variable effective

Verify the success, as shown in the following figure

Configure related xml files

Vi / usr/lib/hadoop/hadoop-2.7.4/etc/hadoop/core-site.xml (hadoop global configuration)

The contents are as follows:

Fs.defaultFS

Hdfs://127.0.0.1:9000

Vi / usr/lib/hadoop/hadoop-2.7.4/etc/hadoop/hdfs-site.xml (hdfs configuration)

The contents are as follows:

Dfs.replication

one

Cd / usr/lib/hadoop/hadoop-2.7.4/etc/hadoop

Cp mapred-site.xml.template mapred-site.xml

Vi mapred-site.xml (configuration of MapReduce)

The contents are as follows:

Mapreduce.framework.name

Yarn

Vi yarn-site.xml (yarn configuration)

The contents are as follows:

Yarn.nodemanager.aux-services

Mapreduce_shuffle

Format hdfs file system

You must have this operation when you run hadoop for the first time, as follows:

/ usr/lib/hadoop/hadoop-2.7.4/bin/hadoop namenode-format

During execution, you may need to confirm whether to continue. If so, enter y to enter.

It proves to be successful when you see the following

If you see exiting with status 1, run the following command, and then format the hdfs

Mkdir-pv / tmp/hadoop-root/dfs/name

Start hadoop (hdfs and yarn)

Sh / usr/lib/hadoop/hadoop-2.7.4/sbin/start-all.sh

Sh / usr/lib/hadoop/hadoop-2.7.4/sbin/stop-all.sh # stop

If there is no wrong report, it will be a success.

Use the jps command to view the process. If the following appears, it is confirmed and must be successful.

PS: if you modify the above xml file, you need to restart the service

Use web to view Hadoop running status

Http:// your server ip address: 50070 /

Use web to view cluster status

Http:// your server IP address: 8088

Problems that may be encountered

If you format hdfs many times, you may not be able to start datanode because the id is inconsistent. The general solution is to change namenode clusterID and datanode clusterID to the same. The modified file is the content of VERSION under the name or data file under / tmp/hadoop-root/dfs/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.