Chapter 2 of the introductory Apache Hadoop tutorial 07/06 Update SLTechnology News&Howtos

Chapter 2 of the introductory Apache Hadoop tutorial

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Installation configuration on a single node of Apache Hadoop

The following will demonstrate how to quickly complete the installation and configuration of Hadoop on a single node so that you can have some experience with the Hadoop HDFS and MapReduce framework.

precondition

Support platform:

GNU/Linux: it has been confirmed that Hadoop can support clusters of 2000 nodes on the GNU/Linux platform.

Windows . The examples demonstrated in this article are all running on the GNU/Linux platform. If you run on Windows, you can refer to http://wiki.apache.org/hadoop/Hadoop2OnWindows.

Required software:

Java must be installed. Hadoop 2.7 and later, you need to install Java 7, which can be OpenJDK or JDK/JRE of Oracle (HotSpot). For other versions of JDK requirements, see http://wiki.apache.org/hadoop/HadoopJavaVersions

Ssh must install and keep sshd running so that the remote Hadoop daemon can be managed with Hadoop scripts. The following is an example of an installation on Ubuntu:

$sudo apt-get install ssh

$sudo apt-get install rsync

one

two

download

The download address is http://www.apache.org/dyn/closer.cgi/hadoop/common/.

Preparation for running a Hadoop cluster

Extract the downloaded Hadoop distribution. Edit the etc/hadoop/hadoop-env.sh file and define the following parameters:

Set the installation directory for Java

Export JAVA_HOME=/usr/java/latest

one

two

Try the following command:

$bin/hadoop

one

The usage document for the hadoop script will be displayed.

Now you can start the Hadoop cluster in one of the three supported modes:

Local (stand-alone) mode

Pseudo-distributed mode

Fully distributed mode

Operation method of stand-alone mode

By default, Hadoop is configured as a stand-alone Java process running in non-distributed mode. This is very helpful for debugging.

The following example takes the extracted copy of the conf directory as input to find and display entries that match a given regular expression. The output is written to the specified output directory.

$mkdir input

$cp etc/hadoop/.xml input

$bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs [a murz.] +'

$cat output/

one

two

three

four

The operation method of pseudo-distributed mode

Hadoop can run on a single node in a so-called pseudo-distributed mode, where each Hadoop daemon runs as a separate Java process.

Configuration

Use the following:

Etc/hadoop/core-site.xml:

Fs.defaultFS

Hdfs://localhost:9000

one

two

three

four

five

six

Etc/hadoop/hdfs-site.xml:

Dfs.replication

one

Those who are interested can move on to the next chapter.

Many people know that I have big data training materials, and they naively think that I have a full set of big data development, hadoop, spark and other video learning materials. I would like to say that you are right. I do have a full set of video materials developed by big data, hadoop and spark.

If you are interested in big data development, you can add a group to get free learning materials: 763835121

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.