How to build Hadoop in Linux system 07/19 Update SLTechnology News&Howtos

How to build Hadoop in Linux system

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article shows you how to build Hadoop in the Linux system, the content is concise and easy to understand, it will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

Hadoop is a distributed program Hadoop in big data cluster that implements a distributed file system (Distributed File System), one of which is HDFS. HDFS has high fault tolerance and is designed to be deployed on low-cost (low-cost) hardware; and it provides high throughput (high throughput) to access application data, suitable for applications with very large data sets (large data set).

Install the ssh service

Enter the shell command and enter the following command to see if the ssh service has been installed. If not, use the following command to install the service:

Sudo apt-get install ssh openssh-server

The installation process is relatively easy and enjoyable.

1. Create ssh-key. Here we use the rsa method and use the following command:

Ssh-keygen-t rsa-P ""

two。 A figure appears, and the figure that appears is the password. Don't worry about it.

Cat ~ / .ssh/id_rsa.pub > > authorized_keys (it seems to be omitted)

3. Then you can log in without password authentication, as follows:

Ssh localhost

The successful screenshot is as follows:

Download the Hadoop installation package * *

There are also two ways to download Hadoop installation

1. Go directly to the official website to download, http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz

two。 To download using shell, the command is as follows:

Wget http://mirrors.hust.edu.cn/apache/hadoop/core/stable/hadoop-2.7.1.tar.gz

It seems that the second method is faster, and after a long wait, the download is finally completed.

Extract the Hadoop installation package * *

Use the following command to extract the Hadoop installation package

Tar-zxvf hadoop-2.7.1.tar.gz

The folder where hadoop2.7.1 appears after unzipping is complete

Configure the corresponding file in Hadoop * *

The files that need to be configured are as follows, hadoop-env.sh,core-site.xml,mapred-site.xml.template,hdfs-site.xml. All files are located under hadoop2.7.1/etc/hadoop. The specific configuration is as follows:

The 1.core-site.xml configuration is as follows:

The path of hadoop.tmp.dir can be set according to your own habits.

The 2.mapred-site.xml.template configuration is as follows:

The 3.hdfs-site.xml configuration is as follows:

The paths of dfs.namenode.name.dir and dfs.datanode.data.dir can be set freely, preferably under the directory of hadoop.tmp.dir.

In addition, if you can't find jdk when running Hadoop, you can directly place the path of jdk in hadoop.env.sh, as shown below:

Export JAVA_HOME= "/ home/leesf/program/java/jdk1.8.0_60"

Run Hadoop**

After the configuration is complete, run hadoop.

1. Initialize the HDFS system

Use the following command from the hadop2.7.1 directory:

Bin/hdfs namenode-format

The screenshot is as follows:

The process requires ssh authentication, which is already logged in, so type y between initialization procedures.

The successful screenshot is as follows:

Indicates that initialization is complete.

two。 Open the NameNode and DataNode daemons

Turn it on using the following command:

Sbin/start-dfs.sh. The successful screenshot is as follows:

3. View process information

Use the following command to view process information

Jps. The screenshot is as follows:

Indicates that both data DataNode and NameNode are enabled

4. View Web UI

Enter http://localhost:50070 in the browser to view the relevant information. The screenshot is as follows:

At this point, the environment for hadoop has been set up. Let's start using hadoop to run a WordCount example.

Run WordCount Demo**

1. Create a new file locally, and the author creates a new words document under the home/leesf directory, which can be filled in casually.

two。 Create a new folder in HDFS to upload local words documents, and enter the following command under the hadoop2.7.1 directory:

Bin/hdfs dfs-mkdir / test, which means that a test directory has been created under the root directory of hdfs

Use the following command to view the directory structure under the HDFS root directory

Bin/hdfs dfs-ls /

The screenshots are as follows:

Indicates that a test directory has been established under the root directory of HDFS

3. Upload the local words document to the test directory

Use the following command to upload:

Bin/hdfs dfs-put / home/leesf/words / test/

Use the following command to view

Bin/hdfs dfs-ls / test/

The screenshot is as follows:

Indicates that the local words document has been uploaded to the test directory.

4. Run wordcount

Run wordcount using the following command:

Bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount / test/words / test/out

The screenshot is as follows:

After running, generate a file named out in the / test directory and view the files in the / test directory using the following command

Bin/hdfs dfs-ls / test

The screenshot is as follows:

Indicates that there is already a file directory called Out in the test directory

Enter the following command to view the files in the out directory:

Bin/hdfs dfs-ls / test/out. The screenshot of the result is as follows:

Indicates that it has been run successfully, and the result is saved in part-r-00000.

5. View the running results

Use the following command to view the running results:

Bin/hadoop fs-cat / test/out/part-r-00000

The screenshot is as follows:

At this point, the running process is complete.

Hadoop is a software framework capable of distributed processing of a large amount of data. Hadoop processes data in a reliable, efficient and scalable manner

The above content is how to build Hadoop in Linux system. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.