The process of installing and configuring hadoop under linux 10/25 Update SLTechnology News&Howtos

The process of installing and configuring hadoop under linux

2025-10-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "the installation and configuration process of hadoop under linux". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the installation and configuration process of hadoop under linux".

1, install linux

This article assumes that we start with bare metal, so we should install linux first. I am a supporter of ubuntu, so I'm pretending to be ubuntu. No nonsense, it's easy to install it. During installation, it should be noted that all nodes should have regular host names, such as node0, node1, node2.

2, prepare

The following software installation, all use apt, this software requires an Internet link, if you are extremely slow to connect to the Internet, or can not connect to the external network (this phenomenon is very common in the education website), you can install from the deb file in the / var/cache/apt/archives folder on a ubuntu computer (what? You ask me why there is no desired package in this folder, this folder is the apt pair cache folder, I want to have something you want in it, I don't have to remind you what to do.

Ubuntu has installed openssh-client by default, so you need to install openssh-server. The command is as follows:

The code is as follows:

Sudo apt-get install openssh-server

Then there is jdk:

The code is as follows:

Sudo apt-get install default-jdk

3. Configure the network

If your nodes can get ip through the dhcp server, I personally suggest that this is done because it is simple and does not need to be configured. Or you can use static ip, as a linux of Niu b, you should be able to set static ip with an one-line command, but I won't. I usually set up ip pairs in the network manager provided by gnome. If you don't have a graphical interface in linux, please google.

After setting the ip address, remember to give each node a name in the / etc/hosts file, preferably its own hostname, which is beneficial to the configuration and management below.

Hadoop requires that nodes can log in to each other with ssh without entering a password. I used a simpler method than the official method to set it up, but it is said that there is a problem with security. Here are the methods:

Execute the following command on a node:

The code is as follows:

Rm-rf ~ / .ssh

The code is as follows:

Ssh-keygen-t rsa

After this command, you need to press enter several times until the command prompt appears again. Of course, this practice is not very safe, there is a place to enter key, the official suggestion is to enter the lyrics of a song, hehe, this suggestion is very funny. Of course, in my step, this key is null.

The code is as follows:

Cat ~ / .ssh/id_rsa.pub > > authorized_keys

Then test the entire .ssh folder on all pairs of nodes, the specific method can use the scp command, the specific command is related to the specific environment, the following will not be written.

In this way, the network will be set up right.

4, install hadoop

Decompress, this is relatively simple, the command is not written. Decompression on the location of the best every hadoop is the same, different what the consequences, I have not tried.

After that, you need to make some configurations. First, go to the decompressed directory to the conf folder, and modify the configuration files including hadoop-env.sh hadoop-site.xml masters slaves.

In hadoop-env.sh, you want to uncomment the following line:

The code is as follows:

Export JAVA_HOME=/home/hadoop/jdk1.6.0_16/

Of course, the specific value should also be changed according to the specific situation.

Next is hadoop-site.xml, and here are the contents of my file.

The code is as follows:

Fs.default.name

Hdfs://node0:6000

Mapred.job.tracker

Node0:6001

This example is very straightforward and does not explain.

Masters contains the hostname or ip address of the node where jobtracker and namenode are located. There is only one line in my masters file, of course, if you want to set up multiple nodes as master nodes.

Slaves contains all tasktracker and datanode pairs hostname or ip address.

5, run

Enter the hadoop installation folder first. Then run the following command in turn:

The code is as follows:

Bin/hadoop namenode-format

Bin/start-all.sh

If nothing happens, hadoop can now be used.

Thank you for reading, the above is the content of "the process of installation and configuration of hadoop under linux". After the study of this article, I believe you have a deeper understanding of the process of installation and configuration of hadoop under linux. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.