How to install Hadoop stand-alone and fully distributed 04/17 Update SLTechnology News&Howtos

How to install Hadoop stand-alone and fully distributed

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly introduces "how to install Hadoop stand-alone version and fully distributed". In daily operation, I believe many people have doubts about how to install Hadoop stand-alone version and fully distributed version. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "how to install Hadoop stand-alone version and fully distributed version". Next, please follow the editor to study!

Hadoop, distributed big data storage and computing, free and open source! There are Linux basic students installed more smoothly, write a few configuration files can be started, I rookie, so write more detailed. For convenience, the virtual machine system I use is Ubuntu-12. Set up the network connection of the virtual machine to use the bridge mode, so that it is easy to debug in a local area network. There is little difference between stand-alone and cluster installation, let's start with stand-alone and then supplement the configuration of the cluster.

The first step is to install the tools first.

Editor: vim

The code is as follows:

Sudo apt-get install vim

Ssh server: openssh, ssh is installed first to use remote terminal tools (putty or xshell, etc.), which makes it much easier to manage virtual machines.

The code is as follows:

Sudo apt-get install openssh-server

Step two, some basic settings

It is best to set a fixed IP for the virtual machine.

The code is as follows:

Sudo vim / etc/network/interfaces

Add the following:

Iface eth0 inet static

Address 192.168.0.211

Gateway 192.168.0.222

Netmask 255.255.255.0

Modify the machine name. The name I specify here is: hadoopmaster, which will be used as namenode later.

The code is as follows:

Sudo vim / etc/hostname

Modify hosts to facilitate response to IP changes, as well as memory and identification

The code is as follows:

Sudo vim / etc/hosts

Add content:

192.168.0.211 hadoopmaster

Step 3, add a user specifically for hadoop

The code is as follows:

Sudo addgroup hadoop

Sudo adduser-ingroup hadoop hadoop

Set sudo permissions for hadoop users

The code is as follows:

Sudo vim / etc/sudoers

In root ALL= (ALL:ALL)

Add a line hadoop ALL= (ALL:ALL) below

Switch to hadoop user su hadoop

Step 4, unzip and install JDK,HADOOP,PIG (install PIG by the way)

The code is as follows:

Sudo tar zxvf. / jdk-7-linux-i586.tar.gz-C / usr/local/jvm/

Sudo tar zxvf. / hadoop-1.0.4.tar.gz-C / usr/local/hadoop

Sudo tar zxvf. / pig-0.11.1.tar.gz-C / usr/local/pig

Modify the extracted directory name and the final path is:

The code is as follows:

Jvm: / usr/local/jvm/jdk7

Hadoop: / usr/local/hadoop/hadoop (Note: all nodes in hadoop must be installed in the same path)

Pig: / usr/local/pig

Set the user to which the directory belongs

The code is as follows:

Sudo chown-R hadoop:hadoop jdk7

Sudo chown-R hadoop:hadoop hadoop

Sudo chown-R hadoop:hadoop pig

Set environment variables and edit ~ / .bashrc or ~ / .profile files to join

The code is as follows:

Export JAVA_HOME=/usr/local/jvm/jdk7

Export JRE_HOME=$ {JAVA_HOME} / jre

Export CLASSPATH=.:$ {JAVA_HOME} / lib:$ {JRE_HOME} / lib

Export PATH=$ {JAVA_HOME} / bin:$PATH

Export HADOOP_INSTALL=/usr/local/hadoop/hadoop

Export PATH=$ {HADOOP_INSTALL} / bin:$PATH

Source ~ / .profile takes effect

Step 5, .ssh login without a password, that is to say, no password is required for ssh to this computer.

The code is as follows:

Ssh-keygen-t rsa-P''- f ~ / .ssh/id_rsa

Cat ~ / .ssh/id_rsa.pub > > ~ / .ssh/authorized_keys

If it doesn't work, modify the permissions:

The code is as follows:

Chmod 700. ssh

Chmod 600 ~ / .ssh/authorized_keys

Authorized_keys is equivalent to a whitelist. Id_rsa.pub is the public key. When authorized_keys has the public key of the requester machine, the ssh server releases it directly without a password.

Step 6, Hadoop must be set up

All settings files are in the hadoop/conf directory

1. Hadoop-env.sh finds # export JAVA_HOME, removes the comment #, and sets the actual jdk path

2 、 core-site.xml

The code is as follows:

Fs.default.name

Hdfs://hadoopmaster:9000

Hadoop.tmp.dir

/ usr/local/hadoop/tmp

3 、 mapred-site.xml

The code is as follows:

Mapred.job.tracker

Hadoopmaster:9001

4 、 hdfs-site.xml

The code is as follows:

Dfs.name.dir

/ usr/local/hadoop/datalog1,/usr/local/hadoop/datalog2

Dfs.data.dir

/ usr/local/hadoop/data1,/usr/local/hadoop/data2

Dfs.replication

one

5. File masters and file slaves. You can write localhost on a single machine.

Step 7: start Hadoop

Format the HDFS file system of Hadoop

The code is as follows:

Hadoop namenode-format

Execute the Hadoop startup script, if it is a cluster, on the master, and the other slave nodes Hadoop will be executed through ssh:

The code is as follows:

Start-all.sh

Execute the command jps if there are: Namenode,SecondaryNameNode,TaskTracker,DataNode,JobTracker and other five processes indicate that the startup is successful!

Step 8, the configuration of the cluster

The installation of all other stand-alone machines is the same as above, and only the additional configuration of the cluster is added below!

It is best to configure a stand-alone machine first, others can be copied directly through scp, and the path is the same, including java!

List of hosts for this example (set hosts):

Set ssh so that master can log in to other slaves without a password, which is mainly used to start slaves

The code is as follows:

Copy the id_rsa.pub under hadoopmaster to the child node:

Scp. / ssh/id_rsa.pub hadoopnode1:/home/hadoop/.ssh/id_master

Scp. / ssh/id_rsa.pub hadoopnode2:/home/hadoop/.ssh/id_master

Execute under the child node ~ / .ssh / directory respectively:

Cat. / id_master > > authorized_keys

Masters file, added as the hostname of secondarynamenode or namenode, one per line.

Cluster writes master names such as: hadoopmaster

Slaves file, added as the hostname of slave, one per line.

Cluster write child node names, such as hadoopnode1, hadoopnode2

Hadoop management

After hadoop starts, it starts a task management service and a file system management service, which are two JETTY-based WEB services, so you can check the operation online through WEB.

The task management service runs on port 50030, such as http://127.0.0.1:50030 file system management service runs on port 50070.

Parameter description:

1. Dfs.name.dir: the local file system path where NameNode persists namespaces and transaction logs. When this value is a comma-separated directory list, nametable data will be copied to all directories for redundant backup.

2. Dfs.data.dir: the local file system path where DataNode stores block data, a comma-separated list. When this value is a comma-separated directory list, the data is stored in all directories, usually distributed across different devices.

3. Dfs.replication: the number of data that needs to be backed up. The default is 3. If this number is greater than the number of machines in the cluster, an error will occur.

At this point, the study on "how to install Hadoop stand-alone and fully distributed" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.