How to use VirtualBox to build Ubuntu15.04 fully distributed Hadoop2.7.1 Cluster on window 07/19 Update SLTechnology News&Howtos

How to use VirtualBox to build Ubuntu15.04 fully distributed Hadoop2.7.1 Cluster on window

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to use VirtualBox to build a fully distributed Ubuntu15.04 Hadoop2.7.1 cluster on window. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Let's take a look at the screenshot of the configured cluster:

Note (default belongs to docker, don't worry about it, the following four are, of which Ubuntu_0 is master,Ubuntu_1,2,3 and slave node)

one。 Create new virtual machines, configure basic java environment, and configure network access

Download Ubuntu15.04, open VirtualBox, create a new Ubuntu virtual machine, user name linux1, no screenshot, 1G memory is enough.

Next, download and install JDK:

Download: go to the official website to download the corresponding version of JDK. This is jdk-8u60-linux-x64.tar.gz.

Create a new installation directory:

Sudo mkdir / usr/local/java

Extract the JDK:

Sudo tar xvf ~ / Downloads/jdk-8u60-linux-x64.tar.gz-C / usr/local/java

Set the global environment variable:

Sudo gedit / .bashrc

Add at the end of the file:

Export JAVA_HOME=/usr/local/java/jdk1.8.0_60 export JRE_HOME=$ {JAVA_HOME} / jre export CLASSPATH=.:$ {JAVA_HOME} / lib:$ {JRE_HOME} / lib export PATH=$ {JAVA_HOME} / bin:$PATH

Verification: newly opened terminal, enter java for verification (does not take effect in the current terminal)

Next, configure the network (why do you want to configure: because you are building a cluster and using cluster services, you must want machines other than the cluster to be able to access, instead of being as convenient as those people on the Internet, directly installing Eclipse on the master for development, which is not right. For example, my host is Windows and the cluster is built with VirtualBox. I want to use Eclipse for programming on windows and use the cluster Hadoop service. I do not want to install Eclipse development on master, although it will save a lot of trouble of error resolution, but it is wrong! The service is to be invoked remotely)

Set up the first network card: NAT allows the virtual machine to use the host IP to access the Internet, so that your virtual machine can install whatever software is missing, convenient!

Next, set up the second network card: this enables the host to ping the virtual machine.

two。 Clone a virtual machine

Select the first ubuntu_0 (be sure to turn it off) and you will find the sheep on the right

(you should know why it is a sheep.) the icon can be clicked. I am using the cluster now. I am too lazy to turn it off. I just look for someone else's picture. After clicking it, it looks like this:

Be careful to reset the network card settings, name it whatever you want. I clone a total of 3 virtual machines, named Ubuntu_1,Ubuntu_2,Ubuntu_3, "full copy", and click OK all the time.

three。 Set virtual machine static IP

Why set it up? The virtual machine is DHCP by default. If you build a Hadoop cluster, you can't always let the machine of the Hadoop cluster start up and change the IP. That's troublesome. Therefore, it is necessary to set up static IP.

I am confused about the Internet, so I will say my way.

The following actions apply to all four virtual machines.

Sudo gedit / etc/network/interfaces

Auto loiface lo inet loopback

Add the following:

Auto eth2 # this is the second network card iface eth2 inet staticaddress 192.168.99.101 # in the terminal input ifconfig to view, and then each machine this address last paragraph (a total of four segments) self-increment 1 (these four machines are 101 (used as master), 100102103 (these three are used as slave) netmask 255.255.255.0 # ifconfiggateway 10.0.2.2 # route view, the first line is

Now that the static IP is ready, the next step is to set the host name.

Command:

Sudo gedit / etc/hostname

Command:

Sudo gedit / etc/hosts

Change it to the required hostname (I am linux0-cloud,linux1-cloud,linux2-cloud,linux3-cloud here) and restart? Wait a minute, we're not done yet.

Next, modify the hosts file:

Why do you want to set up what hosts,hosts is for? my understanding is to find IP according to the host name, so?

Modify the hosts files of all virtual machines, command: sudo gedit / etc/hosts. Set to as shown:

The above operation of the four machines must be applied! All right, restart it!

four。 Install SSH so that master can log in to all slave nodes without a password (without explaining why)

All hosts install ssh

Command:

Sudo apt-get install ssh

On the master node

Command:

Ssh-keygen-t rsa-P "" cat .ssh / id_rsa.pub > > .ssh/authorized_keys

Use ssh localhost to see if you can log in without a password

Next, master connects to the slave node by ssh without a password.

All other nodes execute the command:

Ssh-keygen-t rsa-P ""

Next, you can log in to the slave node without a password as long as you put the public key of the master to another ssh node

Copy master .ssh / authorized_keys to other slave nodes using the scp command, so that master accesses slave without a password (if slave accesses master, the process is reversed)

Execute the command on master:

Scp .ssh / authorized_keys linux1@linux1-cloud:~/.ssh/authorized_keysscp .ssh / authorized_keys linux1@linux2-cloud:~/.ssh/authorized_keysscp .ssh / authorized_keys linux1@linux3-cloud:~/.ssh/authorized_keys

five。 Install Hadoop2.7.1

First configure the master node, and then copy the configured files to the rest of the machine

On master

Create a new directory, command: mkdir ~ / hadoop

Extract hadoop, command: tar xvf ~ / Downloads/hadoop-2.7.1.tar.gz-C ~ / hadoop

Create a new hdfs folder (cannot be created using sudo, permission issue):

Mkdir ~ / dfsmkdir ~ / dfs/namemkdir ~ / dfs/datamkdir ~ / tmp

Modify hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh configuration file

Export JAVA_HOME=/usr/local/java/jdk1.8.0_60

Modify the / etc/hadoop/slaves file:

Modify / etc/hadoop/core-site.xml file

Fs.defaultFS hdfs://linux0-cloud:8020 io.file.buffer.size 131072 hadoop.tmp.dir / home/linux1/tmp Abase for other temporary directories.

Modify / etc/hadoop/hdfs-site.xml file

Dfs.namenode.secondary.http-address linux0-cloud:9001 makes namenode as secondary namenode at the same time You should actually set up other machines, such as linux1-cloud:9001, you can visit linux0-cloud:50070 or you can visit linux0-cloud:9001 (or other things such as linux1-cloud:8001) to view hadoop profiles (namenode status is synchronized) dfs.namenode.name.dir file:/home/linux1/dfs/name dfs.datanode.data.dir File:/home/linux1/dfs/data dfs.replication 3 dfs.webhdfs.enabled true

Modify etc/hadoop/mapred-site.xml

Mapreduce.framework.name yarn mapreduce.jobhistory.address linux0-cloud:10020 mapreduce.jobhistory.webapp.address linux0-cloud:19888

Modify the yarn-site.xml file

Yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address linux0-cloud:8032 yarn.resourcemanager.scheduler.address linux0-cloud:8030 Yarn.resourcemanager.resource-tracker.address linux0-cloud:8031 yarn.resourcemanager.admin.address linux0-cloud:8033 yarn.resourcemanager.webapp.address linux0-cloud:8088

Copy the hadoop to the other slave nodes below:

Command:

Sudo scp-r ~ / hadoop linux1@linux1-cloud:~/sudo scp-r ~ / hadoop linux1@linux2-cloud:~/sudo scp-r ~ / hadoop linux1@linux3-cloud:~/

Set all node environment variables:

Gedit / .bashrc

Add:

Export HADOOP_HOME=/home/linux1/hadoop/hadoop-2.7.1export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

Apply environment variables:

Source / .bashrc

six。 Start Hadoop

First format:

Hdfs namenode-format

Start hdfs:

Start-dfs.sh

Start yarn:

Start-yarn.sh

You can also enter 192.168.99.101bure50070 to access it without giving you screenshots. The host browser has a lot of tags.

After reading the above, do you have any further understanding of how to use VirtualBox to build a fully distributed Ubuntu15.04 Hadoop2.7.1 cluster on window? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.