In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
Today, I will talk to you about how to use VirtualBox to build a fully distributed Ubuntu15.04 Hadoop2.7.1 cluster on window. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.
Let's take a look at the screenshot of the configured cluster:
Note (default belongs to docker, don't worry about it, the following four are, of which Ubuntu_0 is master,Ubuntu_1,2,3 and slave node)
one。 Create new virtual machines, configure basic java environment, and configure network access
Download Ubuntu15.04, open VirtualBox, create a new Ubuntu virtual machine, user name linux1, no screenshot, 1G memory is enough.
Next, download and install JDK:
Download: go to the official website to download the corresponding version of JDK. This is jdk-8u60-linux-x64.tar.gz.
Create a new installation directory:
Sudo mkdir / usr/local/java
Extract the JDK:
Sudo tar xvf ~ / Downloads/jdk-8u60-linux-x64.tar.gz-C / usr/local/java
Set the global environment variable:
Sudo gedit / .bashrc
Add at the end of the file:
Export JAVA_HOME=/usr/local/java/jdk1.8.0_60 export JRE_HOME=$ {JAVA_HOME} / jre export CLASSPATH=.:$ {JAVA_HOME} / lib:$ {JRE_HOME} / lib export PATH=$ {JAVA_HOME} / bin:$PATH
Verification: newly opened terminal, enter java for verification (does not take effect in the current terminal)
Next, configure the network (why do you want to configure: because you are building a cluster and using cluster services, you must want machines other than the cluster to be able to access, instead of being as convenient as those people on the Internet, directly installing Eclipse on the master for development, which is not right. For example, my host is Windows and the cluster is built with VirtualBox. I want to use Eclipse for programming on windows and use the cluster Hadoop service. I do not want to install Eclipse development on master, although it will save a lot of trouble of error resolution, but it is wrong! The service is to be invoked remotely)
Set up the first network card: NAT allows the virtual machine to use the host IP to access the Internet, so that your virtual machine can install whatever software is missing, convenient!
Next, set up the second network card: this enables the host to ping the virtual machine.
two。 Clone a virtual machine
Select the first ubuntu_0 (be sure to turn it off) and you will find the sheep on the right
(you should know why it is a sheep.) the icon can be clicked. I am using the cluster now. I am too lazy to turn it off. I just look for someone else's picture. After clicking it, it looks like this:
Be careful to reset the network card settings, name it whatever you want. I clone a total of 3 virtual machines, named Ubuntu_1,Ubuntu_2,Ubuntu_3, "full copy", and click OK all the time.
three。 Set virtual machine static IP
Why set it up? The virtual machine is DHCP by default. If you build a Hadoop cluster, you can't always let the machine of the Hadoop cluster start up and change the IP. That's troublesome. Therefore, it is necessary to set up static IP.
I am confused about the Internet, so I will say my way.
The following actions apply to all four virtual machines.
Sudo gedit / etc/network/interfaces
In
Auto loiface lo inet loopback
Add the following:
Auto eth2 # this is the second network card iface eth2 inet staticaddress 192.168.99.101 # in the terminal input ifconfig to view, and then each machine this address last paragraph (a total of four segments) self-increment 1 (these four machines are 101 (used as master), 100102103 (these three are used as slave) netmask 255.255.255.0 # ifconfiggateway 10.0.2.2 # route view, the first line is
Now that the static IP is ready, the next step is to set the host name.
Command:
Sudo gedit / etc/hostname
Command:
Sudo gedit / etc/hosts
Change it to the required hostname (I am linux0-cloud,linux1-cloud,linux2-cloud,linux3-cloud here) and restart? Wait a minute, we're not done yet.
Next, modify the hosts file:
Why do you want to set up what hosts,hosts is for? my understanding is to find IP according to the host name, so?
Modify the hosts files of all virtual machines, command: sudo gedit / etc/hosts. Set to as shown:
The above operation of the four machines must be applied! All right, restart it!
four。 Install SSH so that master can log in to all slave nodes without a password (without explaining why)
All hosts install ssh
Command:
Sudo apt-get install ssh
On the master node
Command:
Ssh-keygen-t rsa-P "" cat .ssh / id_rsa.pub > > .ssh/authorized_keys
Use ssh localhost to see if you can log in without a password
Next, master connects to the slave node by ssh without a password.
All other nodes execute the command:
Ssh-keygen-t rsa-P ""
Next, you can log in to the slave node without a password as long as you put the public key of the master to another ssh node
Copy master .ssh / authorized_keys to other slave nodes using the scp command, so that master accesses slave without a password (if slave accesses master, the process is reversed)
Execute the command on master:
Scp .ssh / authorized_keys linux1@linux1-cloud:~/.ssh/authorized_keysscp .ssh / authorized_keys linux1@linux2-cloud:~/.ssh/authorized_keysscp .ssh / authorized_keys linux1@linux3-cloud:~/.ssh/authorized_keys
five。 Install Hadoop2.7.1
First configure the master node, and then copy the configured files to the rest of the machine
On master
Create a new directory, command: mkdir ~ / hadoop
Extract hadoop, command: tar xvf ~ / Downloads/hadoop-2.7.1.tar.gz-C ~ / hadoop
Create a new hdfs folder (cannot be created using sudo, permission issue):
Mkdir ~ / dfsmkdir ~ / dfs/namemkdir ~ / dfs/datamkdir ~ / tmp
Modify hadoop/hadoop-2.7.1/etc/hadoop/hadoop-env.sh configuration file
Export JAVA_HOME=/usr/local/java/jdk1.8.0_60
Modify the / etc/hadoop/slaves file:
Modify / etc/hadoop/core-site.xml file
Fs.defaultFS hdfs://linux0-cloud:8020 io.file.buffer.size 131072 hadoop.tmp.dir / home/linux1/tmp Abase for other temporary directories.
Modify / etc/hadoop/hdfs-site.xml file
Dfs.namenode.secondary.http-address linux0-cloud:9001 makes namenode as secondary namenode at the same time You should actually set up other machines, such as linux1-cloud:9001, you can visit linux0-cloud:50070 or you can visit linux0-cloud:9001 (or other things such as linux1-cloud:8001) to view hadoop profiles (namenode status is synchronized) dfs.namenode.name.dir file:/home/linux1/dfs/name dfs.datanode.data.dir File:/home/linux1/dfs/data dfs.replication 3 dfs.webhdfs.enabled true
Modify etc/hadoop/mapred-site.xml
Mapreduce.framework.name yarn mapreduce.jobhistory.address linux0-cloud:10020 mapreduce.jobhistory.webapp.address linux0-cloud:19888
Modify the yarn-site.xml file
Yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.address linux0-cloud:8032 yarn.resourcemanager.scheduler.address linux0-cloud:8030 Yarn.resourcemanager.resource-tracker.address linux0-cloud:8031 yarn.resourcemanager.admin.address linux0-cloud:8033 yarn.resourcemanager.webapp.address linux0-cloud:8088
Copy the hadoop to the other slave nodes below:
Command:
Sudo scp-r ~ / hadoop linux1@linux1-cloud:~/sudo scp-r ~ / hadoop linux1@linux2-cloud:~/sudo scp-r ~ / hadoop linux1@linux3-cloud:~/
Set all node environment variables:
Gedit / .bashrc
Add:
Export HADOOP_HOME=/home/linux1/hadoop/hadoop-2.7.1export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export HADOOP_YARN_HOME=$HADOOP_HOME export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Apply environment variables:
Source / .bashrc
six。 Start Hadoop
First format:
Hdfs namenode-format
Start hdfs:
Start-dfs.sh
Start yarn:
Start-yarn.sh
You can also enter 192.168.99.101bure50070 to access it without giving you screenshots. The host browser has a lot of tags.
After reading the above, do you have any further understanding of how to use VirtualBox to build a fully distributed Ubuntu15.04 Hadoop2.7.1 cluster on window? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.