Hadoop2.7.0 configuration in ubuntu14.04 environment + remote eclipse and hdfs calls under windows 09/23 Update SLTechnology News&Howtos

Hadoop2.7.0 configuration in ubuntu14.04 environment + remote eclipse and hdfs calls under windows

2025-09-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

This tutorial is a formal hadoop environment deployed on three computers. Instead of creating user groups, this tutorial is deployed directly under the current users. It is summarized as follows:

1. The host computer name of the three nodes is 192.168.11.33 Master,192.168.11.24 Slaver2192.168.11.4

Slaver1, and the user names of all three hosts should be set to the same, mine are all hadoop users.

Because this tutorial uses a vim editor, if you don't have a vim, you can download one: sudo apt-get install vim

2. Modify the hosts file: switch to root user: sudo-s, then enter the password, vim / etc/hosts, and add the hostname and ip address of the three computers through vim. The host name comes first and the host name comes after:

192.168.12.32 Master

192.168.12.24 Slaver2

192.168.12.4 Slaver1

3. Modify the name of the host, which is also under the root user, vim / etc/hostname. Put the names of the three hosts in the Master,Slaver1,Slaver2 above, corresponding to one host name for each machine. You cannot put all three in it.

4. The command to view the ip address in Ubuntu is: ifconfig

5. To install ssh, before installing ssh, update the source of downloading Ubuntu software: sudo apt-get update

Install ssh: sudo apt-get install ssh

Check whether the sshd service starts: ps-e | grep ssh, if there is 1019? A message like 00:00:00 sshd shows that ssh has been started. If it is not started, enter: / etc/init.d/ssh start or sudo start ssh to start

6. Set the key-free login of ssh, which does not need to be done under the root user, otherwise it cannot be set under the current user.

6.1. ssh-keygen-t rsa, then press enter all the time

6.2. after the above operation is completed, there will be a hidden folder .ssh under the user, which can be viewed through: ls-al

The files inside are: id_rsa,id_rsa.pub

Enter the .ssh directory: cd .ssh, and then execute: cat id_rsa.pub > > authorized_keys (this file does not exist in the .ssh directory at the beginning). After execution, an authorized_keys file will be generated automatically.

6.4.You log in directly under .ssh: ssh localhost, if nothing happens, you will log in successfully and you do not need to enter a password. After logging in, a known_hosts file will be generated under .ssh.

7. The above three hosts all need to perform the operation of 6.

8. Set the keyless login between nodes, configure Master's keyless login to Slaver1 and Slaver2, copy the id_rsa.pub of Master to Slaver1 and Slaver2 nodes, and execute the following commands under the .ssh directory on Slaver1 and Slaver2 nodes, respectively:

Scp hadoop@Master:~/.ssh/id_rsa.pub. / master_rsa.pub

Cat master_rsa.pub > > authorized_keys

9. When completing the above operations, perform such operations on Slaver1 and Slaver2 respectively, that is, to achieve key-free login between the three machines, because in hadoop, the master node and the slave node need to communicate, the namenode needs to manage the datanode, and the datanode also wants the namenode to send status information to tell the namenode its own state, while the datanode also needs to communicate with each other in data replication and storage. Hadoop divides the data into three copies for storage. In replication, the data is obtained from the previous datanode and put into the current datanode, so it is necessary to realize the communication between the three nodes.

10. After completing the above, you can log in without key: ssh Slaver1, ssh Slaver2,ssh Master. The current user can log out directly with exit.

11. If you need a password in step 6 and 6.4, then the key-less login setting failed. The way is to uninstall ssh, reinstall, restart from 6, and shut down the sshd service: sudo stop ssh before uninstalling, then execute:

Sudo apt-get autoremove openssh-server

Sudo apt-get autoremove openssh-client

At the same time, to delete the .ssh file directory: sudo rm-rf .ssh, it's best to restart it, and then start over from step 5.

12. Install java jdk and create a directory under the current user: sudo mkdir Java, which is not carried out under the root user, but installed under the current user.

12.1 extract: tar zxvf jkd name-C (uppercase) / Java

12.2 change the file name in the Java directory to: jdk8:sudo mv jdk name jdk8

12.3 configuration file java path: vim ~ / .bashrc add the following at the end of the file

Export JAVA_HOME = / home/hadoop/Java/jdk8

Export JRE_HOME = ${JAVA_HOME} / jre

Export CLASSPATH=.: {JAVA_HOME} / lib:$ {JRE_HOME} / lib

Export PATH=$ {JAVA_HOME} / bin:$PATH

12.4 after exiting, execute: source ~ / .bashrc to make it effective immediately

12.5 check whether jdk is installed successfully: java-version. If information such as the version of java appears, the installation is successful.

13. Install hadoop, install and configure hadoop in the host Master, and then copy the hadoop to Slaver1 and Slaver2. Although such an installation is not optimal, it is the most convenient for beginners. When you gradually go deep into the study of hadoop, you can tune hadoop according to different machines and performance.

13.1 extract the hadoop to the user's current directory: tar-zxvf / Downloads/hadoop compressed file. If the compressed file is not followed by any path, an unzipped hadoop directory will be created in the user's current directory.

Modify the file name of hadoop: mv hadoop name hadoop

13.3 configuration profile file: vim / etc/profile:

Export HADOOP_INSTALL = / home/hadoop/hadoop

Export PATH = $PATH:$ {HADOOP_INSTALL} / bin

13.4 go to the hadoop directory: cd hadoop, and type: source / etc/profile to make the file you just configured take effect

14. For the configuration of Hadoop configuration files, because hadoop1.x and hadoop2.x use different resource management, yarn is added to hadoop2.x to manage hadoop resources. At the same time, the file directories hadoop1.x and hadoop2.x of hadoop are quite different. Hadoop2.7.0 is used in this tutorial. I would like to say one more sentence here. Many people suggest that for beginners, it is best to use a version like hadoop0.2.0 to learn. I suggest that this is not necessary, because hadoop is constantly developing. To put it bluntly, our purpose of learning hadoop is to use it at work in the future. Now companies generally upgrade their hadoop clusters to a stable version of hadoop2.x, and hadoop0.2.0 is very different from the current hadoop version. For the understanding of hadoop learning has some help, but there is no need to start from scratch to learn, you can learn hadoop2.x directly, but also can not spend too much effort to learn, and hadoop2.x books, the previous version will be introduced, and information is also more.

14.1 go to the hadoop configuration file (this is the file organization for the hadoop2.7.0 version):

Cd / hadoop/etc/hadoop

Using: ls, you can see a lot of configuration information. First, let's configure core-site.xml.

14.2 configure core-site.xml of hadoop: vim core-site.xml

Add at the tail:

Fs.defaultFS

Hdfs://192.168.12.32:9000 / / be sure to use the host's ip, which is needed in the eclipse configuration. This will be discussed in the eclipse link hadoop, so let's configure it this way.

Hadoop.tmp.dir

/ home/hadoop/hadoop/temp

In accordance with the above configuration, the above comments must be removed.

14.3 configure hdfs-site.xml:vim hdfs-site.xml for hadoop

Dfs.namenode.secondary.http-address

192.168.12.32:50090

True

Dfs.namenode.name.dir

/ home/hadoop/hadoop/temp/dfs/name

/ / where home is the root directory, the first hadoop is the hadoop user I established, and the second hadoop is the hadoop file name established when installing hadoop. This is the place where temporary files are set. After initializing the file system, we will see a temp directory under our installed hadoop path. You can set this path according to your preferences, but after setting it up, you can find it later.

True

Dfs.datanode.data.dir

/ home/hadoop/hadoop/temp/dfs/data

True

Dfs.replication

2 / / because I am three machines, I can set 2 copies. When two computers are 2, it should be set to 1.

When you are configuring, be sure to remove the comments after /.

14.4 configure mapred-site.xml. If this file does not exist at the beginning, you need to make a copy from the template first:

Cp mapred-site.xml.template mapred-site.xml

If this file exists, you don't have to use this operation.

Then configure mapred-site.xml: vim mapred-site.xml

Mapreduce.framework.name

Yarn

14.5 configure yarn-site.xml: vim yarn-site.xml

Yarn.resourcemanager.hostname

Master / / this is the hostname

Yarn.nodemanager.aux-services

Mapreduce_shuffle

14.6 configuration slaves file: vim slaves

Slaver1

Slaver2

15. Configure the hadoop-env.sh file, in which there is a directory to import java, but comment it out with #, remove the last #, and then put the jdk8 directory you just installed at the end: export JAVA_HOME=/home/hadoop/Java/jdk8

16. Copy hadoop to Slaver1 and Slaver2:

Scp-r. / hadoop Slaver1:~

Scp-r. / hadoop Slaver2:~

17. Because of the configuration in step 13, we add the bin of hadoop to the shell command, so you can use the hadoop command in the current directory:

17.1 format file system: hadoop namenode-formate

The configuration of multiple lines will be displayed. If you see words such as sucessfull, the formatting is successful.

17.2 start hdfs:start-dfs.sh

17.3 start yarn:start-yarn.sh

17.4 check to see if the startup is successful: jps, if four lines of messages appear on the Master:

5399 Jps

5121 ResourceManager

3975 SecondaryNameNode

4752 NameNode

It indicates that the startup is successful, and the order of display can be different from the number on the left.

If three lines of prompt appear on Slaver:

4645 Jps

4418 DataNode

4531 NodeManager

It indicates success; if datanode cannot be started on Slaver, it may be because it has been configured with a pseudo-distributed hadoop cluster before. You can try to delete the temp folder just configured and reformat it: hadoop namenode-format, starting, should be ready for use.

At this point, we can use hadoop's shell command to operate:

Hadoop dfs-ls

This may give an error message 'ls''.: no file or directiory

At this time, we can try: hadoop dfs-ls /, there will be no mistakes.

Create a directory:

Hadoop dfs-mkdir / user

Hadoop dfs-mkdir / user/hadoop

Hadoop dfs-mkdir / user/hadoop/input

Hadoop dfs-mkdir / user/hadoop/output

When you're done, you'll have a three-level directory, and then create a text to run the wordcount program:

Vim a.txt, write something casually in it:

Hadoop

Aaa

Spark

Then upload this to hdfs: hadoop dfs-copyFromLocal a.txt / user/hadoop/input/a.txt

Run the wordcount program:

Hadoop jar hadoop/share/hadoop/mapreduce/hadop-mapreduce-examples-2.7.0.jar wordcount/ user/hadoop/input/a.txt / user/hadoop/output/wordcount/

Look at the result of the run: hadoop dfs-cat / user/hadoop/output/wordcount/part-r-00000, and you can see the word statistics.

18, static ip settings: there are a lot of static ip settings on the Internet, I found one, according to the above tutorial method, the results of the Master host on the upper right corner of the Internet icon to get no ah, and some can not be on the Internet ah, spent an afternoon, finally figured it out. To set up a permanent static ip, you need three steps to configure three files.

18.1 when you set static ip:sudo vim / etc/network/interfaces, you will have the following information:

Auto lo

Iface lo inet loopback

Comment out: # ifacelo inet loopback

Add:

Auto eth0

Iface eth0 inet static

Address 192.168.12.32 ip address

Netmask 255.255.255.0 subnet mask

Gateway 192.168.12.1 Gateway

Network 192.168.0.0

Broadcast 192.168.11.255 broadcast

Then save and exit.

18.2 configure DNS server, there are many configurations on this network, but when we restart Ubuntu, we return to the default configuration, which makes it impossible to access the Internet normally, so we need to set up a permanent DNS server:

Sudo vim / etc/resolvconf/resolv.conf/base

Add: nameserver 202.96.128.86 DNS server (this is my DNS address). If you don't know your DNS service, you can check it from the network in windows. There are many ways to put it on the Internet, so I won't say much here.

Then save and exit.

18.3 configure the Networkmanager.conf file

Sudo vim / etc/NetworkManager/NetworkManager.conf

The managed=false inside. If it is false, you need to change it to true. Otherwise, save and exit without modification.

18.4 must be to restart the machine, if you can't access the Internet when you start the machine, you need to see whether the file of 18.3 is false, you need to change it, and then restart the machine to surf the Internet.

19. The above is about the cluster of the formal environment, which has been successfully built; then we usually do not develop on Master. We need to develop in eclipse under windows and test it on eclipse. Therefore, what we need to do next is to build the development environment of hadoop under windows7+eclipse.

Download the hadoop component of eclipse, because after hadoop2.x, Apache will no longer provide the hadoop component of eclipse, only the source code, we need to build it ourselves. I am downloading a hadoop component of eclipse from the Internet.

If the hadoop version 2.7.0 is used, the hadoop component using hadoop2.6.0 can be used, my hadoop version is 2.7.0, and the hadoop component of eclipse is hadoop2.6.0

Download component address:

19.2. I choose the version of eclipse. I use spring-tool, which integrates eclipse. The reason for this is that I used this version when I was an intern, and I saw from the Internet that many people who directly use eclipse need to choose the version, and there are many errors.

Download address: http://spring.io/tools, click the place below to download, download and decompress directly to a disk can be used.

Enter the sts-3.7.1RELEASE and you can open it with a STL.exe.

19.3. Put the hadoop component 2.6.0 of eclipse just downloaded into the plugins in the image above, restart STL, and then click Window- > Preferences, you will see

Then click Window- > Perspective- > Open Perspectiver- > other, and you will see the following in the upper left corner:

Directly below, you will see:

Right-click the Location in the figure above, and these three options will appear. Click New Hadoop Location, and the following message will pop up:

The above Location name: you can fill in any name you want. My name is: myhadoop.

Host:192.168.12.32, which is the ip address of the Master where we installed the cluster, instead of filling in Master directly, because there will be a listing folder content error when filling in Master and when linking to hdfs, and when we configure the core-site.xml file, we recommend that we use the ip address instead of Master, which is intended to be in this place.

Port:9001, this is the same as the configuration of core-site.xml that we set up.

In DFS Master, Prot:9000

Then save and exit.

19.4. Then click on the DFS Locations of this

Will show up.

The name is the same as the one we set just now.

Then click myhadoop, and it will appear:

This is the hadoop directory I created. At this point, the installation of the entire hadoop and the remote development of hadoop under windows have been basically introduced. But to really realize the development of hadoop, we still need to learn how to install maven in eclipse. Let's write this later.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.