In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/02 Report--
This tutorial is a formal hadoop environment deployed on three computers. Instead of creating user groups, this tutorial is deployed directly under the current users. It is summarized as follows:
1. The host computer name of the three nodes is 192.168.11.33 Master,192.168.11.24 Slaver2192.168.11.4
Slaver1, and the user names of all three hosts should be set to the same, mine are all hadoop users.
Because this tutorial uses a vim editor, if you don't have a vim, you can download one: sudo apt-get install vim
2. Modify the hosts file: switch to root user: sudo-s, then enter the password, vim / etc/hosts, and add the hostname and ip address of the three computers through vim. The host name comes first and the host name comes after:
192.168.12.32 Master
192.168.12.24 Slaver2
192.168.12.4 Slaver1
3. Modify the name of the host, which is also under the root user, vim / etc/hostname. Put the names of the three hosts in the Master,Slaver1,Slaver2 above, corresponding to one host name for each machine. You cannot put all three in it.
4. The command to view the ip address in Ubuntu is: ifconfig
5. To install ssh, before installing ssh, update the source of downloading Ubuntu software: sudo apt-get update
Install ssh: sudo apt-get install ssh
Check whether the sshd service starts: ps-e | grep ssh, if there is 1019? A message like 00:00:00 sshd shows that ssh has been started. If it is not started, enter: / etc/init.d/ssh start or sudo start ssh to start
6. Set the key-free login of ssh, which does not need to be done under the root user, otherwise it cannot be set under the current user.
6.1. ssh-keygen-t rsa, then press enter all the time
6.2. after the above operation is completed, there will be a hidden folder .ssh under the user, which can be viewed through: ls-al
The files inside are: id_rsa,id_rsa.pub
Enter the .ssh directory: cd .ssh, and then execute: cat id_rsa.pub > > authorized_keys (this file does not exist in the .ssh directory at the beginning). After execution, an authorized_keys file will be generated automatically.
6.4.You log in directly under .ssh: ssh localhost, if nothing happens, you will log in successfully and you do not need to enter a password. After logging in, a known_hosts file will be generated under .ssh.
7. The above three hosts all need to perform the operation of 6.
8. Set the keyless login between nodes, configure Master's keyless login to Slaver1 and Slaver2, copy the id_rsa.pub of Master to Slaver1 and Slaver2 nodes, and execute the following commands under the .ssh directory on Slaver1 and Slaver2 nodes, respectively:
Scp hadoop@Master:~/.ssh/id_rsa.pub. / master_rsa.pub
Cat master_rsa.pub > > authorized_keys
9. When completing the above operations, perform such operations on Slaver1 and Slaver2 respectively, that is, to achieve key-free login between the three machines, because in hadoop, the master node and the slave node need to communicate, the namenode needs to manage the datanode, and the datanode also wants the namenode to send status information to tell the namenode its own state, while the datanode also needs to communicate with each other in data replication and storage. Hadoop divides the data into three copies for storage. In replication, the data is obtained from the previous datanode and put into the current datanode, so it is necessary to realize the communication between the three nodes.
10. After completing the above, you can log in without key: ssh Slaver1, ssh Slaver2,ssh Master. The current user can log out directly with exit.
11. If you need a password in step 6 and 6.4, then the key-less login setting failed. The way is to uninstall ssh, reinstall, restart from 6, and shut down the sshd service: sudo stop ssh before uninstalling, then execute:
Sudo apt-get autoremove openssh-server
Sudo apt-get autoremove openssh-client
At the same time, to delete the .ssh file directory: sudo rm-rf .ssh, it's best to restart it, and then start over from step 5.
12. Install java jdk and create a directory under the current user: sudo mkdir Java, which is not carried out under the root user, but installed under the current user.
12.1 extract: tar zxvf jkd name-C (uppercase) / Java
12.2 change the file name in the Java directory to: jdk8:sudo mv jdk name jdk8
12.3 configuration file java path: vim ~ / .bashrc add the following at the end of the file
Export JAVA_HOME = / home/hadoop/Java/jdk8
Export JRE_HOME = ${JAVA_HOME} / jre
Export CLASSPATH=.: {JAVA_HOME} / lib:$ {JRE_HOME} / lib
Export PATH=$ {JAVA_HOME} / bin:$PATH
12.4 after exiting, execute: source ~ / .bashrc to make it effective immediately
12.5 check whether jdk is installed successfully: java-version. If information such as the version of java appears, the installation is successful.
13. Install hadoop, install and configure hadoop in the host Master, and then copy the hadoop to Slaver1 and Slaver2. Although such an installation is not optimal, it is the most convenient for beginners. When you gradually go deep into the study of hadoop, you can tune hadoop according to different machines and performance.
13.1 extract the hadoop to the user's current directory: tar-zxvf / Downloads/hadoop compressed file. If the compressed file is not followed by any path, an unzipped hadoop directory will be created in the user's current directory.
Modify the file name of hadoop: mv hadoop name hadoop
13.3 configuration profile file: vim / etc/profile:
Export HADOOP_INSTALL = / home/hadoop/hadoop
Export PATH = $PATH:$ {HADOOP_INSTALL} / bin
13.4 go to the hadoop directory: cd hadoop, and type: source / etc/profile to make the file you just configured take effect
14. For the configuration of Hadoop configuration files, because hadoop1.x and hadoop2.x use different resource management, yarn is added to hadoop2.x to manage hadoop resources. At the same time, the file directories hadoop1.x and hadoop2.x of hadoop are quite different. Hadoop2.7.0 is used in this tutorial. I would like to say one more sentence here. Many people suggest that for beginners, it is best to use a version like hadoop0.2.0 to learn. I suggest that this is not necessary, because hadoop is constantly developing. To put it bluntly, our purpose of learning hadoop is to use it at work in the future. Now companies generally upgrade their hadoop clusters to a stable version of hadoop2.x, and hadoop0.2.0 is very different from the current hadoop version. For the understanding of hadoop learning has some help, but there is no need to start from scratch to learn, you can learn hadoop2.x directly, but also can not spend too much effort to learn, and hadoop2.x books, the previous version will be introduced, and information is also more.
14.1 go to the hadoop configuration file (this is the file organization for the hadoop2.7.0 version):
Cd / hadoop/etc/hadoop
Using: ls, you can see a lot of configuration information. First, let's configure core-site.xml.
14.2 configure core-site.xml of hadoop: vim core-site.xml
Add at the tail:
Fs.defaultFS
Hdfs://192.168.12.32:9000 / / be sure to use the host's ip, which is needed in the eclipse configuration. This will be discussed in the eclipse link hadoop, so let's configure it this way.
Hadoop.tmp.dir
/ home/hadoop/hadoop/temp
In accordance with the above configuration, the above comments must be removed.
14.3 configure hdfs-site.xml:vim hdfs-site.xml for hadoop
Dfs.namenode.secondary.http-address
192.168.12.32:50090
True
Dfs.namenode.name.dir
/ home/hadoop/hadoop/temp/dfs/name
/ / where home is the root directory, the first hadoop is the hadoop user I established, and the second hadoop is the hadoop file name established when installing hadoop. This is the place where temporary files are set. After initializing the file system, we will see a temp directory under our installed hadoop path. You can set this path according to your preferences, but after setting it up, you can find it later.
True
Dfs.datanode.data.dir
/ home/hadoop/hadoop/temp/dfs/data
True
Dfs.replication
2 / / because I am three machines, I can set 2 copies. When two computers are 2, it should be set to 1.
When you are configuring, be sure to remove the comments after /.
14.4 configure mapred-site.xml. If this file does not exist at the beginning, you need to make a copy from the template first:
Cp mapred-site.xml.template mapred-site.xml
If this file exists, you don't have to use this operation.
Then configure mapred-site.xml: vim mapred-site.xml
Mapreduce.framework.name
Yarn
14.5 configure yarn-site.xml: vim yarn-site.xml
Yarn.resourcemanager.hostname
Master / / this is the hostname
Yarn.nodemanager.aux-services
Mapreduce_shuffle
14.6 configuration slaves file: vim slaves
Slaver1
Slaver2
15. Configure the hadoop-env.sh file, in which there is a directory to import java, but comment it out with #, remove the last #, and then put the jdk8 directory you just installed at the end: export JAVA_HOME=/home/hadoop/Java/jdk8
16. Copy hadoop to Slaver1 and Slaver2:
Scp-r. / hadoop Slaver1:~
Scp-r. / hadoop Slaver2:~
17. Because of the configuration in step 13, we add the bin of hadoop to the shell command, so you can use the hadoop command in the current directory:
17.1 format file system: hadoop namenode-formate
The configuration of multiple lines will be displayed. If you see words such as sucessfull, the formatting is successful.
17.2 start hdfs:start-dfs.sh
17.3 start yarn:start-yarn.sh
17.4 check to see if the startup is successful: jps, if four lines of messages appear on the Master:
5399 Jps
5121 ResourceManager
3975 SecondaryNameNode
4752 NameNode
It indicates that the startup is successful, and the order of display can be different from the number on the left.
If three lines of prompt appear on Slaver:
4645 Jps
4418 DataNode
4531 NodeManager
It indicates success; if datanode cannot be started on Slaver, it may be because it has been configured with a pseudo-distributed hadoop cluster before. You can try to delete the temp folder just configured and reformat it: hadoop namenode-format, starting, should be ready for use.
At this point, we can use hadoop's shell command to operate:
Hadoop dfs-ls
This may give an error message 'ls''.: no file or directiory
At this time, we can try: hadoop dfs-ls /, there will be no mistakes.
Create a directory:
Hadoop dfs-mkdir / user
Hadoop dfs-mkdir / user/hadoop
Hadoop dfs-mkdir / user/hadoop
Hadoop dfs-mkdir / user/hadoop/input
Hadoop dfs-mkdir / user/hadoop/output
When you're done, you'll have a three-level directory, and then create a text to run the wordcount program:
Vim a.txt, write something casually in it:
Hadoop
Hadoop
Aaa
Aaa
Spark
Spark
Then upload this to hdfs: hadoop dfs-copyFromLocal a.txt / user/hadoop/input/a.txt
Run the wordcount program:
Hadoop jar hadoop/share/hadoop/mapreduce/hadop-mapreduce-examples-2.7.0.jar wordcount/ user/hadoop/input/a.txt / user/hadoop/output/wordcount/
Look at the result of the run: hadoop dfs-cat / user/hadoop/output/wordcount/part-r-00000, and you can see the word statistics.
18, static ip settings: there are a lot of static ip settings on the Internet, I found one, according to the above tutorial method, the results of the Master host on the upper right corner of the Internet icon to get no ah, and some can not be on the Internet ah, spent an afternoon, finally figured it out. To set up a permanent static ip, you need three steps to configure three files.
18.1 when you set static ip:sudo vim / etc/network/interfaces, you will have the following information:
Auto lo
Iface lo inet loopback
Comment out: # ifacelo inet loopback
Add:
Auto eth0
Iface eth0 inet static
Address 192.168.12.32 ip address
Netmask 255.255.255.0 subnet mask
Gateway 192.168.12.1 Gateway
Network 192.168.0.0
Broadcast 192.168.11.255 broadcast
Then save and exit.
18.2 configure DNS server, there are many configurations on this network, but when we restart Ubuntu, we return to the default configuration, which makes it impossible to access the Internet normally, so we need to set up a permanent DNS server:
Sudo vim / etc/resolvconf/resolv.conf/base
Add: nameserver 202.96.128.86 DNS server (this is my DNS address). If you don't know your DNS service, you can check it from the network in windows. There are many ways to put it on the Internet, so I won't say much here.
Then save and exit.
18.3 configure the Networkmanager.conf file
Sudo vim / etc/NetworkManager/NetworkManager.conf
The managed=false inside. If it is false, you need to change it to true. Otherwise, save and exit without modification.
18.4 must be to restart the machine, if you can't access the Internet when you start the machine, you need to see whether the file of 18.3 is false, you need to change it, and then restart the machine to surf the Internet.
19. The above is about the cluster of the formal environment, which has been successfully built; then we usually do not develop on Master. We need to develop in eclipse under windows and test it on eclipse. Therefore, what we need to do next is to build the development environment of hadoop under windows7+eclipse.
Download the hadoop component of eclipse, because after hadoop2.x, Apache will no longer provide the hadoop component of eclipse, only the source code, we need to build it ourselves. I am downloading a hadoop component of eclipse from the Internet.
If the hadoop version 2.7.0 is used, the hadoop component using hadoop2.6.0 can be used, my hadoop version is 2.7.0, and the hadoop component of eclipse is hadoop2.6.0
Download component address:
19.2. I choose the version of eclipse. I use spring-tool, which integrates eclipse. The reason for this is that I used this version when I was an intern, and I saw from the Internet that many people who directly use eclipse need to choose the version, and there are many errors.
Download address: http://spring.io/tools, click the place below to download, download and decompress directly to a disk can be used.
Enter the sts-3.7.1RELEASE and you can open it with a STL.exe.
19.3. Put the hadoop component 2.6.0 of eclipse just downloaded into the plugins in the image above, restart STL, and then click Window- > Preferences, you will see
Then click Window- > Perspective- > Open Perspectiver- > other, and you will see the following in the upper left corner:
Directly below, you will see:
Right-click the Location in the figure above, and these three options will appear. Click New Hadoop Location, and the following message will pop up:
The above Location name: you can fill in any name you want. My name is: myhadoop.
Host:192.168.12.32, which is the ip address of the Master where we installed the cluster, instead of filling in Master directly, because there will be a listing folder content error when filling in Master and when linking to hdfs, and when we configure the core-site.xml file, we recommend that we use the ip address instead of Master, which is intended to be in this place.
Port:9001, this is the same as the configuration of core-site.xml that we set up.
In DFS Master, Prot:9000
Then save and exit.
19.4. Then click on the DFS Locations of this
Will show up.
The name is the same as the one we set just now.
Then click myhadoop, and it will appear:
This is the hadoop directory I created. At this point, the installation of the entire hadoop and the remote development of hadoop under windows have been basically introduced. But to really realize the development of hadoop, we still need to learn how to install maven in eclipse. Let's write this later.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.