In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to build a CentOS-based Hadoop distributed environment". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to build a CentOS-based Hadoop distributed environment".
Here are a few things you need to know when building a hadoop environment:
1.hadoop runs on linux, and you need to install the linux operating system
two。 You need to build a cluster that runs hadoop, such as a linux system that can access each other in the local area network.
3. In order to access each other between clusters, you need to log in without a key for ssh
4.hadoop runs on jvm, that is, you need to install jdk for java and configure java_home
The components of 5.hadoop are configured through xml. After downloading hadoop on the official website, unzip it and modify the corresponding configuration file in the / etc/hadoop directory
If you want to do good work, you must first sharpen its tools. I would also like to talk about the relevant software and tools used in building a hadoop environment:
1. Virtualbox linux-after all, if you want to simulate several virtual machines, the conditions are limited, so create a few virtual computer buildings in virtualbox.
2. CentosMusure-download the iso image of centos7, load it into virtualbox, install and run it.
3. Recording linux-Software that can be remotely accessed by ssh
4. WinSCP windows-realize the communication between WinSCP and linux
Download it from 5.jdk for linux--oracle 's official website, decompress it and configure it.
6.hadoop2.7.1muri-can be downloaded from apache.com
All right, let's explain it in three steps.
Linux environment preparation
Configure ip
In order to realize the communication between the native machine and the virtual machine and between the virtual machine and the virtual machine, set the connection mode of centos to host-only mode in virtualbox, and set ip manually. Note that the gateway of the virtual machine is the same as the ip address of the host-only network in the machine. After configuring the ip, restart the network service to make the configuration valid. Three linux sets have been built here, as shown in the following figure
Configure the host name
Set the host name hadoop01 for 192.168.56.101. Configure the ip and hostname of the cluster in the hosts file. The operation of the other two hosts is similar.
[root@hadoop01 ~] # cat / etc/sysconfig/network # created by anaconda networking = yes hostname = hadoop01 [root@hadoop01 ~] # cat / etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4:: 1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.56.101 hadoop01 192.168.56.102 hadoop02 192.168.56.103 hadoop03
Permanently turn off the firewall
Service iptables stop (1. The next time you restart the machine, the firewall will start again, so you need the command to shut down the firewall permanently. 2 because you are using centos 7, the command to turn off the firewall is as follows)
Systemctl stop firewalld.service # stop firewallsystemctl disable firewalld.service # disable firewall boot
Shut down the selinux protection system
Change to disabled. Reboot restarts the machine to make the configuration effective
[root@hadoop02 ~] # cat / etc/sysconfig/selinux # this file controls the state of selinux on the system # selinux= can take one of these three values: # enforcing-selinux security policy is enforced # permissive-selinux prints warnings instead of enforcing # disabled-no selinux policy is loaded selinux=disabled # selinuxtype= can take one of three two values: # targeted-targeted processes are protected, # minimum-modification of targeted policy only selected processes are protected # mls-multi level security protection selinuxtype=targeted
Cluster ssh password-free login
First set the ssh key
Ssh-keygen-t rsa
Copy the ssh key to three machines
Ssh-copy-id 192.168.56.101 ssh-copy-id 192.168.56.102ssh-copy-id 192.168.56.103
So if hadoop01's machine wants to log in to hadoop02, type ssh hadoop02 directly.
Ssh hadoop02
Configure jdk
Here in the three folders of / home loyalty creation
Tools-- storage kit
Softwares-- storage software
Data-- stores data
Upload the downloaded linux jdk to / home/tools in hadoop01 via winscp
Extract jdk into softwares
Tar-zxf jdk-7u76-linux-x64.tar.gz-c / home/softwares
Visible jdk's home directory is in / home/softwares/jdk.x.x.x, copy and paste this directory into the / etc/profile file, and set java_home in the file
Export java_home=/home/softwares/jdk0_111 export path=$path:$java_home/bin
Save the changes and execute source / etc/profile to make the configuration effective
Check to see if java jdk is installed successfully:
Java-version
You can copy files set in the current node to other nodes
Scp-r / home/* root@192.168.56.10x:/home
Hadoop cluster installation
The planning of the cluster is as follows:
The 101node serves as the namenode of hdfs, the rest as datanode;102, the resourcemanager of yarn, and the rest as nodemanager. 103as secondarynamenode. Start jobhistoryserver and webappproxyserver at 101and 102nodes respectively
Download hadoop-2.7.3
And put it in the / home/softwares folder. Since hadoop requires the installation environment of jdk, configure the java_home of / etc/hadoop/hadoop-env.sh first
(ps: I feel that the jdk version I am using is too high)
Next, modify the xml corresponding to the corresponding components of hadoop in turn.
Modify core-site.xml:
Specify namenode address
Modify the cache directory of hadoop
Garbage collection mechanism of hadoop
Fsdefaultfs hdfs://101:8020 hadooptmpdir / home/softwares/hadoop-3/data/tmp fstrashinterval 10080
Hdfs-site.xml
Set the number of backups
Turn off permissions
Set up the http provider
Set the ip address of secondary namenode
Dfsreplication 3 dfspermissionsenabled false dfsnamenodehttp-address 101:50070 dfsnamenodesecondaryhttp-address 103:50090
Change the name of mapred-site.xml.template to mapred-site.xml
Specifies that the framework of mapreduce is yarn, which is scheduled through yarn
Specify jobhitory
Specify the web port of the jobhitory
Turn on uber mode-- this is an optimization for mapreduce
Mapreduceframeworkname yarn mapreducejobhistoryaddress 101:10020 mapreducejobhistorywebappaddress 101:19888 mapreducejobubertaskenable true
Modify yarn-site.xml
Specify mapreduce as shuffle
Specify 102nodes as resourcemanager
Specify a security agent for 102 nodes
Open the log of yarn
Specify yarn log deletion time
Specify the memory of nodemanager: 8g
Specify the cpu:8 core of the nodemanager
Yarnnodemanageraux-services mapreduce_shuffle yarnresourcemanagerhostname 102 yarnweb-proxyaddress 102:8888 yarnlog-aggregation-enable true yarnlog-aggregationretain-seconds 604800 yarnnodemanagerresourcememory-mb 8192 yarnnodemanagerresourcecpu-vcores 8
Configure slaves
Specify the compute node, that is, the node running datanode and nodemanager
192.168.56.101
192.168.56.102
192.168.56.103
First perform the formatting on the namenode node, that is, on the 101node:
Go to the hadoop home directory: cd / home/softwares/hadoop-3
Execute the hadoop script in the bin directory: bin/hadoop namenode-format
Successful format is considered a successful execution only when it appears (ps, here is stealing other people's pictures, never mind)
After the above configuration is complete, copy it to another machine
Hadoop environment testing
Go to the hadoop home directory to execute the corresponding script file
The jps command-- java virtual machine process status, which displays the running java process
Open hdfs on the namenode Node 101machine
[root@hadoop01 hadoop-3] # sbin/start-dfssh java hotspot (tm) client vm warning: you have loaded library / home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack-c', or link it with'- z noexecstack' 16-11-07 16:49:19 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable starting namenodes on [hadoop01] hadoop01: starting namenode Logging to / home/softwares/hadoop-3/logs/hadoop-root-namenode-hadoopout 102: starting datanode, logging to / home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 103: starting datanode, logging to / home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 101: starting datanode, logging to / home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout starting secondary namenodes [hadoop03] hadoop03: starting secondarynamenode, logging to / home/softwares/hadoop-3/logs/hadoop-root-secondarynamenode-hadoopout
When jps is executed on the 101st node, you can see that namenode and datanode have been started.
[root@hadoop01 hadoop-3] # jps 7826 jps 7270 datanode 7052 namenode
If you execute jps on 102and103nodes, you can see that datanode has been started
[root@hadoop02 bin] # jps 4260 datanode 4488 jps [root@hadoop03 ~] # jps 6436 secondarynamenode 6750 jps 6191 datanode
Start yarn
Execute at 102 nodes
[root@hadoop02 hadoop-3] # sbin/start-yarnsh starting yarn daemons starting resourcemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-resourcemanager-hadoopout 101: starting nodemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 103: starting nodemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 102: starting nodemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout
Jps looks at each node:
[root@hadoop02 hadoop-3] # jps 4641 resourcemanager 4260 datanode 4765 nodemanager 5165 jps [root@hadoop01 hadoop-3] # jps 7270 datanode 8375 jps 7976 nodemanager 7052 namenode [root@hadoop03 ~] # jps 6915 nodemanager 6436 secondarynamenode 7287 jps 6191 datanode
Start the jobhistory and protection processes of the corresponding nodes respectively
[root@hadoop01 hadoop-3] # sbin/mr-jobhistory-daemonsh start historyserver starting historyserver, logging to / home/softwares/hadoop-3/logs/mapred-root-historyserver-hadoopout [root@hadoop01 hadoop-3] # jps 8624 jps 7270 datanode 7976 nodemanager 8553 jobhistoryserver 7052 namenode [root@hadoop02 hadoop-3] # sbin/yarn-daemonsh start proxyserver starting proxyserver, logging to / home/softwares/hadoop-3/logs/yarn-root-proxyserver-hadoopout [root@hadoop02 hadoop-3] # jps 4641 resourcemanager 4260 datanode 5367 webappproxyserver 5402 jps 4765 nodemanager
Check the status of the node through the browser on the hadoop01 node
Hdfs uploads files
[root@hadoop01 hadoop-3] # bin/hdfs dfs-put / etc/profile / profile
Run the wordcount program
[root@hadoop01 hadoop-3] # bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-jar wordcount / profile / fll_out java hotspot (tm) client vm warning: you have loaded library / home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack-c' Or link it with'- z noexecstack' 16-11-07 17:17:10 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable 16-11-07 17:17:12 info clientrmproxy: connecting to resourcemanager at / 102 warn utilnativecodeloader 8032 16-11-07 17:17:18 info inputfileinputformat: total input paths to process: 1 16-11-07 17:17:19 info mapreducejobsubmitter: number of splits:1 16-11-07 17:17:19 info mapreducejobsubmitter: submitting tokens for job: job_1478509135878_ 0001 16-11-07 17:17:20 info implyarnclientimpl: submitted application application_1478509135878_0001 16-11-07 17:17:20 info mapreducejob: the url to track the job: http://102:8888/proxy/application_1478509135878_0001/ 16-11-07 17:17:20 info mapreducejob: running job: job_1478509135878_0001 16-11-07 17:18:34 info mapreducejob: job job_1478509135878_0001 running in uber mode: true 16-11-07 17:18:35 info mapreducejob: map 0 reduce 16-11-07 17:18:43 info mapreducejob: map reduce 0 0 16-11-07 17:18:50 info mapreducejob: map 100 reduce 16-11-07 17:18:55 info mapreducejob: job job_1478509135878_0001 completed successfully 16-11-07 17:18:59 info mapreducejob: counters: 52 file system counters file: number of bytes read=4264 file: number of bytes written=6412 file: number of read operations=0 file: number of Large read operations=0 file: number of write operations=0 hdfs: number of bytes read=3940 hdfs: number of bytes written=261673 hdfs: number of read operations=35 hdfs: number of large read operations=0 hdfs: number of write operations=8 job counters launched map tasks=1 launched reduce tasks=1 other local map tasks=1 total time spent by all maps in occupied slots (ms) = 8246 total time spent by all reduces in occupied slots (ms) = 7538 total_launched_ubertasks=2 num_uber_submaps=1 num_uber_subreduces=1 total time spent by all map tasks (ms) = 8246 total time spent by all reduce tasks (ms) = 7538 total vcore-milliseconds taken by all map tasks=8246 total vcore-milliseconds taken by all reduce tasks=7538 total megabyte-milliseconds taken by all map tasks=8443904 total megabyte-milliseconds taken by all reduce tasks=7718912 map-reduce framework Map input records=78 map output records=256 map output bytes=2605 map output materialized bytes=2116 input split bytes=99 combine input records=256 combine output records=156 reduce input groups=156 reduce shuffle bytes=2116 reduce input records=156 reduce output records=156 spilled records=312 shuffled maps = 1 failed shuffles=0 merged map outputs=1 gc time elapsed (ms) = 870 cpu time Spent (ms) = 1970 physical memory (bytes) snapshot=243326976 virtual memory (bytes) snapshot=2666557440 total committed heap usage (bytes) = 256876544 shuffle errors bad_id=0 connection=0 io_error=0 wrong_length=0 wrong_map=0 wrong_reduce=0 file input format counters bytes read=1829 file output format counters bytes written=1487
Check the running status through yarn in the browser
Check the final word frequency statistics.
View hdfs's file system in a browser
[root@hadoop01 hadoop-3] # bin/hdfs dfs-cat / fll_out/part-r-00000 java hotspot (tm) client vm warning: you have loaded library / home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack-c' Or link it with'- z noexecstack' 16-11-07 17:29:17 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable! = 1 "$-" 1 "$2" 1 "$euid" 2 "$histcontrol" 1 "$I" 3 "${- # * I}" 1 "0" 1 ": ${path}:" 1 "`id 2" after "1" ignorespace "1 # 13$ uid 1 & 1 () 1 *) 1 *: "$1": *) 1-f 1-gn` "1-gt 1-r 1-ru`1-u` 1-un`" 2-x 1-z 1 2 / etc/bashrc 1 / etc/profile 1 / etc/profiled/ 1 / etc/profiled/*sh 1 / usr/bin/id 1 / usr/local/sbin 2 / usr/sbin 2 / usr/share/doc/setup-*/uidgid 1 002 1022 1 199 1 200 1 2 > / dev/ null`1 3 1 = 4 > / dev/null 1 by 1 current 1 euid= `id 1 functions 1 histcontrol 1 histcontrol=ignoreboth 1 histcontrol=ignoredups 1 histsize 1 histsize=1000 1 hostname 1 hostname= `/ usr/bin/hostname 1 it's 2 java_home=/home/softwares/jdk0_111 1 logname 1 logname=$user 1 mail 1 mail= "/ var/spool/mail/$user" 1 not 1 path 1 path=$1:$path 1 path=$path:$1 1 path=$path:$java_home/bin 1 path 1 system 1 this 1 uid= `id 1 user 1 user= "`id 1 you 1 [9] 3] 6 a 2 after 2 aliases 1 and 2 are 1 as 1 better 1 case 1 change 1 changes 1 check 1 could 1 create 1 custom 1 customsh 1 default, 1 do 1 doing 1 done 1 else 5 environment 1 environment, 1 esac 1 export 5 fi 8 file 2 for 5 future 1 get 1 go 1 good 1 i 2 idea 1 if 8 in 6 is 1 it 1 know 1 ksh 1 login 2 make 1 manipulation 1 merging 1 much 1 need 1 pathmunge 6 prevent 1 programs 1 reservation 1 reserved 1 script 1 set 1 sets 1 setup 1 shell 2 startup 1 system 1 the 1 then 8 this 2 threshold 1 to 5 uid/gids 1 uidgid 1 umask 3 unless 1 unset 2 updates 1 validity 1 want 1 we 1 what 1 wide 1 will 1 workaround 1 you 2 your 1 {1} 1 Thank you for reading The above is the content of "how to build a CentOS-based Hadoop distributed environment". After the study of this article, I believe you have a deeper understanding of how to build a CentOS-based Hadoop distributed environment. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.