How to build Hadoop distributed Environment based on CentOS 07/19 Update SLTechnology News&Howtos

How to build Hadoop distributed Environment based on CentOS

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to build a CentOS-based Hadoop distributed environment". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to build a CentOS-based Hadoop distributed environment".

Here are a few things you need to know when building a hadoop environment:

1.hadoop runs on linux, and you need to install the linux operating system

two。 You need to build a cluster that runs hadoop, such as a linux system that can access each other in the local area network.

3. In order to access each other between clusters, you need to log in without a key for ssh

4.hadoop runs on jvm, that is, you need to install jdk for java and configure java_home

The components of 5.hadoop are configured through xml. After downloading hadoop on the official website, unzip it and modify the corresponding configuration file in the / etc/hadoop directory

If you want to do good work, you must first sharpen its tools. I would also like to talk about the relevant software and tools used in building a hadoop environment:

1. Virtualbox linux-after all, if you want to simulate several virtual machines, the conditions are limited, so create a few virtual computer buildings in virtualbox.

2. CentosMusure-download the iso image of centos7, load it into virtualbox, install and run it.

3. Recording linux-Software that can be remotely accessed by ssh

4. WinSCP windows-realize the communication between WinSCP and linux

Download it from 5.jdk for linux--oracle 's official website, decompress it and configure it.

6.hadoop2.7.1muri-can be downloaded from apache.com

All right, let's explain it in three steps.

Linux environment preparation

Configure ip

In order to realize the communication between the native machine and the virtual machine and between the virtual machine and the virtual machine, set the connection mode of centos to host-only mode in virtualbox, and set ip manually. Note that the gateway of the virtual machine is the same as the ip address of the host-only network in the machine. After configuring the ip, restart the network service to make the configuration valid. Three linux sets have been built here, as shown in the following figure

Configure the host name

Set the host name hadoop01 for 192.168.56.101. Configure the ip and hostname of the cluster in the hosts file. The operation of the other two hosts is similar.

[root@hadoop01 ~] # cat / etc/sysconfig/network # created by anaconda networking = yes hostname = hadoop01 [root@hadoop01 ~] # cat / etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4:: 1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.56.101 hadoop01 192.168.56.102 hadoop02 192.168.56.103 hadoop03

Permanently turn off the firewall

Service iptables stop (1. The next time you restart the machine, the firewall will start again, so you need the command to shut down the firewall permanently. 2 because you are using centos 7, the command to turn off the firewall is as follows)

Systemctl stop firewalld.service # stop firewallsystemctl disable firewalld.service # disable firewall boot

Shut down the selinux protection system

Change to disabled. Reboot restarts the machine to make the configuration effective

[root@hadoop02 ~] # cat / etc/sysconfig/selinux # this file controls the state of selinux on the system # selinux= can take one of these three values: # enforcing-selinux security policy is enforced # permissive-selinux prints warnings instead of enforcing # disabled-no selinux policy is loaded selinux=disabled # selinuxtype= can take one of three two values: # targeted-targeted processes are protected, # minimum-modification of targeted policy only selected processes are protected # mls-multi level security protection selinuxtype=targeted

Cluster ssh password-free login

First set the ssh key

Ssh-keygen-t rsa

Copy the ssh key to three machines

Ssh-copy-id 192.168.56.101 ssh-copy-id 192.168.56.102ssh-copy-id 192.168.56.103

So if hadoop01's machine wants to log in to hadoop02, type ssh hadoop02 directly.

Ssh hadoop02

Configure jdk

Here in the three folders of / home loyalty creation

Tools-- storage kit

Softwares-- storage software

Data-- stores data

Upload the downloaded linux jdk to / home/tools in hadoop01 via winscp

Extract jdk into softwares

Tar-zxf jdk-7u76-linux-x64.tar.gz-c / home/softwares

Visible jdk's home directory is in / home/softwares/jdk.x.x.x, copy and paste this directory into the / etc/profile file, and set java_home in the file

Export java_home=/home/softwares/jdk0_111 export path=$path:$java_home/bin

Save the changes and execute source / etc/profile to make the configuration effective

Check to see if java jdk is installed successfully:

Java-version

You can copy files set in the current node to other nodes

Scp-r / home/* root@192.168.56.10x:/home

Hadoop cluster installation

The planning of the cluster is as follows:

The 101node serves as the namenode of hdfs, the rest as datanode;102, the resourcemanager of yarn, and the rest as nodemanager. 103as secondarynamenode. Start jobhistoryserver and webappproxyserver at 101and 102nodes respectively

Download hadoop-2.7.3

And put it in the / home/softwares folder. Since hadoop requires the installation environment of jdk, configure the java_home of / etc/hadoop/hadoop-env.sh first

(ps: I feel that the jdk version I am using is too high)

Next, modify the xml corresponding to the corresponding components of hadoop in turn.

Modify core-site.xml:

Specify namenode address

Modify the cache directory of hadoop

Garbage collection mechanism of hadoop

Fsdefaultfs hdfs://101:8020 hadooptmpdir / home/softwares/hadoop-3/data/tmp fstrashinterval 10080

Hdfs-site.xml

Set the number of backups

Turn off permissions

Set up the http provider

Set the ip address of secondary namenode

Dfsreplication 3 dfspermissionsenabled false dfsnamenodehttp-address 101:50070 dfsnamenodesecondaryhttp-address 103:50090

Change the name of mapred-site.xml.template to mapred-site.xml

Specifies that the framework of mapreduce is yarn, which is scheduled through yarn

Specify jobhitory

Specify the web port of the jobhitory

Turn on uber mode-- this is an optimization for mapreduce

Mapreduceframeworkname yarn mapreducejobhistoryaddress 101:10020 mapreducejobhistorywebappaddress 101:19888 mapreducejobubertaskenable true

Modify yarn-site.xml

Specify mapreduce as shuffle

Specify 102nodes as resourcemanager

Specify a security agent for 102 nodes

Open the log of yarn

Specify yarn log deletion time

Specify the memory of nodemanager: 8g

Specify the cpu:8 core of the nodemanager

Yarnnodemanageraux-services mapreduce_shuffle yarnresourcemanagerhostname 102 yarnweb-proxyaddress 102:8888 yarnlog-aggregation-enable true yarnlog-aggregationretain-seconds 604800 yarnnodemanagerresourcememory-mb 8192 yarnnodemanagerresourcecpu-vcores 8

Configure slaves

Specify the compute node, that is, the node running datanode and nodemanager

192.168.56.101

192.168.56.102

192.168.56.103

First perform the formatting on the namenode node, that is, on the 101node:

Go to the hadoop home directory: cd / home/softwares/hadoop-3

Execute the hadoop script in the bin directory: bin/hadoop namenode-format

Successful format is considered a successful execution only when it appears (ps, here is stealing other people's pictures, never mind)

After the above configuration is complete, copy it to another machine

Hadoop environment testing

Go to the hadoop home directory to execute the corresponding script file

The jps command-- java virtual machine process status, which displays the running java process

Open hdfs on the namenode Node 101machine

[root@hadoop01 hadoop-3] # sbin/start-dfssh java hotspot (tm) client vm warning: you have loaded library / home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack-c', or link it with'- z noexecstack' 16-11-07 16:49:19 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable starting namenodes on [hadoop01] hadoop01: starting namenode Logging to / home/softwares/hadoop-3/logs/hadoop-root-namenode-hadoopout 102: starting datanode, logging to / home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 103: starting datanode, logging to / home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout 101: starting datanode, logging to / home/softwares/hadoop-3/logs/hadoop-root-datanode-hadoopout starting secondary namenodes [hadoop03] hadoop03: starting secondarynamenode, logging to / home/softwares/hadoop-3/logs/hadoop-root-secondarynamenode-hadoopout

When jps is executed on the 101st node, you can see that namenode and datanode have been started.

[root@hadoop01 hadoop-3] # jps 7826 jps 7270 datanode 7052 namenode

If you execute jps on 102and103nodes, you can see that datanode has been started

[root@hadoop02 bin] # jps 4260 datanode 4488 jps [root@hadoop03 ~] # jps 6436 secondarynamenode 6750 jps 6191 datanode

Start yarn

Execute at 102 nodes

[root@hadoop02 hadoop-3] # sbin/start-yarnsh starting yarn daemons starting resourcemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-resourcemanager-hadoopout 101: starting nodemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 103: starting nodemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout 102: starting nodemanager, logging to / home/softwares/hadoop-3/logs/yarn-root-nodemanager-hadoopout

Jps looks at each node:

[root@hadoop02 hadoop-3] # jps 4641 resourcemanager 4260 datanode 4765 nodemanager 5165 jps [root@hadoop01 hadoop-3] # jps 7270 datanode 8375 jps 7976 nodemanager 7052 namenode [root@hadoop03 ~] # jps 6915 nodemanager 6436 secondarynamenode 7287 jps 6191 datanode

Start the jobhistory and protection processes of the corresponding nodes respectively

[root@hadoop01 hadoop-3] # sbin/mr-jobhistory-daemonsh start historyserver starting historyserver, logging to / home/softwares/hadoop-3/logs/mapred-root-historyserver-hadoopout [root@hadoop01 hadoop-3] # jps 8624 jps 7270 datanode 7976 nodemanager 8553 jobhistoryserver 7052 namenode [root@hadoop02 hadoop-3] # sbin/yarn-daemonsh start proxyserver starting proxyserver, logging to / home/softwares/hadoop-3/logs/yarn-root-proxyserver-hadoopout [root@hadoop02 hadoop-3] # jps 4641 resourcemanager 4260 datanode 5367 webappproxyserver 5402 jps 4765 nodemanager

Check the status of the node through the browser on the hadoop01 node

Hdfs uploads files

[root@hadoop01 hadoop-3] # bin/hdfs dfs-put / etc/profile / profile

Run the wordcount program

[root@hadoop01 hadoop-3] # bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-jar wordcount / profile / fll_out java hotspot (tm) client vm warning: you have loaded library / home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack-c' Or link it with'- z noexecstack' 16-11-07 17:17:10 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable 16-11-07 17:17:12 info clientrmproxy: connecting to resourcemanager at / 102 warn utilnativecodeloader 8032 16-11-07 17:17:18 info inputfileinputformat: total input paths to process: 1 16-11-07 17:17:19 info mapreducejobsubmitter: number of splits:1 16-11-07 17:17:19 info mapreducejobsubmitter: submitting tokens for job: job_1478509135878_ 0001 16-11-07 17:17:20 info implyarnclientimpl: submitted application application_1478509135878_0001 16-11-07 17:17:20 info mapreducejob: the url to track the job: http://102:8888/proxy/application_1478509135878_0001/ 16-11-07 17:17:20 info mapreducejob: running job: job_1478509135878_0001 16-11-07 17:18:34 info mapreducejob: job job_1478509135878_0001 running in uber mode: true 16-11-07 17:18:35 info mapreducejob: map 0 reduce 16-11-07 17:18:43 info mapreducejob: map reduce 0 0 16-11-07 17:18:50 info mapreducejob: map 100 reduce 16-11-07 17:18:55 info mapreducejob: job job_1478509135878_0001 completed successfully 16-11-07 17:18:59 info mapreducejob: counters: 52 file system counters file: number of bytes read=4264 file: number of bytes written=6412 file: number of read operations=0 file: number of Large read operations=0 file: number of write operations=0 hdfs: number of bytes read=3940 hdfs: number of bytes written=261673 hdfs: number of read operations=35 hdfs: number of large read operations=0 hdfs: number of write operations=8 job counters launched map tasks=1 launched reduce tasks=1 other local map tasks=1 total time spent by all maps in occupied slots (ms) = 8246 total time spent by all reduces in occupied slots (ms) = 7538 total_launched_ubertasks=2 num_uber_submaps=1 num_uber_subreduces=1 total time spent by all map tasks (ms) = 8246 total time spent by all reduce tasks (ms) = 7538 total vcore-milliseconds taken by all map tasks=8246 total vcore-milliseconds taken by all reduce tasks=7538 total megabyte-milliseconds taken by all map tasks=8443904 total megabyte-milliseconds taken by all reduce tasks=7718912 map-reduce framework Map input records=78 map output records=256 map output bytes=2605 map output materialized bytes=2116 input split bytes=99 combine input records=256 combine output records=156 reduce input groups=156 reduce shuffle bytes=2116 reduce input records=156 reduce output records=156 spilled records=312 shuffled maps = 1 failed shuffles=0 merged map outputs=1 gc time elapsed (ms) = 870 cpu time Spent (ms) = 1970 physical memory (bytes) snapshot=243326976 virtual memory (bytes) snapshot=2666557440 total committed heap usage (bytes) = 256876544 shuffle errors bad_id=0 connection=0 io_error=0 wrong_length=0 wrong_map=0 wrong_reduce=0 file input format counters bytes read=1829 file output format counters bytes written=1487

Check the running status through yarn in the browser

Check the final word frequency statistics.

View hdfs's file system in a browser

[root@hadoop01 hadoop-3] # bin/hdfs dfs-cat / fll_out/part-r-00000 java hotspot (tm) client vm warning: you have loaded library / home/softwares/hadoop-3/lib/native/libhadoopso which might have disabled stack guard the vm will try to fix the stack guard now it's highly recommended that you fix the library with 'execstack-c' Or link it with'- z noexecstack' 16-11-07 17:29:17 warn utilnativecodeloader: unable to load native-hadoop library for your platform using builtin-java classes where applicable! = 1 "$-" 1 "$2" 1 "$euid" 2 "$histcontrol" 1 "$I" 3 "${- # * I}" 1 "0" 1 ": ${path}:" 1 "`id 2" after "1" ignorespace "1 # 13$ uid 1 & 1 () 1 *) 1 *: "$1": *) 1-f 1-gn` "1-gt 1-r 1-ru`1-u` 1-un`" 2-x 1-z 1 2 / etc/bashrc 1 / etc/profile 1 / etc/profiled/ 1 / etc/profiled/*sh 1 / usr/bin/id 1 / usr/local/sbin 2 / usr/sbin 2 / usr/share/doc/setup-*/uidgid 1 002 1022 1 199 1 200 1 2 > / dev/ null`1 3 1 = 4 > / dev/null 1 by 1 current 1 euid= `id 1 functions 1 histcontrol 1 histcontrol=ignoreboth 1 histcontrol=ignoredups 1 histsize 1 histsize=1000 1 hostname 1 hostname= `/ usr/bin/hostname 1 it's 2 java_home=/home/softwares/jdk0_111 1 logname 1 logname=$user 1 mail 1 mail= "/ var/spool/mail/$user" 1 not 1 path 1 path=$1:$path 1 path=$path:$1 1 path=$path:$java_home/bin 1 path 1 system 1 this 1 uid= `id 1 user 1 user= "`id 1 you 1 [9] 3] 6 a 2 after 2 aliases 1 and 2 are 1 as 1 better 1 case 1 change 1 changes 1 check 1 could 1 create 1 custom 1 customsh 1 default, 1 do 1 doing 1 done 1 else 5 environment 1 environment, 1 esac 1 export 5 fi 8 file 2 for 5 future 1 get 1 go 1 good 1 i 2 idea 1 if 8 in 6 is 1 it 1 know 1 ksh 1 login 2 make 1 manipulation 1 merging 1 much 1 need 1 pathmunge 6 prevent 1 programs 1 reservation 1 reserved 1 script 1 set 1 sets 1 setup 1 shell 2 startup 1 system 1 the 1 then 8 this 2 threshold 1 to 5 uid/gids 1 uidgid 1 umask 3 unless 1 unset 2 updates 1 validity 1 want 1 we 1 what 1 wide 1 will 1 workaround 1 you 2 your 1 {1} 1 Thank you for reading The above is the content of "how to build a CentOS-based Hadoop distributed environment". After the study of this article, I believe you have a deeper understanding of how to build a CentOS-based Hadoop distributed environment. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.