Apache hadoop 07/03 Update SLTechnology News&Howtos

Apache hadoop

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The apache hadoop environment has not been built for more than two years. Yesterday, the hadoop environment was built again and the process was recorded for future reference.

Host role assignment:

The roles of NameNode and DFSZKFailoverController are assumed by oversea-stable and bus-stable servers, and the software needed to be installed are: JDK and Hadoop2.9.1

The ResourceManager role is assumed by the oversea-stable server; the software that needs to be installed are: JDK, Hadoop2.9.1

The roles of JournalNode, DataNode and NodeManager are undertaken by open-stable, permission-stable and sp-stable servers, and the software needs to be installed are: JDK, Hadoop2.9.1

The QuorumPeerMain role of zookeeper cluster is assumed by open-stable, permission-stable and sp-stable servers; the software that needs to be installed are: JDK, zookeeper3.4.12

1. Environment setting

(1) set the hostname and configure local resolution (the hostname and resolution must be the same, otherwise journalnode cannot start)

[root@oversea-stable ~] # cat / etc/hosts192.168.20.68 oversea-stable192.168.20.67 bus-stable192.168.20.66 open-stable192.168.20.65 permission-stable192.168.20.64 sp-stable [root@oversea-stable ~] #

And synchronize the file to all machines.

(2) synchronization time of each node

(3) synchronize jdk and install jdk on all nodes

(4) configure environment variables

Add the following settings to the / etc/profile file:

Export JAVA_HOME=/usr/java/latestexport HADOOP_HOME=/opt/hadoopexport JRE_HOME=$JAVA_HOME/jreexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH

2. Configure the SSH key and copy it to the local machine (password-free login is also required for ssh local machine)

On all machines, do the following:

(1) create a hadoop user, useradd hadoop

(2) set the password of hadoop user: echo "xxxxxxxx" | passwd-- stdin hadoop

Switch hadoop: su-hadoop on one of the server

And generate the ssh key: ssh-keygen-b 2048-t rsa

Synchronize the key to another server: scp-r .ssh server_name:~/

Each server switches hadoop users to verify whether it can log in to other server without secret.

3. Configure zookeeper

Configure zookeeper cluster on open-stable, permission-stable, and sp-stable server, as follows:

[root@open-stable ~] # chmod otakw / opt [root@open-stable ~] # su-hadoop [hadoop@open-stable ~] $wget http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.12/zookeeper-3.4.12.tar.gz[hadoop@open-stable ~] $tar xfz zookeeper-3.4.12.tar.gz-C / opt [hadoop@open-stable ~] $cd / opt/ [hadoop@open-stable opt] $mv zookeeper {- 3.4.12 } [hadoop@open-stable opt] $cd zookeeper/ [hadoop@open-stable zookeeper] $cp conf/zoo_sample.cfg conf/zoo.cfg [hadoop@open-stable zookeeper] $vim conf/zoo.cfg [hadoop@open-stable zookeeper] $grep-Pv "^ (# | $)" conf/zoo.cfgtickTime=2000initLimit=10syncLimit=5dataDir=/opt/zookeeper/zkdatadataLogDir=/opt/zookeeper/zklogsclientPort=2181server.6=open-stable:2888:3888server.5=permission-stable:2888:3888server.4=sp-stable:2888:3888 [hadoop@open-stable zookeeper] $mkdir Zkdata [hadoop@open-stable zookeeper] $mkdir zklogs [hadoop@open-stable zookeeper] $echo 6 > zkdata/myid [hadoop@open-stable zookeeper] $bin/zkServer.sh start other server configurations are the same [hadoop@open-stable zookeeper] $bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: / opt/zookeeper/bin/../conf/zoo.cfgMode: leader [hadoop@open-stable zookeeper] $[hadoop@permission-stable zookeeper] $bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: / Opt/zookeeper/bin/../conf/zoo.cfgMode: follower [hadoop@permission-stable zookeeper] $[hadoop@sp-stable zookeeper] $bin/zkServer.sh statusZooKeeper JMX enabled by defaultUsing config: / opt/zookeeper/bin/../conf/zoo.cfgMode: follower [hadoop@sp-stable zookeeper] $

4. Configure hadoop

(1) configure hadoop on one of the servers, as follows:

[hadoop@oversea-stable ~] $wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz[hadoop@oversea-stable ~] $tar xfz hadoop-2.9.1.tar.gz-C / opt/ [hadoop@oversea-stable ~] $cd / opt/ [hadoop@oversea-stable opt] $ln-s hadoop-2.9.1 hadoop [hadoop@oversea-stable opt] $cd Hadoop/etc/hadoop [hadoop@oversea-stable hadoop] $grep JAVA_HOME hadoop-env.sh export JAVA_HOME=/usr/java/ late [Hadoop @ oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $tail-14 core-site.xml fs.defaultFS hdfs://inspiryhdfs hadoop.tmp.dir / opt/hadoop/tmp ha.zookeeper.quorum open-stable:2181 Permission-stable:2181,sp-stable:2181 [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $tail-50 hdfs-site.xml dfs.nameservices inspiryhdfs dfs.ha.namenodes.inspiryhdfs nn1 Nn2 dfs.namenode.rpc-address.inspiryhdfs.nn1 oversea-stable:9000 dfs.namenode.http-address.inspiryhdfs.nn1 oversea-stable:50070 dfs.namenode.rpc-address.inspiryhdfs.nn2 bus-stable:9000 dfs.namenode.http-address.inspiryhdfs.nn2 bus-stable:50070 dfs.namenode.shared.edits.dir qjournal://open-stable:8485 Permission-stable:8485 Sp-stable:8485/inspiryhdfs dfs.journalnode.edits.dir / opt/hadoop/journal dfs.ha.automatic-failover.enabled true dfs.client.failover.proxy.provider.inspiryhdfs org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence dfs.ha.fencing.ssh. Private-key-files / home/hadoop/.ssh/id_rsa specifies that MapReduce runs on top of the yarn framework [hadoop@oversea-stable hadoop] $cp mapred-site.xml {.template } [hadoop@oversea-stable hadoop] $tail-6 mapred-site.xml mapreduce.framework.name yarn [hadoop@oversea-stable hadoop] $specify DataNode node [hadoop@oversea-stable hadoop] $cat slaves open-stablepermission-stablesp-stable [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $tail-11 yarn-site.xml yarn.resourcemanager.hostname oversea-stable yarn.nodemanager. Aux-services mapreduce_shuffle [hadoop@oversea-stable hadoop] $

(2) synchronize the configured hadoop to other servers

[hadoop@oversea-stable opt] $rsync-avzoptgl hadoop-2.9.1 bus-stable:/opt/ [hadoop@oversea-stable opt] $rsync-avzoptgl hadoop-2.9.1 open-stable:/opt/ [hadoop@oversea-stable opt] $rsync-avzoptgl hadoop-2.9.1 permission-stable:/opt/ [hadoop@oversea-stable opt] $rsync-avzoptgl hadoop-2.9.1 sp-stable:/opt/

Other servers creates the soft link of hadoop

(3) start journalnode

Sbin/hadoop-daemons.sh start journalnode

Format namenode on oversea-stable and start the main namenode

Hadoop namenode-formatsbin/hadoop-daemon.sh start namenode [hadoop@oversea-stable hadoop] $ls / opt/hadoop/tmp/dfs/name/current/ fsimage_0000000000000000000 seen_txid fsimage_0000000000000000000.md5 VERSION

(4) standby_namenode synchronization data

After formatting namenode on the oversea-stable node and starting namenode, synchronize the namenode information on the bus-stable node to avoid formatting the namenode again (and make sure there is also a / opt/hadoop/tmp directory on the bus-stable). On bus-stable, do the following:

Bin/hdfs namenode-bootstrapStandbysbin/hadoop-daemon.sh start namenode

5. Format zkfs (so that namenode can report native status to zookeeper)

Hdfs zkfc-formatZK

(if formatting fails, check that the zookeeper address specified in core-site.xml is completely correct.)

6. Start hdfs

[hadoop@oversea-stable hadoop] $sbin/start-dfs.sh Starting namenodes on [oversea-stable bus-stable] bus-stable: starting namenode, logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-namenode-bus-stable.outoversea-stable: starting namenode, logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-namenode-oversea-stable.outsp-stable: starting datanode, logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-datanode-sp-stable.outpermission-stable: starting datanode Logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-datanode-permission-stable.outopen-stable: starting datanode, logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-datanode-open-stable.outStarting journal nodes [open-stable permission-stable sp-stable] sp-stable: starting journalnode, logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-journalnode-sp-stable.outopen-stable: starting journalnode Logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-journalnode-open-stable.outpermission-stable: starting journalnode, logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-journalnode-permission-stable.outStarting ZK Failover Controllers on NN hosts [oversea-stable bus-stable] oversea-stable: starting zkfc, logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-zkfc-oversea-stable.outbus-stable: starting zkfc Logging to / opt/hadoop-2.9.1/logs/hadoop-hadoop-zkfc-bus-stable.out [hadoop@oversea-stable hadoop] $

7. Start yarn (if Namenode and ResourceManger are not on the same machine, you cannot start yarn on NameNode, you must start yarn on ResouceManager machine)

8. Verify the roles of each node

[hadoop@oversea-stable hadoop] $sbin/start-yarn.sh starting yarn daemonsstarting resourcemanager, logging to / opt/hadoop-2.9.1/logs/yarn-hadoop-resourcemanager-oversea-stable.outsp-stable: starting nodemanager, logging to / opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-sp-stable.outopen-stable: starting nodemanager, logging to / opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-open-stable.outpermission-stable: starting nodemanager Logging to / opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-permission-stable.out [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable ~] $jps4389 DFSZKFailoverController5077 ResourceManager25061 Jps4023 NameNode [hadoop@oversea-stable ~] $[hadoop@bus-stable ~] $jps9073 Jps29956 NameNode30095 DFSZKFailoverController [hadoop@bus-stable ~] $[hadoop@open-stable ~] $jps2434 DataNode421 QuorumPeerMain2559 JournalNode2847 NodeManager11903 Jps [hadoop@open-stable ~] $[hadoop@permission-stable ~] $jps30489 QuorumPeerMain32505 JournalNode9689 Jps32380 DataNode303 NodeManager [ Hadoop@permission-stable ~] $[hadoop@sp-stable] $jps29955 DataNode30339 NodeManager30072 JournalNode6792 Jps28060 QuorumPeerMain [hadoop@sp-stable ~] $

Enter: http://oversea-stable:50070/ and http://bus-stable:50070/ in the browser

Above, you can see that bus-stable is in active state and oversea-stable is in standby. Next, test the high availability of the following namenode, and whether oversea-stable can automatically switch when bus-stable is hung up.

Kill the NameNode process in bus-stable

[root@bus-stable ~] # jps1614 NameNode2500 Jps1929 DFSZKFailoverController [root@bus-stable ~] # kill-9 1614

Refresh http://bus-stable:50070/ again, unable to access; refresh http://oversea-stable:50070/

At this time, the oversea-stable is already in the active state, which shows that there is no problem with switching, and now the highly available construction of the hadoop cluster has been completed.

Enter: http://oversea-stable:8088 to view the hadoop cluster status, as follows:

9. The application of hadoop

[hadoop@oversea-stable hadoop] $hdfs dfs-ls / Found 2 itemsdrwxr-xr-x-hadoop supergroup 0 2018-06-15 10:32 / data [hadoop@oversea-stable ~] $hdfs dfs-put / tmp/notepad.txt / data/notepad.txt [hadoop@oversea-stable ~] $cd / opt/hadoop [hadoop@oversea-stable hadoop] $ls share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.9.1.jar hadoop-mapreduce- Client-jobclient-2.9.1.jar libhadoop-mapreduce-client-common-2.9.1.jar hadoop-mapreduce-client-jobclient-2.9.1-tests.jar lib-exampleshadoop-mapreduce-client-core-2.9.1.jar hadoop-mapreduce-client-shuffle-2.9.1.jar sourceshadoop-mapreduce-client-hs-2.9.1.jar hadoop-mapreduce-examples-2.9.1.jarhadoop- Mapreduce-client-hs-plugins-2.9.1.jar jdiff [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount / data / out118/06/15 11:04:53 INFO client.RMProxy: Connecting to ResourceManager at oversea-stable/192.168.20.68:803218/06/15 11:04:54 INFO input.FileInputFormat: Total input files to process : 118-06-15 11:04:54 INFO mapreduce.JobSubmitter: number of splits:118/06/15 11:04:54 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead Use yarn.system-metrics-publisher.enabled18/06/15 11:04:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528979206314_000218/06/15 11:04:55 INFO impl.YarnClientImpl: Submitted application application_1528979206314_000218/06/15 11:04:55 INFO mapreduce.Job: The url to track the job: http://oversea-stable:8088/proxy/application_1528979206314_0002/18/06/15 11:04:55 INFO mapreduce.Job: Running job: job_1528979206314_000218/06/ 15 11:05:02 INFO mapreduce.Job: Job job_1528979206314_0002 running in uber mode: false18/06/15 11:05:02 INFO mapreduce.Job: map 0 reduce 0 reduce 11:05:02 15 11:05:08 INFO mapreduce.Job: map 100% reduce 0 INFO mapreduce.Job 15 11:05:14 INFO mapreduce.Job: map 100% reduce 100-06-15 11:05:14 INFO mapreduce.Job: Job job_1528979206314_0002 completed successfully18/06/15 11:05:14 INFO mapreduce.Job: Counters : 49 File System Counters FILE: Number of bytes read=68428 FILE: Number of bytes written=535339 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=88922 HDFS: Number of bytes written=58903 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms) = 3466 Total time spent by all reduces in occupied slots (ms) = 3704 Total time spent by all map tasks (ms) = 3466 Total time spent by all reduce tasks (ms) = 3704 Total vcore-milliseconds taken by all map tasks=3466 Total vcore-milliseconds taken by all reduce tasks=3704 Total megabyte-milliseconds taken by all map tasks=3549184 Total megabyte-milliseconds taken by all reduce tasks=3792896 Map-Reduce Framework Map input records=1770 Map output records=5961 Map output bytes=107433 Map output materialized bytes=68428 Input split bytes=100 Combine input records=5961 Combine output records=2366 Reduce input groups=2366 Reduce shuffle bytes=68428 Reduce input records=2366 Reduce output records=2366 Spilled Records=4732 Shuffled Maps = 1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms) = 145CPU time spent (ms) = 2730 Physical memory (bytes) snapshot=505479168 Virtual memory (bytes) snapshot=4347928576 Total committed heap usage (bytes) = 346554368 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=88822 File Output Format Counters Bytes Written=58903 [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $hdfs dfs-ls / out1/ Found 2 items-rw-r--r-- 3 hadoop supergroup 0 2018-06-15 11:05 / out1/_SUCCESS-rw-r--r-- 3 hadoop supergroup 58903 2018-06-15 11:05 / out1/part-r-00000 [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $hdfs dfs-cat / out1/part-r-00000

The custom map-reduce function runs the task as follows:

[hadoop@oversea-stable hadoop] $hadoop jar / opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar-file / opt/map.py-mapper / opt/map.py-file / opt/reduce.py-reducer / opt/reduce.py-input / data/notepad.txt-output / out218/06/15 14:30:32 WARN streaming.StreamJob:-file option is deprecated, please use generic option-files instead.packageJobJar: [/ opt/map.py, / opt/reduce.py / tmp/hadoop-unjar5706672822735184593/] [] / tmp/streamjob6067385394162603509.jar tmpDir=null18/06/15 14:30:33 INFO client.RMProxy: Connecting to ResourceManager at oversea-stable/192.168.20.68:803218/06/15 14:30:33 INFO client.RMProxy: Connecting to ResourceManager at oversea-stable/192.168.20.68:803218/06/15 14:30:34 INFO mapred.FileInputFormat: Total input files to process: 118-06-15 14:30:34 INFO mapreduce.JobSubmitter: number of splits : 218-06-15 14:30:34 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead Use yarn.system-metrics-publisher.enabled18/06/15 14:30:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529036356241_000418/06/15 14:30:35 INFO impl.YarnClientImpl: Submitted application application_1529036356241_000418/06/15 14:30:35 INFO mapreduce.Job: The url to track the job: http://oversea-stable:8088/proxy/application_1529036356241_0004/18/06/15 14:30:35 INFO mapreduce.Job: Running job: job_1529036356241_000418/06/ 15 14:30:42 INFO mapreduce.Job: Job job_1529036356241_0004 running in uber mode: false18/06/15 14:30:42 INFO mapreduce.Job: map 0 reduce 0 reduce 14:30:42 15 14:30:48 INFO mapreduce.Job: map 100% reduce 0 INFO mapreduce.Job 15 14:30:54 INFO mapreduce.Job: map 100% reduce 100-06-15 14:30:54 INFO mapreduce.Job: Job job_1529036356241_0004 completed successfully18/06/15 14:30:54 INFO mapreduce.Job: Counters : 49 File System Counters FILE: Number of bytes read=107514 FILE: Number of bytes written=823175 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=93092 HDFS: Number of bytes written=58903 HDFS: Number of read operations=9 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=2 Launched reduce tasks=1 Data-local map tasks=2 Total time spent by all maps in occupied slots (ms) = 7194 Total time spent by all reduces in occupied slots (ms) = 3739 Total time spent by all map tasks (ms) = 7194 Total time spent by all reduce tasks (ms) = 3739 Total vcore-milliseconds taken by all map tasks=7194 Total vcore-milliseconds taken by all reduce tasks=3739 Total megabyte-milliseconds taken by all map tasks=7366656 Total megabyte-milliseconds taken by all reduce tasks=3828736 Map-Reduce Framework Map input records=1770 Map output records=5961 Map output bytes=95511 Map output materialized bytes=107520 Input split bytes=174 Combine input records=0 Combine output records=0 Reduce input groups=2366 Reduce shuffle bytes=107520 Reduce input records=5961 Reduce output records=2366 Spilled Records=11922 Shuffled Maps = 2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms) = 292 CPU time spent (ms) = 4340 Physical memory (bytes) snapshot=821985280 Virtual memory (bytes) snapshot=6525067264 Total committed heap usage (bytes) = 548929536 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=92918 File Output Format Counters Bytes Written=5890318/06/15 14:30:54 INFO streaming.StreamJob: Output directory: / out2 [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $hdfs dfs-ls / out2Found 2 items-rw-r--r-- 3 hadoop supergroup 0 2018-06-15 14:30 / out2/_SUCCESS-rw- Rmurmurr hadoop supergroup-3 hadoop supergroup 58903 2018-06-15 14:30 / out2/part-00000 [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $cat / opt/map.py #! / usr/bin/pythonimport sysfor line in sys.stdin: line = line.strip () words = line.split () for word in words: print "% s\ t% s"% (word 1) [hadoop@oversea-stable hadoop] $[hadoop@oversea-stable hadoop] $cat / opt/reduce.py #! / usr/bin/pythonfrom operator import itemgetterimport syscurrent_word = Nonecurrent_count = 0word = Nonefor line in sys.stdin: line = line.strip () word, count = line.split ('\ t' 1) try: count = int (count) except ValueError: continue if current_word = = word: current_count + = count else: if current_word: print "% s\ t% s"% (current_word, current_count) current_count = count current_word = wordif word = = current_word: print "% s\ t% s"% (current_word Current_count) [hadoop@oversea-stable hadoop] $

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.