In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Purpose
This document describes how to install a single-node hadoop cluster so that you can understand and use hadoop's HDFS and MapReduce.
Environment:
Os: CentOS release 6.5 (Final)
Ip: 172.16.101.58
User:root
Hadoop-2.9.0.tar.gz
SSH password-less login configuration
Because this document is installed by root users, you need to configure the root user ssh to log in to the local node without a password
[root@sht-sgmhadoopdn-01] # ssh-keygen-t rsa
[root@sht-sgmhadoopdn-01 .ssh] # cat ~ / .ssh/id_dsa.pub > > ~ / .ssh/authorized_keys
[root@sht-sgmhadoopdn-01 ~] # ssh localhost
Java installation and configuration
[root@sht-sgmhadoopdn-01 ~] # cd / usr/java
[root@sht-sgmhadoopdn-01 java] # tar xf jdk-8u111-linux-x64.tar.gz
[root@sht-sgmhadoopdn-01 java] # chown-R root:root jdk1.8.0_111/
[root@sht-sgmhadoopdn-01 bin] # / usr/java/jdk1.8.0_111/bin/java-version
Java version "1.8.0,111"
[root@sht-sgmhadoopdn-01] # vim ~ / .bash_profile
Export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
Export JAVA_HOME=/usr/java/jdk1.8.0_111
Export PATH=$JAVA_HOME/bin:$PATH:$HOME/bin
Export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
Export LD_LIBRARY_PATH=/home/bduser/hadoop/hadoop-2.7.3/lib/native/:$LD_LIBRARY_PATH
[root@sht-sgmhadoopdn-01 ~] # source .bash _ profile
[root@sht-sgmhadoopdn-01 ~] # which java
/ usr/java/jdk1.8.0_111/bin/java
Download and extract hadoop
[root@sht-sgmhadoopdn-01 local] # wget http://www-us.apache.org/dist/hadoop/common/hadoop-2.9.0/hadoop-2.9.0.tar.gz
[root@sht-sgmhadoopdn-01 local] # tar xf hadoop-2.9.0.tar.gz
[root@sht-sgmhadoopdn-01 ~] # vim .bash _ profile
Export HADOOP_HOME=/usr/local/hadoop-2.9.0
Export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH
[root@sht-sgmhadoopdn-01 ~] # source .bash _ profile
[root@sht-sgmhadoopdn-01 ~] # which hadoop
/ usr/local/hadoop-2.9.0/bin/hadoop
[root@sht-sgmhadoopdn-01 local] # hadoop version
Hadoop 2.9.0
.
Hadoop jar command parsing
Jar run a jar file, if it is yarn, you need to use hadoop yarn jar
Take all the files in the input folder as input, filter the words that conform to the regular expression dfs [a Mel z.] + and count the number of occurrences, and finally output the results to the output folder:
Regular expression:
[amurz] means to match any character contained in amurz.
+ indicates that the item before matching is one or more times
[root@sht-sgmhadoopdn-01 ~] # cd / usr/local/hadoop-2.9.0
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # cp etc/hadoop/*.xml input/
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar grep input output 'dfs [a Murray z.] +'
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # cat output/*
1dfsadmin
Hadoop profile description
(1) the operation mode of Hadoop is determined by the configuration file (the configuration file is read when running Hadoop), so if you need to switch from pseudo-distributed mode to non-distributed mode, you need to delete the configuration items in core-site.xml.
(2) although pseudo-distribution only needs to be configured with fs.defaultFS and dfs.replication to run (as is the case with official tutorials), if the hadoop.tmp.dir parameter is not configured, the default temporary directory is / tmp/hadoo-hadoop, and this directory may be cleaned up by the system when rebooted, resulting in the need to re-execute format. So we set it up and also specify dfs.namenode.name.dir and dfs.datanode.data.dir, otherwise we may make an error in the next step
Modify the configuration file
Hadoop can be run in a pseudo-distributed manner on a single node, the Hadoop daemon runs as a separate Java process, the node acts as both NameNode and DataNode, and reads files in HDFS.
The configuration file for Hadoop is located in / usr/local/hadoop/etc/hadoop/, and the pseudo-distribution needs to modify two configuration files, core-site.xml and hdfs-site.xml. The configuration file for Hadoop is in xml format, and each configuration is implemented by declaring name and value for property.
[root@sht-sgmhadoopdn-01 hadoop] # cat / usr/local/hadoop-2.9.0/etc/hadoop/core-site.xml
Hadoop.tmp.dir
/ usr/local/hadoop-2.9.0/tmp
Abase for other temporary directories.
Fs.defaultFS
Hdfs://localhost:9000
[root@sht-sgmhadoopdn-01 hadoop] # cat / usr/local/hadoop-2.9.0/etc/hadoop/hdfs-site.xml
Dfs.replication
one
Dfs.namenode.name.dir
File:/usr/local/hadoop-2.9.0/tmp/dfs/name
Dfs.datanode.data.dir
File:/usr/local/hadoop-2.9.0/tmp/dfs/data
[root@sht-sgmhadoopdn-01 hadoop] # vim / usr/local/hadoop-2.9.0/etc/hadoop/hadoop-env.sh
# export JAVA_HOME=$ {JAVA_HOME}
Export JAVA_HOME=/usr/java/jdk1.8.0_111
Start the hadoop cluster
# formatting NameNode:
[root@sht-sgmhadoopdn-01 hadoop] # hdfs namenode-format
# start the NameNode and DataNode daemons (this step will start three processes, namely namenode,datanode,secondarynamenode)
[root@sht-sgmhadoopdn-01 hadoop] # / usr/local/hadoop-2.9.0/sbin/start-dfs.sh
# check process number and process name through jps command
[root@sht-sgmhadoopdn-01 logs] # jps
12704 DataNode
14273 Jps
12580 NameNode
27988-process information unavailable
13015 SecondaryNameNode
27832-process information unavailable
# you can also stop the daemon through stop-dfs.sh (the next time you start hadoop, you don't need to initialize NameNode, you just need to run start-dfs.sh)
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # / usr/local/hadoop-2.9.0/sbin/stop-dfs.sh
After successfully starting the process, you can access it through a browser to view NameNode and Datanode information, and you can also view files in HDFS online:
NameNode http://172.16.101.58:50070
Run hadoop pseudo-distribution instance MapReduce Job
# create the hdfs directory / user/root/input, and copy the local files to hdfs
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hdfs dfs-mkdir-p / user/root/input
[root@sht-sgmhadoopdn-01 ~] # hdfs dfs-ls
Drwxr-xr-x-root supergroup 0 2017-12-24 15:20 input
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hdfs dfs-put / usr/local/hadoop-2.9.0/etc/hadoop/*.xml / user/root/input
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hdfs dfs-ls / user/root/input
Found 8 items
-rw-r--r-- 1 root supergroup 7861 2017-12-24 15:20 / user/root/input/capacity-scheduler.xml
-rw-r--r-- 1 root supergroup 1040 2017-12-24 15:20 / user/root/input/core-site.xml
-rw-r--r-- 1 root supergroup 10206 2017-12-24 15:20 / user/root/input/hadoop-policy.xml
-rw-r--r-- 1 root supergroup 1091 2017-12-24 15:20 / user/root/input/hdfs-site.xml
-rw-r--r-- 1 root supergroup 620 2017-12-24 15:20 / user/root/input/httpfs-site.xml
-rw-r--r-- 1 root supergroup 3518 2017-12-24 15:20 / user/root/input/kms-acls.xml
-rw-r--r-- 1 root supergroup 5939 2017-12-24 15:20 / user/root/input/kms-site.xml
-rw-r--r-- 1 root supergroup 690 2017-12-24 15:20 / user/root/input/yarn-site.xml
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hadoop jar / usr/local/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar grep input output 'dfs [a Murz] +'
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hdfs dfs-cat output/*
1dfsadmin
# the result file is not overwritten by default, so running the above example again will prompt an error: hdfs://localhost:9000/user/root/output already exists, you need to delete output first.
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hdfs dfs-rm-r / user/root/output
Deleted / user/root/output
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hadoop jar / usr/local/hadoop-2.9.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.0.jar grep input output 'dfs [a Murray z.] +'
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hdfs dfs-cat output/*
1dfsadmin
1dfs.replication
1dfs.namenode.name.dir
1dfs.datanode.data.dir
# you can also copy files from hdfs to local
[root@sht-sgmhadoopdn-01 hadoop-2.9.0] # hdfs dfs-get / user/root/output / usr/local/hadoop-2.9.0/
Run YARN on a single node
(1) the new version of Hadoop uses the new MapReduce framework (MapReduce V2, also known as YARN,Yet Another Resource Negotiator).
(2) YARN is separated from MapReduce and is responsible for resource management and task scheduling. YARN runs on top of MapReduce and provides high availability and scalability
The above starts Hadoop through. / sbin/start-dfs.sh, which only starts the MapReduce environment. We can start YARN and let YARN be responsible for resource management and task scheduling.
(3) if you don't want to start YARN, be sure to rename the configuration file mapred-site.xml to mapred-site.xml.template, and just change it back when you need it. Otherwise, if the configuration file exists and YARN is not turned on, the running program will prompt the error "Retrying connect to server: 0.0.0.0max 0.0.0.0pur8032", which is why the initial file name of the configuration file is mapred-site.xml.template.
(4) however, the main purpose of YARN is to provide better resource management and task scheduling for the cluster, but this does not show value on a single machine, but makes the program run a little slower. Therefore, whether or not to enable YARN on a single machine depends on the actual situation.
[root@sht-sgmhadoopdn-01 hadoop] # mv / usr/local/hadoop-2.9.0/etc/hadoop/mapred-site.xml.template mapred-site.xml
[root@sht-sgmhadoopdn-01 hadoop] # cat mapred-site.xml
Yarn.nodemanager.aux-services
Mapreduce_shuffle
[root@sht-sgmhadoopdn-01 hadoop] # cat yarn-site.xml
Yarn.nodemanager.aux-services
Mapreduce_shuffle
[root@sht-sgmhadoopdn-01 hadoop] # jps
27988-process information unavailable
30341 DataNode
32663 Jps
27832-process information unavailable
30188 NameNode
30525 SecondaryNameNode
# only if you have already started it using the start-dfs.sh script
[root@sht-sgmhadoopdn-01 hadoop] # / usr/local/hadoop-2.9.0/sbin/start-yarn.sh
# there are more ResourceManager and NodeManager processes than using MapReduce
[root@sht-sgmhadoopdn-01 hadoop] # jps
27988-process information unavailable
30341 DataNode
32758 ResourceManager
855 Jps
27832-process information unavailable
411 NodeManager
30188 NameNode
30525 SecondaryNameNode
# you can access it through a browser after startup:
ResourceManager-http://172.16.101.58:8088
Stop the hadoop cluster
[root@sht-sgmhadoopdn-01 hadoop] # / usr/local/hadoop-2.9.0/sbin/stop-yarn.sh
[root@sht-sgmhadoopdn-01 hadoop] # / usr/local/hadoop-2.9.0/sbin/stop-dfs.sh
[root@sht-sgmhadoopdn-01 hadoop] # / usr/local/hadoop-2.9.0/sbin/mr-jobhistory-daemon.sh stop historyserver
No historyserver to stop
Reference link:
Http://www.powerxing.com/install-hadoop/
Http://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-common/SingleCluster.html
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
Learn how to create OU and users in batches on AD
© 2024 shulou.com SLNews company. All rights reserved.