Hadoop&spark installation (part 1) 04/27 Update SLTechnology News&Howtos

Hadoop&spark installation (part 1)

2025-04-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Hardware environment:

Hddcluster1 10.0.0.197 redhat7

Hddcluster2 10.0.0.228 centos7 this one as master

Hddcluster3 10.0.0.202 redhat7

Hddcluster4 10.0.0.181 centos7

Software environment:

Turn off all firewall firewall

Openssh-clients

Openssh-server

Java-1.8.0-openjdk

Java-1.8.0-openjdk-devel

Hadoop-2.7.3.tar.gz

Process:

Select a machine as the Master

Configure hadoop users, install SSH server, install Java environment on the Master node

Install Hadoop on the Master node and complete the configuration

Configure hadoop users, install SSH server, install Java environment on other Slave nodes

Copy the / usr/local/hadoop directory on the Master node to another Slave node

Open Hadoop on the Master node

# Node name and corresponding IP relationship [hadoop@hddcluster2 ~] $cat / etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4::1 localhost localhost.localdomain localhost6 localhost6.localdomain610.0.0.228 hddcluster210.0.0.197 hddcluster110.0.0.202 hddcluster310.0.0.181 hddcluster4 create hadoop user su # as mentioned above, log in to useradd-m hadoop-s / bin/ as root user Bash # create a new user hadooppasswd hadoop # set the hadoop password visudo # root ALL= (ALL) ALL below this line add hadoop ALL= (ALL) ALL# login hadoop user Install SSH and configure SSH login [hadoop@hddcluster2 ~] $rpm-qa | grep ssh [hadoop@hddcluster2 ~] $sudo yum install openssh-clients [hadoop@hddcluster2 ~] $sudo yum install openssh-server [hadoop@hddcluster2 ~] $cd ~ / .ssh/ # if you do not have this directory, please execute ssh localhost [hadoop@hddcluster2 ~] $ssh-keygen-t rsa # first will be prompted Press enter to [hadoop@hddcluster2 ~] $ssh-copy-id-I ~ / .ssh/id_rsa.pub localhost # to add authorization [hadoop@hddcluster2 ~] $chmod 600. / authorized_keys # modify file permissions [hadoop@hddcluster2 ~] $ssh-copy-id-I ~ / .ssh/id_rsa.pub hadoop@hddcluster1 [hadoop@hddcluster2 ~] $ssh-copy-id-I ~ / .ssh/id_rsa.pub hadoop@hddcluster3 [hadoop@hddcluster2 ~] $ssh- Copy-id-I ~ / .ssh/id_rsa.pub hadoop@hddcluster4# extract the hadoop file to / usr/local/hadoop [hadoop@hddcluster2 ~] $sudo tar-zxf hadoop-2.7.3.tar.gz-C / usr/local/ [hadoop@hddcluster2 ~] $sudo mv / usr/local/hadoop-2.7.3 / usr/local/hadoop [hadoop@hddcluster2 ~] $sudo chown-R hadoop:hadoop / usr/local/hadoopcd / usr/local/hadoop./bin/hadoop version# install java Environment [hadoop@hddcluster2 ~] $sudo yum install java-1.8.0-openjdk java-1.8.0-openjdk-devel [hadoop@hddcluster2 ~] $rpm-ql java-1.8.0-openjdk-devel | grep'/ bin/javac' / usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/bin/ javac [Hadoop @ hddcluster2 ~] $vim ~ / .bashrcexport JAVA_HOME=/usr/lib / jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64export HADOOP_HOME=/usr/local/hadoopexport HADOOP_INSTALL=$HADOOP_HOMEexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/binexport HADOOP_PREFIX=$HADOOP_HOMEexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_PREFIX/lib: $HADOOP_PREFIX/lib/native "# Test java environment source ~ / .bashrcjava-version$JAVA_HOME/bin/java-version # same as directly executing java-version # modify hadoop file configuration [hadoop@hddcluster2 hadoop] $pwd/usr/local/hadoop/etc/hadoop [hadoop@hddcluster2 hadoop] $cat core-site.xml fs.defaultFS hdfs://hddcluster2:9000 Hadoop.tmp.dir file:/usr/local/hadoop/tmp Abase for other temporary directories. [hadoop@hddcluster2 hadoop] $cat hdfs-site.xml dfs.namenode.secondary.http-address hddcluster2:50090 dfs.replication 3 dfs.namenode.name.dir file:/usr/local/hadoop/tmp/dfs/name Dfs.datanode.data.dir file:/usr/local/hadoop/tmp/dfs/data [hadoop@hddcluster2 hadoop] $[hadoop@hddcluster2 hadoop] $cat mapred-site.xml mapreduce.framework.name yarn mapreduce.jobhistory.address hddcluster2:10020 Mapreduce.jobhistory.webapp.address hddcluster2:19888 [hadoop@hddcluster2 hadoop] $[hadoop@hddcluster2 hadoop] $cat yarn-site.xml yarn.resourcemanager.hostname hddcluster2 yarn.nodemanager.aux-services mapreduce_shuffle [hadoop@hddcluster2 hadoop] $[hadoop@ Hddcluster2 hadoop] $cat slaves hddcluster1hddcluster2hddcluster3hddcluster4 $cd / usr/local$sudo rm-r. / hadoop/tmp # Delete the temporary Hadoop file $sudo rm-r. / hadoop/logs/* # Delete the log file $tar-zcf ~ / hadoop.master.tar.gz. / hadoop # compress and then copy $cd ~ $scp. / hadoop.master.tar.gz hddcluster1:/home/hadoop$scp. / hadoop.master.tar.gz hddcluster3:/home/hadoop$scp. / hadoop.master.tar.gz hddcluster4:/home/hadoop operates on the salve node Install the software environment and configure .bashrcsudo tar-zxf ~ / hadoop.master.tar.gz-C / usr/localsudo chown-R hadoop / usr/local/hadoop [hadoop@hddcluster2 ~] $hdfs namenode-format # initialization is required for the first run, and then you don't need to start hadoop Startup requires a startup command on the Master node: $start-dfs.sh$start-yarn.sh$mr-jobhistory-daemon.sh start historyserver can view the processes started by each node through the command jps. If correct, you can see the NameNode, ResourceManager, SecondrryNameNode and JobHistoryServer processes on the Master node. In addition, you need to check whether the DataNode is started properly by hdfs dfsadmin-report on the Master node. If the Live datanodes is not 0, the cluster starts successfully. [hadoop@hddcluster2] $hdfs dfsadmin-reportConfigured Capacity: 2125104381952 (1.93 TB) Present Capacity: 1975826509824 (1.80 TB) DFS Remaining: 1975824982016 (1.80 TB) DFS Used: 1527808 (1.46 MB) DFS Used%: 0.00%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0Missing blocks (with replication factor 1): 0murf- -Live datanodes (4): you can also view the status of DataNode and NameNode through the Web page: http://hddcluster2:50070/. If it is not successful, you can troubleshoot the cause by starting the log. In the Slave node operation, you can see the DataNode and NodeManager processes testing the hadoop distributed instance. First, create a user directory on the HDFS: hdfs dfs-mkdir-p / user/hadoop copies the configuration file in / usr/local/hadoop/etc/hadoop to the distributed file system as an input file: hdfs dfs-mkdir inputhdfs dfs-put / usr/local/hadoop/etc/hadoop/*.xml input by viewing the status of DataNode (occupancy size has changed) The input file is indeed copied into DataNode. Then you can run the MapReduce job: hadoop jar / usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar grep input output 'dfs [amurz.] +' wait for the output after execution: hadoop start command: start-dfs.shstart-yarn.shmr-jobhistory-daemon.sh start historyserverhadoop close command: stop-dfs.shstop-yarn.shmr-jobhistory-daemon.sh stop historyserver

PS: if one or two of the cluster cannot be started, try deleting the temporary hadoop file first.

Cd / usr/local

Sudo rm-r. / hadoop/tmp

Sudo rm-r. / hadoop/logs/*

And then execute

Hdfs namenode-format

Restart

This article refers to the website and the experiment is successful:

Http://www.powerxing.com/install-hadoop-cluster/

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.