The sixth of big data's Learning Series-Building Hadoop+Spark Environment 04/09 Update SLTechnology News&Howtos

The sixth of big data's Learning Series-Building Hadoop+Spark Environment

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Introduction

In the previous article, the fifth part of big data's learning series-Hive Integration HBase Picture and text description: Hive is used to integrate HBase in http://www.panchengming.com/2017/12/18/pancm62/, and the test is successful. In the previous big data learning series, one of the Hadoop environment building (stand-alone): http://www.panchengming.com/2017/11/26/pancm55/ successfully built the Hadoop environment, this article is mainly about the Hadoop+Spark environment. Although the stand-alone version is built, it is quite easy to change to the cluster version, which will be written about Hadoop+Spark+HBase+Hive+Zookeeper and other clusters in the future.

1. Environment selection 1, server selection

Local virtual machine

Operating system: linux CentOS 7

Cpu:2 kernel

Memory: 2G

Hard disk: 40g

2, configuration selection

JDK:1.8 (jdk-8u144-linux-x64.tar.gz)

Hadoop:2.8.2 (hadoop-2.8.2.tar.gz)

Scala:2.12.2 (scala-2.12.2.tgz)

Spark: 1.6 (spark-1.6.3-bin-hadoop2.4-without-hive.tgz)

3, download address

Official website address:

JDK:

Http://www.oracle.com/technetwork/java/javase/downloads

Hadopp:

Http://www.apache.org/dyn/closer.cgi/hadoop/common

Spark:

Http://spark.apache.org/downloads.html

Hive on Spark (version of spark integrated hive)

Http://mirror.bit.edu.cn/apache/spark/

Scala:

Http://www.scala-lang.org/download

Baidu Cloud:

Link: https://pan.baidu.com/s/1geT3A8N password: f7jb

Second, the configuration of the server

Before you configure Hadoop+Spark integration, you should do a little configuration.

For convenience in doing these configurations, use root permissions.

1, change the hostname

Change the hostname first in order to facilitate management.

View the name of this machine

Enter:

Hostname

Change the native name

Enter:

Hostnamectl set-hostname master

Note: after the host name is changed, reboot will not take effect until it is restarted.

2. Mapping the relationship between the host and IP

Modify the hosts file to do relational mapping

Input

Vim / etc/hosts

Add

Ip and host name of the host

192.168.219.128 master3, turn off the firewall

Turn off the firewall to facilitate external access.

The following inputs for CentOS 7 version:

Turn off the firewall

Service iptables stop

Input for versions above CentOS 7:

Systemctl stop firewalld.service4, time settin

Enter:

Date

Check whether the server time is the same, and if not, change it.

Change time command

Date-s' MMDDhhmmYYYY.ss' III. Scala environment configuration

Because the configuration of Spark depends on Scala, you need to configure Scala first.

Configuration of Scala

1, file preparation

Extract the downloaded Scala file

Input

Tar-xvf scala-2.12.2.tgz

Then move to / opt/scala

And renamed it scala2.1.

Input

Mv scala-2.12.2 / opt/scalamv scala-2.12.2 scala2.12, environment configuration

Edit / etc/profile file

Enter:

Export SCALA_HOME=/opt/scala/scala2.1export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$PATH

Enter:

Source / etc/profile

Make the configuration effective

Enter scala-version to check whether the installation is successful

III. Spark environment configuration 1, file preparation

There are two kinds of Spark, the download address is given, one is a pure version of spark, and the other is an integrated version of hadoop and hive. The second is used in this article.

Extract the downloaded Spark file

Input

Tar-xvf spark-1.6.3-bin-hadoop2.4-without-hive.tgz

Then move to / opt/spark and rename it

Input

Mv spark-1.6.3-bin-hadoop2.4-without-hive / opt/sparkmv spark-1.6.3-bin-hadoop2.4-without-hive spark1.6-hadoop2.4-hive

2. Environment configuration

Edit / etc/profile file

Enter:

Export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hive export PATH=.:$ {JAVA_HOME} / bin:$ {SCALA_HOME} / bin:$ {SPARK_HOME} / bin:$PATH

Enter:

Source / etc/profile

Make the configuration effective

3, change the configuration file

Switch directories

Enter:

Cd / opt/spark/spark1.6-hadoop2.4-hive/conf4.3.1 modify spark-env.sh

In the conf directory, modify the spark-env.sh file, and if it is not spark-env.sh, copy the spark-env.sh.template file and rename it to spark-env.sh.

Modify the newly created spark-env.sh file and add the configuration:

Export SCALA_HOME=/opt/scala/scala2.1 export JAVA_HOME=/opt/java/jdk1.8export HADOOP_HOME=/opt/hadoop/hadoop2.8 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop export SPARK_HOME=/opt/spark/spark1.6-hadoop2.4-hiveexport SPARK_MASTER_IP=master export SPARK_EXECUTOR_MEMORY=1G

Note: the above path is based on your own, SPARK_MASTER_IP is the host, and SPARK_EXECUTOR_MEMORY is the set running memory.

V. configuration of Hadoop environment

The specific configuration of Hadoop is described in detail in one of big data's learning series-Hadoop Environment Building (stand-alone): http://www.panchengming.com/2017/11/26/pancm55. So this article will give a general introduction.

Note: the specific configuration is based on your own.

1, environment variable setting

Edit the / etc/profile file:

Vim / etc/profile

Configuration file:

Export HADOOP_HOME=/opt/hadoop/hadoop2.8export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS= "- Djava.library.path=$HADOOP_HOME/lib" export PATH=.:$ {JAVA_HOME} / bin:$ {HADOOP_HOME} / bin:$PATH2, configuration file change

Change to the / home/hadoop/hadoop2.8/etc/hadoop/ directory first

5.2.1 modify core-site.xml

Enter:

Vim core-site.xml

Before adding:

Hadoop.tmp.dir / root/hadoop/tmp Abase for other temporary directories. Fs.default.name hdfs://master:9000 5.2.2 modify hadoop-env.sh

Enter:

Vim hadoop-env.sh

Modify ${JAVA_HOME} to your own JDK path

Export JAVA_HOME=$ {JAVA_HOME}

Modified to:

Export JAVA_HOME=/home/java/jdk1.85.2.3 modifies hdfs-site.xml

Enter:

Vim hdfs-site.xml

Before adding:

Dfs.name.dir / root/hadoop/dfs/name Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently. Dfs.data.dir / root/hadoop/dfs/data Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks. Dfs.replication 2 dfs.permissions false need not permissions5.2.4 modify mapred-site.xml

If the file is not mapred-site.xml, copy the mapred-site.xml.template file and rename it to mapred-site.xml.

Enter:

Vim mapred-site.xml

Modify the newly created mapred-site.xml file and add the configuration to the node:

Mapred.job.tracker master:9001 mapred.local.dir / root/hadoop/var mapreduce.framework.name yarn3,Hadoop startup

Note: it is not necessary if it has been successfully configured.

You need to format before starting.

Change to / home/hadoop/hadoop2.8/bin directory

Enter:

. / hadoop namenode-format

After the format is successful, switch to the / home/hadoop/hadoop2.8/sbin directory

Start hdfs and yarn

Enter:

Start-dfs.shstart-yarn.sh

After the startup is successful, enter jsp to check whether the startup is successful

Enter ip+8088 and ip+ 50070 in the browser to see if you can access it.

If you can access it correctly, you can start it successfully.

VI. Spark start

Start spark to make sure that hadoop has been started successfully

First use the jps command to view the started program

After successfully starting spark, use the jps command to view

Change to the Spark directory

Enter:

Cd / opt/spark/spark1.6-hadoop2.4-hive/sbin

Then start Spark

Enter:

Start-all.sh

Then enter it in the browser

Http://192.168.219.128:8080/

If the interface is displayed correctly, the startup will be successful.

Note: if spark starts successfully but cannot access the interface, first check whether the firewall is off, and then use jps to view the process. If there is no problem, you can generally access the interface. If it still doesn't work, check the configuration of hadoop, scala, spark.

So this is the end of this article, thank you for reading!

If you feel good, you can click like or recommend.

Author: nothingness

Source of blog Park: http://www.cnblogs.com/xuwujing

Source of CSDN: http://blog.csdn.net/qazwsxpcm

Source of personal blog: http://www.panchengming.com

Original is not easy, reprint please indicate the source, thank you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.