Installation and configuration of spark for hadoop components 07/06 Update SLTechnology News&Howtos

Installation and configuration of spark for hadoop components

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

The following is the environment built by the spark cluster:

Operating system: minimum installed CentOS 7 (download address)

Hadoop version number corresponding to Yarn: Cloudera company release Hadoop2.6.0-CDH5.4.0 of Hadoop (download address)

Java version number: JDK1.8 (download address)

Scala version number: Scala2.10.4 (download address)

Spark version number: spark-1.3.1-bin-hadoop2.6 (download address)

Cluster composition: master 192.168.1.2

Slave1 192.168.1.3

Slave2 192.168.1.4

Slave3 192.168.1.5

1. Installation of operating system

I don't think it's necessary to say that downloading a virtual machine or installing it directly on a real machine is relatively simple and I don't want to repeat it.

2. Installation of Java

Please refer to my blog "installation and configuration of Jdk1.8 in CentOS7" for detailed instructions.

3. Installation of Scala

Please refer to my blog "installation and configuration of Scala2.10.4 in CentOS7" for detailed instructions.

4. Deployment of Yarn

Yarn is developed from Hadoop2.x, is the upgraded version of JobTracker and TaskTracker of Hadoop1.x, and is the resource scheduling tool of hadoop2.x. When building a Hadoop2.x environment, the yarn will be built automatically, so we only need to build the hadoop environment.

For more information on how to build the Hadoop environment, please see my blog entitled "Hadoop2.6.0 Cluster Building in CentOS 7".

5. Cluster construction of Spark

A) first of all, download the corresponding hadoop2.6.0 version of spark1.3.1 from the official website (Note: all of the following operations are done in superuser mode! )

B) under the root/app directory of the master node master, decompress the downloaded spark-1.3.1-bin-hadoop2.6.tgz:

Tar-xzvf spark-1.3.1-bin-hadoop2.6.tgz

C) configure the environment variables for Spark:

I. Vi / etc/profile

Ii. Add at the end of the file:

# # SPARK

The absolute path to export SPARK_HOME=spark (my side is: / root/app/spark-1.3.1-bin-hadoop2.6)

Export PATH=$PATH:$SPARK_HOME/bin

D) related file configuration of Spark

I. configuration of slaves:

Vi slaves

Add the slave node slave name:

Slave1

Slave2

Slave3

Configuration of ii. Spark-env.sh

Vi spark-env.sh

Add to the file:

Absolute path of export JAVA_HOME=Java installation (my side is: / root/app/jdk1.8)

Absolute path of export SCALA_HOME=Scala installation (my side is: / root/app/scala2.10)

The absolute path of the configuration file directory etc/hadoop under the export HADOOP_CONF_DIR=hadoop environment (my side is: / root/app/hadoop-2.6.0-cdh6.4.0/etc/Hadoop)

Export SPARK_MASTER_IP= master node IP or master node IP mapping name (on my side: master)

Export SPARK_MASTER_PORT= primary node boot port (default 7077)

Export PARK_MASTER_WEBUI_PORT= cluster web monitoring page port (default 8080)

Number of CPU cores working from the export SPARK_WORKER_CORES= slave node (default 1)

Export SPARK_WORKER_PORT= starts the port from the node (default 7078)

Memory space allocated by export SPARK_WORKER_MEMORY= to the Spark master and worker daemons (default 512m)

Export SPARK_WORKER_WEBUI_PORT= slave node monitoring port (default 8081)

Export SPARK_WORKER_INSTANCES= the number of worker running on each slave node (default: 1). PS: when you have a very powerful calculation and need multiple Spark worker processes, you can modify this default value greater than 1. If you set this value. Make sure that SPARK_WORKER_CORE explicitly limits the number of cores per worker, otherwise each worker will try to use all cores.

3. The yarn deployment on my side is deployed according to the default of the spark configuration file. If you want to deploy according to the actual situation, you can modify the file:

#-HADOOP_CONF_DIR, to point Spark towards Hadoop configuration files

#-SPARK_EXECUTOR_INSTANCES, Number of workers to start (Default: 2)

#-SPARK_EXECUTOR_CORES, Number of cores for the workers (Default: 1)

#-SPARK_EXECUTOR_MEMORY, Memory per Worker (e.g. 1000m, 2G) (Default: 1G)

#-SPARK_DRIVER_MEMORY, Memory for Master (e.g. 1000m, 2G) (Default: 512 Mb)

#-SPARK_YARN_APP_NAME, The name of your application (Default: Spark)

#-SPARK_YARN_QUEUE, The hadoop queue to use for allocation requests

#-SPARK_YARN_DIST_FILES, Comma separated list of files to be distributed with the job.

#-SPARK_YARN_DIST_ARCHIVES, Comma separated list of archives to be distributed with the job.

Iii. Spark file copy:

Copy the configured Spark file to the directory corresponding to each slave node slave:

Scp spark-1.3.1-bin-hadoop2.6/ root@slave1:/root/app

Scp spark-1.3.1-bin-hadoop2.6/ root@slave2:/root/app

Scp spark-1.3.1-bin-hadoop2.6/ root@slave3:/root/app

6. Start the cluster of Spark On Yarn:

A) start of Yarn:

I. Enter the hadoop directory first

Ii.. / sbin/start-all.sh

Iii. Jps found a ResourceManager process, indicating that yarn startup is complete

B) start of Spark:

I. Enter the spark directory first

ii. . / sbin/start-all.sh

Iii. Jps master node found Master process, jps slave node has Worker process, indicating that spark startup is complete

C) Spark monitoring page, I will not test, usually masterIP:8080. If there is a problem that the monitoring page cannot be opened, it is also a problem that the firewall is not disabled. Please refer to my blog post "problems that may be encountered in the process of building Hadoop environment" for details.

7. At this point, the cluster of Spark On Yarn has been built.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.