Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Steps of building Spark cluster

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "Spark cluster building steps". In daily operation, I believe many people have doubts about Spark cluster building steps. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubts about "Spark cluster building steps". Next, please follow the editor to study!

1. Overview of Spark 1. Introduction to Spark

Spark is specially designed for large-scale data processing. Based on the fast universal and scalable cluster computing engine, Spark implements an efficient DAG execution engine, which can efficiently process data streams based on memory, and the operation speed is significantly higher than that of MapReduce.

2. Operation structure

Driver

Run the main () function in Spark's Applicaion, which creates a SparkContext,SparkContext to communicate with Cluster-Manager, and is responsible for applying for resources, task allocation, monitoring, and so on.

ClusterManager

Responsible for applying and managing the resources needed to run applications on WorkerNode, you can efficiently scale computing from one computing node to thousands of computing nodes, including Spark native ClusterManager, ApacheMesos and HadoopYARN.

Executor

Application A process running on WorkerNode that is responsible for running Task tasks as a worker node and for storing data in memory or on disk. Each Application has its own independent batch of Executor, and the tasks are independent of each other.

2. Environmental deployment 1. Scala environment

Installation package management

[root@hop01 opt] # tar-zxvf scala-2.12.2.tgz [root@hop01 opt] # mv scala-2.12.2 scala2.12

Configuration variable

[root@hop01 opt] # vim / etc/profileexport SCALA_HOME=/opt/scala2.12export PATH=$PATH:$SCALA_HOME/ [root @ hop01 opt] # source / etc/profile

Version view

[root@hop01 opt] # scala-version

The Scala environment needs to be deployed on the relevant service nodes where the Spark is running.

2. Spark basic environment

Installation package management

[root@hop01 opt] # tar-zxvf spark-2.1.1-bin-hadoop2.7.tgz [root@hop01 opt] # mv spark-2.1.1-bin-hadoop2.7 spark2.1

Configuration variable

[root@hop01 opt] # vim / etc/profileexport SPARK_HOME=/opt/spark2.1export PATH=$PATH:$SPARK_HOME/ [root @ hop01 opt] # source / etc/profile

Version view

[root@hop01 opt] # spark-shell

3. Spark cluster configuration

Service node

[root@hop01 opt] # cd / opt/spark2.1/conf/ [root@hop01 conf] # cp slaves.template slaves [root@hop01 conf] # vim slaveshop01hop02hop03

Environment configuration

[root@hop01 conf] # cp spark-env.sh.template spark-env.sh [root@hop01 conf] # vim spark-env.shexport JAVA_HOME=/opt/jdk1.8export SCALA_HOME=/opt/scala2.12export SPARK_MASTER_IP=hop01export SPARK_LOCAL_IP= installation Node IPexport SPARK_WORKER_MEMORY=1gexport HADOOP_CONF_DIR=/opt/hadoop2.7/etc/hadoop

Note the configuration of SPARK_LOCAL_IP.

4. Start Spark

Depends on the Hadoop-related environment, so start it first.

Start: / opt/spark2.1/sbin/start-all.sh stop: / opt/spark2.1/sbin/stop-all.sh

Here, two processes are started on the main node: Master and Worker, and only one Worker process is started on the other nodes.

5. Access the Spark cluster

The default port is 8080.

Http://hop01:8080/

Run the base case:

[root@hop01 spark2.1] # cd / opt/spark2.1/ [root@hop01 spark2.1] # bin/spark-submit-- class org.apache.spark.examples.SparkPi-- master local examples/jars/spark-examples_2.11-2.1.1.jar running result: Pi is roughly 3.1455357276786384 III, development case 1, core dependency

Rely on the Spark2.1.1 version:

Org.apache.spark spark-core_2.11 2.1.1

Introduce the Scala compilation plug-in:

Net.alchim31.maven scala-maven-plugin 3.2.2 compile testCompile 2, case code development

Read the file at the specified location and output the word statistics results of the file contents.

@ RestControllerpublic class WordWeb implements Serializable {@ GetMapping ("/ word/web") public String getWeb () {/ / 1, create Spark configuration object SparkConf sparkConf = new SparkConf () .setAppName ("LocalCount") .setmaster ("local [*]"); / / 2, create SparkContext object JavaSparkContext sc = new JavaSparkContext (sparkConf) Sc.setLogLevel ("WARN"); / / 3. Read the test file JavaRDD lineRdd = sc.textFile ("/ var/spark/test/word.txt"); / / 4, split the line contents JavaRDD wordsRdd = lineRdd.flatMap (new FlatMapFunction () {@ Override public Iterator call (Object obj) throws Exception {String value = String.valueOf (obj)) String [] words = value.split (","); return Arrays.asList (words) .iterator ();}} / / 5. Mark the segmented words JavaPairRDD wordAndOneRdd = wordsRdd.mapToPair (new PairFunction () {@ Override public Tuple2 call (Object obj) throws Exception {/ / mark the words: return new Tuple2 (String.valueOf (obj), 1);}}) / / 6. Count the number of word occurrences JavaPairRDD wordAndCountRdd = wordAndOneRdd.reduceByKey (new Function2 () {@ Override public Object call (Object obj1, Object obj2) throws Exception {return Integer.parseInt (obj1.toString ()) + Integer.parseInt (obj2.toString ());}}); / / 7, sort JavaPairRDD sortedRdd = wordAndCountRdd.sortByKey () List finalResult = sortedRdd.collect (); / / 8. Print for (Tuple2 tuple2: finalResult) {System.out.println (tuple2._1 + "= = >" + tuple2._2);} / / 9, save statistical results sortedRdd.saveAsTextFile ("/ var/spark/output"); sc.stop (); return "success";}}

Package execution result:

View file output:

[root@hop01 output] # vim / var/spark/output/part- 000004, source code address GitHub address https://github.com/cicadasmile/big-data-parentGitEE address https://gitee.com/cicadasmile/big-data-parent, the study on "Spark cluster building steps" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report