Spark cluster deployment 07/01 Update SLTechnology News&Howtos

Spark cluster deployment

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Spark framework

Comparison between Spark and Storm

For Storm:

1. It is recommended to use it in scenarios that require pure real-time and cannot tolerate a delay of more than 1 second, such as real-time financial systems, which require pure real-time financial transactions and analysis.

2. In addition, if reliable transaction mechanism and reliability mechanism are required in the function of real-time computing, that is, the processing of data is completely accurate, no more and no less, you can also consider using Storm

3. If you also need to dynamically adjust the parallelism of real-time computing programs for peak and low peak periods to maximize the use of cluster resources (usually in small companies, when cluster resources are tight), you can also consider using Storm

4. If a big data application system is pure real-time computing and does not need to execute SQL interactive queries, complex transformation operators, etc., then Storm is a better choice.

For Spark Streaming:

1. If the real-time scenarios that do not satisfy any of the above three points applicable to Storm, that is, do not require pure real-time, powerful and reliable transaction mechanisms, and dynamic parallelism adjustment, then you can consider using Spark Streaming.

2. One of the most important factors to consider the use of Spark Streaming should be a macro consideration for the whole project, that is, if a project includes business functions such as offline batch processing and interactive query in addition to real-time computing, and high delay batch processing, interactive query and other functions may be involved in real-time computing, then Spark ecology should be preferred and offline batch processing should be developed with Spark Core Using Spark SQL to develop interactive query and Spark Streaming to develop real-time computing, the three can be seamlessly integrated and provide a very high expansibility to the system.

Analysis of advantages and disadvantages of Spark Streaming and Storm

In fact, Spark Streaming is by no means better than Storm. These two frameworks are excellent in the field of real-time computing, but they are not good at subdividing scenarios.

Spark Streaming is only better than Storm in throughput, and throughput is also emphasized by people who have always supported Spark Streaming and belittled Storm. But the question is, is throughput so important in all real-time computing scenarios? Not really. Therefore, it is unreliable to say that Spark Streaming is better than Storm through throughput.

In terms of real-time delay, Storm is much better than Spark Streaming, the former is pure real-time, the latter is quasi-real-time. Moreover, the transaction mechanism, robustness / fault tolerance, dynamic parallelism and other features of Storm are better than Spark Streaming.

Spark Streaming, there is one thing that Storm can never compare with, that is, it is located in the Spark ecological technology stack, so Spark Streaming can be seamlessly integrated with Spark Core and Spark SQL, which means that we can immediately seamlessly carry out delayed batch processing, interactive query and other operations in the program for the intermediate data processed in real time. This feature greatly enhances the advantages and functions of Spark Streaming.

Download the packages of spark and scala

Do the following:

[hadoop@oversea-stable] $wget http://mirrors.hust.edu.cn/apache/spark/spark-2.3.0/spark-2.3.0-bin-hadoop2.7.tgz--2018-06-27 10 hadoop@oversea-stable 07 hadoop@oversea-stable 25 wget mirrors.hust.edu.cn (mirrors.hust.edu.cn). 202.114.18.160Connecting to mirrors.hust.edu.cn (mirrors.hust.edu.cn) | 202.114.18.160 |: 80. Connected.HTTP request sent, awaiting response... 200 OKLength: 226128401 (216M) [application/octet-stream] Saving to: 'spark-2.3.0-bin-hadoop2.7.tgz'100% [= >] 226128401 45.4KB/s in 68m 12s2018-06-27 11:15:38 (54.0 KB/s) -' spark-2.3.0-bin-hadoop2.7.tgz' saved [226128401 hadoop@oversea-stable] $[hadoop@oversea-stable ~] $wget https://scala-lang Http: / / files/archive/nightly/2.12.x/scala-2.12.5-bin-3995c7e.tgz--2018-06-27 11 scala-lang.org-https://scala-lang.org/files/archive/nightly/2.12.x/scala-2.12.5-bin-3995c7e.tgzResolving scala-lang.org (scala-lang.org). 128.178.154.159Connecting to scala-lang.org (scala-lang.org) | 128.178.154.159 |: 443. Connected.HTTP request sent, awaiting response... 200 OKLength: 20244926 (19m) [application/x-gzip] Saving to: 'scala-2.12.5-bin-3995c7e.tgz'100% [= >] 20244926 516KB/s in 4m 39s 2018-06-27 11:54:43 (70.8 KB/s) -' scala-2.12.5-bin-3995c7e.tgz' saved [20244926 KB/s 20244926]

Configure environment variables

Do the following:

[hadoop@oversea-stable ~] $tail-4. Bash _ profile export SCALA_HOME=/opt/scalaexport SPARK_HOME=/opt/sparkPATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$SCALA_HOME/bin:$SPARK_HOME/bin:$PATHexport PATH [hadoop@oversea-stable ~] $

Configure and synchronize scala

Do the following:

[hadoop@oversea-stable] $tar xfz scala-2.12.5-bin-3995c7e.tgz-C / opt/ [hadoop@oversea-stable opt] $ln-s scala-2.12.5-bin-3995c7e scala [hadoop@oversea-stable opt] $for (do rsync-avzoptlg scala-2.12.5-bin-3995c7e 192.168.20.$i:/opt/; done)

Configure and synchronize spark

Do the following:

[hadoop@oversea-stable ~] $tar xfz spark-2.3.0-bin-hadoop2.7.tgz-C / opt/ [hadoop@oversea-stable ~] $cd / opt/ [hadoop@oversea-stable opt] $ln-s spark-2.3.0-bin-hadoop2.7 spark [hadoop@oversea-stable opt] $cd spark/conf [hadoop@oversea-stable conf] $pwd/opt/spark/conf [hadoop@oversea-stable conf] $cp spark-env.sh {.template } [hadoop@oversea-stable conf] $vim spark-env.sh [hadoop@oversea-stable conf] $tail-8 spark-env.shexport SCALA_HOME=/opt/sparkexport JAVA_HOME=/usr/java/latestexport SPARK_MASTER_IP=192.168.20.68export SPARK_WORKER_MEMORY=1024m export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop export SPARK_DIST_CLASSPATH=$ (/ opt/hadoop/bin/hadoop classpath) export SPARK_LOCAL_IP=192.168.20.68 # modified to each node itself IPexport SPARK_MASTER_HOST= 192.168.20.68 [Hadoop @ oversea-stable conf] $[hadoop@oversea-stable conf] $cp slaves {.template } [hadoop@oversea-stable conf] $vim slaves [hadoop@oversea-stable conf] $tail-3 slavesopen-stablepermission-stablesp-stable [hadoop@oversea-stable conf] $[hadoop@oversea-stable conf] $cd / opt [hadoop@oversea-stable opt] $for ( I > = 64 avzoptlg spark-2.3.0-bin-hadoop2.7 192.168.20.$i:/opt/ -); do rsync-avzoptlg spark-2.3.0-bin-hadoop2.7 192.168.20.$i:/opt/; done

Start spark

The operation is as follows:

[hadoop@oversea-stable opt] $cd spark [hadoop@oversea-stable spark] $sbin/start-slaves.sh open-stable: starting org.apache.spark.deploy.worker.Worker, logging to / opt/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-open-stable.outpermission-stable: starting org.apache.spark.deploy.worker.Worker Logging to / opt/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-permission-stable.outsp-stable: starting org.apache.spark.deploy.worker.Worker, logging to / opt/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-sp1-stable.out [hadoop@oversea-stable spark] $vim conf/slaves [hadoop@oversea-stable spark] $sbin/start-slaves.sh open-stable: starting org.apache.spark.deploy.worker.Worker Logging to / opt/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-open-stable.outpermission-stable: starting org.apache.spark.deploy.worker.Worker, logging to / opt/spark/logs/spark-hadoop-org.apache.spark.deploy.worker.Worker-1-permission-stable.out [hadoop@oversea-stable spark] $

Verification

(1) check log and confirm that there is no Error

[hadoop@oversea-stable spark] $cd logs [hadoop@oversea-stable logs] $lsspark-hadoop-org.apache.spark.deploy.master.Master-1-oversea-stable.out [hadoop@oversea-stable logs] $

(2) check the status of each server process

[hadoop@oversea-stable logs] $jps12480 DFSZKFailoverController27522 HMaster6738 Master7301 Jps12123 NameNode12588 ResourceManager [hadoop@oversea-stable logs] $[hadoop@open-stable logs] $jps15248 JournalNode15366 NodeManager16248 Jps16169 Worker15131 DataNode18125 QuorumPeerMain22781 HRegionServer [hadoop@open-stable logs] $[hadoop@permission-stable logs] $jps12800 QuorumPeerMain24391 NodeManager4647 Jps24152 DataNode4568 Worker2236 HRegionServer24269 JournalNode [hadoop@permission-stable logs] $[hadoop@sp1-stable logs] $jps7617 QuorumPeerMain9233 Jps21683 NodeManager21540 JournalNode28966 HRegionServer21451 DataNode8813 Worker [hadoop@sp1-stable logs] $

(3) run spark-shell

[hadoop@oversea-stable logs] $spark-shellSLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/opt/spark-2.3.0-bin-hadoop2.7/jars/slf4j-log4j12-1.7.16.jarbank] SLF4J: Found binding in [jar:file:/opt/hadoop-2.9.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jarplash: slf4jbank imply / StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 2018-06-27 15:15:49 WARN NativeCodeLoader:62-Unable to load native-hadoop library for your platform... Using builtin-java classes where applicableSetting default log level to "WARN". To adjust logging level use sc.setLogLevel (newLevel). For SparkR, use setLogLevel (newLevel). Spark context Web UI available at http://oversea-stable:4040Spark context available as' sc' (master = local [*] App id = local-1530083761130). Spark session available as' spark'.Welcome to _ / / _ / / _\ / _ _ / `/ _ _ / / _ /. _ _ /. _ /\ _\ version 2.3.0 / _ / Using Scala version 2.11.8 (Java HotSpot (TM) 64-Bit Server VM, Java 1.8.0 / 172) Type in expressions to have them evaluated.Type: help for more information.scala >

(4) check the status of spark master in web browser

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.