Spark Foundation 04/28 Update SLTechnology News&Howtos

Spark Foundation

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/02 Report--

Official document: spark.apache.org/docs/latest

Spark background

Limitations of MapReduce:

1 >) complicated

Map/reduce (mapjoin does not have reduce)

Low_level

Constained

Requirements test every time the code is changed and then tested.

2 >) low technical efficiency

Hundreds of processes: MapTask ReduceTask JVM reuse

IO: chain network + disk

Sort: all have to sort: interview question: what interface is the key type to achieve?

Memory:

...

Not suitable for iterative processing

Not suitable for real-time streaming processing

Many frameworks fight on their own.

Overview and characteristics of Spark

Spark.apache.org

Speed

Memory

Thread

Sort (settable)

DAG rdd.map.filter....collect

Ease of use

High-level operators: join 、 group 、 count .

Generality

Runs Everywhere

Summary:

Fast + general engine

Write code: java/Scala/Python/R interactive shell

Run:memory/ADG/thread model/.

The introduction and selection of the version is based on the reference:

How to learn Spark

Mail list

User@spark.apache.org

Apache-spark-user-list/

Meetup/ Summit

Sample source code

Github.com/apache/spark

Source code

Environment:

Centos6

Hadoop000 (hadoop) hadoop001 hadoop002

App stores the directory where the software is installed

Software stores the tar of the software package

Data stores test data

Lib stores our own jar

The location where the source code is stored in source

Spark installation

Download the source code and decompress it from the official website

Pre-requirements for compiling Spark source code

Java 8 cycles, Python 2.7 Scala 2.11.xx 3.4 + Spark 2.3.0 Scala 2.11.xx

Install jdk

Apache-maven installation

Extract configuration .bash _ proile

Export MAVEN_HOME/home/hadoop/app/apache-maven-3.3.9

Export PATH=$MAVE_HOME/bin:$PATH

Suggestion: modify the address of maven local warehouse $MAVE_HOME/conf/setting.xml

/ home/hadoop/mave_repo

Install scala-2.11.9.tgz

Extract configuration .bash _ proile

Export MAVEN_HOME/home/hadoop/app/scala-2.11.9

Export PATH=$MAVE_HOME/bin:$PATH

Source ~ .bash _ proile

Verification: mvn-v

Install yum install git under git

Compilation and installation

Export MAVEN_OPTS= "- Xmx2g-XX:ReservedCodeCacheSize=512m"

. / build/mvn-DskipTests clean package

Modify the default hadoop version of the source code

Pom.xml

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.