What are the knowledge points of Apache Spark2.2.0? 07/01 Update SLTechnology News&Howtos

What are the knowledge points of Apache Spark2.2.0?

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Today, the editor will share with you what are the relevant knowledge points of Apache Spark2.2.0, the content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.

Overview of Spark

Apache Spark is a fast, multi-purpose cluster computing system. It provides advanced API for Java, Scala, Python and R, as well as an optimized engine that supports general execution diagram calculations. It also supports a rich set of advanced tools, including Spark SQL for structured data processing using SQL, MLlib for machine learning, GraphX for graphics processing, and Spark Streaming.

download

Get Spark from the download page of the project's official website. This document is used for Spark version 2.2.0. Spark can use HDFS and YARN. Spark through the Hadoop client library. It is troublesome to download a precompiled mainstream Hadoop version. Users can download a compiled version of Hadoop and run Spark. Spark with any version of Hadoop by setting the classpath of Spark. Scala and Java users can introduce Spark through Maven in their projects, and Python users can also install Spark from PyPI in the future.

If you want to compile a Spark from the source code, please visit the compilation Spark.

Spark can run on systems like windows and unix (for example, Linux, Mac OS). It can easily run on a local machine-you just need to install a JAVA environment and configure the PATH environment variable, or have JAVA_HOME point to your JAVA installation path

Spark can run on Java 8 cycles, Python 2.7 cycles Universe 3.4 + and R 3.1 +. For Scala API, Spark 2.2.0 uses Scala 2.11. You will need to use a compatible version of Scala (2.11.x).

Please note that since Spark 2.2.0, support for Java 7, Python 2.6 and older Hadoop versions prior to 2.6.5 has been removed.

Please note that the support for Scala 2.10 is no longer applicable to Spark 2.1.0 and may be removed in Spark 2.3.0.

Run the sample and Shell

Spark comes with several sample programs. The Scala, Java, Python and R examples are in the examples/src/main directory. To run one of the sample programs in Java or Scala, use the bin/run-example [params] command in the topmost Spark directory. (the spark-submit script is called at the bottom of this command to load the application.) For example

. / bin/run-example SparkPi 10

You can also run interactive Spark with an improved version of Scala shell. This is a good way to learn about the framework.

. / bin/spark-shell-- master local [2]

The-- master option can be specified as master URL for distributed clusters, or run locally with 1 thread in local mode, and local [N] runs locally with N threads. You should use local mode for testing first. You can get all the configuration items of spark-shell through the-help directive. Spark also supports Python API. To run the interactive Spark in the Python interpreter (interpreter), use bin/pyspark:

. / bin/pyspark-- master local [2]

Application examples are also available in Python. For example

. / bin/spark-submit examples/src/main/python/pi.py 10

Since 1. 4 (including only DataFrames APIs) Spark also provides an experimental R API. To run the interactive Spark in R interpreter (interpreter), execute bin/sparkR:

. / bin/sparkR-- master local [2]

Application examples are also provided in R. For example

. / bin/spark-submit examples/src/main/r/dataframe.R runs on the cluster

This Spark cluster model overview describes the main concepts of running on a cluster. Spark can be run either standalone or on some existing Cluster Manager (cluster manager). It currently provides several options for deployment:

Standalone Deploy Mode: the easiest way to deploy Spark on a private cluster

Apache Mesos

Hadoop YARN

These are all the contents of the article "what are the knowledge points of Apache Spark2.2.0". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.