Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Spark Series (2)-- Construction of Spark Development Environment

2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/03 Report--

Install Spark1.1, download and decompress

Official download address: http://spark.apache.org/downloads.html. Select the Spark version and the corresponding Hadoop version before downloading:

Extract the installation package:

# tar-zxvf spark-2.2.3-bin-hadoop2.6.tgz1.2 configuration environment variable # vim / etc/profile

Add environment variables:

Export SPARK_HOME=/usr/app/spark-2.2.3-bin-hadoop2.6export PATH=$ {SPARK_HOME} / bin:$PATH

Make the configured environment variables take effect immediately:

# source / etc/profile1.3 Local mode

Local mode is the simplest mode of operation, it runs in a single-node multithreading mode, without deployment, out of the box, and is suitable for daily test and development.

# start spark-shellspark-shell-- master local [2] local: start only one worker thread; local [k]: start k worker threads; * local [] * *: start the same number of worker threads as cpu.

After entering spark-shell, the program has automatically created the context SparkContext, which is equivalent to executing the following Scala code:

Val conf = new SparkConf (). SetAppName ("Spark shell"). SetMaster ("local [2]") val sc = new SparkContext (conf) 2. Word frequency statistics case

After the installation is completed, you can first do a simple word frequency statistics example to feel the charm of spark. Prepare a sample file wc.txt for word frequency statistics, as follows:

Hadoop,spark,hadoopspark,flink,flink,sparkhadoop,hadoop

Execute the following Scala statement on the scala interactive command line:

Val file = spark.sparkContext.textFile ("file:///usr/app/wc.txt")val wordCounts = file.flatMap (line = > line.split (", ")). Map ((word = > (word, 1)). ReduceByKey (_ + _) wordCounts.collect

The execution process is as follows, and you can see that the result of word frequency statistics has been output:

At the same time, you can view the execution of the job through Web UI. The access port is 4040:

III. Configuration of Scala development environment

Spark is developed based on Scala language and provides API based on Scala, Java and Python respectively. If you want to use Scala language for development, you need to build a development environment for Scala language.

3.1 preconditions

The operation of Scala depends on JDK, so you need to install the corresponding version of JDK locally. The latest Scala 2.12.x requires JDK 1.8 +.

3.2 install the Scala plug-in

IDEA does not support the development of the Scala language by default and needs to be extended through plug-ins. Open IDEA, click the File = > settings= > plugins tab, and search for the Scala plug-in (shown below). Find the plug-in and install it, and restart IDEA for the installation to take effect.

3.3.Create Scala project

In IDEA, click the File = > New = > Project tab, and then select create Scala-IDEA Project:

Download Scala SDK1. Mode one

At this point, you can see that Scala SDK is empty, click Create = > Download, select the desired version, click the OK button to download, and click Finish to enter the project.

two。 Mode two

The first is the method used in the official Scala installation guide, but the download speed is usually slow, and the Scala command line tool is not provided directly under this installation. Therefore, it is recommended to download the installation package from the official website. Download address: https://www.scala-lang.org/download/

Here my system is Windows, download the msi version of the installation package, always click the next step to install, after the installation is completed, the environment variables will be automatically configured.

Because the environment variables are automatically configured during installation, IDEA automatically selects the corresponding version of SDK.

3.5 create Hello World

Right-click New = > Scala class on the project src directory to create the Hello.scala. Enter the following code. Click the run button when you are finished. If you run it successfully, you will build it successfully.

3.6 switch Scala version

In daily development, due to the version switch of the corresponding software (such as Spark), it may lead to the need to switch the version of Scala, you can switch in the Global Libraries tab in Project Structures.

3.7 problems that may arise

In IDEA, sometimes after reopening the project, right-clicking does not give you the option to create a new scala file, or there is no Scala syntax prompt when writing. At this time, you can delete the SDK configured in Global Libraries, and then add it again:

In addition, there is no need to build Spark and Hadoop environments natively to run Spark projects in local mode in IDEA.

For more articles in big data's series, please see the GitHub Open Source Project: big data's getting started Guide.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report