How to install spark 09/20 Update SLTechnology News&Howtos

How to install spark

2025-09-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to install spark". In daily operation, I believe many people have doubts about how to install spark. The editor consulted all kinds of materials and sorted out simple and easy-to-use methods of operation. I hope it will be helpful to answer the doubts of "how to install spark". Next, please follow the editor to study!

What is RDD?

Problem: the algorithm for finding all the lines containing "charterer" from a file with a total of 100 lines is as follows:

1. Read the line and judge whether there is a "charterer" in this line. If so, add 1 to the global variable count. two。 Is the file at the end? If not, jump to step 1 to continue. 3. Print the count.

The concept of RDD: the full name is Resilient Distributed Datasets, which is a fault-tolerant, parallel data structure that allows users to explicitly store data to disk and memory, and to control data partitioning.

In the above example, the file with a total of 100 lines is a RDD, where each line represents an element of RDD

Two major features of RDD 1. Do the same for each record in the collection-each line does a "string" check-check to see if the line is at the last line 2. The specific behavior of this operation is specified by the user-including the "charterer" to do + 1 operations for the counter-the last line: end; not the last line: go to the next line to check what operations RDD has. Create RDD-create val b = sc.textFile ("README.md") README.md from the file each line is an element of RDD-create RDD scala > val a = sc.parallelize (1 to 9,3) from the normal array contains the nine numbers 1 to 9, each in three partitions 2. Mapmap performs a specified function on each element in the RDD to generate a new RDD. Any element in the original RDD has and only one element corresponds to it in the new RDD. Each element in-RDD an is twice as large as the original scala > val b = a.map (x = > xylene 2) scala > b.collect res11: Array [Int] = Array (2, 4, 6, 8, 10, 12, 14, 16, 18) 3. MapPartitionsmapPartitions is a variety of map. The input function of map is applied to each element in RDD, while the input function of mapPartitions is applied to each partition That is, the contents of each partition are treated as a whole-the function myfunc is to form a Tuplescala > def myfunc [T] (iter: Iterator [T]): Iterator [(T, T)] = {var res = List [(T, T)] () var pre = iter.next while (iter.hasNext) {val cur = iter.next Res.: = (pre, cur) pre = cur;} res.iterator} scala > a.mapPartitions (myfunc). Collectres0: Array [(Int, Int)] = Array ((2), (1), (5), (4), (8), (7)) 4. MapValuesmapValues means that the input function is applied to the Value of Kev-Value in RDD, and the Key in the original RDD remains unchanged, forming the elements of the new RDD together with the new Value. Therefore, this function applies only to RDD whose elements are KV pairs. _ def mapPartitions [U: ClassTag] (f: Iterator [T] = > Iterator [U], preservesPartitioning: Boolean = false): RDD [U] f is the input function that handles the contents of each partition. The output of the content in each partition will be passed as Iterator [T] to the input function freguery f as Iterator [U]. The final RDD is merged by the results of all partitions processed by the input function. The key of _-RDD b is the length of the string, and value is the current element value MapValues b Make the value prefix and trailing character set to x scala > val a = sc.parallelize (List ("dog", "tiger", "lion", "cat", "panther", "eagle"), 2) scala > val b = a.map (x = > (x.length, x)) scala > b.mapValues ("x" + "x"). Collect res5: Array [(Int, String)] = Array (3menxdogx), (5memxtigerx), (4) Xlionx), (3Maginxcatx), (7magentxpantherx), (5menxeaglex) 5.mapWithmapWith is another variety of map. Map only needs one input function, while mapWith has two input functions. Spark installation-Information [installation procedure] (https://spark.apache.org/downloads.html)-install wget http://apache.spinellicreations.com/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgztar zxf spark-1.6.1-bin-hadoop2.6.tgzmv spark-1.6.1-bin-hadoop2.6 sparkmv-f spark ~ / app/vi ~ / .bash_profile PATH=$PATH:$HOME/ Bin:/home/solr/app/spark/binsource ~ / .bash_profile- starts sparkspark-shell to enter scala > Command Line-hello worldscala > println ("hello world") hello worldspark IDE

Download and install JDK

Download and install IDEA

Download and install SCALA

Get spark's lib package ready.

Add the SCALA plug-in for IDEA File- > Settings- > Plugins- > search Scala, and install the Scala plug-in

Create a new project File- > New Project- > Select Scala- > next- > project name & location-> Finish

Add spark's lib package "File"-> "project structure"-> "Libraries", select "+", and import the package corresponding to spark-hadoop.

Create a new SparkPi class (see $SPARKHOME$/examples/src/main/scala/org/apache/spark/examples for source code) New package: org.apache.spark.examples New Scala Class: SparkPi

/ * Licensed to the Apache Software Foundation (ASF) under one or more * contributor license agreements. See the NOTICE file distributed with * this work for additional information regarding copyright ownership. * The ASF licenses this file to You under the Apache License, Version 2.0 * (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at * http://www.apache.org/licenses/LICENSE-2.0 * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "ASIS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. * / scalastyle:off printlnpackage org.apache.spark.examplesimport scala.math.randomimport org.apache.spark._/** Computes an approximation to pi * / object SparkPi {def main (args: Array [String]) {val conf = new SparkConf (). SetAppName ("SparkPi") / / Local operation plus .setMaster ("local") val spark = new SparkContext (conf) val slices = if (args.length > 0) args (0). ToInt else 2 val n = math.min (100000L * slices) Int.MaxValue) .toInt / / avoid overflow val count = spark.parallelize (1 until n Slices). Map {I = > val x = random * 2-1 val y = random * 2-1 if (x + y < 1) 1 else 0}. Reduce (_ + _) println ("Pi is roughly" + 4.0 * count / n) spark.stop ()}} / / scalastyle:on println [package] (jar typed by http://blog.sina.com.cn/s/blog_3fe961ae0102uy42.html) in code\ Spark\ test\ out\ artifacts\ sparkPi\ sparkPi.jar upload to the linux server Execute the command $SPARK_HOME$/bin/spark-submit-- class "org.apache.spark.examples.SparkPi"-- master spark://updev4:7077 / home/solr/sparkPi.jar output result: Pi is roughly 3.13662 here, the study on "how to install spark" is over, I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.