In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article mainly explains "how to write Spark programs in SparkShell and IDEA". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to write Spark programs in SparkShell and IDEA".
Spark-shell is an interactive Shell program that comes with Spark, which is convenient for users to program interactively. Users can write Spark programs with Scala under this command line. Spark-shell programs are generally used as Spark program testing exercises. Spark-shell belongs to the special application of Spark, in which we can submit the application.
There are two modes of spark-shell startup, local mode and cluster mode, respectively
Local mode:
Spark-shell
Local mode only starts a SparkSubmit process locally and does not establish a connection with the cluster. Although there is SparkSubmit in the process, it will not be submitted to the cluster red.
Cluster mode (cluster mode):
Spark-shell\
-- master spark://hadoop01:7077\
-executor-memory 512m\
-- total-executor-cores 1
The last two commands are not required-the master command is required (unless it is already indicated in the jar package that it can not be specified, it must be specified)
Exit shell
Do not ctrl+c spark-shell exit correctly: quit do not ctrl+c exit this is wrong if you use the ctrl+c exit command to view the listening port netstat-apn | grep 4040 can be killed using the kill-9 port number
3.25.11 comparison between spark2.2shell and spark1.6shell
Ps: if you start spark-shell in cluster mode, there will be a task that will be executed all the time in webUI
Create a Spark project through IDEA
Ps: the steps before project creation are omitted, which have been explained in scala. The default is to create a project directly.
Configure pom.xml files in the project
1.8
1.8
UTF-8
2.11.8
2.2.0
2.7.1
2.11
Org.scala-lang
Scala-library
${scala.version}
Org.apache.spark
Spark-core_2.11
${spark.version}
Org.apache.hadoop
Hadoop-client
${hadoop.version}
Implementation of WordCount Program by Spark
Scala version
Import org.apache.spark.rdd.RDD
Import org.apache.spark. {SparkConf, SparkContext}
Object SparkWordCount {
Def main (args: Array [String]): Unit = {
Val conf = new SparkConf () .setAppName ("dri/wordcount") .setMaster ("local [*]")
/ / create a sparkContext object
Val sc = new SparkContext (conf)
/ / data can be processed through the sparkcontext object
/ / read the file parameter is a string of type String, passing in the path
Val lines: RDD [String] = sc.textFile ("dir/wordcount")
/ / split the data
Val words: RDD [String] = lines.flatMap (_ .split (""))
/ / generate tuples for each word (word, 1)
Val tuples: RDD [(String, Int)] = words.map ((_ 1))
/ / an operator reduceByKey is provided in spark and the same key is provided as a group for summation calculation value
Val sumed: RDD [(String, Int)] = tuples.reduceByKey (_ + _)
/ / sort the current result. SortBy and scala have a different sotrBy parameter.
/ / the default is ascending false, which means descending
Val sorted: RDD [(String, Int)] = sumed.sortBy (_. _ 2, false)
/ / the value cannot be returned if the data is submitted to cluster storage
Sorted.foreach (println)
/ / reclaim resources to stop sc and end the task
Sc.stop ()
}
}
Java version
Import org.apache.spark.SparkConf
Import org.apache.spark.api.java.JavaPairRDD
Import org.apache.spark.api.java.JavaRDD
Import org.apache.spark.api.java.JavaSparkContext
Import org.apache.spark.api.java.function.FlatMapFunction
Import org.apache.spark.api.java.function.Function2
Import org.apache.spark.api.java.function.PairFunction
Import scala.Tuple2
Import java.util.Arrays
Import java.util.Iterator
Import java.util.List
Public class JavaWordCount {
Public static void main (String [] args) {
/ / 1. First create a conf object to configure is mainly to set the name, in order to set the running mode
SparkConf conf = new SparkConf () .setAppName ("JavaWordCount") .setMaster ("local")
/ / 2. Create a context object
JavaSparkContext jsc = new JavaSparkContext (conf)
JavaRDD lines = jsc.textFile ("dir/file")
/ / data sharding flatMapFunction is a concrete implementation class
JavaRDD words = lines.flatMap (new FlatMapFunction () {
@ Override
Public Iterator call (String s) throws Exception {
List splited = Arrays.asList (s.split (""))
Return splited.iterator ()
}
});
/ / generate data into tuples
/ / the first generic type is the input data type, and the last two parameters are output parameter tuples.
JavaPairRDD tuples = words.mapToPair (new PairFunction () {
@ Override
Public Tuple2 call (String s) throws Exception {
Return new Tuple2 (s, 1)
}
});
/ / aggregation
JavaPairRDD sumed = tuples.reduceByKey (new Function2 () {
@ Override
/ / the first Integer is the value corresponding to the same key
/ / the second Integer is the value corresponding to the same key
Public Integer call (Integer v1, Integer v2) throws Exception {
Return v1 + v2
}
});
/ / because Java api does not provide a sortBy operator, you need to change the position of the data in the tuple, and then sort, and then change the order back to
/ / the first exchange is for sorting
JavaPairRDD swaped = sumed.mapToPair (new PairFunction () {
@ Override
Public Tuple2 call (Tuple2 tup) throws Exception {
Return tup.swap ()
}
});
/ / sort
JavaPairRDD sorted = swaped.sortByKey (false)
/ / the second exchange is for the final result
JavaPairRDD res = sorted.mapToPair (new PairFunction () {
@ Override
Public Tuple2 call (Tuple2 tuple2) throws Exception
{
Return tuple2.swap ()
}
});
System.out.println (res.collect ())
Res.saveAsTextFile ("out1")
Jsc.stop ()
}
}
Thank you for reading, the above is the content of "how to write Spark programs in SparkShell and IDEA". After the study of this article, I believe you have a deeper understanding of how to write Spark programs in SparkShell and IDEA, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.