Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What are the skills of spark?

2025-03-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

What are the spark skills, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

1. Set the maximum message size

Def main (args: Array [String]) {System.setProperty ("spark.akka.frameSize", "1024")}

two。 Set up queues when combined with yarn

Val conf=new SparkConf. SetAppName ("WriteParquet") conf.set ("spark.yarn.queue", "wz111") val sc=new SparkContext (conf)

3. The runtime uses yarn to allocate resources and sets the-- num-executors parameter

Nohup / home/SASadm/spark-1.4.1-bin-hadoop2.4/bin/spark-submit--name mergePartition--class main.scala.week2.mergePartition--num-executors 30--master yarnmergePartition.jar > server.log 2 > & 1 &

4. Read the parquet of impala and deal with String string

SqlContext.setConf ("spark.sql.parquet.binaryAsString", "true")

The writing of 5.parquetfile

Case class ParquetFormat (usr_id:BigInt, install_ids:String) val appRdd=sc.textFile ("hdfs://"). Map (_ .split ("\ t")) .map (r = > ParquetFormat (r (0). ToLong,r (1)) sqlContext.createDataFrame (appRdd) .repartition (1). Write.parquet ("hdfs://")

Reading of 6.parquetfile

Val parquetFile=sqlContext.read.parquet ("hdfs://") parquetFile.registerTempTable ("install_running") val data=sqlContext.sql ("select user_id,install_ids from install_running") data.map (t = > "user_id:" + t (0) + "install_ids:" + t (1)) .collect () .foreach (println)

7. When writing a file, gather all the results into one file

Repartition (1)

8. If rdd is reused, use cache cache

Cache ()

9.spark-shell add dependency package

Spark-1.4.1-bin-hadoop2.4/bin/spark-shell local [4]-- jars code.jar

10.spark-shell uses yarn mode and queues

Spark-1.4.1-bin-hadoop2.4/bin/spark-shell-- master yarn-client-- queue wz111, is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report