In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
How to run the wordcount project on the Spark platform in a command-line way, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a simpler and easier way.
Created by Wang, Jerry, last modified on Sep 22, 2015
Stand-alone mode, that is, local mode
Local mode is very simple to run, as long as you run the following command, assuming that the current directory is $SPARK_HOME
MASTER=local bin/spark-shell
"MASTER=local" indicates that you are currently running in stand-alone mode.
Scala > val textFile = sc.textFile ("README.md")
Val textFile = sc.textFile ("jerry.test")
19:14:32 on 15-08-08 INFO MemoryStore: ensureFreeSpace (182712) called with curMem=664070, maxMem=278302556
19:14:32 on 15-08-08 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 178.4 KB, free 264.6 MB)
19:14:32 on 15-08-08 INFO MemoryStore: ensureFreeSpace (17237) called with curMem=846782, maxMem=278302556
19:14:32 on 15-08-08 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 16.8 KB, free 264.6 MB)
19:14:32 on 15-08-08 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on localhost:37219 (size: 16.8 KB, free: 265.3 MB)
15-08-08 19:14:32 INFO SparkContext: Created broadcast 7 from textFile at: 21
TextFile: org.apache.spark.rdd.RDD [String] = MapPartitionsRDD [12] at textFile at: 21
Then: textFile.filter (.customers ("Spark"). Count
Or textFile.flatMap (.split (")) .map ((_, 1))
15-08-08 19:16:27 INFO FileInputFormat: Total input paths to process: 1
15-08-08 19:16:27 INFO SparkContext: Starting job: count at: 24
19:16:27 on 15-08-08 INFO DAGScheduler: Got job 0 (count at: 24) with 1 output partitions (allowLocal=false)
19:16:27 on 15-08-08 INFO DAGScheduler: Final stage: ResultStage 0 (count at: 24)
19:16:27 on 15-08-08 INFO DAGScheduler: Parents of final stage: List ()
19:16:27 on 15-08-08 INFO DAGScheduler: Missing parents: List ()
19:16:27 on 15-08-08 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD [2] at filter at: 24), which has no missing parents
19:16:27 on 15-08-08 INFO MemoryStore: ensureFreeSpace (3184) called with curMem=156473, maxMem=278302556
19:16:27 on 15-08-08 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.1KB, free 265.3 MB)
19:16:27 on 15-08-08 INFO MemoryStore: ensureFreeSpace (1855) called with curMem=159657, maxMem=278302556
19:16:27 on 15-08-08 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1855.0 B, free 265.3 MB)
19:16:27 on 15-08-08 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:42648 (size: 1855.0 B, free: 265.4 MB)
15-08-08 19:16:27 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
19:16:27 on 15-08-08 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD [2] at filter at: 24)
15-08-08 19:16:27 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
19:16:27 on 15-08-08 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1415 bytes)
19:16:27 on 15-08-08 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15-08-08 19:16:27 INFO HadoopRDD: Input split: file:/root/devExpert/spark-1.4.1/README.md:0+3624
15-08-08 19:16:27 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15-08-08 19:16:27 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15-08-08 19:16:27 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15-08-08 19:16:27 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15-08-08 19:16:27 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
19:16:27 on 15-08-08 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1830 bytes result sent to driver
19:16:27 on 15-08-08 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 80 ms on localhost (1 take 1)
15-08-08 19:16:27 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
19:16:27 on 15-08-08 INFO DAGScheduler: ResultStage 0 (count at: 24) finished in 0.093 s
15-08-08 19:16:27 INFO DAGScheduler: Job 0 finished: count at: 24, took 0.176689 s
Res0: Long = 19
This is the answer to the question on how to run the wordcount project on the Spark platform on the command line. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.