In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/03 Report--
Before learning any technology of spark, please correctly understand spark, you can refer to: correct understanding of spark
The following is a configuration of the environment for developing spark with python on the mac operating system
First, install python
Spark2.2.0 requires the version of python to be Python2.6+ or Python3.4+
Please refer to:
Http://jingyan.baidu.com/article/7908e85c78c743af491ad261.html
Download the spark compiler and configure the environment variables
1. On the official website: http://spark.apache.org/downloads.html download version: spark-2.2.0-bin-hadoop2.6.tgz package
Put it on a local disk and decompress it.
2. Set environment variables:
Cd ~
Vi .bash _ profile
Export SPARK_HOME=/Users/tangweiqun/Desktop/bigdata/spark/spark-2.2.0-bin-hadoop2.6
Export PATH=$PATH:$SCALA_HOME/bin:$M2_HOME/bin:$JAVA_HOME/bin:$SPARK_HOME/bin
Source .bash _ profile
3. You need to execute chmod 744. / * on the files in the bin directory under SPARK_HOME, otherwise an error of insufficient permissions will be reported.
Window machines should not have to do this.
Third, install PyCharm
1. Download it from the official website: https://www.jetbrains.com/pycharm/download/ and install it foolishly.
4. Write wordcount.py and run it successfully
1. Create a project
File-- > New Project
2. Configure PYTHONPATH for PyCharm
Run-- > Edit Configurations, the configuration is as follows
Click the "+" above, and then fill in:
PYTHONPATH=/Users/tangweiqun/Desktop/bigdata/spark/spark-2.1.0-bin-hadoop2.6/python/:/Users/tangweiqun/Desktop/bigdata/spark/spark-2.1.0-bin-hadoop2.6/python/lib/py4j-0.10.4-src.zip
Add the python-related dependencies in the spark installation package
3. Py4j-some-version.zip and pyspark.zip are added to the project
In order to see the source code, we need to associate the project with the source code as follows:
Click + Add Content Root to add two zip packages under / Users/tangweiqun/Desktop/bigdata/spark/spark-2.1.0-bin-hadoop2.6/python/lib
4. Write spark word count and run it successfully
Create a python file, wordcount.py, with the following contents:
From pyspark import SparkContext, SparkConf
Import os
Import shutil
If _ name__ = = "_ _ main__":
Conf = SparkConf () .setAppName ("appName") .setMaster ("local")
Sc = SparkContext (conf=conf)
SourceDataRDD = sc.textFile ("file:///Users/tangweiqun/test.txt")
WordsRDD = sourceDataRDD.flatMap (lambda line: line.split ())
KeyValueWordsRDD = wordsRDD.map (lambda s: (s, 1))
WordCountRDD = keyValueWordsRDD.reduceByKey (lambda a, b: a + b)
OutputPath = "/ Users/tangweiqun/wordcount"
If os.path.exists (outputPath):
Shutil.rmtree (outputPath)
WordsRDD.saveAsTextFile ("file://" + outputPath)
Print wordCountRDD.collect ()
Right click to run successfully
Detailed and systematic understanding of spark core RDD-related Api can be referred to: detailed explanation of spark core RDD api principle
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.