Example Analysis of windows installing spark and PyCharm IDEA debugging TopN 07/02 Update SLTechnology News&Howtos

Example Analysis of windows installing spark and PyCharm IDEA debugging TopN

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shares with you the content of sample analysis of windows installation spark and PyCharm IDEA debugging TopN. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

1. Install jdk

The first step in installing spark is to install jdk (whether windows or linux), and spark execution depends on jdk. Download jdk on the Oracle official website. Here I choose 8u74 windows x64 version, and you can also download it according to your own needs. The installation of jdk is not listed here, but the next step is to choose the installation path and so on.

2. Install spark

Download spark from the official Apache Spark ™website and select spark-1.6.0-bin-hadoop2.6.tgz.

Add the spark environment variable and append it after PATH:

% SPARK_HOME%\ bin

% SPARK_HOME%\ sbin

The spark under the windows environment has been built!

Notice that there is a pit here:

Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\ bin\ winutils.exe in the Hadoop binaries.

Spark supports the standalone mode and does not rely on hadoop. However, the winutils.exe of hadoop is still required in the windows environment. Therefore, you need to download the version 2.6 matching winutils.exe. You can google "hadoop.dll 2.6" or download it here (available in all versions of hadoop dll winutils.exe,GitHub), and overwrite the downloaded files to the bin directory of hadoop (if not, you need to create a directory and set the corresponding hadoop environment HADOOP_HOME and PATH environment variables).

3. Build the pyspark development environment.

Spark supports scala, python, and java, and since python is more popular than scala, the development environment is Python.

Let's start to build the python environment:

2.7or 3.5. the installation process is not listed here. Add PYTHONPATH to the environment variable after installation. This step is very important:

If the configuration is correct, open the IDE that comes with python, enter the following code, and wait for the message that the connection is successful:

From pyspark import SparkConf, SparkContext conf = SparkConf (). SetMaster ("local"). SetAppName ("MY First App") sc = SparkContext (conf = conf)

You can also start the test manually:

Spark-class.cmd org.apache.spark.deploy.master.Masterspark-class.cmd org.apache.spark.deploy.worker.Worker spark://localhost:70774, SPARK Analysis of CSDN password Library Common passwords TOP10# coding=utf-8# Test utf-8 Encoding from _ _ future__ import divisionimport decimalfrom pyspark import SparkConf, SparkContext StorageLevelimport sysreload (sys) sys.setdefaultencoding ('utf-8') conf= SparkConf (). SetMaster ("local"). SetAppName ("CSDN_PASSWD_Top10") sc = SparkContext (conf=conf) file_rdd = sc.textFile ("H:\ mysql\ csdn_database\ www.csdn.net.sql") passwds = file_rdd.map (lambda line: line.split ("#") [1]. Strip () .map (lambda passwd: (passwd Persist (storageLevel=StorageLevel.MEMORY_AND_DISK_SER) passwd_nums = passwds.count () top10_passwd = passwds.reduceByKey (lambda a, b: a + b) .sortBy (lambda item: item [1], ascending=False) .take (10) for item in top10_passwd: print item [0] + "\ t" >

5. Scala-Shell version

The code is as follows:

C:\ Users\ username > spark-shellscala > val textFile = spark.read.textFile ("C:\ Users\\ username\\ Desktop\ parse_slow_log.py") textFile: org.apache.spark.sql.Dataset [String] = [value: string] scala > textFile.count () res0: Long = 156scala > textFile.first () res1: String = # encoding: utf-8scala > val linesWithSpark = textFile.filter (line = > line.contains ("Spark") linesWithSpark: org.apache.spark.sql.Dataset [String] = [ Value: string] scala > textFile.filter (line = > line.contains ("Spark")). Count () res2: Long = 0scala > textFile.map (line = > line.split (") .size). Reduce ((a) B) = > if (a > b) an else b) res3: Int = 27scala > val wordCounts = textFile.flatMap (line = > line.split (")) .groupByKey (identity). Count () wordCounts: org.apache.spark.sql.Dataset [(String, Long)] = [value: string, count (1): bigint] scala > wordCounts.collect () res4: Array [(String, Long)] = Array (self.slowlog,1), (import,3), (False,1), (file_name,1) (flag_word,3), (MySQL,1), (else,1), (*, 2), (slowlog,1), (default=script_path), 1), (0 auther,1 4), (", 2), (- d auther,1 1), (_ _ auther,1), (thank you for reading! On "windows installation spark and PyCharm IDEA debugging TopN sample analysis" this article is shared here, I hope the above content can be of some help to you, so that you can learn more knowledge, if you think the article is good, you can share it out for more people to see it!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.