Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use SPARK to analyze PM2.5 data

2025-02-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

In this issue, the editor will bring you about how to use SPARK to analyze PM2.5 data. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

Prepare the SPARK environment

Today, it is possible to apply for SPARK environments in all kinds of public clouds. But completely free, the easiest to start is the SPARK service on the Super Energy Cloud (SuperVessel), which is completely free.

First of all, log in to the Super Cloud home page http://www.ptopenlab.com. If you have not applied for an account before, you can apply directly. The newly applied account will receive an email from manager@ptopenlab.com. Click the link inside to activate the account.

After logging in, select "big data Lab (Big data service)" on the home page.

Log in to big data service and enter your registered user name and password again on the login meeting. You can go to the big data service page.

Click create to enter the interface for creating big data cluster. Currently, both MapReduce and SPARK environments are available on the supercloud. We can select SPARK and the smallest single node, as shown in the following figure.

After clicking "confirm creation", about 30 seconds later, the single-node SPARK environment will be built successfully. You can see the following interface.

Click the "Master console" button and a new page will appear to log in to the editing console, as follows. The default password is "passw0rd".

You can enter the command line interface of the SPARK cluster master node. At this point, you are ready for the SPARK environment.

PM2.5 data

In order to facilitate students to learn SPARK, we specially put the PM2.5 data of the past five months on the superenergy cloud for you as experimental data:) these data are the first-hand real data measured by our five PM2.5 monitoring sensors every day. They measure the real situation in the Zhongguancun Software Park in Shangdi, Beijing.

Do not underestimate these five PM2.5 air quality sensors, they are the latest research results of the IBM Research Institute. Take a look at the picture first. it is small and fully meets the requirements of industrial outdoor design. It comes with 3G data transmission and is powered by solar energy. In a word, outdoor and indoor installation does not have to pull a single cable. It's so cool!

First, there is a picture, there is a picture and there is a truth.

This is a low-cost sensor based on laser scattering technology (Michaelis scattering theory). Compared with the current sensor technology on the market, the accuracy is much higher, can be measured from PM0.3 to PM10, the key is maintenance-free.

To get back to the point, we have sorted out all the data this time to make it convenient for users of the super cloud to try data analysis. The method to obtain data is as follows:

Cd / home/opuserwget http://softrepoNaNopenlab.com/bigdata/pm25_file.tar

Unlock the tar package using the tar command

Tar-xf pm25_file.tar

There are three files in the generated directory pm25_file. Where pm25.txt is a data file, such as 08-Nov-2014, 84 refers to the measured value of 84 at a certain time on November 8, 2014.

The implementation code of SPARK

1. Run as a script

Pm25_2.10-1.0.jar is a compiled implementation. Run.sh is to run the script. If you want to feel it first, you can run. / run.sh directly. The following results can be obtained:

GradeOne is 24.77876%gradeTwo is 25.663715%gradeThree is 20.353981%gradeFour is 12.38938%gradeFive is 15.004249%gradeSix is 1.7699115%

The results show the percentage of days that reached the national level 1 to 6 air quality in the five-month data. Among them, gradeSix is the measured value of PM2.5 above 250, gradeFive is 150-250, and so on.

two。 Steps and codes for calculating the mean value of PM2.5 concentration

After feeling the results, let's try to write our own SPARK code step by step. Enter the editing environment of SPARK first:

$/ opt/spark-1.0.2-bin-hadoop2/bin/spark-shellscala >

Read input data

Scala > val datainput = sc.textFile ("pm25.txt")

Read all pm25 data into a list. Because our data is "date, pm2.5 value", we use "," as the delimiter in the middle to read the second value by shaping.

Scala > val Valuelist = datainput.map (_ .split (",") .map (x = > (x (1). Trim () .toInt))

Calculate the average PM2.5 of all data over a 5-month period

Scala > val AveragePm25=Valuelist.reduce (_ + _) / Valuelist.count

Print out the result

Scala > println ("AveragePm25 is" + AveragePm25+ "ug/m3") 3.PM2.5 concentration sorted by day

First, sum (.reduceByKey (_ + _)) the PM2.5 value (x (1)) of key every day (x (0)).

Scala > val datamap=datainput.map (_ .split (",")) .map (x = > (x (0), x (1) .trim () .toInt)) .reduceByKey (_ + _)

Get the number of records per day

Scala > val recordnumber=datainput.map (_ .split (","). Map (x = > (x (0), 1)) .reduceByKey (_ + _)

Calculate the average PM2.5 for each day

Scala > val dayAverage = datamap.join (recordnumber) .map (x = > (x.circle 1direction x.coach 2.lesson 1Accord x.resume 2.map 2))

Sort the daily average for all days

Scala > val sortData = dayAverage.map (x = > (x.fug2mcex.room1)) .sortByKey (false) .map (x = > (x.fug2memx.room1)

Print the highest 10-day value after sorting

Scala > sortData.take (10) .foreach (p = > println (p)) the above is how to use SPARK to analyze PM2.5 data. If you happen to have similar doubts, please refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report