Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Spark Streaming SQL for data Statistics based on time window

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly introduces "how to use Spark Streaming SQL for data statistics based on time window". In daily operation, I believe many people have doubts about how to use Spark Streaming SQL to carry out data statistics based on time window. Xiaobian consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts of "how to use Spark Streaming SQL to carry out data statistics based on time window". Next, please follow the editor to study!

1. Background introduction

A very common scenario of streaming computing is processing based on event time, which is often used in detection, monitoring, statistics according to time and other systems. For example, each log in the burial log records the operation time at the burial point, or the user operation time is recorded in the business system, which is used to count the frequency of various operations, or to match according to rules to detect abnormal behavior or monitor system alarms. Such time data will be included in the event data, and it is necessary to extract the time field and make statistics or rule matching according to a certain time range.

It is very convenient to deal with the time field in the event data by using Spark Streaming SQL, and the time window function provided by Spark Streaming SQL can operate the event time according to a certain time interval.

This article introduces how to use Spark Streaming SQL to manipulate event time by showing a case of counting the number of times a user clicked on a web page in the past 5 seconds.

two。 Time window syntax description

Spark Streaming SQL supports two types of window operations: scroll window (TUMBLING) and sliding window (HOPPING).

2.1 Scroll window

The scroll window (TUMBLING) assigns the data to a specified size window according to the time field of each data. The window slides with the window size as the step size, and there is no overlap between the windows. For example, if you specify a 5-minute scroll window, the data will be divided into windows [0:00-0:05), [0:05, 0:10), [0:10, 0:15), and so on.

Grammar

GROUP BY TUMBLING (colName, windowDuration)

Example

Perform window operations on the inv_data_time time column of the inventory table to count the mean value of inv_quantity_on_hand; the window size is 1 minute.

SELECT avg (inv_quantity_on_hand) qohFROM inventoryGROUP BY TUMBLING (inv_data_time, interval 1 minute) 2.2 sliding window

Sliding HOPPING, also known as Sliding Window. Unlike scrolling windows, sliding windows set the step size of window sliding, so windows can overlap. The sliding window has two parameters: windowDuration and slideDuration. SlideDuration is the step of each slide, and windowDuration is the size of the window. When slideDuration

< windowDuration时窗口会重叠,每个元素会被分配到多个窗口中。 所以,滚动窗口其实是滑动窗口的一种特殊情况,即slideDuration = windowDuration则等同于滚动窗口。 语法 GROUP BY HOPPING ( colName, windowDuration, slideDuration ) 示例 对inventory表的inv_data_time时间列进行窗口操作,统计inv_quantity_on_hand的均值;窗口为1分钟,滑动步长为30秒。 SELECT avg(inv_quantity_on_hand) qohFROM inventoryGROUP BY HOPPING (inv_data_time, interval 1 minute, interval 30 second)3.系统架构

After the business log is collected to Aliyun SLS, Spark interconnects with SLS, processes the data through Streaming SQL and writes the statistical result into HDFS. The subsequent operation flow is mainly focused on the part where Spark Streaming SQL receives SLS data and writes it to HDFS. For log collection, please refer to Log Service.

4. Operation flow 4.1 Environment preparation

Create a Hadoop cluster with E-MapReduce version 3.21.0 or later.

Download and compile the E-MapReduce-SDK package

Git clone git@github.com:aliyun/aliyun-emapreduce-sdk.gitcd aliyun-emapreduce-sdkgit checkout-b master-2.x origin/master-2.xmvn clean package-DskipTests

After compilation, emr-datasources_shaded_$ {version} .jar is generated in the assembly/target directory, where ${version} is the version of sdk.

4.2 create tables

The command line starts the spark-sql client

Spark-sql-- master yarn-client-- num-executors 2-- executor-memory 2g-- executor-cores 2-- jars emr-datasources_shaded_2.11-$ {version} .jar-- driver-class-path emr-datasources_shaded_2.11-$ {version} .ja r

Create SLS and HDFS tables

Spark-sql > CREATE DATABASE IF NOT EXISTS default;spark-sql > USE default

Data source table spark-sql > CREATE TABLE IF NOT EXISTS sls_user_logUSING loghubOPTIONS (sls.project = "${logProjectName}", sls.store = "${logStoreName}", access.key.id = "${accessKeyId}", access.key.secret = "${accessKeySecret}", endpoint = "${endpoint}")

-- result table spark-sql > CREATE TABLE hdfs_user_click_countUSING org.apache.spark.sql.jsonOPTIONS (path'${hdfsPath}')

Among them, the built-in function delay () is used to set the watermark in Streaming SQL, and there will be a special article about Streaming SQL watermark.

4.4 View results

As you can see, the resulting result automatically generates a window column containing the window's start and end time information.

At this point, the study on "how to use Spark Streaming SQL for data statistics based on time window" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report