Flume combined with Spark testing 07/08 Update SLTechnology News&Howtos

Flume combined with Spark testing

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Recently, we are testing the experiment of Flume combined with Kafka and Spark Streaming. Today, I made a simple combination of Flume and Spark. Record it here to avoid detours from netizens. If there are some inconsiderate places, I still hope that the passing gods will give me a lot of advice.

The experiment is relatively simple and is divided into two parts: first, sending data using avro-client; second, sending data using netcat.

First, the Spark program needs two jar packages for Flume:

Flume-ng-sdk-1.4.0, spark-streaming-flume_2.11-1.2.0

First, use avro-client to send data

1. Write a Spark program whose function is to receive Flume events

Import org.apache.log4j. {Level, Logger}

Import org.apache.spark.SparkConf

Importorg.apache.spark.storage.StorageLevel

Import org.apache.spark.streaming._

Import org.apache.spark.streaming.flume._

Object FlumeEventTest {

Defmain (args: array [string]) {

Logger.getLogger ("org.apache.spark") .setLevel (Level.WARN)

Logger.getLogger ("org.apache.eclipse.jetty.server") .setLevel (Level.OFF)

Val hostname = args (0)

Val port = args (1) .toInt

Val batchInterval = args (2)

Val sparkConf = newSparkConf () .setAppName ("FlumeEventCount") .setMaster ("local [2]")

Val ssc = new StreamingContext (sparkConf,batchInterval)

Valstream = FlumeUtils.createStream (ssc,hostname,port,StorageLevel.MEMORY_ONLY)

Stream.count () .map (cnt = > "Received" + cnt + "flumeevents.") .print ()

Ssc.start ()

Ssc.awaitTermination ()

}

2. Flume configuration file parameters

A1.channels = C1

A1.sinks = K1

A1.sources = R1

A1.sinks.k1.type = avro

A1.sinks.k1.channel = C1

A1.sinks.k1.hostname = localhost

A1.sinks.k1.port = 9999

A1.sources.r1.type = avro

A1.sources.r1.bind = localhost

A1.sources.r1.port = 44444

A1.sources.r1.channels = C1

A1.channels.c1.type = memory

A1.channels.c1.capacity = 1000

A1.channels.c1.transactionCapacity = 100

Here, avro is used to send data to port 44444 of flume; then flume sends data to Spark through 9999.

3. Run Spark program:

4. Start Flumeagent through the Flume configuration file

.. / bin/flume-ng agent-- conf conf--conf-file. / flume-conf.conf-- name A1

-Dflume.root.logger=INFO,console

The running effect of Spark:

5. Use avro to send files:

. / flume-ng avro-client-- conf conf-Hlocalhost-p 44444-Fmax optqpact servicesClientCharpy SparkOnSparkUniSparkCompact Sparklash env.sh.templateripDflume.root.loggerDeBUGMague console

Flume agent effect:

Spark effect:

Use netcat to send data

1. Spark program is the same as above

2. Configure Flume parameters

A1.channels = C1

A1.sinks = K1

A1.sources = R1

A1.sinks.k1.type = avro

A1.sinks.k1.channel = C1

A1.sinks.k1.hostname = localhost

A1.sinks.k1.port = 9999

A1.sources.r1.type = netcat

A1.sources.r1.bind = localhost

A1.sources.r1.port = 44444

A1.sources.r1.channels = C1

A1.channels.c1.type = memory

A1.channels.c1.capacity = 1000

A1.channels.c1.transactionCapacity = 100

Here, telnet is used as the data source for Flume

3. Run the Spark program as above

4. Start Flumeagent through the Flume configuration file

.. / bin/flume-ng agent-- conf conf--conf-file. / flume-conf.conf-- name A1

-Dflume.root.logger=INFO,console

Note: netcat is used as the data source of Flume here, and the effect is different from that of avro as source

5. Use telnet to send data

Spark effect:

These are two relatively simple demo. If you really use Flume to collect data in a project, use Kafka as a distributed message queue, and use Spark Streaming real-time computing, you also need to study Flume and Spark flow computing in detail.

Some time ago, I trained the department and demonstrated several examples of Spark Streaming: text processing, network data processing, stateful operation and window operation. I have time to organize and share with you these days. It includes two simple demo of Spark MLlib: user classification based on K-Means and movie recommendation system based on collaborative filtering.

Today, I watched Professor Andrew Ng's ML course at Stanford. It was a great lecture. Here is a link to share with you:

Http://open.163.com/special/opencourse/machinelearning.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.