What is the integration of Flume+Kafka+SparkStreaming? 10/27 Update SLTechnology News&Howtos

What is the integration of Flume+Kafka+SparkStreaming?

2025-10-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article is to share with you about the integration of Flume+Kafka+SparkStreaming, the editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

1. Architecture

The first step is to dock Flume and Kakfa. Flume grabs the log and writes it to Kafka.

In the second part, Spark Streaming reads the data in Kafka for real-time analysis.

First of all, use the message processing (script) that comes with Kakfa to get the message and get through the docking of Flume and Kafka.

two。 Install flume,kafka

Flume install: http://my.oschina.net/u/192561/blog/692225

Kafka install: http://my.oschina.net/u/192561/blog/692357

Integration of 3.Flume and Kafka

3.1 advantages of integration of the two

Flume prefers the data transfer itself, and Kakfa is a typical message middleware used to decouple producers and consumers.

In terms of architecture, Agent does not send data directly to Kafka, and there is a layer of forward made up of Flume in front of Kafka. There are two reasons for this:

Kafka's API is not friendly to non-JVM language support, and forward provides a more general HTTP interface. The forward layer can do routing, Kafka topic, Kafkapartition key and other logic, further reducing the logic of the agent side.

When the data has data source to flume and then to Kafka, on the one hand, the data can be synchronized to HDFS for offline calculation, on the other hand, it can do real-time calculation. In this paper, the real-time calculation is tested by SparkStreaming.

3.2 Integrated installation of Flume and Kafka

1. Download the Flume and Kafka integrated plug-ins at:

Https://github.com/beyondj2ee/flumeng-kafka- plugin

Copy the flumeng-kafka-plugin.jar from the package directory to the lib directory of the Flume installation directory

two。 Copy the following jar package under the libs directory of the Kakfa installation directory to the lib directory of the Flume installation directory

Kafka_2.11-0.10.0.0.jar

Scala-library-2.11.8.jar

Metrics-core-2.2.0.jar

Extract the flume-conf.properties file in the plug-in: modify it as follows: flume source uses exec

Producer.sources.s.type = execproducer.sources.s.command=tail-F-Number1 / home/eric/bigdata/kafka-logs/a.logproducer.sources.s.channels = C1

Change the topic of the producer agent to HappyBirthDayToAnYuan

Put the configuration into apache-flume-1.6.0-bin/conf/producer.conf

Full producer.conf:

# agentsectionproducer.sources= s1producer.channels = c1producer.sinks= producer.sources.s1.type=exec# configuration data source producer.sources.s1.type=exec# configuration log output file or directory to be monitored producer.sources.s1.command=tail-F-Number1 / home/eric/bigdata/kafka-logs/a.log# configuration data channel producer.channels.c1.type=memoryproducer.channels.c1.capacity=10000producer.channels.c1.transactionCapacity=100# configuration data source output # set Kafka receiver, this is the worst, pay attention to the version Here, set the broker address and port number of Kafka for the output slot type producer.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink# of Flume 1.6.0. Producer.sinks.k1.brokerList=localhost:9092# sets the Topicproducer.sinks.k1.topic=HappyBirthDayToAnYuan# setting serialization mode of Kafka. Producer.sinks.k1.serializer.class=kafka.serializer.StringEncoder# cascades the three producer.sources.s1.channels=c1producer.sinks.k1.channel=c1.

3.3.Starting kafka flume related services

Start ZK bin/zookeeper-server-start.sh config/zookeeper.properties

Start the Kafka service bin/kafka-server-start.sh config/server.properties

Create a theme

Bin/kafka-topics.sh-create-zookeeper localhost:2181-replication-factor 1-partitions 1-topic HappyBirthDayToAnYuan

View topic

Bin/kafka-topics.sh-list-zookeeper localhost:2181

View topic details

Bin/kafka-topics.sh-describe-zookeeper localhost:2181-topic HappyBirthDayToAnYuan

Delete theme

Bin/kafka-topics.sh-delete-zookeeper localhost:2181-topic test

Create consumers

Bin/kafka-console-consumer.sh-zookeeper localhost:2181-topic test-from-beginning

Start flume

Bin/flume-ng agent-n producer-c conf-f conf/producer.conf-Dflume.root.logger=INFO,console

Send data to flume:

Echo "yuhai" > > a.log

Kafka consumption data:

Note: the current file content is deleted, the server is restarted, and the theme needs to be recreated, but the consumption content has landing files, and the current consumption content does not disappear. The above is what the integration of Flume+Kafka+SparkStreaming is like. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.