What is the basic usage of Flume and Kafka integration 07/11 Update SLTechnology News&Howtos

What is the basic usage of Flume and Kafka integration

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the knowledge of "what is the basic usage of Flume and Kafka integration". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

1. Introduction to Flume 1. Basic description

Flume is a highly available, highly reliable, distributed mass log collection, aggregation and transmission system provided by Cloudera. Flume supports customizing all kinds of data senders in the log system to collect data.

Features: distributed, highly available, stream-based architecture, usually used to collect, aggregate, and transport a large number of logs from different data sources to the data warehouse.

2. Architecture model

Agent consists of three core components, Source, Channel and Sink. Source is responsible for receiving data sources and is compatible with multiple types. Channel is the buffer of data, how and where Sink handles data output.

Event is a basic unit of data stream transmission defined by Flume, which sends data from source to destination.

Flume can set multiple Agent connections to transfer Event data, from the initial source to the final sink transmission destination storage system, if the number is too large, the transmission rate will be affected, and a single node failure in the transmission process will also affect the entire transmission channel.

Flume supports multiplexing data streams to one or more destinations, a mode in which the same data can be copied to multiple channel, or different data can be distributed to different channel, and the sink can choose to transfer to different destinations.

Agent1 is understood as that the routing node is responsible for balancing the Event of Channel to multiple Sink components, and each Sink component is connected to an independent Agent to achieve load balancing and error recovery.

Flume uses the combination method to do data aggregation, each server deploys a flume node to collect log data, and then aggregates and transmits it to the storage system, such as HDFS, Hbase and other components, to solve the cluster data collection efficiently and stably.

II. Installation process 1. Installation package

Apache-flume-1.7.0-bin.tar.gz

2. Decompress and name [root@hop01 opt] # pwd/opt [root@hop01 opt] # tar-zxf apache-flume-1.7.0-bin.tar.gz [root@hop01 opt] # mv apache-flume-1.7.0-bin flume1.73, configuration text

Configuration path: / opt/flume1.7/conf

Mv flume-env.sh.template flume-env.sh4, modify configuration

Add JDK dependency

Vim flume-env.shexport JAVA_HOME=/opt/jdk1.85, environmental testing

Install netcat tools

Sudo yum install-y nc

Create a task configuration

[root@hop01 flume1.7] # cd job/ [root@hop01 job] # vim flume-netcat-test01.conf

Add basic task configuration

Note: A1 represents the agent name.

# this agenta1.sources = sr1a1.sinks = sk1a1.channels = sc1# the sourcea1.sources.sr1.type = netcata1.sources.sr1.bind = localhosta1.sources.sr1.port = 5555 years the sinka1.sinks.sk1.type = logger# events in memorya1.channels.sc1.type = memorya1.channels.sc1.capacity = 1000a1.channels.sc1.transactionCapacity = 10 years Bind the source and sinka1.sources.sr1.channels = sc1a1.sinks.sk1.channel = sc1

Open the flume listening port

/ opt/flume1.7/bin/flume-ng agent-conf/ opt/flume1.7/conf/-name A1-conf-file / opt/flume1.7/job/flume-netcat-test01.conf-Dflume.root.logger=INFO,console

Use the netcat tool to send data to port 55555

[root@hop01 ~] # nc localhost 55555helloauthorflume

View flume control fac

Application case 1. Case description

Based on flume, the data is collected in each cluster service, and then the data is transmitted to the kafka service, and then the data consumption strategy is considered.

Acquisition: based on the convenient collection capability of flume components, if you directly use kafka, it will result in a large number of burying points that are difficult to maintain.

Consumption: based on the temporary data storage capacity of the kafka container, to avoid excessive data collection during the highly active period of the system to break down the data acquisition channel, and can do data isolation and targeted processing based on kafka.

2. Create kafka configuration [root@hop01 job] # pwd/opt/flume1.7/job [root@hop01 job] # vim kafka-flume-test01.conf3, modify sink configuration # the sinka1.sinks.sk1.type = org.apache.flume.sink.kafka.KafkaSink# topica1.sinks.sk1.topic = kafkatest# broker address, port number a1.sinks.sk1.kafka.bootstrap.servers = hop01:9092# serialization mode a1.sinks.sk1.serializer.class = kafka.serializer.StringEncoder4, create Topic for kafka

The name in the above configuration file: kafkatest, check the topic information after executing the create command.

[root@hop01 bin] # pwd/opt/kafka2.11 [root@hop01 kafka2.11] # bin/kafka-topics.sh-- create-- zookeeper hop01:2181-- replication-factor 1-- partitions 1-- topic kafkatest [root@hop01 kafka2.11] # bin/kafka-topics.sh-- describe-- zookeeper hop01:2181-- topic kafkatest5, launch Kakfa consumption [root@hop01 kafka2.11] # bin/kafka-console-consumer.sh-bootstrap-server hop01:2181-topic kafkatest-from-beginning

Here you specify that topic is kafkatest.

6. Start flume configuration / opt/flume1.7/bin/flume-ng agent-- conf/ opt/flume1.7/conf/-- name A1-- conf-file / opt/flume1.7/job/kafka-flume-test01.conf-Dflume.root.logger=INFO,console "what is the basic usage of Flume and Kafka integration". Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.