How to deploy Flume 07/15 Update SLTechnology News&Howtos

How to deploy Flume

2025-07-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how to deploy Flume". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn how to deploy Flume.

Introduction to Flume

Flume is a highly available, highly reliable and distributed massive log collection, aggregation and transmission system provided by Cloudera. Flume supports customizing various data senders to collect data in the log system, while Flume provides the ability to simply process the data and write the data processing results to various data receivers.

As a real-time log collection system developed by Cloudera, Flume has been recognized and widely used in the industry. In November 2010, Cloudera opened up the first available version of Flume, 0.9.2, which is collectively referred to as Flume-OG and the refactored version as Flume-NG. Another reason for the change is the inclusion of Flume under Apache, and the renaming of Cloudera Flume to Apache Flume has become the core project of Apache.

2 working principle of Flume

Flume (waterway) takes agent as the smallest independent operating unit. An agent is a JVM. A single agent consists of three major components: Source, Sink and Channel, as shown below:

The data flow of the Flume is run through by the Event. An event is the basic data unit of a Flume. It carries log data (in the form of a byte array) and header information. These Event are generated by the Source outside the Agent. When the Source captures the event, it is formatted specifically, and then the Source pushes the event into (single or multiple) Channel. Think of Channel as a buffer that will hold the event until the Sink finishes processing it. Sink is responsible for persisting logs or pushing events to another Source. Here are some of the core concepts of Flume:

The component feature Agent runs Flume using JVM. Each machine runs one agent, but it can contain multiple sources and sinks in one agent. Client production data, running in a separate thread. Source collects data from Client and passes it to Channel. Sink collects data from Channel and runs on a separate thread. Channel connects sources and sinks, which is a bit like a queue. Events can be logging, avro objects, and so on. 3 Flume deployment hadoop users 1) download the corresponding document of wgte http://archive-primary.cloudera.com/cdh6/cdh/5/flume-ng-1.6.0-cdh6.7.0.tar.gzcdh6.7: http://archive-primary.cloudera.com/cdh6/cdh/5/flume-ng-1.6.0-cdh6.7.0/2) decompress to ~ / app Check the user and user group tar-xzvf flume-ng-1.6.0-cdh6.7.0.tar.gz-C ~ / app/3) add to the system environment variable vim ~ / .bash_profileexport FLUME_HOME=/home/hadoop/app/apache-flume-1.6.0-cdh6.7.0-binexport PATH=$FLUME_HOME/bin:$PATHsource ~ / .bash_profile4) configure the jdk path $FLUME_HOME/conf/flume-env.shcp flume-env.sh of flume .template flume-env.shexport JAVA_HOME=/usr/java/jdk1.8.0_454 Flume listening port

Configure the configuration file loaded by flume:

# collect logs from the specified network port to the console output a1.sources = r1a1.sinks = k1a1.channels = clocked Describe/configure the sourcea1.sources.r1.type = netcata1.sources.r1.bind = 0.0.0.0a1.sources.r1.port = 4444 records Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memory# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = c15 Flume start

Start flume:

. / flume-ng agent\-- name A1\-- conf $FLUME_HOME/conf\-- conf-file / home/hadoop/script/flume/telnet-flume.conf\-Dflume.root.logger=INFO,console\-Dflume.monitoring.type=http\-Dflume.monitoring.port=34343

-Dflume.root.logger=INFO,console-- Log level

-Dflume.monitoring.type=http-- http output log

-Dflume.monitoring.port=34343-- http port

Install the telnet command before installing telnet

[root@locahost ~] # yum install-y telnet-server [root@locahost ~] # yum install-y telnet because the telnet service is also guarded by xinetd, so after installing telnet-server, to start the telnet service, you must restart xinetd [root@locahost ~] # service xinetd restart

Result test:

[root@hadoop001 ~] # telnet localhost 44444Trying:: 1...Connected to localhost.Escape character is'^] '.asdOKasdOKasdOKasdOK

Console log output:

(LoggerSink.java:94)] Event: data flow events are composed of [headers] [body], which are byte array + content, respectively.

2018-08-09 19 org.mortbay.log.Slf4jLog.info 20 Slf4jLog.java:67 21272 (conf-file-poller-0) [INFO-org.mortbay.log.Slf4jLog.info (Slf4jLog.java:67)] Logging to org.slf4j.impl.Log4jLoggerAdapter (org.mortbay.log) via org.mortbay.log.Slf4jLog2018-08-09 19 19: 20 Slf4jLog.java:67 21391 (conf-file-poller-0) [INFO-org.mortbay.log.Slf4jLog.info (Slf4jLog.java:67)] Started SelectChannelConnector@0.0.0.0:343432018-08-09 1915 Slf4jLog.java:67 29 org.mortbay.log.Slf4jLog.info 2915 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO-org.apache.flume.sink.LoggerSink.process (LoggerSink.java:94)] Event: {headers: {} body: 61 73 64 0D asd. } 2018-08-09 19 Event 2915 337 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO-org.apache.flume.sink.LoggerSink.process (LoggerSink.java:94)] Event: {headers: {} body: 61 73 64 0D asd. } 2018-08-09 19 Event 2915 337 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO-org.apache.flume.sink.LoggerSink.process (LoggerSink.java:94)] Event: {headers: {} body: 61 73 64 0D asd. } 2018-08-09 19 Event 2915 338 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO-org.apache.flume.sink.LoggerSink.process (LoggerSink.java:94)] Event: {headers: {} body: 61 73 64 0D asd. }

Http output:

7 TypeSource commonly used in Flume: the data source of the avroavro protocol

The execunix command commands the monitoring file tail-Fspooldir to monitor a folder that cannot contain subfolders and does not monitor windows folders

After processing the file, you can no longer write data to the file.

File names cannot conflict. TAILDIR can monitor both files and folders to support breakpoint resume function. Use this netcat to listen on a port.

Kafka monitors Kafka data

Sink: note the meaning of the name. Kafka is written to kafka.

HDFS writes data to HDFS

Logger output to console

Avroavro protocol uses channel with avro source: the meaning of the name notes that memory is in memory

Kafka stores data in kafka

File is stored in the local disk file

Thank you for reading, the above is the content of "how to deploy Flume". After the study of this article, I believe you have a deeper understanding of how to deploy Flume, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.