The basic architecture of FLUME NG 05/10 Update SLTechnology News&Howtos

The basic architecture of FLUME NG

2025-05-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Flume Profile

Flume is a highly available, reliable, distributed mass log collection aggregation delivery system provided by cloudera. The original name is Flume OG.(original generation), but with the expansion of Flume function, Flume OG code engineering bloated, core component design unreasonable, core configuration is not standard and other shortcomings exposed, especially in the last release version of Flume OG 0.94.0, log transmission instability is particularly serious, in order to solve these problems, October 22, 2011, cloudera completed Flume-728, Milestone changes were made to Flume: core components, core configuration, and code architecture were refactored, and the refactored version was collectively called Flume NG (next generation, another reason for the change was to include Flume under apache, and cloudera Flume was renamed Apache Flume).

FLUME NG

1. NG has only one role node: agent node.

The composition of the agent node has also changed. Flume NG agent consists of source, sink and Channel.

Flume ng node composition diagram:

Architecture diagram under multi-agent parallel connection:

Characteristics of Flume

Flume supports customizing various data senders in the log system for collecting data; at the same time, it supports the ability to simply process data and write it to various data receivers (such as text, HDFS, Hbase, etc.).

Flume's data flow is traversed by events. Events are the basic data units of Flume, which carry log data and header information. These events are generated by Source outside Agent. When Source captures events, it will be formatted specifically, and then Source will push events into Channel (single or multiple). Think of a Channel as a buffer that holds an event until Sink finishes processing it.

Sink is responsible for persisting logs or pushing events to another Source.

Flume has high reliability

When a node fails, logs can be transferred to other nodes without being lost. Flume provides three levels of reliability assurance, from strong to weak:

1, end-to-end: receive data agent first write event to disk, when the data transmission is successful, then delete; if the data transmission fails, you can resend.

Store on failure: This is also the strategy adopted by scribe. When the data receiver crashes, write the data locally, and continue to send it after recovery.

Best effort: After the data is sent to the receiver, it will not be confirmed.

Flume architecture consists of and core concept client: a place where data is produced, running on a separate thread. event: The data produced, which can be log records, avro objects, etc. If it is a text file, it is usually a single line record. agent: core component of flume, flume takes Agent as the smallest independent operation unit. An agent is a JVM, and an agent is constructed from source, channel, sink, etc.

Agent is constructed from source, channel, sink, etc.:

3.1 Source: Collect data from clients and pass it to Channel

Different sources can accept different data formats, such as monitoring external sources-spooling directory data sources, you can monitor new file changes in the specified folder, if there are files in the directory, it will immediately read its contents. The source component can handle log data in various formats, eg:avro Sources, rift Sources, exec, jms, spooling directory, netcat, sequence generator, syslog, http, legacy, custom.

Avro Source: Support for Avro protocol (actually Avro RPC), built-in support|

Thrift Source: Thrift protocol support, built-in support

Exec Source: Unix-based command produces data on standard output

JMS Source: reads data from JMS systems (messages, topics)

Spooling Directory Source: monitors changes to data in a specified directory

Twitter 1% firehose Source: Continuous download of Twitter data via API, experimental nature

Netcat Source: Monitor a port and input each line of text data flowing through the port as an Event

Sequence Generator Source: Sequence Generator Data Source, Production Sequence Data

Syslog Sources: reads syslog data, generates events, supports UDP and TCP protocols

HTTP Source: Data source based on HTTP POST or GET, supporting JSON and BLOB representation

Legacy Sources: Compatible with older Flume OG Source (version 0.9.x)

For details, please refer to the official website: flume.apache.org/FlumeUserGuide.html#flume-sources

3.2 Channel: Connect sources and sinks

A bit like a queue, is a storage pool, receive the output of the source, until a sink consumes the data in the channel, the data in the channel will not be deleted until entering the next channel or entering the sink, when the sink fails to write, can automatically restart, will not cause data loss. Temporary storage data can be stored in memory Channel, jdbc Channel, file Channel, custom.

Memory Channel: Event data stored in memory

JDBC Channel: Event data is stored in persistent storage, current Flume Channel has built-in Derby support

File Channel: Event data stored in disk files

Spillable Memory Channel: Event data is stored in memory and on disk, and when the memory queue is full, it will be persisted to disk files.

Pseudo Transaction Channel: Test Purpose

Custom Channel: Custom Channel Implementation

For details, please refer to the official website: flume.apache.org/FlumeUserGuide.html#flume-channels

3.3 Sink: Collect data from Channel and run in a separate thread

Component destinations used to send data to destinations include hdfs, logger, avro, thrift, ipc, file, null, hbase, solr, custom.

For details, please refer to the official website: flume.apache.org/FlumeUserGuide.html#flume-sinks

Flume can support

1, multi-level flume agent,(that is, multiple flumes can be connected into a string, the previous flume can write data to the next flume)

Fan-in support: source can accept multiple inputs

Fan-out: Sink can be exported to multiple destinations

3.4 Several other components

Interceptor: acts on Source, decorating and filtering events where necessary in a preset order.

Channel Selector: Allows Source to select one or more Channels from all Channels based on preset criteria

Sink Processor: Multiple Sinks can form a Sink Group. The Sink Processor can Load Balancer across all Sinks in a group or jump to another if one Sink fails.

Ruijiangyun official website link: www.eflycloud.com/#register? salesID=6DGNUTUAV

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.