Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Introduction to the core concepts of flume

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/01 Report--

The main content of this article is "introduction to the core concepts of flume". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "introduction to the core concepts of flume".

Core concept

Event

Client

Agent

Sources 、 Channels 、 Sinks

Other components: Interceptors, Channel Selectors, Sink Processor

Core concept: Event

Event is the basic unit of Flume data transmission. Flume transmits data from the source to the final destination in the form of events. The Event consists of an optional hearders and a byte array containing data.

The data contained is opaque to flume

Headers is an unordered collection containing pairs of key-value strings, and key is unique within the collection.

Headers can use extensions in context routing

Public interface Event {

Public Map getHeaders ()

Public void setHeaders (Map headers)

Public byte [] getBody ()

Public void setBody (byte [] body)

}

Core concept: Client

Clinet is an entity that wraps the original log into events and sends them to one or more agent.

For example

Flume log4j Appender

You can customize a specific Client using Client SDK (org.apache.flume.api)

The aim is to decouple Flume from the data source system

Is not required in the topology of flume

Core concept: Agent

An Agent contains Sources, Channels, Sinks, and other components that it uses to transfer events from one node to another or the final destination.

Agent is the basic part of the flume stream.

Flume provides configuration, life cycle management, and monitoring support for these components.

Core concept: Source

Source is responsible for receiving events or generating events through a special mechanism, and placing events in batches on one or more Channels. There are two types of Source: event driver and polling.

Different types of Source:

And the well-known system integration Sources: Syslog, Netcat

Sources: Exec, SEQ for automatically generated events

IPC Sources: Avro for communication between Agent and Agent

Source must be associated with at least one channel

Core concept: Channel

Channel is located between Source and Sink and is used for cached events. When Sink successfully sends events to the next hop's channel or final destination, events is removed from Channel.

Different Channels provides different levels of persistence:

Memory Channel: volatile

File Channel: based on WAL (pre-written log Write-Ahead Logging) implementation

JDBC Channel: implementation based on embedded Database

Channels supports transactions

Provide a weaker sequence guarantee

Can work with any number of Source and Sink

Core concept: Sink

Sink is responsible for transferring the events to the next hop or final destination, and removes the events from the channel upon successful completion.

Different types of Sinks:

Store events to the terminal Sink of the final destination. For example: HDFS, HBase

Automatic consumption of Sinks. For example: Null Sink

IPC sink: Avro for inter-Agent communication

Must work with an exact channel

Core concept: Interceptor

A set of Interceptor for Source that decorates and filters the events where necessary in a preset order.

The built-in Interceptors allows you to add event headers such as timestamp, hostname, static tag, etc.

A custom interceptors can create a specific headers where necessary by introspecting the event payload (reading the original log).

Core concept: Channel Selector

Channel Selector allows Source to select one or more Channel from all Channel based on preset criteria

Built-in Channel Selectors:

Copy Replicating: event is copied to the relevant channel

Multiplexing Multiplexing: routed to a specific channel based on hearder,event

Core concept: Sink Processor

Multiple Sink can form a Sink Group. A Sink Processor is responsible for activating a Sink from a specified Sink Group. Sink Processor can achieve load balancing through all the Sink in the group, or it can be transferred to another Sink if it fails.

Flume implements load balancing (Load Balancing) and failover (failover) through Sink Processor

Built-in Sink Processors:

Load Balancing Sink Processor-use RANDOM, ROUND_ROBIN, or custom selection algorithms

Failover Sink Processor

Default Sink Processor (single Sink)

All Sink are polled (polling) to get events from Channel. This action is activated by Sink Runner

Sink Processor acts as an agent for Sink

At this point, I believe you have a deeper understanding of the "introduction to the core concepts of flume". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report