In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
The main content of this article is "introduction to the core concepts of flume". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "introduction to the core concepts of flume".
Core concept
Event
Client
Agent
Sources 、 Channels 、 Sinks
Other components: Interceptors, Channel Selectors, Sink Processor
Core concept: Event
Event is the basic unit of Flume data transmission. Flume transmits data from the source to the final destination in the form of events. The Event consists of an optional hearders and a byte array containing data.
The data contained is opaque to flume
Headers is an unordered collection containing pairs of key-value strings, and key is unique within the collection.
Headers can use extensions in context routing
Public interface Event {
Public Map getHeaders ()
Public void setHeaders (Map headers)
Public byte [] getBody ()
Public void setBody (byte [] body)
}
Core concept: Client
Clinet is an entity that wraps the original log into events and sends them to one or more agent.
For example
Flume log4j Appender
You can customize a specific Client using Client SDK (org.apache.flume.api)
The aim is to decouple Flume from the data source system
Is not required in the topology of flume
Core concept: Agent
An Agent contains Sources, Channels, Sinks, and other components that it uses to transfer events from one node to another or the final destination.
Agent is the basic part of the flume stream.
Flume provides configuration, life cycle management, and monitoring support for these components.
Core concept: Source
Source is responsible for receiving events or generating events through a special mechanism, and placing events in batches on one or more Channels. There are two types of Source: event driver and polling.
Different types of Source:
And the well-known system integration Sources: Syslog, Netcat
Sources: Exec, SEQ for automatically generated events
IPC Sources: Avro for communication between Agent and Agent
Source must be associated with at least one channel
Core concept: Channel
Channel is located between Source and Sink and is used for cached events. When Sink successfully sends events to the next hop's channel or final destination, events is removed from Channel.
Different Channels provides different levels of persistence:
Memory Channel: volatile
File Channel: based on WAL (pre-written log Write-Ahead Logging) implementation
JDBC Channel: implementation based on embedded Database
Channels supports transactions
Provide a weaker sequence guarantee
Can work with any number of Source and Sink
Core concept: Sink
Sink is responsible for transferring the events to the next hop or final destination, and removes the events from the channel upon successful completion.
Different types of Sinks:
Store events to the terminal Sink of the final destination. For example: HDFS, HBase
Automatic consumption of Sinks. For example: Null Sink
IPC sink: Avro for inter-Agent communication
Must work with an exact channel
Core concept: Interceptor
A set of Interceptor for Source that decorates and filters the events where necessary in a preset order.
The built-in Interceptors allows you to add event headers such as timestamp, hostname, static tag, etc.
A custom interceptors can create a specific headers where necessary by introspecting the event payload (reading the original log).
Core concept: Channel Selector
Channel Selector allows Source to select one or more Channel from all Channel based on preset criteria
Built-in Channel Selectors:
Copy Replicating: event is copied to the relevant channel
Multiplexing Multiplexing: routed to a specific channel based on hearder,event
Core concept: Sink Processor
Multiple Sink can form a Sink Group. A Sink Processor is responsible for activating a Sink from a specified Sink Group. Sink Processor can achieve load balancing through all the Sink in the group, or it can be transferred to another Sink if it fails.
Flume implements load balancing (Load Balancing) and failover (failover) through Sink Processor
Built-in Sink Processors:
Load Balancing Sink Processor-use RANDOM, ROUND_ROBIN, or custom selection algorithms
Failover Sink Processor
Default Sink Processor (single Sink)
All Sink are polled (polling) to get events from Channel. This action is activated by Sink Runner
Sink Processor acts as an agent for Sink
At this point, I believe you have a deeper understanding of the "introduction to the core concepts of flume". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.