In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/02 Report--
Flume Profile
Flume is a highly available, reliable, distributed mass log collection aggregation delivery system provided by cloudera. The original name is Flume OG.(original generation), but with the expansion of Flume function, Flume OG code engineering bloated, core component design unreasonable, core configuration is not standard and other shortcomings exposed, especially in the last release version of Flume OG 0.94.0, log transmission instability is particularly serious, in order to solve these problems, October 22, 2011, cloudera completed Flume-728, Milestone changes were made to Flume: core components, core configuration, and code architecture were refactored, and the refactored version was collectively called Flume NG (next generation, another reason for the change was to include Flume under apache, and cloudera Flume was renamed Apache Flume).
FLUME NG
1. NG has only one role node: agent node.
The composition of the agent node has also changed. Flume NG agent consists of source, sink and Channel.
Flume ng node composition diagram:
Architecture diagram under multi-agent parallel connection:
Characteristics of Flume
Flume supports customizing various data senders in the log system for collecting data; at the same time, it supports the ability to simply process data and write it to various data receivers (such as text, HDFS, Hbase, etc.).
Flume's data flow is traversed by events. Events are the basic data units of Flume, which carry log data and header information. These events are generated by Source outside Agent. When Source captures events, it will be formatted specifically, and then Source will push events into Channel (single or multiple). Think of a Channel as a buffer that holds an event until Sink finishes processing it.
Sink is responsible for persisting logs or pushing events to another Source.
Flume has high reliability
When a node fails, logs can be transferred to other nodes without being lost. Flume provides three levels of reliability assurance, from strong to weak:
1, end-to-end: receive data agent first write event to disk, when the data transmission is successful, then delete; if the data transmission fails, you can resend.
Store on failure: This is also the strategy adopted by scribe. When the data receiver crashes, write the data locally, and continue to send it after recovery.
Best effort: After the data is sent to the receiver, it will not be confirmed.
Flume architecture consists of and core concept client: a place where data is produced, running on a separate thread. event: The data produced, which can be log records, avro objects, etc. If it is a text file, it is usually a single line record. agent: core component of flume, flume takes Agent as the smallest independent operation unit. An agent is a JVM, and an agent is constructed from source, channel, sink, etc.
Agent is constructed from source, channel, sink, etc.:
3.1 Source: Collect data from clients and pass it to Channel
Different sources can accept different data formats, such as monitoring external sources-spooling directory data sources, you can monitor new file changes in the specified folder, if there are files in the directory, it will immediately read its contents. The source component can handle log data in various formats, eg:avro Sources, rift Sources, exec, jms, spooling directory, netcat, sequence generator, syslog, http, legacy, custom.
Avro Source: Support for Avro protocol (actually Avro RPC), built-in support|
Thrift Source: Thrift protocol support, built-in support
Exec Source: Unix-based command produces data on standard output
JMS Source: reads data from JMS systems (messages, topics)
Spooling Directory Source: monitors changes to data in a specified directory
Twitter 1% firehose Source: Continuous download of Twitter data via API, experimental nature
Netcat Source: Monitor a port and input each line of text data flowing through the port as an Event
Sequence Generator Source: Sequence Generator Data Source, Production Sequence Data
Syslog Sources: reads syslog data, generates events, supports UDP and TCP protocols
HTTP Source: Data source based on HTTP POST or GET, supporting JSON and BLOB representation
Legacy Sources: Compatible with older Flume OG Source (version 0.9.x)
For details, please refer to the official website: flume.apache.org/FlumeUserGuide.html#flume-sources
3.2 Channel: Connect sources and sinks
A bit like a queue, is a storage pool, receive the output of the source, until a sink consumes the data in the channel, the data in the channel will not be deleted until entering the next channel or entering the sink, when the sink fails to write, can automatically restart, will not cause data loss. Temporary storage data can be stored in memory Channel, jdbc Channel, file Channel, custom.
Memory Channel: Event data stored in memory
JDBC Channel: Event data is stored in persistent storage, current Flume Channel has built-in Derby support
File Channel: Event data stored in disk files
Spillable Memory Channel: Event data is stored in memory and on disk, and when the memory queue is full, it will be persisted to disk files.
Pseudo Transaction Channel: Test Purpose
Custom Channel: Custom Channel Implementation
For details, please refer to the official website: flume.apache.org/FlumeUserGuide.html#flume-channels
3.3 Sink: Collect data from Channel and run in a separate thread
Component destinations used to send data to destinations include hdfs, logger, avro, thrift, ipc, file, null, hbase, solr, custom.
For details, please refer to the official website: flume.apache.org/FlumeUserGuide.html#flume-sinks
Flume can support
1, multi-level flume agent,(that is, multiple flumes can be connected into a string, the previous flume can write data to the next flume)
Fan-in support: source can accept multiple inputs
Fan-out: Sink can be exported to multiple destinations
3.4 Several other components
Interceptor: acts on Source, decorating and filtering events where necessary in a preset order.
Channel Selector: Allows Source to select one or more Channels from all Channels based on preset criteria
Sink Processor: Multiple Sinks can form a Sink Group. The Sink Processor can Load Balancer across all Sinks in a group or jump to another if one Sink fails.
Ruijiangyun official website link: www.eflycloud.com/#register? salesID=6DGNUTUAV
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.