In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)06/01 Report--
How to carry out Flume analysis, I believe that many inexperienced people do not know what to do. Therefore, this paper summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
What is Flume?
As a real-time log collection system developed by cloudera, flume has been recognized and widely used in the industry. The initial release of Flume is now collectively referred to as Flume OG (original generation) and belongs to cloudera. However, with the expansion of FLume functions, the shortcomings such as bloated Flume OG code engineering, unreasonable design of core components and non-standard core configuration have been exposed, especially in the last release version 0.94.0 of Flume OG, the phenomenon of unstable log transmission is particularly serious. in order to solve these problems, cloudera completed Flume-728 on October 22, 2011. A landmark change has been made to Flume: refactoring core components, core configuration and code architecture. The refactored version is collectively referred to as Flume NG (next generation). Another reason for the change is the inclusion of Flume under apache and the renaming of cloudera Flume to Apache Flume.
Characteristics of flume:
Flume is a distributed, reliable and highly available system for massive log collection, aggregation and transmission. Support customizing various data senders in the log system for data collection; at the same time, Flume provides the ability to simply process the data and write to various data recipients (such as text, HDFS, Hbase, etc.).
The data flow of the flume is run through by the Event. An event is the basic data unit of a Flume. It carries log data (in the form of a byte array) and header information. These Event are generated by the Source outside the Agent. When the Source captures the event, it is formatted specifically, and then the Source pushes the event into (single or multiple) Channel. You can think of Channel as a buffer that will hold the event until the Sink finishes processing it. Sink is responsible for persisting logs or pushing events to another Source.
Reliability of flume
When a node fails, the log can be sent to other nodes without being lost. Flume provides three levels of reliability guarantee, from strong to weak: end-to-end (agent first writes event to disk when the data is received, and then deletes it when the data is transferred successfully; if the data fails to be sent, it can be re-sent. ), Store on failure (this is also the strategy adopted by scribe, when the data receiver crash, write the data locally, wait for recovery, continue to send), Besteffort (after the data is sent to the receiver, it will not be acknowledged).
Recoverability of flume:
It's still Channel. FileChannel is recommended and events are persisted in the local file system (poor performance).
Some core concepts of flume:
Agent uses JVM to run Flume. Each machine runs one agent, but it can contain multiple sources and sinks in one agent.
Client production data, running in a separate thread.
Source collects data from Client and passes it to Channel.
Sink collects data from Channel and runs on a separate thread.
Channel connects sources and sinks, which is a bit like a queue.
Events can be logging, avro objects, and so on.
Flume takes agent as the smallest independent operating unit. An agent is a JVM. A single agent consists of three major components: Source, Sink and Channel, as shown below:
It is worth noting that Flume provides a large number of built-in Source, Channel, and Sink types. Different types of Source,Channel and Sink can be combined freely. The combination method is based on the profile set by the user, which is very flexible. For example, Channel can store events in memory temporarily or persist them to a local hard disk. Sink can write logs to HDFS, HBase, or even another Source, etc. Flume allows users to establish multi-level streams, that is, multiple agent can work together, and support Fan-in, Fan-out, Contextual Routing, Backup Routes, which is where NB is.
After reading the above, have you mastered the method of how to analyze Flume? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.