In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces "what is the use of flume". In the daily operation, I believe that many people have doubts about the use of flume. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful to answer the doubts about "what is the use of flume?" Next, please follow the editor to study!
1. Basic introduction of flume (1) commonly used data collection tools
-Chukwa (Apache)
-Scribe (Facebook)
-Fluentd:Fluentd is developed using C/Ruby and uses JSON files to unify log data.
-Logstash (the L in the famous open source data stack ELK (ElasticSearch,Logstash,Kibana))
-Flume (Apache): a data acquisition system that is open source, highly reliable, highly scalable, easy to manage, and supports customer expansion.
(2) Why use data collection tools?
First, take a look at the overall development process of the hadoop business:
Data acquisition-data cleaning etl (data extraction, conversion, loading)-data storage-data calculation and analysis-data presentation
Among them, data acquisition is essential to all data systems, and everything is empty talk without data.
What are the characteristics of the data acquisition system?
-build a bridge between applications and analysis systems and web---hadoop them to support real-time online analysis systems and offline analysis systems such as hadoop
-highly scalable, that is, when data increases, it can scale horizontally by adding nodes.
(3) Why use flume?
-Apache Flume is a distributed, reliable and highly available system for massive log collection, aggregation and transmission. It belongs to the same component of data acquisition system as sqooq, but sqoop is used to collect data from relational database. Flume is used to collect liquidity data.
The name -Flume comes from the original near-real-time log data collection tool, which is now widely used in any flow event data collection, which supports aggregating data from many data sources to HDFS.
-General collection requirements can be realized through simple configuration of flume. Flume also has a good ability to customize and expand for special scenarios, so flume can be applied to most daily data acquisition scenarios.
Advantages of -Flume: scale-out, scalability, reliability
(4) introduction of the new and old architecture of flume
Next, take a very simple scenario as an example, collecting logs from weserver into hdfs.
NG architecture:
Application system (web server)-flume log collection (source, channel, sink)-hdfs (data storage)
Where:
source: data source (read the original log file for reading)
channel: data channel (buffering, alleviating the problem of inconsistent data reading and writing speed)
sink: the destination of the data (the collected data is written to the final destination)
OG architecture: (before 0.9)
Proxy Node (agent)-Collection Node (collector)-(master) Master Node
Agent collects log data from various data sources, centralizes the collected data to collector, and then aggregates it into hdfs by the collection node. Master is responsible for managing the activities of agent and collector.
2. Flume architecture and core components (1) flume architecture
The data of the Flume runs through the event, which is the basic data unit of the flume. It carries log data (in the form of a byte array) and header information. These event are generated by the source outside the agent. When the source captures the event, it is formatted specifically, and then the source pushes the event into (single or multiple) channel. Think of Channel as a buffer that will hold the event until the Sink finishes processing it. Sink is responsible for persisting the log or pushing events to another Source.
Flume takes agent as the smallest independent running unit, an agent is a jvm, and a single agent is composed of source, sink and channel components.
(2) four core components of flume
Event
Event is the basic unit of flume data transmission. Flume transmits data from the source to the final destination in the form of events. The Event consists of an optional header and a byte array containing data. Header is an unordered collection containing pairs of key-value strings, and key is unique within the collection.
Agent
agent is the basic part of the flume stream, and an Agent contains source,channel,sink and other components that are used to transfer events from one node to another or to the final destination.
Source
Source is responsible for receiving event or generating event through a special mechanism, and putting events in batches into one or more channel.
Channel
Channel is located between Source and Sink and is used for cached event. When sink successfully sends the event to the next channel or the final destination, the event is removed from the channel.
Sink
Sink is responsible for transferring the event to the next or final destination, and if successful, removes the event from the channel.
3. The construction of flume
The construction of flume is very simple, and it is basically decompressed. However, since we often connect flume with big data platform, we need to successfully build the environment of hadoop and jdk.
Installation
-upload the installation package
-extract the installation package
-configure environment variabl
-modify the configuration file:
[hadoop @ hadoop01 ~] cd / application/flume/conf [hadoop @ hadoop01 ~] mv flume-env.sh.template flume-env.sh [hadoop @ hadoop01 ~] vim flume-env.shexport JAVA_HOME=/application/jdk1.8 (just modify this one)
-Test whether the installation is successful
[hadoop @ hadoop01 ~] flume-ng version
Seeing the above results indicates that the installation is successful!
Note: generally speaking, install flume in any machine in which you need to collect data.
4. The first use of flume
All operations of flume are based on configuration files, so configuration files must be written. (must end with .conf or .properties).
Here is a very simple example of how to use flume:
Configuration file:
# example.conf# A1 here refers to the name of agent, which can be customized But note: the names of agent under the same node cannot be the same. # defines sources, sinks, Channels alias a1.sources = r1a1.sinks = k1a1.channels = source specifies the type of source and related parameters a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = / home/hadoop/flumedata # listens on a folder # sets channela1.channels.c1.type = memory# sets sinka1.sinks.k1.type = logger#Bind the source and sink to the channel# sets channels a1.sources.r1.channels = caches sets channels for sink a1.sinks.k1.channel = C1
Prepare the test environment:
Create a directory: a1.sources.r1.spoolDir = / home/hadoop/flumedata
Start the command:
Flume-ng agent-conf conf--conf-file / home/hadoop/example.conf-name A1-Dflume.root.logger=INFO,console
Then move a file with content to the folder where flume listens (/ home/hadoop/flumedata):
View the status of the window at this time:
Content collected successfully!
At this point, the study on "what is the use of flume" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.