What are the entry-level knowledge points of Flume 04/21 Update SLTechnology News&Howtos

What are the entry-level knowledge points of Flume

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article will explain in detail what are the basic knowledge points about Flume. The editor thinks it is very practical, so I share it for you as a reference. I hope you can get something after reading this article.

1. What is Flume?

○ Flume is a real-time log collection system developed by cloudera

The core concept of ○ is that a java process called Agent (proxy node) runs on the log collection node.

○ Flume was called Cloudera Flume OG before version 0.94.0 (including version 0.94.0). Due to various defects before version 0.94.0, Flume had to be redesigned and renamed Apache Flume NG (starting with 1.0.0).

○ Flume NG VS Flume OG

-the current versions are all Flume NG (after version 1.0.0)

Architecture:

□ Flume OG has three roles of nodes: proxy node agent, collection node collector, and primary node master

□ agent is responsible for collecting log data from various data sources, centralizing the collected data to collector, and then storing it in HDFS by the collector node. Master is responsible for managing the activities of agent\ collector.

The roles of □ agent and collector, which are both called node,node, are divided into logical nodes and physical nodes according to different configurations. The distinction, configuration and use of logical nodes are very complicated.

□ agent and collector are composed of source and sink, indicating that the data of the current node is transferred from source to sink

The above is relative to Flume NG:

□ Flume NG has only one role node: proxy node agent

□ does not have collector, master nodes, which is the core change.

□ removes the concepts and contents of logical and physical nodes

The composition of □ agent node has changed, which is composed of source, sink and channel.

For Zookeeper:

The stability of □ Flume OG depends on zookeeper, which requires zookeeper to manage the work of its multi-class nodes. Although OG can manage all kinds of nodes in memory, it requires users to endure the loss of information when the machine fails.

The number of node roles in □ Flume NG is reduced from three to one, and there is no problem of multiple roles, so it is no longer necessary for zookeeper to coordinate all kinds of nodes, thus breaking away from the dependence on zookeeper.

2. Three components of Flume

An Agent process consists of three components: Source component, Channel component and Sink component. The Source component is responsible for collecting log files and sending them to the Channel component. The Channel component forms a pipeline, and then the Sink component reads the log files in the Channel component and sends them to other targets or file systems.

Source component: specializes in collecting log files and can handle various types of log data, such as Avro, Thrift, Exec, JMS, Spooling Directory, Twitter, Kafka, NetCat, Sequence Generator, Syslog, HTTP, Stress, Legacy, Custom (custom format), Scribe

Channel components: designed to store temporary files, the storage location can be Memory, JDBC, Kafka, File, Spillable Memory, Pseudo Transaction, Custom (custom)

Sink components: designed to send data stored in Channel components, including: HDFS, Hive, Logger, Thrift, IRC, File Roll, Null, HBase, MorphlineSolr, ElasticSearch, Kite Dataset, Kafka, Custom (custom)

Note:

For specific uses of ○, please refer to the official document: http://flume.apache.org/FlumeUserGuide.html#flume-sink-processors

Data in ○ Channel will not be deleted until the Sink component is successfully sent.

The flow of ○ in the whole process of data transmission is event,event, which can be understood as the basic unit of data transmission in flume. Event is represented as pieces of data, and its transaction guarantee is event level.

○ flume supports agent of multi-level flume and fan in (fan-in) / fan out (fan-out).

Note:

○ Sink supports sending multiple destinations

3. Installation and configuration of Flume

○ download

Apache-flume-1.6.0-bin.tar.gz

JDK version: 1.6 +

○ decompresses FLUME_HOME

Tar-zxvf apache-flume-1.6.0-bin.tar.gz

○ installs JDK, configures JAVA_HOME, FLUME_HOME.

Vi / etc/profileexport FLUME_HOME=/home/app/flumeexport PATH=.:$FLUME_HOME/bin

A simple example of ○ is to monitor the / home/data/logs directory and upload files to hdfs as soon as they are found.

□ first writes a configuration file named example.conf

# agent1 means that the agent name agent1.sources=source1 agent1.sinks=sink1 agent1.channels=channel1 # Spooling Directory monitors the changes of a new file in the specified folder. Once a new file appears, the contents of the file are parsed and written to channle. When the write is complete, mark the file as complete or delete it. # configure source1 agent1.sources.source1.type=spooldir # specify the monitored directory agent1.sources.source1.spoolDir=/home/data/logs agent1.sources.source1.channels=channel1 agent1.sources.source1.fileHeader = false agent1.sources.source1.interceptors = i1 agent1.sources.source1.interceptors.i1.type = timestamp # configure sink1 agent1.sinks.sink1.type=hdfs agent1.sinks.sink1.hdfs.path=hdfs://master:9000/flume/data agent1.sinks.sink1.hdfs.fileType=DataStream agent1.sinks.sink1.hdfs. WriteFormat=TEXT agent1.sinks.sink1.hdfs.rollInterval=1 agent1.sinks.sink1.channel=channel1 agent1.sinks.sink1.hdfs.filePrefix=%Y-%m-%d # configure channel1 agent1.channels.channel1.type=file # backup directory for channel data storage agent1.channels.channel1.checkpointDir=/home/data/channel_data.backup # channel data storage directory agent1.channels.channel1.dataDirs=/home/data/channel_data

□ puts the example.conf file under the $FLUME_HOME/conf folder

□ start agent process command: you need to specify the name of agent, the configuration directory and the configuration file

Official format:

Bin/flume-ng agent-n $agent_name-c conf-f conf/flume-conf.properties.template

In the example, write the following ↓↓

Bin/flume-ng agent-n agent1-c conf-f conf/example.conf-Dflume.root.logger=DEBUG,console-Dflume.root.logger=DEBUG,console prints information on the console

□ reopens a terminal and uploads a file to / home/data/logs

The file in □ / home/data/logs is renamed to .COMPLETED. Check that the HDFS file exists and is configured.

This is the end of this article on "what are the basic knowledge points of Flume". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.