How to collect HDFS by Flume 04/26 Update SLTechnology News&Howtos

How to collect HDFS by Flume

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly introduces how Flume collects HDFS, which is very detailed and has certain reference value. Friends who are interested must finish reading it!

1. Demand:

Collect the contents of the specified file to HDFS

Technology selection: exec-memory-hdfs

A1.sources = r1a1.sinks = k1a1.channels = cased Describe/configure the sourcea1.sources.r1.type = execa1.sources.r1.command = tail-F / home/hadoop/data/data.log# Describe the sinka1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = hdfs://192.168.0.129:9000/user/hadoop/flumea1.sinks.k1.hdfs.batchSize = 10 # 10 generate a new file a1.sinks.k1.hdfs.fileType = DataStream # compressed format a1.sinks.k1.hdfs.writeFormat = Text # format type # Use a channel which buffers events in memorya1.channels.c1.type = memory# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = C1

Start:

. / flume-ng agent\-- name A1\-- conf $FLUME_HOME/conf\-- conf-file / home/hadoop/script/flume/exec-memory-hdfs.conf\-Dflume.root.logger=INFO,console\-Dflume.monitoring.type=http\-Dflume.monitoring.port=34343

Add test data:

[hadoop@hadoop001 data] $touch data.log [hadoop@hadoop001 data] $echo test > > data.log

Check HDFS:

[hadoop@hadoop001 flume] $hdfs dfs-text hdfs://192.168.0.129:9000/user/hadoop/flume/*18/08/09 20:59:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicabletesttesttesttesttest

Second, demand:

Collect the contents of the specified folder to (HDFS or console)

The file under the folder cannot be modified and cannot be renamed.

= "add the .COMPLETED identity to the current file after processing

A1.sources = r1a1.sinks = k1a1.channels = cased Describe/configure the sourcea1.sources.r1.type = spooldira1.sources.r1.spoolDir = / home/hadoop/data/a1.sources.r1.fileHeader = true# Describe the sinka1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memory# Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = C1

Third, demand: (production use, recording offset)

Collect the contents of specified folders and files to (console or HDFS)

A1.sources = r1a1.sinks = k1a1.channels = centering Describe/configure the sourcea1.sources.r1.type = TAILDIRa1.sources.r1.channels = centering record offset Restart and resume a1.sources.r1.positionFile = / home/hadoop/script/flume/taildir_position.jsona1.sources.r1.filegroups = F1 fallow to monitor the specified log file a1.sources.r1.filegroups.f1 = / home/hadoop/data/example.loga1.sources.r1.headers.f1.headerKey1 = value1# all log* folders and contents under the monitoring text a1.sources.r1.filegroups.f2 = / home/hadoop/data/test/.*log.*a1.sources. R1.headers.f2.headerKey1 = value2a1.sources.r1.headers.f2.headerKey2 = value2-output from the console a1.sinks.k1.type = logger# Use a channel which buffers events in memorya1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 10cm Bind the source and sink to the channela1.sources.r1.channels = c1a1.sinks.k1.channel = C1

Start:

. / flume-ng agent\-- name A1\-- conf $FLUME_HOME/conf\-- conf-file / home/hadoop/script/flume/taildir-memory-logger.conf\-Dflume.root.logger=INFO,console

Record offset:

[hadoop@hadoop001 flume] $cat taildir_position.json

[{"inode": 679982, "pos": 14, "file": "/ home/hadoop/data/example.log"}

{"inode": 679984, "pos": 0, "file": "/ home/hadoop/data/test/log1.log"}]

The above is all the contents of the article "how Flume collects HDFS". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.