What are the new features of flume1.7 04/16 Update SLTechnology News&Howtos

What are the new features of flume1.7

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

Editor to share with you what the new features of flume1.7 are, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Before flume1.7, if you want to monitor the new content of a file, we usually use exec tail as the source, but this will have a drawback, that is, when your server is down and restarted, the data is read from scratch, which is obviously not what we want to see! Before flume1.7 comes out, our general solution is: when a record is read, the line number of the current record is recorded in a file. When it is down and restarts, we can first get the number of lines of the last file read from the file, and then continue to monitor and read it. Ensure that the data is not lost or duplicated.

The specific configuration file is modified as follows:

A1.sources.r3.command = tail-n + $(tail-N1 / root/nnn)-F / root/data/web.log | awk 'ARGIND==1 {iPrecipient / tail/ next} {iposit / ^ print I > > "/ root/nnn"; fflush ("")}' / root/nnn-

Where / root/data/web.log is the monitoring file and / root/nnn is the file that saves the read record.

In flume1.7, a new type of source is taildir, which can monitor multiple files in a directory and achieve the function of real-time reading and saving records! More powerful! Let's take a look at the introduction on the official website:

-Taildir Source

Note

This source is provided as a preview feature. It does not work on Windows.

Watch the specified files, and tail them in nearly real-time once detected new lines appended to the each files. If the new lines are being written, this source will retry reading them in wait for the completion of the write.

This source is reliable and will not miss data even when the tailing files rotate. It periodically writes the last read position of each files on the given position file in JSON format. If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file.

In other use case, this source can also start tailing from the arbitrary position for each files using the given position file. When there is no position file on the specified path, it will start tailing from the first line of each files by default.

Files will be consumed in order of their modification time. File with the oldest modification time will be consumed first.

This source does not rename or delete or do any modifications to the file being tailed. Currently this source does not support tailing binary files. It reads text files line by line.

Requirements: implement flume to monitor the contents of multiple files in a directory, collect and store them in hadoop cluster in real time.

Configuration case:

A1.channels = ch2

A1.sources = S1

A1.sinks = hdfs-sink1

# channel

A1.channels.ch2.type = memory

A1.channels.ch2.capacity=100000

A1.channels.ch2.transactionCapacity=50000

# source

A1.sources.s1.channels = ch2

# Monitoring the new content of multiple files in a directory

A1.sources.s1.type = taildir

# save the offset of each file consumption in json format to avoid consumption from scratch

A1.sources.s1.positionFile = / var/local/apache-flume-1.7.0-bin/taildir_position.json

A1.sources.s1.filegroups = F1 f2 f3

A1.sources.s1.filegroups.f1 = / root/data/access.log

A1.sources.s1.filegroups.f2 = / root/data/nginx.log

A1.sources.s1.filegroups.f3 = / root/data/web.log

A1.sources.s1.headers.f1.headerKey = access

A1.sources.s1.headers.f2.headerKey = nginx

A1.sources.s1.headers.f3.headerKey = web

A1.sources.s1.fileHeader = true

# # sink

A1.sinks.hdfs-sink1.channel = ch2

A1.sinks.hdfs-sink1.type = hdfs

A1.sinks.hdfs-sink1.hdfs.path = hdfs://master:9000/demo/data

A1.sinks.hdfs-sink1.hdfs.filePrefix = event_data

A1.sinks.hdfs-sink1.hdfs.fileSuffix = .log

A1.sinks.hdfs-sink1.hdfs.rollSize = 10485760

A1.sinks.hdfs-sink1.hdfs.rollInterval = 20

A1.sinks.hdfs-sink1.hdfs.rollCount = 0

A1.sinks.hdfs-sink1.hdfs.batchSize = 1500

A1.sinks.hdfs-sink1.hdfs.round = true

A1.sinks.hdfs-sink1.hdfs.roundUnit = minute

A1.sinks.hdfs-sink1.hdfs.threadsPoolSize = 25

A1.sinks.hdfs-sink1.hdfs.useLocalTimeStamp = true

A1.sinks.hdfs-sink1.hdfs.minBlockReplicas = 1

A1.sinks.hdfs-sink1.hdfs.fileType = DataStream

A1.sinks.hdfs-sink1.hdfs.writeFormat = Text

A1.sinks.hdfs-sink1.hdfs.callTimeout = 60000

These are all the contents of the article "what are the New Features of flume1.7?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.