In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you what the new features of flume1.7 are, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!
Before flume1.7, if you want to monitor the new content of a file, we usually use exec tail as the source, but this will have a drawback, that is, when your server is down and restarted, the data is read from scratch, which is obviously not what we want to see! Before flume1.7 comes out, our general solution is: when a record is read, the line number of the current record is recorded in a file. When it is down and restarts, we can first get the number of lines of the last file read from the file, and then continue to monitor and read it. Ensure that the data is not lost or duplicated.
The specific configuration file is modified as follows:
A1.sources.r3.command = tail-n + $(tail-N1 / root/nnn)-F / root/data/web.log | awk 'ARGIND==1 {iPrecipient / tail/ next} {iposit / ^ print I > > "/ root/nnn"; fflush ("")}' / root/nnn-
Where / root/data/web.log is the monitoring file and / root/nnn is the file that saves the read record.
In flume1.7, a new type of source is taildir, which can monitor multiple files in a directory and achieve the function of real-time reading and saving records! More powerful! Let's take a look at the introduction on the official website:
-Taildir Source
Note
This source is provided as a preview feature. It does not work on Windows.
Watch the specified files, and tail them in nearly real-time once detected new lines appended to the each files. If the new lines are being written, this source will retry reading them in wait for the completion of the write.
This source is reliable and will not miss data even when the tailing files rotate. It periodically writes the last read position of each files on the given position file in JSON format. If Flume is stopped or down for some reason, it can restart tailing from the position written on the existing position file.
In other use case, this source can also start tailing from the arbitrary position for each files using the given position file. When there is no position file on the specified path, it will start tailing from the first line of each files by default.
Files will be consumed in order of their modification time. File with the oldest modification time will be consumed first.
This source does not rename or delete or do any modifications to the file being tailed. Currently this source does not support tailing binary files. It reads text files line by line.
Requirements: implement flume to monitor the contents of multiple files in a directory, collect and store them in hadoop cluster in real time.
Configuration case:
A1.channels = ch2
A1.sources = S1
A1.sinks = hdfs-sink1
# channel
A1.channels.ch2.type = memory
A1.channels.ch2.capacity=100000
A1.channels.ch2.transactionCapacity=50000
# source
A1.sources.s1.channels = ch2
# Monitoring the new content of multiple files in a directory
A1.sources.s1.type = taildir
# save the offset of each file consumption in json format to avoid consumption from scratch
A1.sources.s1.positionFile = / var/local/apache-flume-1.7.0-bin/taildir_position.json
A1.sources.s1.filegroups = F1 f2 f3
A1.sources.s1.filegroups.f1 = / root/data/access.log
A1.sources.s1.filegroups.f2 = / root/data/nginx.log
A1.sources.s1.filegroups.f3 = / root/data/web.log
A1.sources.s1.headers.f1.headerKey = access
A1.sources.s1.headers.f2.headerKey = nginx
A1.sources.s1.headers.f3.headerKey = web
A1.sources.s1.fileHeader = true
# # sink
A1.sinks.hdfs-sink1.channel = ch2
A1.sinks.hdfs-sink1.type = hdfs
A1.sinks.hdfs-sink1.hdfs.path = hdfs://master:9000/demo/data
A1.sinks.hdfs-sink1.hdfs.filePrefix = event_data
A1.sinks.hdfs-sink1.hdfs.fileSuffix = .log
A1.sinks.hdfs-sink1.hdfs.rollSize = 10485760
A1.sinks.hdfs-sink1.hdfs.rollInterval = 20
A1.sinks.hdfs-sink1.hdfs.rollCount = 0
A1.sinks.hdfs-sink1.hdfs.batchSize = 1500
A1.sinks.hdfs-sink1.hdfs.round = true
A1.sinks.hdfs-sink1.hdfs.roundUnit = minute
A1.sinks.hdfs-sink1.hdfs.threadsPoolSize = 25
A1.sinks.hdfs-sink1.hdfs.useLocalTimeStamp = true
A1.sinks.hdfs-sink1.hdfs.minBlockReplicas = 1
A1.sinks.hdfs-sink1.hdfs.fileType = DataStream
A1.sinks.hdfs-sink1.hdfs.writeFormat = Text
A1.sinks.hdfs-sink1.hdfs.callTimeout = 60000
These are all the contents of the article "what are the New Features of flume1.7?" Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.