Flume-1.6.0 Learning Notes (5) sink to hdfs 04/19 Update SLTechnology News&Howtos

Flume-1.6.0 Learning Notes (5) sink to hdfs

2025-04-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

Lu Chunli's work notes, who said that programmers should not have literary style?

Flume reads data from a specified directory, uses memory as a channel, and then writes the data to hdfs.

Spooling Directory Source (http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source)

Memory Channel (http://flume.apache.org/FlumeUserGuide.html#memory-channel)

HDFS Sink (http://flume.apache.org/FlumeUserGuide.html#hdfs-sink)

Flume profile

# vim agent-hdfs.conf# write data to hdfsagent.sources = sd-sourceagent.channels = mem-channelagent.sinks = hdfs-sink# define sourceagent.sources.sd-source.type = spooldiragent.sources.sd-source.spoolDir = / opt/flumeSpoolagent.sources.sd-source.fileHeader = true# define channelagent.channels.mem-channel.type = memory# define sinkagent.sinks.hdfs-sink.type = hdfsagent.sinks.hdfs-sink.hdfs.path = hdfs://nnode:8020/flume/webdata# assembleagent.sources.sd -source.channels = mem-channelagent.sinks.hdfs-sink.channel = mem-channel

Note: the / opt/flumeSpool directory needs to be created in advance, otherwise the directory cannot be detected by flume and there will be an error.

Start Agent

[hadoop@nnode flume1.6.0] $bin/flume-ng agent--conf conf--name agent--conf-file conf/agent-hdfs.conf-Dflume.root.logger=INFO,console

Copy the data to the / opt/flumeSpool directory

Cp / usr/local/hadoop2.6.0/logs/* / opt/flumeSpool

Flume detects data changes in this directory and automatically writes to HDFS

View the flume directory on HDFS

[hadoop@nnode flume1.6.0] $hdfs dfs-ls-R / flume/drwxr-xr-x-hadoop hadoop 0 2015-11-21 16:55 / flume/webdata-rw-r--r-- 2 hadoop hadoop 2568 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836223-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836224-rw-r--r- -2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836225-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836226-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836227-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata / FlumeData.1448095836228-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836229-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836230-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836231-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836232-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836233-rw-r--r-- 2 hadoop hadoop 2163 2015-11-21 16:50 / flume/webdata/FlumeData.1448095836234

View Fil

Description:

When writing data to hdfs through Flume, the default format (hdfs.fileType) is SequenceFile, which cannot be viewed directly; if you want to save it in text format, you can specify hdfs.fileType as DataStream.

View the flumeSpool directory

[root@nnode flumeSpool] # lltotal 3028 hadoop-hadoop-journalnode-nnode.out.2.COMPLETED-rw-r--r-- hadoop-hadoop-journalnode-nnode.out.2.COMPLETED-rw-r--r---1 root root 227893 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.log.COMPLETED-rw-r--r-- 1 root root 718 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.out.1.COMPLETED-rw-r--r-- 1 root root 718 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.out.2.COMPLETED-rw-r--r-- 1 root root 718 Nov 21 16:50 hadoop-hadoop-journalnode-nnode.out.COMPLETED-rw-r--r-- 1 root root 1993109 Nov 21 16:50 hadoop-hadoop-namenode-nnode.log.COMPLETED-rw-r--r-- 1 root root 718 Nov 21 16:50 hadoop-hadoop-namenode-nnode.out.1.COMPLETED-rw-r--r-- 1 root root 718 Nov 21 16:50 hadoop-hadoop-namenode-nnode.out.2.COMPLETED-rw-r--r-- 1 Root root 718 Nov 21 16:50 hadoop-hadoop-namenode-nnode.out.COMPLETED-rw-r--r-- 1 root root 169932 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.log.COMPLETED-rw-r--r-- 1 root root 718 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.out.1.COMPLETED-rw-r--r-- 1 root root 718 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.out.2.COMPLETED-rw-r -- root root-1 Nov 21 16:50 hadoop-hadoop-zkfc-nnode.out.COMPLETED

Note: Flume does not delete ten thousand files by default, but it will mark that the file has been processed by flume. If you do not need to retain the file after processing, you can specify the deletion policy through Source:

DeletePolicy never When to delete completed files: never or immediate

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.