How to build 3 sets of Flume Agent collection + 1 sets of aggregation to hdfs 07/19 Update SLTechnology News&Howtos

How to build 3 sets of Flume Agent collection + 1 sets of aggregation to hdfs

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article is about how to build 3 Flume Agent collections + 1 aggregates into hdfs. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article. Without saying much, let's take a look at it.

[log Collection]:

Machine name service name user

Flume-agent-01: namenode hdfs

Flume-agent-02: datanode hdfs

Flume-agent-03: datanode hdfs

[log aggregation]:

Machine name user

Sht-sgmhadoopcm-01 (172.16.101.54) root

[sink to hdfs]:

Hdfs://172.16.101.56:8020/testwjp/

1. Download apache-flume-1.7.0-bin.tar.gz

[hdfs@flume-agent-01 tmp] $wget http://www-eu.apache.org/dist/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz

-- 2017-01-04 20 40 10 muri-http://www-eu.apache.org/dist/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz

Resolving www-eu.apache.org... 88.198.26.2, 2a01:4f8:130:2192::2

Connecting to www-eu.apache.org | 88.198.26.2 |: 80. Connected.

HTTP request sent, awaiting response... 200 OK

Length: 55711670 (53m) [application/x-gzip]

Saving to: "apache-flume-1.7.0-bin.tar.gz"

55711670 473K/s in 74s

2017-01-04 20:41:25 (733 KB/s)-"apache-flume-1.7.0-bin.tar.gz" saved [55711670 hand 55711670]

two。 Decompress and rename

[hdfs@flume-agent-01 tmp] $

[hdfs@flume-agent-01 tmp] $tar-xzvf apache-flume-1.7.0-bin.tar.gz

[hdfs@flume-agent-01 tmp] $mv apache-flume-1.7.0-bin flume-ng

[hdfs@flume-agent-01 tmp] $cd flume-ng/conf

3. Copy flume environment configuration and agent configuration files

[hdfs@flume-agent-01 tmp] $cp flume-env.sh.template flume-env.sh

[hdfs@flume-agent-01 tmp] $cp flume-conf.properties.template exec_memory_avro.properties

4. Add a hdfs user's environment variable file

[hdfs@flume-agent-01 tmp] $cd

[hdfs@flume-agent-01 ~] $ls-la

Total 24

Drwxr-xr-x 3 hdfs hadoop 4096 Jul 8 14:05.

Drwxr-xr-x. 35 root root 4096 Dec 10 2015..

-rw- 1 hdfs hdfs 4471 Jul 8 17:22. Bash _ history

Drwxrwxrwt 2 hdfs hadoop 4096 Nov 19 2014 cache

-rw- 1 hdfs hdfs 3131 Jul 8 14:05 .viminfo

[hdfs@flume-agent-01 ~] $cp / etc/skel/.*. /

Cp: omitting directory `/ etc/skel/.'

Cp: omitting directory `/ etc/skel/..'

[hdfs@flume-agent-01 ~] $ls-la

Total 36

Drwxr-xr-x 3 hdfs hadoop 4096 Jan 4 20:49.

Drwxr-xr-x. 35 root root 4096 Dec 10 2015..

-rw- 1 hdfs hdfs 4471 Jul 8 17:22. Bash _ history

-rw-r--r-- 1 hdfs hdfs 18 Jan 4 20:49. Bash _ logout

-rw-r--r-- 1 hdfs hdfs 176 Jan 4 20:49. Bash _ profile

-rw-r--r-- 1 hdfs hdfs 124 Jan 4 20:49 .bashrc

Drwxrwxrwt 2 hdfs hadoop 4096 Nov 19 2014 cache

-rw- 1 hdfs hdfs 3131 Jul 8 14:05 .viminfo

5. Add environment variables for flume

[hdfs@flume-agent-01 ~] $vi .bash _ profile

Export FLUME_HOME=/tmp/flume-ng

Export FLUME_CONF_DIR=$FLUME_HOME/conf

Export PATH=$PATH:$FLUME_HOME/bin

[hdfs@flume-agent-01] $. .bash _ profile

6. Modify the flume environment configuration file

[hdfs@flume-agent-01 conf] $vi flume-env.sh

Export JAVA_HOME=/usr/java/jdk1.7.0_25

7. Upload the AdvancedExecSource.jar package for developing the custom plug-in AdvancedExecSource based on Flume-ng ExecSource to $FLUME_HOME/lib/

Http://blog.itpub.net/30089851/viewspace-2131995/

[hdfs@LogshedNameNodeLogcollector lib] $pwd

/ tmp/flume-ng/lib

[hdfs@LogshedNameNodeLogcollector lib] $ll AdvancedExecSource.jar

-rw-r--r-- 1 hdfs hdfs 10618 Jan 5 23:50 AdvancedExecSource.jar

[hdfs@LogshedNameNodeLogcollector lib] $

8. Modify the agent configuration file for flume

[hdfs@flume-agent-01 conf] $vi exec_memory_avro.properties

# Name the components on this agent

A1.sources = R1

A1.sinks = K1

A1.channels = C1

# Describe/configure the custom exec source

A1.sources.r1.type = com.onlinelog.analysis.AdvancedExecSource

A1.sources.r1.command = tail-f / var/log/hadoop-hdfs/hadoop-cmf-hdfs1-NAMENODE-flume-agent-01.log.out

A1.sources.r1.hostname = flume-agent-01

A1.sources.r1.servicename = namenode

# Describe the sink

A1.sinks.k1.type = avro

A1.sinks.k1.hostname = 172.16.101.54

A1.sinks.k1.port = 4545

# Use a channel which buffers events in memory

A1.channels.c1.type = memory

A1.channels.c1.keep-alive = 60

A1.channels.c1.capacity = 1000000

A1.channels.c1.transactionCapacity = 2000

# Bind the source and sink to the channel

A1.sources.r1.channels = C1

A1.sinks.k1.channel = C1

9. Package flume-agent-01 's flume-ng, scp to flume-agent-02/03 and sht-sgmhadoopcm-01 (172.16.101.54)

[hdfs@flume-agent-01 tmp] $zip-r flume-ng.zip flume-ng/*

[jpwu@flume-agent-01 ~] $scp / tmp/flume-ng.zip flume-agent-02:/tmp/

[jpwu@flume-agent-01 ~] $scp / tmp/flume-ng.zip flume-agent-03:/tmp/

[jpwu@flume-agent-01 ~] $scp / tmp/flume-ng.zip sht-sgmhadoopcm-01:/tmp/

10. Configure hdfs user environment variables and decompress in flume-agent-02, modify agent configuration file

[hdfs@flume-agent-02 ~] $cp / etc/skel/.*. /

Cp: omitting directory `/ etc/skel/.'

Cp: omitting directory `/ etc/skel/..'

[hdfs@flume-agent-02 ~] $vi .bash _ profile

Export FLUME_HOME=/tmp/flume-ng

Export FLUME_CONF_DIR=$FLUME_HOME/conf

Export PATH=$PATH:$FLUME_HOME/bin

[hdfs@flume-agent-02] $. .bash _ profile

[hdfs@flume-agent-02 tmp] $unzip flume-ng.zip

[hdfs@flume-agent-02 tmp] $cd flume-ng/conf

# # modify the following parameters

[hdfs@flume-agent-02 conf] $vi exec_memory_avro.properties

A1.sources.r1.command = tail-f / var/log/hadoop-hdfs/hadoop-cmf-hdfs1-DATANODE-flume-agent-02.log.out

A1.sources.r1.hostname = flume-agent-02

A1.sources.r1.servicename = datanode

# check whether the JAVA_HOME directory of flume-env.sh exists

11. Configure hdfs user environment variables and decompress in flume-agent-03, modify agent configuration file

[hdfs@flume-agent-03 ~] $cp / etc/skel/.*. /

Cp: omitting directory `/ etc/skel/.'

Cp: omitting directory `/ etc/skel/..'

[hdfs@flume-agent-03 ~] $vi .bash _ profile

Export FLUME_HOME=/tmp/flume-ng

Export FLUME_CONF_DIR=$FLUME_HOME/conf

Export PATH=$PATH:$FLUME_HOME/bin

[hdfs@flume-agent-03] $. .bash _ profile

[hdfs@flume-agent-03 tmp] $unzip flume-ng.zip

[hdfs@flume-agent-03 tmp] $cd flume-ng/conf

# # modify the following parameters

[hdfs@flume-agent-03 conf] $vi exec_memory_avro.properties

A1.sources.r1.command = tail-f / var/log/hadoop-hdfs/hadoop-cmf-hdfs1-DATANODE-flume-agent-03.log.out

A1.sources.r1.hostname = flume-agent-03

A1.sources.r1.servicename = datanode

# check whether the JAVA_HOME directory of flume-env.sh exists

twelve。 Aggregation side sht-sgmhadoopcm-01, configure root user environment variables and decompress, modify agent configuration file

[root@sht-sgmhadoopcm-01 tmp] # vi / etc/profile

Export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

Export FLUME_HOME=/tmp/flume-ng

Export FLUME_CONF_DIR=$FLUME_HOME/conf

Export PATH=$FLUME_HOME/bin:$JAVA_HOME/bin:$PATH

[root@sht-sgmhadoopcm-01 tmp] # source / etc/profile

[root@sht-sgmhadoopcm-01 tmp] #

[root@sht-sgmhadoopcm-01 tmp] # unzip flume-ng.zip

[root@sht-sgmhadoopcm-01 tmp] # cd flume-ng/conf

[root@sht-sgmhadoopcm-01 conf] # vi flume-env.sh

Export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera

# Test: first aggregate and sink to HDFS

[root@sht-sgmhadoopcm-01 conf] # vi avro_memory_hdfs.properties

# Name the components on this agent

A1.sources = R1

A1.sinks = K1

A1.channels = C1

# Describe/configure the source

A1.sources.r1.type = avro

A1.sources.r1.bind = 172.16.101.54

A1.sources.r1.port = 4545

# Describe the sink

A1.sinks.k1.type = hdfs

A1.sinks.k1.hdfs.path = hdfs://172.16.101.56:8020/testwjp/

A1.sinks.k1.hdfs.filePrefix = logs

A1.sinks.k1.hdfs.inUsePrefix =.

A1.sinks.k1.hdfs.rollInterval = 0

# roll 16 m = 16777216 bytes

A1.sinks.k1.hdfs.rollSize = 1048576

A1.sinks.k1.hdfs.rollCount = 0

A1.sinks.k1.hdfs.batchSize = 6000

A1.sinks.k1.hdfs.writeFormat = text

A1.sinks.k1.hdfs.fileType = DataStream

# Use a channel which buffers events in memory

A1.channels.c1.type = memory

A1.channels.c1.keep-alive = 90

A1.channels.c1.capacity = 1000000

A1.channels.c1.transactionCapacity = 6000

# Bind the source and sink to the channel

A1.sources.r1.channels = C1

A1.sinks.k1.channel = C1

13. Background start

[root@sht-sgmhadoopcm-01 flume-ng] # source / etc/profile

[hdfs@flume-agent-01 flume-ng] $. ~ / .bash_profile

[hdfs@flume-agent-02 flume-ng] $. ~ / .bash_profile

[hdfs@flume-agent-03 flume-ng] $. ~ / .bash_profile

[root@sht-sgmhadoopnn-01 flume-ng] # nohup flume-ng agent-c conf-f / tmp/flume-ng/conf/avro_memory_hdfs.properties-n A1-Dflume.root.logger=INFO,console &

[hdfs@flume-agent-01 flume-ng] $nohup flume-ng agent- c / tmp/flume-ng/conf-f / tmp/flume-ng/conf/exec_memory_avro.properties-n A1-Dflume.root.logger=INFO,console &

14. Verification: download the log of the cluster locally, open it and view it (briefly)

- -

[remarks]:

1. Error 1 flume-ng installs a machine that does not have a hadoop environment, so if sink to HDFS, you need to use hdfs's jar package

[ERROR-org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run (PollingPropertiesFileConfigurationProvider.java:146)] Failed to start agent

Because dependencies were not found in classpath. Error follows.

Java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType

Simply search for the following 5 jar packages on other installed hadoop machines and copy them to the $FLUME_HOME/lib directory.

Search method: find $HADOOP_HOME/-name commons-configuration*.jar

Commons-configuration-1.6.jar

Hadoop-auth-2.7.3.jar

Hadoop-common-2.7.3.jar

Hadoop-hdfs-2.7.3.jar

Hadoop-mapreduce-client-core-2.7.3.jar

Protobuf-java-2.5.0.jar

Htrace-core-3.1.0-incubating.jar

Commons-io-2.4.jar

two。 Error 2 could not load the custom plug-in class Unable to load source type: com.onlinelog.analysis.AdvancedExecSource

2017-01-06 21 org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run 10 org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run 48278 (conf-file-poller-0) [ERROR-org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run (PollingPropertiesFileConfigurationProvider.java:142)] Failed to load configuration data. Exception follows.

Org.apache.flume.FlumeException: Unable to load source type: com.onlinelog.analysis.AdvancedExecSource, class: com.onlinelog.analysis.AdvancedExecSource

Execute the environment variable of the hdfs or root user

[root@sht-sgmhadoopcm-01 flume-ng] # source / etc/profile

[hdfs@flume-agent-01 flume-ng] $. ~ / .bash_profile

[hdfs@flume-agent-02 flume-ng] $. ~ / .bash_profile

[hdfs@flume-agent-03 flume-ng] $. ~ / .bash_profile

The above is how to build 3 sets of Flume Agent collection + 1 sets of aggregation to hdfs. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.