Common Cluster configuration cases of Flume data Collection 04/16 Update SLTechnology News&Howtos

Common Cluster configuration cases of Flume data Collection

2025-04-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

[TOC]

Non-cluster configuration

In this case, it is not a cluster configuration, which is relatively simple. You can directly refer to "Flume Notes arrangement", which is organized by me. Its basic structure is as follows:

Description of the structure of multiple Agent and a source of Flume Cluster

The structure diagram is as follows:

The explanation is as follows:

That is, we can deploy our Agent on different nodes. The above is the case of two Agent. Agent foo can be deployed on the node generated by the log, for example, on our web server, such as tomcat or nginx, the source of foo can be configured to monitor changes in log file data, channel can be stored based on memory or file, and sink, that is, log landing can be configured as avro, that is, output to the next Agent. Agent bar can be deployed on another node, but there is no problem with being on the same node as foo, because Flume itself can run on the same node with multiple instances. The main function of bar is to collect log data from different avro source nodes. In fact, if our web environment is clustered, then the web server will have multiple nodes, then multiple web server nodes will generate logs, and we need to deploy agent on these multiple web servers. At this time, there will be multiple source for bar, as is the case in the following case, but in this section Only discuss the case of multiple agent and one source. As for the data sinking method of agent bar, you can also choose a variety of methods. For more information, please refer to the official website. Here, select sink as HDFS. However, it should be noted that in agent foo, there is only one source. In later cases, multiple source will be configured, that is, in this agent, different log files can be collected. The multiple source to be discussed later refers to the sources of multiple different log files, that is, multiple source in foo, such as data-access.log, data-ugctail.log, data-ugchead.log, and so on. Configuration case environment description

As follows:

That is, there are two nodes:

Uplooking01: the log file / home/uplooking/data/data-clean/data-access.log is the user access log generated by the web server, and a new log file is generated every day. On this node, we need to deploy an Agent for Flume, whose source is the log file and sink is avro. Uplooking03: the main purpose of this node is to collect log output data from different Flume Agent, such as the agent above, and then output it to HDFS. Description: in my environment, there are three uplooking01 uplooking02 uplooking03 nodes, and three nodes are configured with Hadoop clusters. The main function of configuring uplooking01### is to listen for the new data in the file, and after collecting the data, output it to avro## Note: the operation of Flume agent is mainly to configure the A1 under source channel sink##, which is the code name of agent. Source is called R1 channel, called C1 sink, called k1###a1.sources = r1a1.sinks = k1a1.channels = k1a1.channels for source configuration description, new data in the listening file execa1.sources.r1.type = execa1.sources.r1.command = tail-F / home / uplooking/data/data-clean/data-access.log# configuration description for sink using avro logs for data consumption a1.sinks.k1.type = avroa1.sinks.k1.hostname = uplooking03a1.sinks.k1.port = 4444 configuration description for channel using files as temporary caching of data this kind of security is higher a1.channels.c1.type = filea1.channels.c1.checkpointDir = / home/uplooking/data/flume/checkpointa1.channels. C1.dataDirs = / home/uplooking/data/flume/data# associates source R1 and sink K1 through channel C1 a1.sources.r1.channels = c1a1.sinks.k1.channel = c1uplooking03### the main function is to listen to avro After collecting the data, output to hdfs## Note: the operation of Flume agent is mainly to configure the A1 under source channel sink##, which is the code name of agent. Source called R1 channel called C1 sink called k1###a1.sources = r1a1.sinks = k1a1.channels = source configuration description listening avroa1.sources.r1.type = avroa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port Configuration description for sink a1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = / input/data-clean/access/%y/%m/%da1.sinks.k1.hdfs.filePrefix = flumea1.sinks.k1.hdfs.fileSuffix = .loga1.sinks.k1.hdfs.i nUsePrefix = tmpFlumea1.sinks.k1.hdfs.inUseSuffix = .tmpa1.sinks.k1.hdfs.useLocalTimeStamp = truea1.sinks.k1.hdfs.round = truea1 .sinks.k1.hdfs.roomvalue = 10a1.sinks.k1.hdfs.roundUnit = second# after configuring the following two items The data saved in HDFS is the text # otherwise when viewed through hdfs dfs-text What is shown is the compressed hexadecimal a1.sinks.k1.hdfs.serializer = TEXTa1.sinks.k1.hdfs.fileType = DataStream# configuration description for channel. Temporary cache using memory buffers for data a1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 10 "associate source R1 with sink K1 through channel C1 a1.sources.r1.channels = c1a1.sinks.k1.channel = C1 test

The first thing to do is to make sure that a log is generated and its output is / home/uplooking/data/data-clean/data-access.log.

Start Flume Agent on uplooking03:

[uplooking@uplooking03 flume] $flume-ng agent-n A1-c conf--conf-file conf/flume-source-avro.conf-Dflume.root.logger=INFO,console

Start Flume Agent on uplooking01:

Flume-ng agent-n A1-c conf--conf-file conf/flume-sink-avro.conf-Dflume.root.logger=INFO,console

After a period of time, you can see the written log file in hdfs:

[uplooking@uplooking02 ~] $hdfs dfs-ls / input/data-clean/access/18/04/0718/04/07 08:52:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicableFound 26 items-rw-r--r-- 3 uplooking supergroup 1131 2018-04-07 08:50 / input/data-clean/access/18/04/07/flume.1523062248369.log-rw-r--r-- 3 uplooking supergroup 1183 2018-04-07 08:50 / input/data-clean/access/18/04/07/flume.1523062248370.log-rw-r--r-- 3 uplooking supergroup 1176 2018-04- 07 08:50 / input/data-clean/access/18/04/07/flume.1523062248371.log.

View the data in the file:

[uplooking@uplooking02 ~] $hdfs dfs-text / input/data-clean/access/18/04/07/flume.1523062248369.log18/04/07 08:55:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicable1000 220.194.55.244 null 40604 0 POST / check/init HTTP/1.1 500 null Mozilla/5.0 (Windows NT 6.1 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.3 15230622363681002 221.8.9.6 80 886a1533-38ca-466c-86e1-0b84022f781b 20201 1 GET / top HTTP/1.0 500 null Mozilla/5.0 (Windows NT 6.1 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.3 15230622368691002 61.172.249.96 99fb19c4-ec59-4abd-899c-4059dea39ead 00 POST / updateById?id=21 HTTP/1.1 408 null Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0 Rv:11.0) like Gecko 15230622373701003 61.172.249.96 886a1533-38ca-466c-86e1-0b84022f781b 10022 1 GET / tologin HTTP/1.1 null / update/pass Mozilla/5.0 (Windows; U; Windows NT 5.1) Gecko/20070309 Firefox/2.0.0.3 15230622378711003 125.39.129.67 6839fff8-7b3a-48f5-90cd-0f45c7be1aeb 10022 1 GET / tologin HTTP/1.0 408 null Mozilla/5.0 (Windows; U WindowsNT 5.1) Gecko/20070309 Firefox/2.0.0.3 15230622383721000 61.172.249.96 89019ae0-6140-4e5a-9061-e3af74f3e4a8 10022 1 POST / stat HTTP/1.1 null / passpword/getById?id=11 Mozilla/4.0 (compatible; MSIE 5.0; WindowsNT) 1523062238873

If hdfs.serializer=TEXT and hdfs.fileType=DataStream are not configured in the Flume agent of uplooking03, the data you see above will be hexadecimal data.

Structure description of multiple Agent and multiple source of Flume Cluster

As follows:

Configuration case environment description

In our environment, it is as follows:

That is, in our environment, there are three log sources, namely data-access.log, data-ugchead.log and data-ugctail.log, but in the following actual configuration, we only use two agent for log sources, uplooking01 and uplooking02, and their sink are all output to the source of uplooking03. Configuration

The configuration of uplooking01 and uplooking02 is the same, as follows:

# the main function is to listen for the new data in the file, and after collecting the data, print it on the console # # Note: the operation of Flume agent is mainly to configure the A1 under source channel sink##, which is the code name of agent. Source is called R1 channel called C1 sink called k1###a1.sources = R1 R2 r3a1.sinks = k1a1.channels = source R1 configuration description new data in the listening file execa1.sources.r1.type = execa1.sources.r1.command = tail- F / home/uplooking/data/data-clean/data-access.loga1.sources.r1.interceptors = i1 i2a1.sources.r1.interceptors.i1.type = static## statically add a key value to the header The following two interceptors are configured, i1 and i2a1.sources.r1.interceptors.i1.key = typea1.sources.r1.interceptors.i1.value = accessa1.sources.r1.interceptors.i2.type = timestamp## timestamp: if configured here, the flume agent responsible for centralized log collection does not need to be configured # # a1.sinks.k1.hdfs.useLocalTimeStamp = true to obtain time information through these% y/%m/%d. The burden of collecting logs centrally can be reduced, because the time information can be obtained directly from source # for the new data in the source R2 configuration description listening file execa1.sources.r2.type = execa1.sources.r2.command = tail-F / home/uplooking/data/data-clean/data-ugchead.loga1.sources.r2.interceptors = i1 i2a1.sources.r2.interceptors.i1.type = static## statically add a key value to the header Two interceptors are configured below. I1 and i2a1.sources.r2.interceptors.i1.key = typea1.sources.r2.interceptors.i1.value = ugcheada1.sources.r2.interceptors.i2.type = timestamp# for the configuration description of source R3 New data in the listening file execa1.sources.r3.type = execa1.sources.r3.command = tail-F / home/uplooking/data/data-clean/data-ugctail.loga1.sources.r3.interceptors = i1 i2a1.sources.r3.interceptors.i1.type = static## statically add a key value to the header Two interceptors are configured below. I1 and i2a1.sources.r3.interceptors.i1.key = typea1.sources.r3.interceptors.i1.value = ugctaila1.sources.r3.interceptors.i2.type = timestamp# for sink configuration description using avro logs for data consumption a1.sinks.k1.type = avroa1.sinks.k1.hostname = uplooking03a1.sinks.k1.port = 4444 for channel configuration description using files as temporary cache of data this kind of security is higher a1.channels. C1.type = filea1.channels.c1.checkpointDir = / home/uplooking/data/flume/checkpointa1.channels.c1.dataDirs = / home/uplooking/data/flume/data# associates source R1 R2 R2 with sink K1 via channel C1 a1.sources.r1.channels = c1a1.sources.r2.channels = c1a1.sources.r3.channels = c1a1.sinks.k1.channel = C1

The configuration of uplooking03 is as follows:

# the main function is to listen to avro, collect data, and output to hdfs## Note: the operation of Flume agent is mainly to configure the A1 under source channel sink## is the code name of agent. Source called R1 channel called C1 sink called k1###a1.sources = r1a1.sinks = k1a1.channels = source configuration description listening avroa1.sources.r1.type = avroa1.sources.r1.bind = 0.0.0.0a1.sources.r1.port Configuration description for sink a1.sinks.k1.type = hdfsa1.sinks.k1.hdfs.path = / input/data-clean/% {type} /% Y/%m/%da1.sinks.k1.hdfs.filePrefix =% {type} a1.sinks.k1.hdfs.fileSuffix = .loga1.sinks.k1.hdf s.inUseSuffix = .tmpa1.sinks.k1.hdfs.round = truea1.sinks.k1.hdfs.rollInterval = 0a1.sinks. K1.hdfs.rollCount = 0a1.sinks.k1.hdfs.rollSize = 1048576log if you want the log file rolling policy configured above to take effect Then you must configure the following item a1.sinks.k1.hdfs.minBlockReplicas = "after configuring the following two items, the data saved to HDFS is text # otherwise, when viewed through hdfs dfs-text What is shown is the compressed hexadecimal a1.sinks.k1.hdfs.serializer = TEXTa1.sinks.k1.hdfs.fileType = DataStream# configuration description for channel. Temporary cache using memory buffers for data a1.channels.c1.type = memorya1.channels.c1.capacity = 1000a1.channels.c1.transactionCapacity = 10 "associate source R1 with sink K1 through channel C1 a1.sources.r1.channels = c1a1.sinks.k1.channel = C1 test

First of all, you need to make sure that logs are generated properly on both uplooking01 and uplooking02.

Start Agent on uplooking03:

[uplooking@uplooking03 flume] $flume-ng agent-n A1-c conf--conf-file conf/flume-source-avro.conf-Dflume.root.logger=INFO,console

Start Agent on uplooking01 and uplooking02, respectively:

Flume-ng agent-n A1-c conf--conf-file conf/flume-sink-avro.conf-Dflume.root.logger=INFO,console

After a period of time, you can view the corresponding log file in HDFS:

$hdfs dfs-ls / input/data-clean18/04/08 01:34:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicableFound 3 itemsdrwxr-xr-x-uplooking supergroup 0 2018-04-07 22:00 / input/data-clean/accessdrwxr-xr-x-uplooking supergroup 0 2018-04-07 22:00 / input/data-clean/ugcheaddrwxr-xr-x-uplooking supergroup 0 2018-04-07 22:00 / input/data-clean/ugctail

View the log files in a log directory:

[uplooking@uplooking02 data-clean] $hdfs dfs-ls / input/data-clean/access/2018/04/0718/04/08 01:35:27 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... Using builtin-java classes where applicableFound 2 items-rw-r--r-- 3 uplooking supergroup 2447752 2018-04-08 01:02 / input/data-clean/access/2018/04/08/access.1523116801502.log-rw-r--r-- 3 uplooking supergroup 5804 01:02 / input/data-clean/access/2018/04/08/access.1523120538070.log.tmp

You can see that the number of log files is very small, because when you configured agent for uplooking03 earlier, the log files were scrolled so that a single file reached 10m and then split the log files.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.