Flume+Kafka integration 07/12 Update SLTechnology News&Howtos

Flume+Kafka integration

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/02 Report--

Flume+Kafka integration

I. preparatory work

Prepare 5 intranet servers to create Zookeeper and Kafka clusters

Server address:

192.168.2.240

192.168.2.241

192.168.2.242

192.168.2.243

192.168.2.244

Server system: Centos 6.564 bit

Download the installation package

Zookeeper: http://apache.fayea.com/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

Flume: http://apache.fayea.com/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz

Kafka: http://apache.fayea.com/kafka/0.10.0.0/kafka_2.10-0.10.0.0.tgz

The Java environment is required for Zookeeper,Flume,kafka, so install JDk first

Yum install java-1.7.0-openjdk-devel

2. Install and configure zookeeper

Select 3 servers as the zookeeper cluster, and their IP are:

192.168.2.240

192.168.2.241

192.168.2.242

Note: first perform steps (1)-(3) on the first server 192.168.2.240.

(1) decompress: put zookeeper-3.4.6.tar.gz in / opt directory

Tar zxf zookeeper-3.4.6.tar.gz

(2) create a configuration file: make a copy of conf/zoo_sample.cfg named zoo.cfg, and also put it in the conf directory. Then modify the configuration according to the following values:

TickTime=2000

DataDir=/opt/zookeeper/Data

InitLimit=5

SyncLimit=2

ClientPort=2181

Server.1=192.168.2.240:2888:3888

Server.2=192.168.2.241:2888:3888

Server.3=192.168.2.242:2888:3888

The meaning of each parameter:

TickTime: interval between heartbeat detection (milliseconds). Default: 2000

ClientPort: the port on which other applications (such as solr) access ZooKeeper. Default: 2181.

InitLimit: phase of initial synchronization (stage when followers is connected to leader), length of time allowed (number of tick), default: 10

SyncLimit: length of time followers is allowed to synchronize to ZooKeeper (number of tick). Default: 5

DataDir: the path where data (such as managed configuration files) is stored

Server.X:X is the id of a server in the cluster, which is consistent with the id in the myid file. Two ports can be configured on the right, the first for data synchronization and other communications between Fllower and Leader, and the second for voting communication during the Leader election.

(3) create the / opt/zookeeper/Data snapshot directory and create the my id file, which is written to 1.

Mkdir / opt/zookeeper/Data vi / opt/zookeeper/Data/myid 1

(4) copy the configured / opt/zookeeper/ directory on 192.168.2.240 to 192.168.2.241 and 192.168.2.242, respectively. Then modify the contents of the corresponding myid to 2 and 3

(5) start zookeeper cluster

Execute the startup command on each of the three servers

/ opt/zookeeper/bin/zkServer.sh start III. Install and configure kafka cluster

A total of 5 servers, server IP address:

192.168.2.240 node1

192.168.2.241 node2

192.168.2.242 node3

192.168.2.243 node4

192.168.2.244 node5

1. Extract the installation files to the / opt/ directory

Cd / opttar-zxvf kafka_2.10-0.10.0.0.tar.gzmv kafka_2.10-0.10.0.0 kafka

2. Modify server. Properties file

# node1 configuration

Broker.id=0

Port=9092

Advertised.listeners=PLAINTEXT:// 58.246.xx.xx:9092

Advertised.host.name=58.246.xx.xx

# the pit encountered, because I pulled the nginx log back to the company's local server online, these two options must be configured as the router extranet IP address, otherwise the online flume report cannot connect to the kafka node or send log messages.

Advertised.port=9092

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

Zookeeper.connect=192.168.2.240:2181192.168.2.241:2181192.168.2.242:2181

# node2 configuration

Broker.id=1

Port=9093

Advertised.listeners=PLAINTEXT://58.246.xx.xx:9093

Advertised.host.name=58.246.xx.xx

Advertised.port=9093

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

Zookeeper.connect=192.168.2.240:2181192.168.2.241:2181192.168.2.242:2181

# node3 configuration

Broker.id=2

Port=9094

Advertised.listeners=PLAINTEXT:// 58.246.xx.xx:9094

Advertised.host.name=58.246.xx.xx

Advertised.port=9094

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

Zookeeper.connect=192.168.2.240:2181192.168.2.241:2181192.168.2.242:2181

# node4 configuration

Broker.id=2

Port=9095

Advertised.listeners=PLAINTEXT:// 58.246.xx.xx:9095

Advertised.host.name=58.246.xx.xx

Advertised.port=9095

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

Zookeeper.connect=192.168.2.240:2181192.168.2.241:2181192.168.2.242:2181

# node5 configuration

Broker.id=2

Port=9096

Advertised.listeners=PLAINTEXT:// 58.246.xx.xx:9096

Advertised.host.name=58.246.xx.xx

Advertised.port=9096

Num.network.threads=3

Num.io.threads=8

Num.partitions=5

Zookeeper.connect=192.168.2.240:2181192.168.2.241:2181192.168.2.242:2181

Start the Kafka cluster

Execute the following command on all nodes to start the service

/ opt/kafka/bin/kafka-server-start.sh/opt/kafka/config/server.properties &

4. Install and configure Flume

Install two flume, one is installed online, the online log is sent back to the local kafka, and the other is installed locally to transfer the log information of the kafka cluster to HDFS

4.1. install Flume on online server

Collect nginx logs and send them to the company's internal kafka

1. Extract the installation package

Cd / opt

Tar-zxvf apache-flume-1.7.0-bin.tar.gz

2. Create a configuration file

Vi flume-conf.properties adds the following

A1.sources = R1

A1.sinks = K1

A1.channels = C1

# Describe/configure the source

A1.sources.r1.type = exec

A1.sources.r1.command = tail-F/unilifeData/logs/nginx/access.log

A1.sources.r1.channels = C1

# Use a channel which buffers events in memory

A1.channels.c1.type = memory

A1.channels.c1.capacity = 100000

A1.channels.c1.transactionCapacity = 100000

# sinks

A1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink

A1.sinks.k1.kafka.topic = unilife_nginx_production

A1.sinks.k1.kafka.bootstrap.servers = 58.246.xx.xxpur9092, 58.246.xx.xxpur9093, 58.246.xx.xxpur9094

A1.sinks.k1.brokerList = 58.246.xx.xxpur9092, 58.246.xx.xxpur9093, 58.246.xx.xxpur9094

A1.sinks.k1.kafka.producer.acks = 1

A1.sinks.k1.flumeBatchSize = 2000

A1.sinks.k1.channel = C1

Start the flume service

/ opt/flume/bin/flume-ng agent-- conf/ opt/flume/conf/--conf-file / opt/flume/conf/flume-conf.properties-- name A1 muri Dflume.root.loggerFormInfoFormLOGFILE &

4.2. Install flume locally

Dump the log to HDFS

1. Extract the installation package

Cd / opt

Tar-zxvf apache-flume-1.7.0-bin.tar.gz

3. Create a configuration file

Nginx.sources = source1

Nginx.channels = channel1

Nginx.sinks = sink1

Nginx.sources.source1.type = org.apache.flume.source.kafka.KafkaSource

Nginx.sources.source1.zookeeperConnect = master:2181,slave1:2181,slave2:2181

Nginx.sources.source1.topic = unilife_nginx_production

Nginx.sources.source1.groupId = flume_unilife_nginx_production

Nginx.sources.source1.channels = channel1

Nginx.sources.source1.interceptors = i1

Nginx.sources.source1.interceptors.i1.type = timestamp

Nginx.sources.source1.kafka.consumer.timeout.ms = 100

Nginx.channels.channel1.type = memory

Nginx.channels.channel1.capacity = 10000000

Nginx.channels.channel1.transactionCapacity = 1000

Nginx.sinks.sink1.type = hdfs

Nginx.sinks.sink1.hdfs.path = hdfs://192.168.2.240:8020/user/hive/warehouse/nginx_log

Nginx.sinks.sink1.hdfs.writeFormat=Text

Nginx.sinks.sink1.hdfs.inUsePrefix=_

Nginx.sinks.sink1.hdfs.rollInterval = 3600

Nginx.sinks.sink1.hdfs.rollSize = 0

Nginx.sinks.sink1.hdfs.rollCount = 0

Nginx.sinks.sink1.hdfs.fileType = DataStream

Nginx.sinks.sink1.hdfs.minBlockReplicas=1

Nginx.sinks.sink1.channel = channel1

Start the service

/ opt/flume/bin/flume-ng agent-conf/ opt/flume/conf/--conf-file / opt/flume/conf/flume-nginx-log.properties-name nginx-Dflume.root.logger=INFO,LOGFILE &

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.