Detailed introduction, installation and configuration of kafka 07/06 Update SLTechnology News&Howtos

Detailed introduction, installation and configuration of kafka

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

1. Introduction

Kafka is a distributed,partitioned,replicated commit logservice . It provides features similar to JMS, but is completely different in design and implementation, and it is not an implementation of the JMS specification. Kafka classifies messages according to Topic when they are saved, the sender becomes Producer and the message receiver becomes Consumer. In addition, the kafka cluster is composed of multiple kafka instances, and each instance (server) becomes broker. Both kafka clusters, producer and consumer rely on zookeeper to ensure system availability that the cluster holds some meta information.

Kafka is a distributed, publish / subscribe-based messaging system whose architecture includes the following components:

i. The publish of the message is called producer, the subscribe of the message is called consumer, and the intermediate storage array is called broker.

ii. Multiple broker work together, and producer, consumer and broker coordinate requests and forwarding through zookeeper.

iii. Producer generates and pushes (push) data to broker,consumer to pull (pull) data from broker and processes it.

iv. The broker side does not maintain the consumption status of data, which improves the performance. Published messages are stored in a set of servers called Kafka clusters. Each server in the cluster is a Broker. Consumers can subscribe to one or more topics and pull data from Broker to consume these published messages.

v. Direct use of disk storage, linear read and write, fast: avoid data replication between JVM memory and system memory, reduce performance-consuming creation of objects and garbage collection.

vi. Kafka is written in scala and can be run on JVM.

As shown in the figure above, a typical Kafka cluster contains:

Several Producer (can be Page View generated by web front end, or server log, system CPU, Memory, etc.), and several broker (Kafka supports horizontal expansion. Generally, the more the number of broker, the higher the cluster throughput).

Several Consumer Group and a Zookeeper cluster. Kafka manages the cluster configuration through Zookeeper, elects leader, and rebalance when the Consumer Group changes. Producer publishes messages to broker,Consumer using the push pattern subscribes and consumes messages from broker using the pull pattern.

Topic & Partition

Topic can be logically thought of as a queue, and each consumer must specify its Topic, which can be simply understood as indicating which queue to put the message into. In order to improve the throughput of Kafka linearly, the Topic is physically divided into one or more Partition, and each Partition physically corresponds to a folder in which all the messages and index files of the Partition are stored. If you create two topic of topic1 and topic2 with 13 and 19 partitions respectively, a total of 32 folders will be generated on the whole cluster.

Start installing the kafka cluster:

1, create a user

Add users on all hosts:

Groupadd kafka

Useradd kafka-g kafka

2. The host is assigned to Hadoop1 and Hadoop2,Hadoop3 respectively.

3. Bind hosts

172.16.1.250 hadoop1

172.16.1.252 hadoop2

172.16.1.253 hadoop3

4, download, decompress

Https://kafka.apache.org/

Tar-xzf kafka_2.9.2-0.8.1.1.tgz

Cd kafka_2.9.2-0.8.1.1

Ln-s / usr/local/hadoop/kafka_2.10-0.8.1.1 / usr/local/hadoop/kafka

Chown-R kafka:kafka / usr/local/hadoop

Install it on the Hadoop3 machine first

5. Modify the configuration file

Cd / usr/local/hadoop/kafka/config

Vim / kafka/server.properties

The id of the three broker.id=3 computers cannot be the same.

Port=9092

Num.network.threads=2

Num.io.threads=8

Socket.send.buffer.bytes=1048576

Socket.receive.buffer.bytes=1048576

Socket.request.max.bytes=104857600

Log.dirs=/tmp/kafka-logs

Num.partitions=2

Log.retention.hours=168

Log.segment.bytes=536870912

Log.retention.check.interval.ms=60000

Log.cleaner.enable=false

Zookeeper.connect=hadoop1:2181,hadoop2:2181,hadoop3:2181/kafka (zookpeer Cluster)

Zookeeper.connection.timeout.ms=1000000

Start

Bin/kafka-server-start.sh / usr/local/hadoop/kafka/config/server.properties &

6. Configure the Java environment

# java

Export JAVA_HOME=/soft/jdk1.7.0_79

Export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

Export PATH=$PATH:/$JAVA_HOME/bin:$HADOOP_HOME/bin

7. Deploy the kafka cluster

Since the kafka cluster depends on zookeeper, install zookeeper

See:

Https://taoistwar.gitbooks.io/spark-operationand-maintenance-management/content/spark_relate_software/kafka_install.html

7. Synchronize the configuration files of the three machines and modify the corresponding broker.id=1,broker.id=2,broker.id=3

Cd / usr/local/hadoop/

Install one on the Hadoop3 machine first.

Scp-r kafka/ hadoop1:/usr/local/hadoop/

Scp-r kafka/ hadoop2:/usr/local/hadoop/

On the Hadoop1 machine, modify the configuration file and start

Vim conf/server.properties

The id of the three broker.id=1 computers cannot be the same.

Port=9092

Num.network.threads=2

Num.io.threads=8

Socket.send.buffer.bytes=1048576

Socket.receive.buffer.bytes=1048576

Socket.request.max.bytes=104857600

Log.dirs=/tmp/kafka-logs

Num.partitions=2

Log.retention.hours=168

Log.segment.bytes=536870912

Log.retention.check.interval.ms=60000

Log.cleaner.enable=false

Zookeeper.connect=hadoop1:2181,hadoop2:2181,hadoop3:2181/kafka (zookpeer Cluster)

Zookeeper.connection.timeout.ms=1000000

Start

Bin/kafka-server-start.sh / usr/local/kafka/config/server.properties &

On the Hadoop2 machine, modify the configuration file and start

Vim conf/server.properties

The id of the three broker.id=2 computers cannot be the same.

Port=9092

Num.network.threads=2

Num.io.threads=8

Socket.send.buffer.bytes=1048576

Socket.receive.buffer.bytes=1048576

Socket.request.max.bytes=104857600

Log.dirs=/tmp/kafka-logs

Num.partitions=2

Log.retention.hours=168

Log.segment.bytes=536870912

Log.retention.check.interval.ms=60000

Log.cleaner.enable=false

Zookeeper.connect=hadoop1:2181,hadoop2:2181,hadoop3:2181/kafka (zookpeer Cluster)

Zookeeper.connection.timeout.ms=1000000

Start

Bin/kafka-server-start.sh / usr/local/hadoop/kafka/config/server.properties &

8 Verification

Start Console-based producer and consumer using scripts that come with Kafka.

9, error summary:

TttpVERVERULAR Universe Wenda.ChinaHadoop.cnUnix Universe 4079 notify ficationalization cards 290954 RTF falsehood itemization cards 10382

Http://blog.csdn.net/wenxuechaozhe/article/details/52664774

Http://472053211.blog.51cto.com/3692116/1655844

10. For actual operation, please see:

Https://taoistwar.gitbooks.io/spark-operationand-maintenance-management/content/spark_relate_software/kafka_install.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.