What is the concept, deployment and practice of message middleware Kafka+Zookeeper cluster 07/04 Update SLTechnology News&Howtos

What is the concept, deployment and practice of message middleware Kafka+Zookeeper cluster

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the concept, deployment and practice of message middleware Kafka+Zookeeper cluster, the content is very detailed, interested friends can refer to, hope to be helpful to you.

Kafka is a high-throughput distributed publish and subscribe messaging system that can handle all action flow data in consumer-scale websites. The purpose of Kafka is to unify online and offline message processing through the Hadoop parallel loading mechanism and to provide real-time messages through clusters. The content is more basic, mainly around the architecture and functions of kafka.

Before the text begins, let's take a look at the relevant terms involved in Kafka:

1. A Broker--Kafka cluster contains one or more servers, which are called broker.

2. Topic-- every message posted to the Kafka cluster has a category, which is called Topic. (physically, messages with different Topic are stored separately. Logically, messages from a Topic are stored on one or more broker, but users only need to specify the Topic of the message to produce or consume data regardless of where the data is stored.)

3. Partition--Partition is a physical concept, and each Topic contains one or more Partition.

4. Producer-- is responsible for publishing messages to Kafka broker.

5. Consumer-- message consumer, the client that reads the message to Kafka broker.

6. Consumer Group-- each Consumer belongs to a specific Consumer Group (you can specify group name for each Consumer, and if you don't specify group name, it belongs to the default group).

Kafka's topic can be thought of as a stream of records ("/ orders", "/ user-signups"), and each topic has a log, which is stored on disk. Each topic is divided into multiple partition (zones), and each partition is an append log file at the storage level. Any message published to partition is directly appended to the end of the log file, Kafka Producer API is used to generate data streams, and Kafka Consumer API is used for streams using Kafka.

Kafka architecture: Topic, Partition, Producer, Consumer

Typically, a normal workflow is for Kafka's producer to write messages to topic, and consumer to read messages from topic. Topic is associated with logs, which are data structures stored on disk, and Kafka appends the records of producer to the end of the topic log. The topic log consists of many partitions distributed across multiple files, which can be distributed across multiple Kafka cluster nodes. Kafka distributes topic log partitions on different nodes of the cluster to achieve high performance with horizontal scalability. Spreading partitions help write data quickly, and Kafka copies partitions to many nodes to provide failover.

How does Kafka extend if multiple producer and consumer read and write the same Kafka topic log at the same time? First, Kafka itself is so fast that it doesn't take much time to write sequentially to the file system itself; second, on modern fast drives, Kafka can easily write 700 MB or more bytes of data per second.

Cluster deployment and testing

Kafka uses ZooKeeper to manage clusters, ZooKeeper is used to coordinate the server or cluster topology, and ZooKeeper is a consistent file system for configuration information. You can choose the Zookeeper that comes with Kafka, or you can deploy it separately. A Linux host can open three ports to build a simple pseudo-ZooKeeper cluster.

ZooKeeper can send topology changes to Kafka. If a server in the cluster goes down or a topic is added or deleted, each node in the cluster can know when the new server will join, and ZooKeeper provides a synchronous view of the Kafka Cluster configuration. The construction of both Kafka and ZooKeeper requires a java environment. You can query the download and installation of jdk without too much detail in this article. The installation packages of both can also be downloaded from the Apache official website. The configuration process of self-built Zookeeper cluster is as follows:

Create a directory ZooKeeper:mkdir zookeeper

Copy at least three instances, enter the ZooKeeper directory, and other instances do the same:

Create directories zkdata, zkdatalog

Enter the conf directory

Copy zoo_sample.cfg to zoo.cfg, and the detailed configuration is as follows:

Java code

Use the ZooKeeper cluster that comes with Kafka:

View configuration fil

Enter the directory of Kafka's config:

First, set up a zk cluster, then directly use the ZooKeeper that comes with Kafka to establish a zk cluster, and modify the zookeeper.properties file:

Kafka server

A Kafka cluster consists of multiple Kafka Brokers. Each Kafka Broker has a unique ID (number). Kafka Brokers contains topic log partitions, and if you want fault handling capabilities, you need to ensure that there are at least three to five servers, and the Kafka cluster can have up to 10100 or 1000 servers at the same time. Kafka copies each partition data to multiple server, any partition has a leader and multiple follower (may not have); the number of backups can be set through the broker configuration file. When leader fails, you need to select a new leader in followers. Maybe follower lags behind leader at this time, so you need to choose a "up-to-date" follower.

For example, if you are running a kafka cluster in AWS and one of the Kafka Broker fails, Kafka Broker as an ISR (synchronous copy) can quickly provide data.

Note that there are no hard and fast rules on how to set up the Kafka cluster itself. For example, you can set up an entire cluster in a single AZ to achieve higher throughput using AWS enhanced networks and placement groups, and then use Mirror Maker to mirror the cluster to a hot disaster backup AZ in the same area.

On the message middleware Kafka+Zookeeper cluster concept, deployment and practice is shared here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.