How to understand the relationship between Kafka and Zookeeper 04/26 Update SLTechnology News&Howtos

How to understand the relationship between Kafka and Zookeeper

2025-04-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to understand the relationship between Kafka and Zookeeper". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to understand the relationship between Kafka and Zookeeper".

Introduction to 1.Kafka

Apache Kafka was first developed by Linkedin and later donated to the Apack Foundation.

Kafka is officially defined as a distributed streaming platform, which is widely used because of its high throughput, persistence, horizontal scalability and so on. Currently, Kafka has the following features:

Message queue, Kafka has the functions of system decoupling, traffic peaking, buffering, asynchronous communication and so on.

Distributed storage system, Kafka can persist messages and fail over with multiple copies at the same time, so it can be used as a data storage system.

Real-time data processing, Kafka provides some components related to data processing, such as Kafka Streams, Kafka Connect, with the function of real-time data processing.

The following figure is the message model of Kafka: [2]

Through the figure above, introduce several main concepts in Kafka:

Producer and consumer: producers and consumers in message queues, where producers push messages to queues and consumers pull messages from queues.

Consumer group: a collection of consumers who can consume messages from different partition under the same topic in parallel.

Servers in the broker:Kafka cluster.

Topic: the classification of messages.

Partition:topic physical grouping, a topic can have a partition, and messages in each partition are assigned an ordered id as an offset. Only one consumer can consume one partition per consumer group.

The relationship between 2.Kafka and Zookeeper

The Kafka architecture is shown below: as you can see from the picture, the work of Kafka requires the cooperation of Zookeeper. So how on earth do they work together?

Look at the picture below:

As you can see from the figure, the work of Kafka requires the cooperation of Zookeeper. So how on earth do they work together?

Look at the picture below:

2.1 Registration Centre

2.1.1 broker registration

As you can see from the figure above, broker distributed deployment requires a registry for unified management. Zookeeper uses a dedicated node to hold the list of Broker services, that is, / brokers/ids.

When broker starts, it sends a registration request to Zookeeper, and Zookeeper creates the broker node under / brokers/ids, such as / brokers/ids/ [0.N], and saves the IP address and port of broker.

The temporary node of this node will be deleted automatically once the broker goes down.

2.1.2 topic registration

Zookeeper also assigns a separate node to topic, and each topic is recorded in Zookeeper in the form of / brokers/topics/ [topic _ name].

A topic message is saved to multiple partition, and the correspondence between these partition and broker also needs to be saved to Zookeeper.

Partition is saved with multiple copies, and the red partition in the image above is a copy of leader. When the broker where the leader copy is located fails, the partition needs to re-elect the leader, which needs to be done by the Zookeeper.

After broker starts, it registers its Broker ID with the partition list of the corresponding topic node.

Let's look at the message that topic is xxx and partition number is 1. The command is as follows:

[root@master] get / brokers/topics/xxx/partitions/1/state {"controller_epoch": 15, "leader": 11, "version": 1, "leader_epoch": 2, "isr": [11je 12je 13]}

When broker exits, Zookeeper updates the partition list of its corresponding topic.

2.1.3 consumer registration

The consumer group will also register with Zookeeper, and Zookeeper will assign a node to it to store the relevant data. The node path is / consumers/ {group_id}, and there are three child nodes, as shown below:

In this way, Zookeeper can record the relationship between the partition and the consumer, as well as the offset of the partition. [3]

2.2 load balancing

After broker registers with Zookeeper, the producer perceives the change of broker service list according to the broker node, which can achieve dynamic load balancing.

Consumers in consumer group can pull messages from specific partitions according to the information of topic nodes to achieve load balancing.

In fact, Kafka stores a lot of metadata in Zookeeper, as shown in the following figure:

With the increase of broker, topic, and partition, the amount of data saved will become larger and larger.

3.Controller introduction

From the previous section, we saw that Kafka is so dependent on Zookeeper that there is no way for Kafka to run independently without Zookeeper. So how does Kafka interact with Zookeeper?

Figure below: [4] A broker in the Kafka cluster will be elected as Controller to interact with Zookeeper, which is responsible for managing the status of all partitions and replicas in the entire Kafka cluster. Other broker listens for data changes on Controller nodes.

Controller's election work depends on Zookeeper, and after a successful election, Zookeeper creates a / controller temporary node.

The specific responsibilities of Controller are as follows:

Listen for partition changes

For example, when the leader of a partition fails, Controller will elect a new leader for that partition. When a change is detected in the ISR collection of the partition, Controller notifies all broker to update the metadata. When a topic adds partitions, Controller is responsible for reassigning partitions.

Monitor topic-related changes

Monitor broker-related changes

Cluster metadata management

The following figure shows the details of the interaction between Controller, Zookeeper, and broker:

After the Controller election is successful, a complete metadata is pulled from the Zookeeper cluster to initialize the ControllerContext, which is cached in the Controller node. When the cluster changes, such as adding topic partitions, Controller not only needs to change the local cached data, but also needs to synchronize the change information to other Broker.

After Controller listens for Zookeeper events, scheduled task events, and other events, these events are temporarily stored in LinkedBlockingQueue in order, and are processed sequentially by event handling threads, most of which need to interact with Zookeeper, while Controller needs to update its own metadata.

Problems brought by 4.Zookeeper

Kafka itself is a distributed system, but it needs to be managed by another distributed system, which undoubtedly increases the complexity.

4.1 complexity of operation and maintenance

With Zookeeper, two sets of systems must be deployed when deploying Kafka, and the operation and maintenance personnel of Kafka must have the operation and maintenance capability of Zookeeper.

4.2 Controller fault handling

Kafaka relies on a single Controller node to interact with the Zookeeper, and if this Controller node fails, you need to select a new Controller from the broker. As shown in the following figure, the new Controller becomes broker3.

After the successful election of the new Controller, the metadata is pulled from the Zookeeper again for initialization, and all other broker needs to be notified to update the ActiveControllerId. The old Controller needed to turn off snooping, event handling threads, and scheduled tasks. When the number of partitions is very large, the process is very time-consuming, and the Kafka cluster does not work in the process.

4.3 Partition bottleneck

When the number of partitions increases, the metadata saved by Zookeeper becomes more and more, and the pressure on the Zookeeper cluster increases. After reaching a certain level, the listening delay increases, which affects the work of Kafaka.

Therefore, the number of partitions hosted by a single Kafka cluster is a bottleneck. And this is exactly what some business scenarios need.

5. Upgrade

The architecture diagrams before and after the upgrade are compared as follows:

KIP-500 uses Quorum Controller instead of each Controller node in the previous Controller,Quorum to save all the metadata, ensuring the consistency of the copy through the KRaft protocol. In this way, even if the Quorum Controller node fails, the new Controller migration will be very fast.

Officially, after the upgrade, Kafka can easily support millions of partitions.

The Kafak team uses Kafka Raft Metadata mode, or KRaft, to synchronize data through the Raft protocol.

The number of users of Kafka is so large that it is necessary to upgrade without stopping the service.

The Kafka code KIP-500, which currently removes Zookeeper, has been submitted to the trunk branch and is scheduled for release in the future version 2.8.

Kafaka plans to be compatible with Zookeeper Controller and Quorum Controller in version 3.0, so that users can perform grayscale testing. [5]

Thank you for your reading, the above is the content of "how to understand the relationship between Kafka and Zookeeper". After the study of this article, I believe you have a deeper understanding of how to understand the relationship between Kafka and Zookeeper. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.