How does the controller controller of Kafka understand 07/19 Update SLTechnology News&Howtos

How does the controller controller of Kafka understand

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you how to understand the controller controller of Kafka. The article is rich in content and analyzes and narrates it from a professional point of view. I hope you can get something after reading this article.

A brief introduction to the controller

Controller component (Controller) is the core component of Apache Kafka. Its main role is to manage and coordinate the entire Kafka cluster with the help of Apache ZooKeeper. Any Broker in the cluster can play the role of controller, but in the process of operation, only one Broker can become the controller and exercise its responsibility of management and coordination. In other words, every functioning Kafka cluster has one and only one controller at any given time. There is a JMX indicator called activeController on the official website, which can help us monitor the survival status of the controller in real time. This JMX indicator is very critical. In the actual operation and maintenance process, you must check the value of this indicator in real time. Next, let's talk about the principle and internal operation mechanism of the controller in detail.

2. Introduction of the principle and internal operation mechanism of the controller ZooKeeper.

Before I begin, let me briefly introduce the Apache ZooKeeper framework. You know, controllers are heavily dependent on ZooKeeper, so it's necessary to take some time to learn what ZooKeeper does. Apache ZooKeeper is a distributed coordination service framework that provides high reliability. The data model it uses is similar to the tree structure of the file system, and the root directory also starts with "/". Each node on this structure is called znode and is used to hold some metadata coordination information. If divided by znode persistence, znode can be divided into persistent znode and temporary znode. The persistent znode does not disappear because the ZooKeeper cluster is restarted, while the temporary znode is bound to the ZooKeeper session that created the znode, and once the session ends, the node is automatically deleted.

ZooKeeper gives clients the ability to monitor znode changes, the so-called Watch notification function. Once the znode node is created or deleted, the number of child nodes changes, or the data stored in the znode itself changes, ZooKeeper explicitly notifies the client through the node change listener (ChangeHandler).

Relying on these functions, ZooKeeper is often used to achieve cluster member management, distributed locking, leader election and other functions. Kafka controller makes extensive use of Watch function to realize the coordinated management of the cluster. Let's take a look at a picture that shows the znode distribution created by Kafka in ZooKeeper. You don't need to understand the role of each znode, but you can get a general understanding of Kafka's dependence on ZooKeeper.

The figure includes almost all the Kafka cluster data we can think of. The more important data are:

All subject information. Include specific partition information, such as who is the leader copy, which copies are in the ISR collection, etc.

All Broker information. Including which are currently running Broker, which are shutting down Broker and so on.

All partitions involving operation and maintenance tasks. Includes a list of partitions that are currently under way for Preferred leadership elections and partition reassignment.

It is worth noting that a copy of this data is actually saved in ZooKeeper. Whenever the controller initializes, it reads the corresponding metadata from the ZooKeeper and populates it into its own cache. With these data, the controller can provide data services. The external here mainly means that for other Broker, the controller synchronizes the data to other Broker by sending requests to these Broker.

Controller failover

As we emphasized earlier, during the operation of a Kafka cluster, only one Broker can act as a controller, so there is a risk of single point of failure (Single Point of Failure). How does Kafka deal with single point of failure? The answer is to provide failover for the controller, which is called Failover.

Failover means that when a running controller goes down suddenly or terminates unexpectedly, Kafka can quickly sense it and immediately enable a backup controller to replace the previously failed controller. This process is called Failover, and it is done automatically without manual intervention.

In the beginning, Broker 0 was the controller. When Broker 0 goes down, ZooKeeper senses and deletes the / controller temporary node through the Watch mechanism. After that, all the surviving Broker began to campaign for a new controller identity. Broker 3 finally won the election and successfully rebuilt the / controller node on ZooKeeper. Broker 3 then reads the cluster metadata information from ZooKeeper and initializes it into its own cache. At this point, the Failover of the controller is complete and can perform its normal duties.

Internal design principle of controller

Before Kafka 0.11, the controller design was quite cumbersome and the code was a bit confusing, which led to many controller Bug in the community that could not be fixed. The controller is multithreaded and many threads are created internally. For example, the controller needs to create a corresponding Socket connection for each Broker, and then create a dedicated thread to send specific requests to these Broker. If there is a large number of Broker in the cluster, there will be many threads to be created on the controller side. In addition, the controller connects to the session of the ZooKeeper and creates a separate thread to handle the notification callback of the Watch mechanism. In addition to the above threads, the controller also creates additional Icano threads for theme deletion.

Worse than multithreaded design, these threads also access shared controller cache data. As we all know, multithreaded access to shared variable data is the biggest problem in maintaining thread safety. In order to protect data security, the controller has to use a lot of ReentrantLock synchronization mechanism in the code, which further slows down the processing speed of the whole controller.

For these reasons, the community reconstructed the underlying design of the controller in version 0.11, and the biggest improvement is to change the multithreaded scheme to a single-threaded plus event queue scheme. I use a picture of the community directly to illustrate.

From this figure, we can see that the community introduces an event handling thread to handle various controller events uniformly, and then the controller models all the operations performed into separate events and sends them to the exclusive event queue for consumption by this thread. This is how the so-called single thread + queue is implemented.

It is worth noting that the single thread here does not mean that all the threads mentioned earlier have been "killed". The controller simply delegates the work of cache state changes to this thread.

The biggest advantage of this scheme is that the state saved in the controller cache is processed by only one thread, so the heavyweight thread synchronization mechanism is no longer needed to maintain thread safety, and Kafka no longer has to worry about multi-thread concurrent access, which is very helpful for the community to locate and diagnose various problems of the controller. In fact, since the 0.11 version of the refactoring controller code, the community has significantly less Bug about the controller, which also shows that this scheme is effective.

The second improvement to the controller is to change all the previous synchronous operations ZooKeeper to asynchronous operations. ZooKeeper's own API provides both synchronous and asynchronous writes. Before, the controller used synchronous API to operate ZooKeeper, and its performance was very poor. When a large number of topic partitions were changed, ZooKeeper was easy to become the bottleneck of the system. The new version of Kafka modifies this part of the design, completely abandoning the previous synchronous API calls and using asynchronous API to write to ZooKeeper, resulting in a great improvement in performance. According to community tests, ZooKeeper writes increased tenfold after being changed to asynchronous!

III. Community work

In addition to the above, the community has recently released a major improvement! Previously, all requests received by Broker were treated equally and without discrimination. This design is very unfair to requests sent by the controller because such requests should have a higher priority.

As a simple example, if we delete a topic, the controller sends a request called StopReplica to the Broker where all copies of the topic are located. If there is a large backlog of Produce requests on the Broker at this time, the StopReplica request can only be queued up. If these Produce requests are meant to send messages to the topic, it's ironic: the topics are about to be deleted, is there any point in processing these Produce requests? The most reasonable processing order at this point should be to give the StopReplica request a higher priority so that it can be processed preemptively.

This could not be done before version 2.2. However, since 2. 2, Kafka officially supports the processing of requests with different priorities. To put it simply, Kafka separates the requests sent by the controller from the ordinary data class requests, and implements the logic that the controller requests can be handled separately. In view of the fact that this improvement is still a very new feature, let's wait and see how it works.

When you think there is a problem with the controller component, such as the theme cannot be deleted, or the repartition hang resides, you don't have to restart Kafka Broker or the controller. A quick and easy way to do this is to manually delete the / controller node from ZooKeeper. The specific command is rmr / controller. The advantage of this is that it can not only trigger the re-election of the controller, but also avoid the interruption of message processing caused by restarting Broker.

The above is the editor for you to share the Kafka controller controller how to understand, if there happen to be similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.