How does Kafka ensure high availability mechanism 07/10 Update SLTechnology News&Howtos

How does Kafka ensure high availability mechanism

2025-07-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces the relevant knowledge of "Kafka is how to ensure high availability mechanism". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

interview questions

How to ensure high availability of message queues?

Interviewer psychoanalysis

If anyone asks you about MQ knowledge, high availability is a must-ask. As mentioned earlier, MQ causes system availability degradation. So as long as you use MQ, the next point to ask is definitely how to solve the shortcomings surrounding MQ.

If you are stupid enough to dare to use MQ, all kinds of questions have never been considered, then you will have a cup, the interviewer's feeling for you is that you will only use some simple techniques, without any thinking, and immediately your impression will not be very good. If such students were recruited, it would be okay to be an ordinary underling with a salary of less than 20k. If they were to be a senior engineer with a salary of 20k +, it would be disastrous. If you designed a system, there would definitely be a bunch of pits in it. If there was an accident, the company would suffer losses, and the team would bear the blame together.

Analysis of interview questions

It's a good question to ask, because I can't ask you how Kafka's high availability is guaranteed. How is ActiveMQ high availability guaranteed? If an interviewer asks this question, it will seem very low level. They may use RabbitMQ and have never used Kafka. Why do you ask Kafka? Isn't this clearly making things difficult for people?

So a competent interviewer asks how to guarantee MQ's high availability. So that's which MQ you've used, and you talk about your understanding of that MQ's high availability.

High availability of RabbitMQ

RabbitMQ is more representative, because it is based on master-slave (non-distributed) high availability, we will use RabbitMQ as an example to explain how to achieve the first MQ high availability.

RabbitMQ has three modes: stand-alone mode, normal cluster mode, mirror cluster mode.

stand-alone mode

Standalone mode, that is, Demo level, generally you start the local play children, no one production with stand-alone mode.

Normal cluster mode (no high availability)

Normal cluster mode, which means starting multiple RabbitMQ instances on multiple machines, one for each machine. The queue you create will only be placed on one RabbitMQ instance, but each instance synchronizes the metadata of the queue (metadata can be considered as some configuration information of the queue, through metadata, you can find the instance where the queue is located). When you consume, in fact, if you connect to another instance, then that instance will pull data from the queue instance.

This method is indeed very troublesome, and it is not very good. It does not achieve the so-called distribution. It is just an ordinary cluster. Because this results in either consumers randomly connecting one instance at a time and then pulling data, or fixed connections to that queue instance consuming data, the former has the overhead of data pulling, the latter leads to single instance performance bottlenecks.

And if the instance that put the queue goes down, it will cause other instances to be unable to pull data from that instance. If you enable message persistence and let RabbitMQ land to store messages, the messages may not be lost. You have to wait for this instance to recover before you can continue to pull data from this queue.

Therefore, this matter is more embarrassing, there is no so-called high availability, this solution is mainly to improve throughput, that is to say, let multiple nodes in the cluster serve the read and write operations of a queue.

Mirror cluster mode (high availability)

This pattern is the so-called RabbitMQ high availability pattern. Unlike normal cluster mode, in mirror cluster mode, the queue you create, whether metadata or messages in the queue, will exist on multiple instances, that is, each RabbitMQ node has a complete mirror of the queue, containing all the data of the queue. Then every time you write a message to the queue, it automatically synchronizes the message to multiple instances of the queue.

So how do I turn on this mirror cluster mode? RabbitMQ has a very good management console, which is to add a new policy in the background. This policy is a mirrored cluster mode policy. When specified, it can require data synchronization to all nodes, or it can require synchronization to a specified number of nodes. When creating a queue again, apply this policy, and the data will be automatically synchronized to other nodes.

In this case, the advantage is that any of your machines is down, nothing happens, other machines (nodes) also contain the complete data of this queue, and other consumers can go to other nodes to consume data. The downside is that, first, this performance overhead is too large, messages need to be synchronized to all machines, resulting in network bandwidth pressure and consumption is very heavy! Second, this way of playing, not distributed, there is no scalability at all, if a queue load is very heavy, you add machines, new machines also contain all the data of this queue, and there is no way to linearly expand your queue. You think, if the queue is so large that the capacity on this machine cannot accommodate it, what should we do at this time?

High Availability of Kafka

Kafka one of the most basic architectural understanding: composed of multiple brokers, each broker is a node; you create a topic, this topic can be divided into multiple partitions, each partition can exist in different brokers, each partition puts a part of the data.

This is a natural distributed message queue, that is, the data of a topic is distributed on multiple machines, and each machine puts a part of the data.

RabbmitMQ is not a distributed message queue, it is a traditional message queue, but provides some clustering, HA(High Availability) mechanism, because no matter how to play, RabbitMQ queue data is placed in a node, mirror cluster, but also each node puts the complete data of this queue.

Before Kafka 0.8, there was no HA mechanism, that is, if any broker went down, the partition on that broker would be disabled, and there was no way to write or read, so there was no high availability.

For example, suppose we create a topic and specify that it has three partitions on three machines. However, if the second machine goes down, it will cause 1/3 of the data of this topic to be lost, so this is not highly available.

After Kafka 0.8, HA mechanism was provided, which is replica mechanism. Each partition's data is synchronized to other machines, forming multiple replicas of itself.

All replicas elect a leader, so production and consumption deal with this leader, and then other replicas are followers.

When writing, the leader will be responsible for synchronizing the data to all followers, and when reading, read the data directly on the leader. Read and write leader only?

Very simple, if you can read and write each follower at will, then you have to care about the problem of data consistency, the system complexity is too high, it is easy to go wrong. Kafka distributes all replicas of a partition evenly across different machines to improve fault tolerance.

This way, there is a so-called high availability, because if a broker goes down, it's fine, the partition on that broker has copies on other machines, if there is a partition leader on it, then a new leader will be elected from the followers, and everyone can continue to read and write that new leader. This is called high availability.

When writing data, the producer writes the leader, and then the leader writes the data to the local disk, and then other followers take the initiative to pull the data from the leader. Once all the followers have synchronized their data, they will send an ack to the leader, and after receiving the ack from all the followers, the leader will return a successful message to the producer. (Of course, this is just one of the patterns, and you can adjust this behavior appropriately.)

When consuming, it will only be read from the leader, but only when a message has been successfully returned ack by all followers synchronously, this message will be read by the consumer.

"Kafka is how to ensure high availability mechanism" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.