What is the important mechanism of Kafka consistency? 04/17 Update SLTechnology News&Howtos

What is the important mechanism of Kafka consistency?

2025-04-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly explains "what is the important mechanism of Kafka consistency". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the important mechanism of Kafka consistency".

1. Kafka replica

When the replication-factor of a topic is N and N is greater than 1, each Partition will have N copies (Replica). The replica of kafka contains leader and follower. The number of Replica is less than or equal to the number of Broker, that is, for each Partition, there will be at most one Replica on each Broker, so you can use Broker id to specify the Replica of the Partition. The Replica of all Partition is evenly distributed over all Broker by default.

2. How does Data Replication Propagate messages?

Each Partition has a leader and multiple follower,producer to write data to a Partition. Only the data is written to the leader, and then the data is copied to other Replica. Is the data in the past by leader push or by flower pull?

Kafka is brought over periodically by follower or by trying to pull (pull) (in fact, this process is very similar to the consumption process of consumer). All writes are written to leader, but you can't read on any flower, and you can only read on leader. Flower is just a backup of data to ensure that the leader is hung up and does not provide services.

3. When will Data Replication Commit?

Synchronous replication: commit only after all the follower has taken the data, good consistency and low availability. Asynchronous replication: commit as soon as the leader gets the data, and wait for follower to replicate slowly, with high availability and immediate return, with poor consistency. Commit: it means that leader tells the client that this data has been written successfully. Kafka tries to ensure that leader is dead immediately after commit, and other flower have this data.

Kafka is not fully synchronous or completely asynchronous, but is an ISR mechanism:

Leader maintains a list of Replica that is basically synchronized with it, which is called ISR (in-sync Replica). Each Partition has an ISR and is dynamically maintained by leader.

If a flower lags far behind a leader, or if a data replication request is not initiated for a certain period of time, the leader removes it from the ISR

Leader only commit when all Replica in ISR send ACK to Leader

Since leader is only commit when all Replica sends ACK to Leader, how can flower leader lag so much? When producer sends data to kafka, it can send not only one data at a time, but also an array of message. Batch send, batch send when synchronous, and asynchronous time itself is batch. The underlying queue will be cached and sent in batches, and corresponding to broker, a lot of data will be received (suppose 1000). At this time, leader finds that it has 1000 pieces of data, while flower only has 1000 pieces of data, and removes it from ISR. At this time, it is found that the gap between other flower and him is very small, so wait; if there is a large gap due to memory and other reasons, remove it from ISR.

Commit policy: server configuration

Rerplica.lag.time.max.ms=10000

# if leader finds that flower has not made a fech request to it for more than 10 seconds, then leader considers whether there is something wrong with the flower.

# or the resource is too tight to schedule, and it is too slow to slow down later, so remove it from the ISR.

Remove rerplica.lag.max.messages=4000 # if there is a difference of 4000

# when flower is slow, ensure high availability. When these two conditions are met, add it to ISR.

# makes a dynamic balance between usability and consistency

Topic configuration

Min.insync.replicas=1 # needs to guarantee at least how many replica there are in the ISR

Producer configuration

Request.required.asks=0

# 0: it is equivalent to asynchronous, and you don't need a reply from leader. Producer returns immediately, and the message is sent successfully.

Then the sending message network timed out or broker crash (1.Partition 's Leader does not have commit messages 2.Leader is out of sync with Follower data)

It may be lost or resent.

# 1: when leader receives the message and sends ack, the loss will be retransmitted, and the probability of loss is very small.

#-1: when all follower synchronization messages are successful, send ack. The possibility of losing messages is relatively low. IV. How does Data Replication handle Replica recovery?

Leader is dead, choose one of its follower as leader, and remove the dead leader from ISR to continue processing data. After a while, the leader restarted, it knew where its previous data went, tried to get the data it processed by leader after it hung up, and joined the ISR after it was finished.

5. How does Data Replication deal with all Replica downtime

1. Wait for any replica in the ISR to recover, and select it as Leader

Longer waiting time and lower availability

Or all the Replica in the ISR cannot be recovered or the data is lost, the Partition will never be available

2. Select the first restored Replica as the new Leader, regardless of whether it is in the ISR or not

Does not contain all messages that have been previously Leader Commit, resulting in data loss

High availability

Thank you for your reading, the above is the content of "what is the important mechanism of Kafka consistency". After the study of this article, I believe you have a deeper understanding of what the important mechanism of Kafka consistency is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.