How to realize copy synchronization in Kafka 07/06 Update SLTechnology News&Howtos

How to realize copy synchronization in Kafka

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to achieve copy synchronization in Kafka. The editor thinks it is very practical, so I share it with you as a reference. I hope you can get something after reading this article.

The process of follower copy synchronization is roughly to initiate a request to obtain data from leader, leader gives a response and returns data, and then the follower copy updates its own HW and Leo values, and during the follower request data process, leader will also update its own HW and LE. Note here that the leder copy not only maintains its own HW and Leo values, but also maintains a Leo value of each follower copy. Here we will temporarily call him RemoteLEO.

To sum up, the synchronization process of a follower copy is nothing more than taking data from a leader copy and writing it to log, and then updating the values of HW and LEO.

Update mechanism of HW and LEO

Suppose our new kafka cluster has just been established without any producers or messages. Follower initiates a request for fetch data to leader at this time. Leader finds that no data will temporarily store the request in purgatory (used to temporarily store requests that cannot be processed temporarily, but these requests have a timeout setting and will be forcibly completed if they time out). The initial states of Leader and follower are as follows:

At this point, suppose the producer sends a message to the partition of a topic in kafka, and the leader copy will have its own Leo value + 1, HW value unchanged, and RemoteLeo value unchanged. The state diagram is as follows:

After receiving the message from the producer, kafka mainly goes through the following process (it is assumed that follower does not have a request for fetch data for the time being):

Leader writes data to the underlying log and updates its own Leo value

Leader will try to update its own HW value, because the RemoteLeo value is 0, the Leo value is 1, and the smaller value is between the two, so the HW value is still 0 and is not updated.

When the message is written, suppose follower makes a fetch data request, and because new data is generated, leader will respond to the new data to follower,follower, write the data to the underlying log and update its own LEO after receiving the new data. The state diagram is as follows:

From the initiation of the fetch data request to the completion of the response of follower, leader and follower mainly go through the following processes:

Follower initiates a request for fetch data and carries its own fetch offset in the request because there is no data in the follower at this time, so the fetch offset is 0

After receiving the request, leader reads the underlying log data

Leader will try to update RemoteLEO because the fetch offset in the follower request is 0, so no update is made.

Leader will try to update HW, compare the size of LEO and RemoteLEO, and take a smaller value, so the value of HW is still 0 and no update is made.

At this point, leader responds to the data and HW values to follower

When follower receives the response, it writes the data to the underlying log log and updates its LEO

Follower attempts to update its HW value, comparing its own Leo value with the HW in the response, taking a smaller value, so the HW value is still 0 and no update is made.

In the above steps, a fetch data request has been completed, and HW, LEO and RemoteLEO of leader have not been updated. Follower writes the data to the underlying log and updates LEO. Then the update about HW needs to be accompanied by another fetch data request update to be successful. It is precisely because HW requires two fetch requests to update, so kafka using watermarks for follower synchronization can lead to data loss and data inconsistency (described in this next section). Let's take a look at the result status diagram after the second fetch request.

After the second fetch data request, the RemoteLEO and HW in the leader will be successfully updated to 1 and the HW in the HW follower will be updated to 1. The state diagram is as follows:

Follower initiates a fetch data request for the second time. After the response is completed, leader and follower go through the same process as the first time, except that the data in the request and response has changed:

Follower initiates a fetch data request again, this time with a fetch offset of 1 instead of 0

After receiving the request, leader reads the underlying log log

Leader tried to update RemoteLEO, this time the local LEO and fectch offset are both 1, so RemoteLEO is successfully updated to 1

Leader attempts to update HW, comparing LEO and RemoteLEO, both of which have a value of 1, so HW is also successfully updated to 1

Leader will now respond to follower with data (actually no data this time) and HW value.

After follower receives the response, because there is no data coming this time, the underlying log log is no longer written, and the LEO will not be updated.

Follower attempts to update the HW, comparing its own LEO with the response HW, because both are 1, so the HW of follower is updated successfully.

Update key points for LEO and HW

Leader

Leader LEO: update occurs after the message is written to the underlying log

Leader RemoteLEO: the local RemoteLEO and fetch offset values need to be compared, with a smaller value.

Leader HW: the values of RemoteLEO and LEO need to be compared, with a smaller value.

Update order: data is written to the underlying log LEO update, followed by an attempt to update RemoteLEO, and then an attempt to update HW

Follower

Follower LEO: depends on whether there is log data in response

HW and LEO in Follower HW:response are compared, and the two are smaller.

This is the end of the article on "how to achieve copy synchronization in Kafka". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.