How to understand the copy Mechanism of Kafka 07/02 Update SLTechnology News&Howtos

How to understand the copy Mechanism of Kafka

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article mainly explains "how to understand Kafka's copy mechanism". The explanation content in this article is simple and clear, and it is easy to learn and understand. Please follow the ideas of Xiaobian to study and learn "how to understand Kafka's copy mechanism" together.

I. Kafka Cluster

Kafka uses Zookeeper to maintain information about its cluster members. Each broker has a unique identifier, broker.id, that identifies itself in the cluster and can be configured in the configuration file server.properties or automatically generated by the program. Here is how the Kafka brokers cluster is automatically created:

When each broker starts, it creates a temporary node under Zookeeper's/brokers/ids path and writes its own broker.id to register itself to the cluster;

When there are multiple brokers, all brokers will compete to create/controller nodes on Zookeeper. Since the nodes on Zookeeper will not be duplicated, only one broker will be successfully created. In this case, the broker is called controller broker. In addition to other broker functions, it is responsible for managing the state of subject partitions and their replicas.

When the broker goes down or exits voluntarily, resulting in timeout of the Zookeeper session held by the broker, the watcher event registered on Zookeeper will be triggered. At this time, Kafka will perform corresponding fault tolerance processing; if the controller broker goes down, a new controller election will also be triggered.

II. Copy mechanism

To ensure high availability, kafka partitions are multi-copy, and if one copy is lost, partition data can be retrieved from other copies. However, this requires that the data corresponding to the replica must be complete, which is the basis for Kafka data consistency, so it is necessary to use controller broker for special management. Kafka's copy mechanism is explained in detail below.

2.1 Partitions and copies

Kafka's themes are divided into partitions, which are the most basic storage unit of Kafka. Each partition can have multiple copies (you can specify this with the replication-factor parameter when creating the theme). One of the replicas is the Leader replica, all events are sent directly to the Leader replica; the other replica is the Follower replica, which needs to be replicated to keep the data consistent with the Leader replica. When the Leader replica is unavailable, one of the Follower replicas will become the new Leader replica.

2.2 ISR mechanism

Each partition has an ISR(in-sync Replica) list that maintains all synchronized, available replicas. A leader instance is necessarily a synchronous instance, while a follower instance needs to satisfy the following conditions to be considered a synchronous instance:

There is an active session with Zookeeper, i.e. heartbeat must be sent to Zookeeper regularly;

Get messages from the leader copy with low latency within the specified time.

If the copy does not meet the above criteria, it will be removed from the ISR list and will not be added again until the criteria are met.

Here is an example of theme creation: use--replication-factor to specify a copy coefficient of 3. After successful creation, use--describe command to see that there are three copies of partition 0, 0, 1, and 2, and all three copies are in the ISR list, where 1 is the leader copy.

2.3 incomplete leadership election

For the copy mechanism, there is an optional configuration parameter unleader.leader.election.enable at the broker level, which defaults to fasle, meaning incomplete leader elections are prohibited. This is for whether to allow a replica that is not completely synchronized to become the leader replica when the leader replica hangs and there are no other replicas available in the ISR. This may cause data loss or data inconsistency. In some scenarios with high data consistency requirements (such as the financial field), this may not be tolerated, so its default value is false. If you can allow some data inconsistency, you can configure it to true.

2.4 minimum synchronous copy

Another parameter associated with the ISR mechanism is min.insync.replicas , which can be configured at the broker or topic level and means that there must be at least a few available replicas in the ISR list. Assuming it is set to 2, then when the number of available copies is less than this value, the entire partition is considered unavailable. org.apache.kafka.common.errors. NotEnoughReplicasException: Messages are rejected since there are fewer in-sync replicas than required.

2.5 send an acknowledgement

Kafka has an optional ack parameter on the producer that specifies how many copies of the partition must receive the message before the producer considers the message written successfully:

acks=0: The message is considered successful when sent out, and will not wait for any response from the server;

acks=1: The producer receives a success response from the server whenever the cluster leader receives the message;

acks=all: The producer receives a success response from the server only when all participating nodes receive the message.

III. Data requests

3.1 metadata request mechanism

Of all replicas, only the lead replica can read and write messages. Because the leader copies of different partitions may be on different brokers, if a broker receives a partition request but the leader copy of the partition is not on the broker, it will return a Not a Leader for Partition error response to the client. To solve this problem, Kafka provides a metadata request mechanism.

First, each broker in the cluster caches the partition copy information of all topics, and the client sends metadata requests periodically, and then caches the obtained metadata. The interval at which metadata is periodically refreshed can be specified by configuring metadata.max.age.ms for the client. With the metadata information, the client knows the broker where the lead copy is located, and then directly sends the read/write request to the corresponding broker.

If the partition copy election occurs within the time interval of the timed request, it means that the original cached information may have been outdated. At this time, an error response of Not a Leader for Partition may be received. In this case, the client will request metadata again, and then flush the local cache, and then go to the correct broker to perform the corresponding operation. The process is as follows:

3.2 data visibility

It should be noted that not all data stored on the partition head can be read by the client. To ensure data consistency, only data stored by all synchronous replicas (all replicas in ISR) can be read by the client.

3.3 zero-copy

Kafka writes and reads all its data via zero-copy. The difference between traditional copy and zero copy is as follows:

Four copies and four context switches in traditional mode

Consider sending disk files over the network. In traditional mode, the file data is read into memory first, and then the data in memory is sent out through Socket using the method shown in the following pseudocode.

buffer = File.read Socket.send(buffer)

This process actually takes place four copies of data. First, the file data is read into the kernel-state Buffer(DMA copy) through the system call, then the application program reads the memory state Buffer data into the user-state Buffer(CPU copy), then the user program copies the user-state Buffer data to the kernel-state Buffer(CPU copy) when sending data through the Socket, and finally the data is copied to the NIC Buffer through DMA copy. At the same time, it is accompanied by four context switches, as shown in the following figure:

sendfile and transferTo achieve zero copy

The Linux 2.4+ kernel provides zero copy via the sendfile system call. After data is copied to the kernel buffer through DMA, it is directly copied to the NIC Buffer through DMA, without CPU copying. This is where the term zero copy comes from. In addition to reducing data copying, because the entire read file is sent to the network by a single sendfile call, there are only two context switches in the entire process, thus greatly improving performance. The zero-copy process is shown in the following figure: From the specific implementation point of view, Kafka's data transmission is completed through TransportLayer. The transferFrom method of its subclass PlaintTextTransportLayer realizes zero-copy by calling the transferTo method of FileChannel in Java NIO, as follows:

@Override public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException { return fileChannel.transferTo(position, count, socketChannel); }

Note: transferTo and transferFrom do not guarantee zero copy. Whether zero copy can actually be used depends on the operating system. If the operating system provides zero copy system calls such as sendfile, these two methods will take full advantage of zero copy through such system calls, otherwise zero copy cannot be achieved through these two methods themselves.

IV. Physical storage

4.1 partition allocation

When creating a theme, Kafka first decides how to distribute partition copies among brokers, following the following principles:

Distribute partition copies evenly across all brokers;

Ensure that each copy of the partition is distributed on a different broker;

If the broker.rack parameter is used to specify the rack information for the broker, copies of each partition are distributed to brokers in different racks as much as possible to avoid that one rack becomes unavailable and the entire partition becomes unavailable.

For the above reasons, if you create a 3-copy topic on a single node, the following exception is usually thrown:

Error while executing topic command : org.apache.kafka.common.errors.InvalidReplicationFactor Exception: Replication factor: 3 larger than available brokers: 1.

4.2 partitioned data retention rule

Keeping data is a fundamental feature of Kafka, but Kafka doesn't keep data forever, nor does it wait until all consumers have read the message before deleting it. Instead, Kafka configures a data retention period for each topic, specifying how long data can be retained before it is deleted, or how much data can be retained before it is cleaned up. They correspond to the following four parameters:

log.retention.bytes: Maximum amount of data allowed before deleting data; default value-1, means no limit;

log.retention.ms: Number of milliseconds to save the data file, if not set, use the value in log.retention.minutes, default is null;

log.retention.minutes: Number of minutes to retain the data file. If not set, use the value in log.retention.hours, default is null;

log.retention.hours: The number of hours to retain the data file, the default value is 168, which is one week.

Because finding and deleting messages in a large file is time-consuming and error-prone, Kafka divides partitions into segments, and the segment currently writing data is called the active segment. Active clips are never deleted. If you keep the data for a week by default and use a new fragment each day, you'll see that the oldest fragment is deleted while the new fragment is used each day, so most of the time there will be seven fragments in the partition.

4.3 file format

The format of the data stored on disk is usually the same as the format of the message sent by the producer. If the producer sends compressed messages, the same batch of messages is compressed together, sent as "wrapped messages"(in the format shown below), and saved to disk. After that, the consumer reads and decompresses the wrapped message to obtain the specific information of each message.

Thank you for your reading. The above is the content of "How to understand Kafka's copy mechanism." After studying this article, I believe everyone has a deeper understanding of how to understand Kafka's copy mechanism. The specific use situation still needs to be verified by practice. Here is, Xiaobian will push more articles related to knowledge points for everyone, welcome to pay attention!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.