What are the common interview questions in Kafka? 07/12 Update SLTechnology News&Howtos

What are the common interview questions in Kafka?

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

What are the common interview questions in Kafka? I believe many inexperienced people don't know what to do about it. Therefore, this article summarizes the causes and solutions of the problems. Through this article, I hope you can solve this problem.

Nowadays, Kafka is no longer a simple message queuing system. Kafka is a distributed stream processing platform, which is used by more and more companies. Kafka can be used in high-performance data pipelines, stream processing analysis, data integration and other scenarios. Here are a few common interview questions for Kafka.

How does Q1:Kafka protect data from loss?

This question has become a common practice in Kafka interviews, like Java's HashMap, which is a high-frequency interview question. So, how do we understand this problem? The question is how Kafka ensures that data is not lost, that is, what mechanism does Kafka's Broker provide to ensure that data is not lost.

In fact, for Kafka Broker, Kafka replication mechanism and partition multi-replica architecture are the core of Kafka reliability guarantee. Writing a message to multiple copies enables Kafka to keep the message persistent in the event of a crash.

After getting to the heart of the question, let's take a look at how to answer the question: it mainly includes three aspects.

Number of 1.Topic copy factors: replication.factor > = 3

two。 List of synchronous copies (ISR): min.insync.replicas = 2

3. Disable unclean elections: unclean.leader.election.enable=false

The above three configurations will be analyzed step by step below:

Replica factor

The topic of Kafka can be partitioned, and multiple copies can be configured for partitions, which can be done through the replication.factor parameter. There are two types of partition replicas in Kafka: leader replicas (Leader Replica) and follower replicas (Follower Replica). Each partition elects one replica as the leader replica when it is created, and the remaining replicas automatically become follower replicas. In Kafka, follower copies do not provide services, that is, none of the follower replicas can respond to read and write requests from consumers and producers. All requests must be processed by the leader copy. In other words, all read and write requests must be sent to the Broker where the leader's copy is located, and that Broker is responsible for processing. The follower copy does not handle client requests, its only task is to asynchronously pull messages from the leader copy and write it to its own submission log, thus synchronizing with the leader copy.

In general, the replica is set to 3 to satisfy most usage scenarios, or it may be 5 copies (such as a bank). If the replica factor is N, the data can still be read from or written to the topic even if 1 broker fails. Therefore, a higher replica factor leads to higher availability, reliability, and fewer failures. On the other hand, the replica factor N requires at least N broker, and there will be N copies of the data, which means they will take up N times the disk space. In an actual production environment, there is generally a tradeoff between availability and storage hardware.

In addition, the distribution of replicas can also affect availability. By default, Kafka ensures that each copy of the partition is distributed on a different Broker, but if the Broker is on the same rack, the zone will not be available in the event of a rack switch failure. Therefore, it is recommended that Broker be distributed on different racks, and you can use the broker.rack parameter to configure the name of the rack where the Broker resides.

Synchronize copy list

In-sync replica (ISR) is called synchronous replicas, and replicas in ISR are replicas synchronized with Leader, so follower that is not in the list is considered out of sync with Leader. So what copy exists in ISR? First of all, it is clear that a copy of Leader always exists in ISR. Whether the follower copy is in ISR depends on whether the follower copy is "synchronized" with the Leader copy.

There is a parameter replica.lag.time.max.ms on the broker side of Kafka, which represents the maximum time interval between the lag of the follower copy and the Leader copy, and the default is 10 seconds. This means that as long as the follower copy lags behind the leader copy by no more than 10 seconds, the follower copy can be considered synchronized with the leader copy, so even if the current follower copy lags behind the Leader copy by a few messages, as long as it catches up with the Leader copy within 10 seconds, it will not be kicked out.

You can see that ISR is dynamic, so even if three replicas are configured for the partition, there will still be only one copy in the list of synchronized replicas (the other replicas are removed from the ISR list because they cannot be synchronized with leader in time). If this synchronous replica becomes unavailable, we must choose between availability and consistency (CAP theory).

According to Kafka's definition of reliability assurance, a message is considered committed only after it has been written to all synchronous replicas. But if all replicas here contain only one synchronous copy, the data will be lost when that copy becomes unavailable. If you want to ensure that committed data is written to more than one copy, you need to set the minimum number of synchronous copies to a larger value. For a subject partition with three replicas, if min.insync.replicas=2, there must be at least two synchronous replicas to write data to the partition.

If you have made the above configuration, you must ensure that there are at least two replicas in the ISR, and if the number of replicas in the ISR is less than 2, then Broker will stop accepting requests from the producer. The producer who tries to send the data will receive a NotEnoughReplicasException exception and the consumer can still continue to read the existing data.

Disable unclean election

The process of selecting a partition in a list of synchronized replicas as a leader partition is called clean leader election. Note that this is distinguished from the process of selecting a partition as a leader partition in an asynchronous replica, and the process of selecting a partition as a leader in an asynchronous replica is called unclean leader election. Because the ISR is dynamically adjusted, there will be cases where the ISR list is empty, and in general, asynchronous replicas lag too much behind Leader, so data loss may occur if these replicas are selected as the new Leader. After all, the messages saved in these copies lag far behind those in the old Leader. In Kafka, the process of electing such a copy can be controlled by the Broker side parameter unclean.leader.election.enable to allow Unclean leader elections. Turning on the Unclean leader election may result in data loss, but the benefit is that it keeps partitioned Leader replicas alive and does not stop providing services, thus improving high availability. On the contrary, the advantage of banning Unclean Leader elections is that it maintains data consistency and avoids message loss at the expense of high availability. This is the case with the CAP theory of distributed systems.

Unfortunately, the unclean leader election election process can still cause data inconsistencies because synchronized copies are not fully synchronized. Because replication is done asynchronously, there is no guarantee that follower will get the latest messages. For example, the offset of the last message in the Leader partition is 100. in this case, the offset of the copy may not be 100. this is affected by two parameters:

Replica.lag.time.max.ms: time lag between synchronized replicas and leader replicas

Zookeeper.session.timeout.ms: session timeout with zookeeper

In short, if we allow out-of-sync copies to become leader, we run the risk of data loss and data inconsistency. If they are not allowed to become leader, then we have to accept lower availability because we have to wait for the former leader to return to the available state.

With regard to unclean elections, different scenarios have different configurations. Systems that require high data quality and data consistency will disable leader elections for this kind of unclean (such as banks). If you are in a system with high availability requirements, such as a real-time clickstream analysis system, unclean's leader election is generally not disabled.

Q2: how to solve the problem of Kafka data loss?

You might ask: what's the difference between this question and Q1? In fact, it can be understood as a question in the interview question. The distinction is made here because the two solutions are different. Q1 looks at data loss from the Broker side of Kafka, while Q2 looks at data loss from the perspective of Kafka producers and consumers.

First, let's take a look at how to answer this question: it mainly includes two aspects:

Producer

Retries=Long.MAX_VALUE

Set retries to a larger value. Retries here is also a parameter of Producer, which corresponds to the automatic retry of Producer mentioned earlier. When there is an instantaneous jitter in the network, message transmission may fail. Producer configured with retries > 0 can automatically retry message sending to avoid message loss.

Acks=all

Set acks = all. Acks is a parameter of Producer that represents your definition of a "submitted" message. If set to all, it means that all copy Broker must receive the message before the message is considered "submitted". This is the highest level of "submitted" definition.

Max.in.flight.requests.per.connections=1

This parameter specifies how many messages the producer can send before receiving a response from the server. The higher its value, the more memory it takes up, but it also increases throughput. Setting it to 1 ensures that messages are written to the server in the order in which they are sent, even if a retry occurs.

Producer uses API with callback notifications, which means that instead of using producer.send (msg), use producer.send (msg, callback).

Other error handling

Using the producer's built-in retry mechanism, you can easily handle most errors without causing message loss, but you still need to handle other types of errors, such as message size errors, serialization errors, and so on.

Consumer

Disable autocommit: enable.auto.commit=false

Consumers submit the offset after processing the message

Configure auto.offset.reset

This parameter specifies what the consumer will do when there is no offset to submit (such as when the consumer starts for the first time) or if the requested offset does not exist on the broker (such as the data has been deleted).

There are two configurations for this parameter. One is earliest: the consumer reads data from the beginning of the partition, regardless of whether the offset is valid or not, which causes the consumer to read a large amount of duplicate data, but ensures minimal data loss. One is latest (the default). If you choose this configuration, the consumer will read the data from the end of the partition, which can reduce the repetition of processing messages, but is likely to miss some messages.

Can Q3:Kafka guarantee permanent data loss?

The above analysis of some measures to ensure the loss of data, to a certain extent, can avoid the loss of data. Note, however, that Kafka only guarantees limited persistence of "submitted" messages (committed message). Therefore, Kafka can not completely guarantee that data will not be lost, and some tradeoffs need to be made.

First, to understand what a committed message is, when several Broker of Kafka successfully receive a message and write it to the log file, they will tell the producer program that the message has been successfully submitted. At this point, the message officially becomes a submitted message in the eyes of Kafka. So whether it's ack=all or ack=1, in either case, the fact that Kafka only persists the submitted messages is the same.

Second, understand the limited persistence guarantee, that is, Kafka cannot guarantee that messages will not be lost under any circumstances. You must ensure that the Broker of the Kafka is available, in other words, if the message is stored on N Kafka Broker, the prerequisite is that at least one of the N Broker is alive. As long as this condition is true, Kafka can guarantee that your message will never be lost.

To sum up, Kafka can do without losing messages, but these messages must be submitted messages and meet certain conditions.

Q4: how to ensure that messages in Kafka are orderly?

First of all, it needs to be clear that the theme of Kafka is ordered by partitions. if a topic has multiple partitions, then Kafka will send it to the corresponding partition according to key, so for a given key, the corresponding record is ordered within the partition.

Kafka can guarantee that messages in the same partition are orderly, that is, producers send messages in a certain order, and Broker will write them into the corresponding partition in this order, and consumers will consume them in the same order.

In some scenarios, the order of messages is very important. For example, saving and then withdrawing money and withdrawing money before saving are two very different outcomes.

A parameter max.in.flight.requests.per.connections=1 is mentioned in the above question, which is used to ensure the order of data writing when the number of retries is greater than or equal to 1. If the parameter is not 1, then when the first batch fails and the second batch writes successfully, Broker retries to write to the first batch, and if the first batch retries to write successfully, the order of the two batch messages is reversed.

Generally speaking, if there is a requirement for the order of messages, in order to ensure that the data is not lost, you need to set the number of send retries retries > 0 and set the max.in.flight.requests.per.connections parameter to 1, so that when the producer tries to send the first batch of messages, no other messages will be sent to broker, although it will affect the throughput, but the order of messages can be guaranteed.

In addition, Topic with a single partition can be used, but it can seriously affect throughput.

Q5: how do I determine the number of partitions for an appropriate Kafka topic?

Choosing the appropriate number of partitions can achieve the purpose of highly parallel reading and writing and load balancing, and load balancing on partitions is the key to achieve throughput. Estimates need to be made based on the expected throughput of producers and consumers for each partition.

For example, Chestnut: suppose the expected data read rate (throughput) is 1GB/Sec, and a consumer's read rate is 50MB/Sec, then at least 20 partitions and 20 consumers (one consumer group) are required. Similarly, if you expect the rate of production data to be 1GB/Sec and the production rate for each producer is 100MB/Sec, you need to have 10 partitions. In this case, if you set up 20 partitions, you can ensure both the production rate of 1GB/Sec and the throughput of consumers. It is usually necessary to adjust the number of partitions to the number of consumers or producers, so that the throughput of both producers and consumers can be achieved.

A simple formula is: number of partitions = max (number of producers, number of consumers)

Number of producers = overall production throughput / maximum production throughput per producer for a single partition consumers = overall consumption throughput / maximum throughput per consumer consumed from a single partition Q6: how to adjust the number of partitions for Kafka topics in a production environment?

It is important to note that when we increase the number of partitions for a topic, it violates the fact that the same key has the same partition. We can create a new theme so that it has more partitions, then pause producers, copy data from the old theme to the new theme, and then switch consumers and producers to the new theme. it's going to be tricky.

Q7: how to rebalance the Kafka cluster?

You need to rebalance the cluster when the following occurs:

The uneven distribution of topic partitions in the whole cluster results in the unbalanced load of the cluster. Broker offline causes the partition to be out of sync. The new broker needs to get the load from the cluster.

Use the kafka-reassign-partitions.sh command for rebalancing

Q8: how to check whether there is lagging consumption in the consumer group?

We can use the kafka-consumer-groups.sh command to view it, such as:

$bin/kafka-consumer-groups.sh-bootstrap-server cdh02:9092-describe-group my-group

# # some of the following metrics will be displayed

TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID

Topic partition current offset LEO lag messages Consumer id host client id

In general, if it works well, the value of CURRENT-OFFSET will be very close to the value of LOG-END-OFFSET. With this command, you can see which partition's consumption is lagging.

After reading the above, have you mastered the methods of common interview questions in Kafka? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.