Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to resolve the loss and consumption of Kafka messages exactly one-time

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to analyze Kafka message loss and consumption accurately. Many people may not know much about it. In order to let everyone know more, Xiaobian summarized the following contents for everyone. I hope you can gain something according to this article.

Message loss scenarios

If Kafka Producer sends a message in a "send and forget" manner, that is, calls the producer.send(msg) method to send the message, the method returns immediately, but this does not mean that the message has been sent successfully. See how messages are sent for first encounter with Kafka producer.

If network jitter occurs during the message process, the message is lost, or the message itself is sent out of specification, such as being larger than the Broker can handle, etc.(the case of too large messages has actually been encountered in production and finally solved by packetizing messages before sending and sending them in turn).

The solution to this problem is that Producer sends messages using a method with callback notification, producer.send(msg, callback). Callback method can tell us whether the message is really submitted successfully. Once the message fails to be sent, you can use code for fault tolerance and remediation.

For example, if the message is lost due to network jitter, Producer can retry; if the message is unqualified, adjust the message format and send it again. Producer uses a message sending API with callbacks to discover if a message failed to be sent and handle it accordingly.

Consumers lose data

The loss of data at the Consumer end is mainly reflected in: pulling messages and submitting consumption displacement, but suddenly downtime and other failures occur before the message processing ends. After the consumer is reborn, the consumer will restart consumption from the next position of the previously submitted displacement, and the previously unprocessed message will not be processed again, i.e., the consumer has lost the message.

The method to solve the loss of messages at the Consumer end is also very simple: change the timing of displacement submission to the completion of message processing, confirm that a batch of messages have been consumed, and then submit the corresponding displacement. In this way, even if an exception occurs in the process of processing the message, since the displacement is not submitted, the message will be pulled again from the last displacement at the next consumption, and no message loss will occur.

The specific implementation method is that the Consumer turns off automatic submission of displacement when consuming messages, and the application manually submits displacement.

Broker lost data

Broker data loss mainly occurs in the following situations:

The original Broker is down, but it elects a Broker that lags too much behind the Leader to become the new Leader, so these backward messages are lost, and these "unleader" Brokers can be prohibited from running for Leader;

Kafka uses page caching mechanisms to write messages to page caches rather than persisting them directly to disk, leaving disk scrubbing to the operating system to schedule, thus ensuring high efficiency and throughput. If some messages are still in the memory page and not persisted to disk, the Broker will be down, and this part of messages will be lost after restarting. Multi-copy mechanism can avoid message loss at the Broker end.

Best practices to avoid message loss

Instead of producer.send(msg), use producer.send(msg, callback) with callback;

Set acks = all. The acks parameter is a parameter of Producer that represents the definition of the message "Submitted." If it is set to all, it means that all Broker replicas must receive the message before the message is considered "submitted", which is the highest level of "submitted" standard;

Retries is set to a large value. Retries represents the number of retries after the Producer fails to send messages. If a transient fault such as network jitter occurs, the message can be resent through the retry mechanism to avoid message loss.

Set unclean.leader.election.enable = false. This is a Broker side parameter that indicates which brokers are eligible to run for Leader of the partition. If a Broker where a Follower is too far behind the Leader becomes a new Leader, it will inevitably lead to message loss, so set this parameter to false, that is, this situation is not allowed to happen;

Set replication.factor >= 3. Broker side parameter, indicating that the number of copies of each partition is greater than or equal to 3, and redundant mechanism is used to prevent message loss;

Set min.insync.replicas > 1. Broker side parameter, which controls how many copies of the message are written at least. Cai Shuan is "Submitted." Setting this parameter to greater than 1 can improve message persistence.

Make sure replication.factor > min.insync.replicas. If both are equal, then if one copy dies, the entire partition will not function properly. Recommended settings: replication.factor = min.insync.replicas + 1;

Make sure that the displacement is submitted after the message is consumed. Set the Consumer parameter enable.auto.commit to fasle, turn off the automatic submission of displacement, and use the form of manual submission of displacement.

Accurate one-time consumption

The default message reliability mechanism provided by Kafka is "at least once," that is, the message will not be lost. As we know in the previous section, if the Producer fails to send the message, it can solve it by retry. If the Broker response is not successfully sent to the Producer (such as network jitter), the Producer will retry at this time and send the original message again. This is why Kafka provides messages at least once by default, although this may cause messages to be sent repeatedly.

If you need to guarantee "at most once" for message consumption, then prohibit Producer retry. But failed messages are lost forever if not retried. Are there other ways to ensure that messages are sent without loss or duplication? In other words, even if the Producer sends some messages repeatedly, the Broker side can automatically de-duplicate them.

Kafka actually ensures that messages are consumed exactly once through two mechanisms:

idempotent

transaction (Transaction)

the idempotency

The so-called idempotent is simply that the result of multiple calls to the interface is consistent with the call once. Producer is not idempotent by default in Kafka, which introduced this feature in version 0.11.0.0. Set the parameter enable. idempotency to true to specify the idempotent of Producer. After turning on idempotent producers, Kafka automatically de-resends messages. In order to realize idempotent nature of producers, Kafka introduced two concepts: producer id (hereinafter referred to as PID) and sequence number.

When a producer instance is created, it is assigned a PID that is completely transparent to the user. For each PID, each partition to which the message is sent has a corresponding sequence number that monotonically increases from 0. Each time the producer sends a message, the serial number corresponding to **** is incremented by 1.

The Broker side maintains a sequence number SN_old in memory for each pair. For each message sent by the producer, the sequence number SN_new is judged and processed accordingly.

Broker accepts this message only if SN_new is 1 greater than SN_old, i.e. SN_new = SN_old + 1;

SN_new

< SN_old + 1,说明消息被重复写入,broker直接丢弃该条消息; SN_new >

SN_old + 1, indicating that there is data in the middle that has not been written, and there is a message disorder. There may be a message loss phenomenon, and the corresponding producer will throw OutOfOrderSequenceException.

Note: Sequence number is for, which means that idempotent producers can only ensure that messages in a single partition of a single topic are not repeated; secondly, it can only achieve idempotent on a single session, and cannot achieve idempotent across sessions, where the session can be understood as: a run of the Producer process. When the Producer process is restarted, the idempotent guarantee is broken.

Affairs

Idempotent does not work across multiple partitions, and Kafka transactions compensate for this deficiency. Kafka has provided transaction support since version 0.11, primarily at the read committed isolation level. It can ensure that multiple messages are atomically written to the target partition, and it can also ensure that the Consumer can only see the successful transaction submission message.

Producer-side configuration

A transactional Producer guarantees that messages are written atomically to multiple partitions. Batches of messages either all succeed or all fail. Moreover, after the restart of transactional producers, Kafka still guarantees that they send messages accurately once. The configuration to enable transactional Producer is as follows:

As with idempotent Producer, enable. idempotency = true.

Set the Producer side parameter transcational. id. It is best to give them a meaningful name.

Producer with transaction type can call some transaction APIs, such as initTransaction, initiationTransaction, commitTransaction and abortTransaction, which correspond to transaction initialization, transaction initiation, transaction submission and transaction termination respectively.

producer.initTransactions();try { producer.beginTransaction(); producer.send(record1); producer.send(record2); producer.commitTransaction(); } catch (KafkaExecption e) { producer.abortTransaction(); }

In the above code, the transactional Producer can guarantee that record1 and record2 either both commit successfully or fail to write. In fact, even if the write fails, Kafka will write them to the underlying log, that is, the Consumer will still see these messages, and the specific Consumer side reads the messages sent by the transactional Producer.

Consumer configuration

When reading a message sent by a transactional Producer, the isolation.level parameter on the Consumer side indicates the isolation level of the transaction, that is, determines the level at which the Consumer reads the message. This parameter has two values: read_uncommitted: default value, surface Consumer can read any message written by Kafka, regardless of whether the transaction Producer committed the transaction properly. Obviously, if transactional Producer is enabled, the Consumer argument should not use this value, otherwise the transaction is invalid. read_committed: Surface Consumer reads only messages written in transactions successfully committed by transactional producers, and all messages written by non-transactional producers are visible to Consumer.

After reading the above, do you have any further understanding of how to resolve Kafka message loss and consumption accuracy? If you still want to know more knowledge or related content, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report