In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the knowledge of "will Kafka message middleware lose messages?". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Get to know Kafka
Take a look at the definition of Wikipedia
Kafka is a distributed publish-subscribe messaging system. It was originally developed by LinkedIn and later became part of the Apache project.
Kafka is a distributed, divisible, redundant backup persistent logging service. It is mainly used to deal with active streaming data.
Kafka architecture
The overall architecture of Kafka is very simple, is an explicit distributed architecture, and is mainly composed of producer, broker (kafka) and consumer.
Kafka Architecture (Lite version)
The Producer (producer) can publish the data to the selected topic (topic). The producer is responsible for assigning records to which partition (partition) of the topic. Load balancing can be achieved simply using loops, or based on some semantic partitioning functions, such as key in records.
The Consumer (consumer) is identified by a consumer group (consumer group) name, and each record published to the topic is assigned to a consumer instance in the subscription consumer group. Consumer instances can be distributed across multiple processes or on multiple machines.
Will Kafka lose the message or not?
Before discussing whether kafka loses messages, let's take a look at what messaging semantics is.
Message passing semantics
Message delivery semantic is the semantics of message passing, which is simply the guarantee of message delivery in the process of message delivery. It is mainly divided into three categories:
At most once: once at the most. Messages may be lost or processed, but they can only be processed once at most.
At least once: at least once. Messages are not lost, but may be processed multiple times. It may be repeated and will not be lost.
Exactly once: pass it exactly once. The message is processed and only processed once. No loss, no repetition, just once.
Ideally, you must want the message delivery of the system to be strictly exactly once, that is, it is guaranteed that it will not be lost and will only be processed once, but it is difficult to do so.
Back to the protagonist Kafka,Kafka, there are three message delivery processes:
Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community
The producer sends a message to Kafka Broker.
Synchronization and persistence of Kafka Broker messages
Kafka Broker passes the message to the consumer.
Messages may be lost in each of these three steps. Here is a detailed analysis of why messages are lost and how to avoid them as much as possible.
Producer lost message
Let's first introduce the general process for producers to send messages (some processes are strongly related to specific configuration items, which are ignored here):
Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community
The producer interacts directly with the leader, so first get the leader metadata of the topic corresponding partition from the cluster
Send the message directly to the leader partition metadata after getting it.
The leader partition corresponding to Kafka Broker is persisted after receiving the message.
Follower pulling Leader messages is consistent with Leader data.
After pulling the Follower message, you need to reply ACK confirmation message to Leader.
After the Kafka Leader and Follower partitions are synchronized, the Leader partition will reply to the producer with an ACK confirmation message.
Producer sends data flow
The producer publishes the data to broker in push mode, and each message is appended to the partition and sequentially written to disk. After the message is written to Leader, Follower actively synchronizes with Leader.
There are two ways to send Kafka messages: synchronous (sync) and async (synchronous). The default is synchronous, which can be configured through the producer.type property.
Kafka confirms the production of messages by configuring the request.required.acks property:
0 indicates that there is no confirmation of whether the message is received successfully; there is no guarantee that the message is sent successfully, and the generation environment will not be used.
1 means it is confirmed when the Leader is received successfully; as long as the Leader is alive, it is guaranteed not to be lost, thus ensuring the throughput.
-1 or all indicates that both Leader and Follower are confirmed when they are received successfully; it ensures that the message is not lost as much as possible, but the throughput is low.
The default value of the parameter acks for kafka producer is 1, so the default producer level is at least once, not exactly once.
Knock on the blackboard, you may lose news here!
If the acks is configured to 0, the network jitter message is lost, and the producer will not know it is lost without verifying the ACK.
If the acks is configured as 1 to ensure that the leader is not lost, but if the leader fails, and you happen to choose a follower without ACK, you will also lose it.
All: make sure that leader and follower are not lost, but if the network is congested and no ACK is received, there will be a problem of repetition.
Kafka Broker lost message
After receiving the data, Kafka Broker will persist the data, which you think is something like this:
Message persistence, no cache
I didn't expect it to be like this:
Message persistence, with cache
The operating system itself has a cache called Page Cache. When writing to a disk file, the system first writes the data stream to the cache, and it is up to the operating system to decide when to write the cached data to the file.
Kafka provides a parameter producer.type to control whether it is an active flush. If Kafka is written to mmap, it is called flush and then returned to Producer (sync); immediately after mmap is written, Producer is returned without calling flush (async).
Knock on the blackboard, you may lose news here!
Kafka can ensure that the data will not be lost as much as possible through the multi-partition and multi-copy mechanism. If the data has been written to the system cache but has not been flushed into the disk, it will be lost suddenly when the machine goes down or the power goes down. Of course, this situation is very extreme.
Consumer lost message
Consumers actively go to the kafka cluster to pull messages through pull mode. Like producer, consumers also go to the leader partition to pull messages.
Multiple consumers can form a consumer group (consumer group), and each consumer group has a group id. Consumers of the same consumer group can consume data from different partitions under the same topic, but there will not be multiple consumers consuming data from the same partition.
Consumer group consumption message
The progress of consumer consumption is saved in the _ _ consumer_offsets topic of the kafka cluster through offset.
The consumption of messages is mainly divided into two stages:
1. Identity message has been consumed, commit offset coordinates
2. Processing messages.
Knock on the blackboard, you may lose news here!
Scenario 1: commit before processing messages. If there is an exception while processing the message, but the offset has been submitted, the message is lost to the consumer and will never be consumed again.
Scenario 2: first process the message and then commit. If an exception occurs before the commit, the message will be consumed next time. The problem of repeated consumption can be solved by ensuring the idempotency of the message.
Summary
So the question is, will kafka lose the news or not? The answer is: yes!
Kafka may lose messages in three phases:
(1) producers send data
(2) Kafka Broker stores data
(3) Consumer consumption data
In fact, it is difficult to strictly implement exactly once in a production environment, at the same time, efficiency and throughput will be sacrificed. The best practice is to make a good compensation mechanism on the business side, which can be covered in case of message loss.
This is the end of the content of "will Kafka message Middleware lose messages?" Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.