In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article shows you how to minimize data loss when the Kafka partition is unavailable and the copy is corrupted. The content is concise and easy to understand, which will definitely brighten your eyes. I hope you can gain something through the detailed introduction of this article.
The following is devoted to the failure reproduction of unavailable partitions, and gives some of my actions to minimize data loss.
Fault recurrence
Let me use an example to reproduce an example where the partition is not available and the leader copy is corrupted:
Start broker0 with the unclean.leader.election.enable = false parameter
Start broker1 with the unclean.leader.election.enable = false parameter
Create topic-1,partition=1,replica-factor=2
Write a message to topic-1
At this point, the copies on both broker are in ISR, and the copy of broker0 is a leader copy.
Stop broker1, where the leader of topic-1 is still the copy of broker0, and the copy of broker1 is removed from ISR
Stop broker0 and delete log data on broker0
Restart broker1,topic-1 to try to connect to the leader copy, but broker0 has stopped running and the partition is unavailable and cannot write messages
Restore the replica on broker0,broker0 and restore the leader position. When broker1 tries to join ISR, but because the data of leader is cleared, that is, the offset is 0, the copy of broker1 needs to truncate the log to keep the offset not greater than the leader copy, and all the data of the partition is lost.
My advice.
Is it possible to provide an option for users to manually set any copy within the partition as leader when the partition is not available?
Because once unclean.leader.election.enable = false is set in the cluster, a copy other than ISR cannot be elected as leader. In extreme cases, only a copy of leader is still in ISR, and the broker where leader resides is down. What if the broker data is corrupted at this time? In this case, is it possible for the user to choose a copy of leader? Although there will be data loss in doing so, the situation will be much better than losing data for the entire partition.
My coquettish operation
First of all, you need to have an unavailable partition (and the leader copy of that partition has lost data). If you are testing, you can repeat the above failure steps 1-8 to achieve an unavailable partition (you need to add a broker):
At this time, the leader copy is in broker0, but it has been hung up, and the partition is not available. The broker2 copy cannot be selected as leader due to falling out of ISR, and the leader copy has been corrupted and erased. If you restart the broker0,follower copy at this time, the log will be truncated and all data of the partition will be lost.
After a series of tests and experiments, I have summed up the following actions, which can forcibly select a copy of broker2 as leader to minimize data loss:
1. Use the kafka-reassign-partitions.sh script to reallocate the partition of the topic. Of course, you can also use the kafka-manager console to reallocate the partition of the topic, as follows:
At this point, the preferred leader has been changed to the copy of the broker2, but the leader is still the copy of the broker0. It is important to note that the preferred leader after partition redistribution must be the copy that kicked out the ISR before, rather than the partition redistributing the newly generated copy. Because the offset of the newly generated copy is 0, if the automatic redistribution is not satisfied, you need to write a json file and manually change the allocation policy.
2. Go to zk, check the partition status and modify its contents:
Modify the node content, forcibly change the leader to 2 (the same as the preferred leader after redistribution), and add 1 to the leader_epoch, and change the ISR list to leader, as follows:
At this point, the kafka-manager console looks like this:
But it still doesn't work at this time. Remember that you need to restart broker 0 at this time.
3. Restart broker0 and find that the lastOffset of the partition has become the lastOffset of the copy of broker2:
Successfully recovered 46502 message data, although still lost 76053-46502 = 29551 message data, but it is better than losing all of it!
The above is how to minimize data loss when the Kafka partition is unavailable and the copy is corrupted. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.