Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to look at the consumption rebalance of kafka from a production error

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about how to look at kafka's consumption rebalancing problem from a production error. Many people may not know much about it. In order to let you know more, Xiaobian summarizes the following contents for you. I hope you can gain something according to this article.

problem description

A section of the project error log on production is as follows,

This is a kafka error log, which roughly means,

Kafka's server did not receive a heartbeat from a certain consumer for more than max.poll.interval.ms, and thought that the consumer had "hung up," so it carried out "rebalancing" of the partition ownership of the topic.

the analysis

According to my personal habits, when I encounter such production problems, I will think about the technical details involved and sort them out after solving them.

If you have a good understanding of the technical details involved in the problem, it is very helpful for locating the problem. This article will take you to understand some of the technical details involved in the error log above.

Kafka's topic partition

To improve the high availability of message processing and facilitate horizontal scaling, kafka introduces the concept of topic partitioning. Consumers belonging to the same consumer group can share messages from different partitions of the same topic. Therefore, the function of diversion can be achieved, and the message processing can be more efficient.

Topic A has three partitions, and we have three consumers who belong to the same group, so each consumer is responsible for consuming one partition. Everyone was responsible for their own partitions, and the system was operating in an orderly manner.

In general, we increase kafka's spending power by increasing the number of consumers in the group. Be careful, however, not to allow the number of consumers to exceed the number of theme partitions, the excess consumers will only be idle.

heartbeat mechanism

Kafka's server needs to monitor which consumers are consuming all the time. The monitoring mechanism is realized by constantly sending heartbeat packets from consumers. There are two ways for consumers to send heartbeats, one is poll (here not for show English, pay attention to contact the error log above), and the other is to submit offset after consumption.

These two ways are two separate threads that do not interfere with each other.

As long as the consumer sends a heartbeat at regular intervals, it is considered active, indicating that it is still reading messages in the partition, otherwise it is considered "dead."

This so-called normal interval cannot exceed max. poll. interval. ms.

Zonal reequilibrium of kafka

Consumers maintain their affiliation with groups and their ownership of partitions by sending heartbeats to servers. If the server believes that a consumer is "dead," it triggers a reequilibrium.

As mentioned earlier, consumers in a group collectively read the topic's partition.

For example, a new consumer joins the group and reads messages that were originally read by other consumers. When a consumer is shut down or crashes, it leaves the group and the partition it was reading is read by other consumers in the group.

The transfer of ownership of a partition from one consumer to another is called rebalancing.

What's the point of rebalancing?

Of course, with rebalancing, we can safely add or remove consumers without worrying about losing messages.

solve problems

After understanding the relevant technical details, we can follow the clues and slowly investigate the problem. Based on the previous analysis, I give several directions for investigation:

See if a customer's service is down?

If the service is running normally, whether the node where the service is located has a memory or CPU full, which causes the consumer to be unable to send a heartbeat in time. The situation I encountered was caused by the latter. Later, after solving the memory full problem, kafka's error does not exist.

Consider increasing the value of max.poll.interval.ms according to your actual business situation.

Having read all this, do you have any idea how to look at kafka's consumption rebalancing problem from a production error? If you still want to know more knowledge or related content, please pay attention to the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report