The problem of Kafka Cluster message backlog and how to deal with it 07/13 Update SLTechnology News&Howtos

The problem of Kafka Cluster message backlog and how to deal with it

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the backlog of Kafka cluster news and how to deal with it. Many people may not know much about it. In order to make you understand better, the editor has summarized the following contents for you. I hope you can get something according to this article.

Usually, enterprises will use polling or random methods to produce data to the Kafka cluster through the producer of Kafka to ensure that the data between Kafk partitions is evenly distributed as far as possible.

Under the premise of uniform distribution of partition data, if we design a reasonable number of topic partitions according to factors such as the amount of Kafka data to be processed. For some real-time tasks, such as Spark Streaming/Structured-Streaming, Flink and Kafka integrated applications, the consumer side does not hang up for a long time, that is, the data has been consumed continuously, so there is generally no backlog of Kafka data.

But these all have the premise, when some accident or unreasonable partition number setting situation occurs, the backlog problem is inevitable.

Typical scenarios of backlog of Kafka messages:

1. Real-time / consumption tasks fail

For example, the real-time application we wrote failed for some reason, and the task was not found by the monitor program to notify the relevant responsible person, and the responsible person did not write a script to automatically pull the task to restart.

Then before we restart the real-time application for consumption, messages will be delayed during this period of time. If the amount of data is very large, it can not be solved simply by restarting the application for direct consumption.

The setting of 2.Kafka partition number is unreasonable (too few) and consumers'"spending power" is insufficient

Kafka single partition production message speed qps is usually very high, if consumers for some reason (for example, due to the complexity of business logic, consumption time will be different), consumption lag will occur.

In addition, the number of Kafka partitions is the minimum unit for Kafka parallelism tuning, and if the number of Kafka partitions is set too little, it will affect the throughput of Kafka consumer consumption.

The key of 3.Kafka messages is uneven, resulting in uneven interval data.

When using Kafka producer messages, you can specify key for the message, but the key is required to be uniform, otherwise there will be Kafka interval data imbalance.

So, in view of the above situation, is there any good way to deal with the backlog of data?

In general, there are the following targeted solutions:

1. Consumption lag caused by failure of real-time / consumption tasks

a. After the task is restarted, the latest messages are directly consumed, and the "lagging" historical data is "remedied" by offline programs.

In addition, it is recommended that the task be brought into the monitoring system, and when there is a problem with the task, the relevant person in charge should be notified in time to deal with it. Of course, task restart scripts are also necessary, and it also requires strong exception handling ability of the real-time framework to avoid the inability to restart the task caused by non-standard data.

b. Task startup starts consumption processing from the last time the offset was submitted

If the backlog of data is large, you need to increase the processing power of the task, such as increasing resources, so that the task can consume processing as quickly as possible and catch up with the latest news.

Fewer 2.Kafka partitions

If the amount of data is large, a reasonable increase in the number of Kafka partitions is the key. If spark stream and Kafka direct approach mode are used, KafkaRDD can also be re-partitioned by repartition to increase parallelism.

3. Due to the unreasonable setting of Kafka message key, the partition data is uneven.

At Kafka producer, you can add a random suffix to key to make it balanced. After reading the above, do you have any further understanding of the backlog of Kafka cluster messages and how to deal with them? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.