Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to understand the problem that Spark Streaming perceives kafka dynamic partitioning

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

How to understand the problem of Spark Streaming perceiving kafka dynamic partitioning? for this problem, this article introduces the corresponding analysis and solution in detail, hoping to help more partners who want to solve this problem to find a more simple and feasible method.

The editor mainly explains the problem of new partition detection based on the combination of Spark Streaming and kafka.

Before reading, you need to understand the principle and source structure of Spark Streaming.

Kafka version 0.8

To get to the point, the reason for the confusion of today's topic is that in the 08 version of DirectStream, which is a combination of kafka and Spark Streaming, the new partition or topic detection of kafka is not supported. And this problem, for many companies with obvious business growth, there will be corresponding problems.

For example, if the business growth of the original company is relatively obvious, then the kafka throughput, the number of topic and the number of partitions created at the beginning may not be able to meet the concurrency requirements, and more partitions are needed. The newly added partition will have producers to write data into it, and the API combined with Spark Streaming and kafka version 0.8 can not meet the need to dynamically find new topic or partitions in kafka.

Is there any basis for saying so? We can't follow the example of others when we do a project, so we can verify our ideas from the source code.

We won't go into the Spark Streaming source code here, but we can think about it here, where is Spark Streaming partition detection done?

Obviously for batch Spark Streaming tasks, partition detection should be involved every time job generates a fetch kafkaRDD to determine the number of partitions to kafkaRDD and assign an offset range to each partition, and this code is in the DirectKafkaInputDStream#compute method. (those who have seen the Langjian Spark Streaming source video tutorial will know.)

So let's post this source code to verify our idea, first the first line of the compute method:

Val untilOffsets = clamp (latestLeaderOffsets (maxRetries))

What is obtained here is the maximum offset consumed by each partition of the generated KafkaRDD, so we need to go to latestLeaderOffsets to take a closer look, and we can find the following line of code:

Val o = kc.getLatestLeaderOffsets (currentOffsets.keySet)

This is based on the currentOffsets information to obtain the largest offset, which continues to find that because it only obtains the largest offset based on the currentOffsets information and does not perceive the new partition, the combination of Spark Streaming and kafka 0.8 cannot dynamically perceive the partition.

Kafka version 0.10

Similarly, we can also directly look at the source code of kafka 0.10 to check whether it will dynamically generate kafka partitions.

Enter the compute of DirectKafkaInputDStream, and the first line of code you see is also:

Val untilOffsets = clamp (latestOffsets ())

In latestOffsets, there is a new continent:

This is the answer to the question about how to understand Spark Streaming-aware kafka dynamic partitioning. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel for more related knowledge.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report