Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to analyze the mechanism of SparkStreaming conjecture

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

What this article shares with you is about how to analyze the SparkStreaming conjecture mechanism. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

What is the conjecture mechanism?

If there are a lot of task running, many task complete their tasks at once, but one task runs very slowly. In real-time computing tasks, if the real-time requirements are relatively high, even two or three seconds should care about these.

So there is a speculative mechanism in sparkstreaming specifically to solve this slow-running task.

Every once in a while to check which running task needs to be rescheduled, assuming that there are 10 total task, the number of task running successfully > 0.75x10, and the running time of running task > 1.5x the average time of successfully running task, then the running task needs to wait for scheduling again.

But there is a very serious problem here. I found it at the beginning of self-study, and then I talked about it in the videos of some institutions.

The question is, what if the running task encounters data skew?

If there are five task, one task encounters data skew, but even if it encounters data skew (slightly skewed, it's fine), it will complete the task, it takes 6s, and the other four tasks only need 1s. After turning on the conjecture mechanism, the task finally runs to 2s, which is about to succeed, but when it encounters the conjecture mechanism, it needs to be rescheduled and re-runned. the next time it runs for 3 seconds, it will run again when it encounters the conjecture mechanism. the whole process has been cycling, this is what the editor is going to say!

Solve

What if we turn on the conjecture mechanism and encounter data tilt?

We can use some solutions to data tilting, and talk about several solutions about data tilting:

1. If you find that there are only a few key that cause data skew and have little impact on the calculation itself, you can filter a small number of key that cause skew.

2. Two-stage aggregation, changing the same key into multiple different key by adding random prefixes, so that the data originally processed by one task can be dispersed to multiple task for local aggregation, thus solving the problem that a single task handles too much data. Then remove the random prefix and do the global aggregation again, and the final result can be obtained. But this method only applies to the shuffle operation of the aggregate class, not the shuffle operation of the join class.

3. For data skew caused by join, if only a few key cause skew, you can split a few key into independent RDD and add random prefix to break them into n parts for join. In this case, the data corresponding to these key will not be concentrated on a few task, but distributed to multiple task for join. It is suitable for join of two tables with large amount of data.

4. If a large number of key in the RDD leads to data skew during the join operation, then there is no point in splitting the key, so this solution can only be used to solve the problem. Change the same key into a different key by adding a random prefix, and then you can spread the processed "different key" into multiple task for processing, instead of having a single task handle a large number of the same key.

The above is how to analyze the SparkStreaming conjecture mechanism. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report