Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of Back Pressure in flink and spark Streaming

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the example analysis of Back Pressure in flink and spark Streaming. The editor thinks it is very practical, so I share it with you for reference. I hope you can get something after reading this article.

Back pressure of Spark Streaming

Before we talk about flink's back pressure, let's talk about Spark Streaming's back pressure. The reason for the emergence of Spark Streaming's back pressure, I think we all know, is to cope with the short-term data spike. Spark Streaming's back pressure was introduced after spark 1.5. before, we can only limit the maximum consumption speed (this important man-made pressure estimate). For the Receiver-based form, we can limit the maximum recorded data per second per receiver by configuring the spark.streaming.receiver.maxRate parameter. For Direct Approach data reception, we can limit the maximum number of records read per Kafka partition per job by configuring the spark.streaming.kafka.maxRatePerPartition parameter.

The disadvantages of this speed limit are obvious, for example, if our back-end processing capacity exceeds this maximum limit, it will lead to a waste of resources. A stress test estimate is required for each spark Streaming task. The cost is relatively high. Therefore, back pressure was introduced from 1.5. this mechanism is actually based on the concept of pid in automatic control theory. Let's briefly talk about the idea: in order to automatically adjust the data transmission rate, a component named RateController is added to the original architecture, which inherits from StreamingListener, which listens for onBatchCompleted events of all jobs, and estimates a rate based on processingDelay, schedulingDelay, the number of records currently processed by Batch, and completed events; this rate is mainly used to update the maximum number of records that can be processed by the stream per second. In this way, it can be realized that if the processing power is good, there will be a larger maximum value, and when the processing power decreases, a smaller maximum value will be generated. To ensure that the Spark Streaming runs smoothly.

Pid rate calculation source code

Configure back pressure for Spark Streaming

Spark.streaming.backpressure.initialRate: the initial maximum rate at which each receiver receives the first batch of data when the reverse pressure mechanism is enabled. The default value is not set.

Spark.streaming.backpressure.rateEstimator: speed estimator class. The default value is pid. Currently, Spark only supports this. You can implement it according to your own needs.

Spark.streaming.backpressure.pid.proportional: weight used to respond to errors (changes between the last batch and the current batch). The default value is 1 and can only be set to a non-negative value. Weight for response to "error" (change between last batch and this batch)

Spark.streaming.backpressure.pid.integral: response weight accumulated by errors, with inhibitory effect (effective damping). The default value is 0.2 and can only be set to a non-negative value. Weight for the response to the accumulation of error. This has a dampening effect.

Spark.streaming.backpressure.pid.derived: response weight to error trends. This may cause fluctuations in batch size and can help quickly increase / decrease capacity. The default value is 0 and can only be set to a non-negative value. Weight for the response to the trend in error. This can cause arbitrary/noise-induced fluctuations in batch size, but can also help react quickly to increased/reduced capacity.

Spark.streaming.backpressure.pid.minRate: what is the lowest rate that can be estimated? The default value is 100 and can only be set to a non-negative value.

For more Spark tutorials, follow the official account of Langjian: Spark Learning skills

BackPressure of Flink

If you see a back pressure alert for task (for example, high), this means that production data is consumed faster than downstream operators. The transmission direction of Record in your workflow is downstream, such as from source to sink, while back pressure is exactly in the opposite direction, upstream.

To take a simple example, a workflow has only two steps from source to sink. If you see an alarm on the source side, this means that the sink consumption data rate is slower than the producer's production data rate. Sink is back pressure upstream.

Sampling thread

Back Pressure (later translated as back pressure) monitors the task by repeatedly sampling the tack trace sample data of the running tasks. JobManager will repeatedly trigger a call to Thread.getStackTrace () for your job's tasks.

If the sample data shows that the task thread is stuck in an internal method call (requesting a buffer from the network stack), there is back pressure on the task.

By default, jobmanager triggers stack traces 100 times per 50ms to determine whether or not to perform back pressure. The ratio shown in the Web interface tells you that in these stack traces, the percentage of stack traces blocking internal method calls, for example, 0.01, means that one in 100 internal calls is blocked.

OK: 0

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report