Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize the performance comparison of Flink,Storm,SparkStreaming

2025-03-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)05/31 Report--

How to achieve Flink,Storm,SparkStreaming performance comparison, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

A team has previously published a blog post showing the performance test results of Storm, Flink, and Spark Streaming. This test is of great value to the industry because it is the first benchmark in the field of stream processing based on real applications.

The application exposes messages from Kafka consumer ads, looks up the publicity campaign corresponding to each ad from Redis, and groups according to the advertising campaign, calculating the number of ad views in a window of 10 seconds. The final results of 10-second windows are stored in Redis, and the status of these windows is also written to Redis according to the frequency of recording once per second, so that users can query them in real time.

In the initial performance evaluation, because Storm is a stateless stream processor (that is, it cannot define and maintain state), Flink jobs are also written in stateless mode. All states are stored in Redis.

In the performance evaluation, Spark Streaming has encountered the problem of both throughput and latency. With the increase of batch operation scale, the delay increases. If you scale down to reduce latency, the throughput will be reduced. Storm and Flink can maintain low latency as throughput increases.

In order to further test the performance of Flink, testers set up a series of different scenarios and test them step by step.

The initial performance evaluation focused on measuring end-to-end latency at relatively low throughput, even in the limit state, without paying attention to fault tolerance. In addition, the key cardinality in the application is very small, which makes the test results do not reflect the large number of users, or the growth of key space over time.

Because the initial test results show that the performance of Spark Streaming is poor, the only targets in this test are Storm and Flink, which have similar performance in the initial test.

The first change is to reimplement the application using the state fault tolerance feature provided by Flink, as shown in figure 5-15. This enables the application to guarantee exactly-once.

The second change is to increase the amount of data in the input stream by using a data generator that generates millions of events per second.

The results are as follows:

Results of using a high throughput data generator: (a) when Storm is used with Kafka, the application can maintain a processing speed of 400,000 events per second, and the bottleneck lies in CPU;. When Flink is used with Kafka, the application can maintain a processing speed of 3 million events per second, and the bottleneck lies in the network.

(B) when eliminating network bottlenecks, Flink applications can maintain a processing speed of 1500 million events per second

(C) in additional tests, message queuing is provided by MapR Streams and uses 10 high-performance network nodes (hardware is different from the first two cases); Flink applications can maintain a processing speed of 1000 events per second.

Storm can withstand 400,000 events per second, but limited by CPU;Flink, it can reach 3 million events per second (7.5x), but it is limited by the network between Kafka cluster and Flink cluster.

To see how Flink performs when there are no network bottlenecks, we move the data generator inside the Flink application. Under such conditions, Flink can maintain a processing speed of 1500 million events per second (which is 37.5 times that of Storm).

Integrating data generators into Flink applications can test performance limits, but this is not realistic because real-world data must flow in from outside the application.

It is worth noting that this is by no means the limit of Kafka (Kafka can support greater throughput than this), but only the limit of the hardware environment used for testing-the network connection between the Kafka cluster and the Flink cluster is too slow.

The last change is to increase the key base (the number of advertising campaigns). In the initial test, the key cardinality was only 100. These key are written to Redis every second for query. When the key cardinality increases to 1 million, the overall throughput of the system decreases to 280,000 events per second because writing to Redis becomes a system bottleneck. An early prototype of the queryable state of Flink can eliminate this bottleneck, restore the processing speed of the system to 1500 million events per second, and have 1 million key to query.

By moving the query function into a prototype of the Flink queryable state, the system can maintain a processing speed of 1500 million events per second even when the key base is very large.

What does this example illustrate? By avoiding flow processing bottlenecks and taking advantage of the stateful flow processing capabilities of Flink, the throughput can reach about 30 times that of Storm, while ensuring exactly-once and high availability. Roughly speaking, this means that the hardware cost or cloud computing cost of Flink is only 30 times that of Storm, and the same hardware can handle 30 times as much data as the former.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report