Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the performance comparison experiment analysis of Flink using RocksDB and Gemini?

2025-04-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Today, I will talk to you about the comparative experimental analysis of the performance of Flink using RocksDB and Gemini, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Abstract: we will conduct stress tests on RocksDB, Heap and Gemini in the same scenario, and compare their resource consumption. The Flink kernel version tested is 1.10.0.

Weibo machine learning platform uses Flink to implement multi-stream join to generate samples needed for online machine learning. The data in the time window is cached in state, and the latency of state access usually determines the performance of the job. Open source Flink state storage mainly includes RocksDB and Heap, and at last year's Flink Forward conference, we learned that Aliyun VVP product developed a higher performance state storage plug-in Gemini, and tested and tried it.

Test scenario

We use the real sample stitching business as the test scenario. By union the data of multiple streams and keyby the specified key, we obtain the corresponding fields from each stream in the aggregation function, and reassemble the required fields into a new object and store them in value state. Here, a timer is defined for each new object, replacing TimeWindow with the timer function, and the data is emitted to the downstream operator at the end of the window. The main reason for using timer function is that timer is more flexible, more convenient for users to customize, and performs better in the practicability and expansibility of the platform. MemoryStateBackend vs. RocksDBStateBackend

First of all, it is important to note that MemoryStateBackend is not recommended for online use. The main purpose here is to quantify the resource consumption of using Heap to store state through testing. The configuration of checkpoint in our test is as follows:

CheckpointInterval:10 minutes CheckpointingMode: EXACTLY_ONCECheckpointTimeout:3 minutes

At the same time, the following configurations have been added to RocksDB:

SetCompressionType:LZ4_COMPRESSIONsetTargetFileSizeBase:128 * 1024 * 1024setMinWriteBufferNumberToMerge:3setMaxWriteBufferNumber:4setWriteBufferSize:1GsetBlockCacheSize:10GsetBlockSize:4 * 1024setFilter:BloomFilter (10, false)

The test found that when the same job processes the same amount of data, the job throughput using MemoryStateBackend is similar to that of RocksDB (input qps is 300000, aggregate output qps is 20,000), but the memory (taskmanager.heap.mb) required is 8 times that of RocksDB, and the corresponding machine resources are 2 times that of RocksDB.

From this, we come to the following conclusions:

Using MemoryStateBackend requires a lot of Heap space to store the state data in the window (samples). Compared with putting the data on disk, the advantage is that the processing performance is very good, but the disadvantage is obvious: because the storage efficiency of Java objects in memory is not high, GB-level memory can only store 100 megabytes of real physical data, so there will be a lot of memory overhead, and the downtime of JVM mass GC is relatively high. Affect the overall stability of the job, in addition, encounter hot events will have the risk of OOM.

Using RocksDB requires less Heap space, enlarged the Native area for read cache, and the efficient disk read and write strategy combined with RocksDB still has a good performance.

GeminiStateBackend vs. RocksDBStateBackend

You can specify the use of Gemini state backend in Ververica Platform products by:

State.backend=org.apache.flink.runtime.state.gemini.GeminiStateBackendFactory

At the same time, we have made the following basic configuration for Gemini:

/ / specify the local directory kubernetes.taskmanager.replace-with-subdirs.conf-keys= state.backend.gemini.local.dirstate.backend.gemini.local.dir=/mnt/disk3/state when Gemini is stored / mnt/disk5/state// specifies Gemini's page compression format (page is the minimum physical unit of Gemini storage) state.backend.gemini.compression.in.page=Lz4// specifies the percentage of memory allowed by Gemini as a percentage of the single storage file size of state.backend.gemini.heap.rate=0.7// specified Gemini state.backend.gemini.log.structure.file.size=134217728// specifies the number of worker threads of Gemini

Machine configuration

Job uses resource corresponding parameters

Memory related parameters

Contrast result

Note: the full sample splicing load cannot be fully served by 16 machines, so we carry out the stress test by sampling the data in different proportions. When backpressure occurs, we think that the job has reached the performance bottleneck.

From the above comparison, we can see that under the premise of the same data, job processing logic and hardware configuration, the amount of data successfully processed by Gemini is 2.4 times that of RocksDB (17280 vs 7200 entries / s). At the same time, through the comparison of hardware resource consumption, RocksDB reaches the disk IO bottleneck faster, while Gemini has higher memory and CPU utilization.

After reading the above, do you have any further understanding of the comparative experimental analysis of the performance of Flink using RocksDB and Gemini? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report