Example Analysis of parameters tuning in Flink RocksDB State backend 07/01 Update SLTechnology News&Howtos

Example Analysis of parameters tuning in Flink RocksDB State backend

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article will explain in detail the example analysis of Flink RocksDB state backend parameter tuning for you. Xiaobian thinks it is quite practical, so share it with you for reference. I hope you can gain something after reading this article.

As of now, Flink jobs still have only Memory, FileSystem, and RocksDB state backends available, and RocksDB is the only choice for large amounts of state data (GB to TB levels). RocksDB performance depends heavily on tuning, and if all default configurations are used, read and write performance may be poor.

However, the configuration of RocksDB is also extremely complex, with up to 100 adjustable parameters, and there is no one-size-fits-all optimization scheme. If we consider only the state storage aspect of Flink, we can still summarize some relatively universal optimization ideas. This article first introduces some basic knowledge, and then enumerates the methods.

**Note: ** The content of this article is based on our practice of running version 1.9 of Flink online. In version 1.10 and later, due to TaskManager memory model refactoring, RocksDB memory becomes part of out-of-heap managed memory by default, eliminating some manual tuning. If performance is still poor and intervention is needed, you must disable RocksDB memory hosting by setting the state.backend.rocksdb.memory.managed parameter to false.

State R/W on RocksDB

The read and write logic of RocksDB as the backend of Flink state is slightly different from the general case, as shown in the following figure.

Each registered state in a Flink job corresponds to a column family, which contains its own separate memtable and sstable collections. A write operation first writes data to the active memtable, then converts it to an immutable memtable and flushes it to disk to form an sstable. Read operations look for the target data in the active memtable, immutable memtable, block cache, and sstable, in that order. In addition, sstable also needs to be merged through the compaction policy, and finally form a hierarchical LSM Tree storage structure.

In particular, since Flink persists RocksDB's data snapshot to the file system at every checkpoint cycle, there is naturally no need to write a write-ahead log (WAL), and WAL and fsync can be safely turned off.

The author has explained RocksDB's compaction strategy in detail before, and mentioned the concepts of read amplification, write amplification, and spatial amplification. Tuning RocksDB is essentially a balance between these three factors. In Flink homework, which focuses on real-time occasions, it is necessary to focus on reading and writing amplification.

Tuning MemTable

Memtable, as a read-write cache in the LSM Tree system, has a great impact on write performance. Here are some notable parameters. For comparison purposes, the original parameter names of RocksDB are listed below along with the parameter names in the Flink configuration, separated by vertical lines.

write_buffer_size |state.backend.rocksdb.writebuffer.size The size of a single memtable, which defaults to 64MB. When the memtable size reaches this threshold, it is marked as immutable. Generally speaking, properly increasing this parameter can reduce the impact of write amplification, but at the same time it will increase the pressure of L0 and L1 layers after flush, so it is also necessary to modify the compaction parameter, which will be mentioned later.

max_write_buffer_number |state.backend.rocksdb.writebuffer.count Maximum number of memtables (both active and immutable), default is 2. When all memtables are full but flush speed is slow, it will cause write pause, so if there is sufficient memory or mechanical hard disk is used, it is recommended to adjust this parameter appropriately, such as 4.

min_write_buffer_number_to_merge |state.backend.rocksdb.writebuffer.number-to-merge The minimum number of memtables to merge before a flush occurs. Default is 1. For example, if this parameter is set to 2, then flush is only possible if there are at least two immutable memtables (i.e., wait if there is only one immutable memtable). The advantage of increasing this value is that more changes are merged before flush, reducing write amplification, but at the same time may increase read amplification because there are more memtables to check when reading data. After testing, it is better to set this parameter to 2 or 3.

Tuning Block/Block Cache

Block is the basic storage unit of sstable. Block cache plays the role of read cache, using LRU algorithm to store the most recently used blocks, which has a greater impact on read performance.

block_size |state.backend.rocksdb.block.blocksize The default is 4KB. In the production environment, it will always be appropriately increased, generally 32KB is more appropriate, and for mechanical hard disks, it can be increased to 128~256KB, making full use of its sequential reading ability. However, it should be noted that if the block size increases while the block cache size remains unchanged, the number of blocks cached will decrease, which will increase the read magnification.

block_cache_size |state.backend.rocksdb.block.cache-size block cache size, default is 8MB. As can be seen from the reading and writing process described above, a larger block cache can effectively prevent hot data read requests from falling on sstable, so if the memory margin is sufficient, it is recommended to set it to 128MB or even 256MB, and the read performance will be significantly improved.

Tuning Compaction

Compaction is the most expensive operation of all LSM Tree-based storage engines, and it is very easy to block reads and writes if it is not done well. It is recommended that the reader read the previous article on RocksDB's compaction strategy for some background knowledge, which will not be repeated here.

compaction_style |state.backend.rocksdb.compaction.style Compaction algorithm, using the default LEVEL (i.e. leveled compaction), the following parameters are also based on this.

target_file_size_base |state.backend.rocksdb.compaction.level.target-file-size-base The size threshold for a single sstable file at the L1 level. The default is 64MB. For each step up, the threshold is multiplied by the factor target_file_size_multiplier (but the default is 1, i.e. the maximum sstable is the same for each step). Obviously, increasing this value reduces the frequency of compactions and reduces write amplification, but it also causes old data to be cleaned up in time, thus increasing read amplification. This parameter is not easy to adjust, and it is generally not recommended to set it to more than 256MB.

max_bytes_for_level_base |state.backend.rocksdb.compaction.level.max-size-level-base Total data size threshold for layer L1, default 256MB. For each step up, the threshold is multiplied by a factor max_bytes_for_level_multiplier (default is 10). Since the upper size threshold is calculated based on it, it should be carefully adjusted. It is recommended to set it as a multiple of target_file_size_base, and it cannot be too small, for example, 5~10 times.

level_compaction_dynamic_level_bytes |state.backend.rocksdb.compaction.level.use-dynamic-size This parameter was mentioned earlier. When enabled, the multiplication factor of the threshold will become a division factor, and the data amount threshold of each layer can be dynamically adjusted, so that more data can fall on the highest layer, space amplification can be reduced, and the structure of the whole LSM Tree will be more stable. For mechanical hard disk environments, it is strongly recommended to turn on.

Generic Parameters

max_open_files |state.backend.rocksdb.files.open, as its name implies, is the maximum number of files that a RocksDB instance can open. The default is-1, indicating no limit. Since both the sstable index and bloom filter reside in memory by default and consume file descriptors, if this value is too small, the index and bloom filter will not load properly, which will severely slow down read performance.

max_background_compactions/max_background_flushes |state.backend.rocksdb.thread.num Maximum number of concurrent threads the backend is responsible for flush and compaction. Default is 1. Note that Flink combines these two parameters into one (corresponding to DBOptions.setIncreaseParallelism() method). Since flush and compaction are relatively heavy operations, if the CPU margin is sufficient, it is recommended to increase it. In our practice, it is generally set to 4.

About "Flink RocksDB state backend parameter tuning example analysis" This article is shared here, I hope the above content can be of some help to everyone, so that you can learn more knowledge, if you think the article is good, please share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.