In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article introduces how to manage the memory size of RocksDB in Apache Flink, the content is very detailed, interested friends can refer to, hope to be helpful to you.
RocksDB status backend in Apache Flink
Before delving into the configuration parameters, let's first re-discuss how to use RocksDB for state management in flink. When you choose RocksDB as the state backend, your state is serialized into bytes and stored in out-of-heap memory or on the local disk. RocksDB is a key-value store that is organized into a log-structured merge tree (LMS tree). When used to store Keyed state in Flink, Key consists of serialized bytes, while value consists of bytes of serialized state. Each time the keyed state is registered, it maps to column family (similar to a table in a traditional database), and the key-value pair is stored in RocksDB as serialized bytes. This means that every READ or WRITE operation has to serialize / deserialize the data
Using RocksDB as the state backend has many advantages: it is not affected by garbage collection, it usually has lower memory overhead than objects in the heap, and it is currently the only option that supports incremental checkpoints. In addition, with RocksDB, your state size is limited only by the amount of available local disk space, which is best suited for Flink applications that rely on large state operations.
If you are not familiar with RocksDB, the following figure illustrates its basic READ and WRITE operations.
Writes in RocksDB store data in the currently active memory table (Active MemTable). When the memory table is full, it becomes READ ONLY MemTable and is replaced by a new, idle active state MemTable. The READ ONLY MemTable is periodically flush to disk by the background thread, becoming a read-only file sorted by key-- the so-called SSTables. SSTables, in turn, is immutable, bringing them together through background log compression (multiplex merging of SSTables). As mentioned earlier, with RocksDB, each registration state is a column family, which means that each state contains its own MemTables and SSTables.
Insert a picture description here
The READ operation in RocksDB first accesses the Active Memory Table in response to the query. If the key to search is not found, the READ operation looks from the newest to the oldest READ ONLY MemTables based on the key until it finds the key to search. If the key cannot be found in any MemTable, the READ operation will once again access the SSTable from the latest location. The SSTable file can be obtained from BlockCache, (if it contains uncompressed table files) from the operating system's file cache, or in the worst case from the local disk. Optional indexes like SST-level bloom filters can help avoid hitting the disk.
3 configurations to manage your RocksDB memory consumption
Now that we've built some RocksDB-based features using Apache Flink, let's take a look at configuration options that can help you manage RocksDB memory size more effectively. Note that the following options are not comprehensive, and you can use the State TTL (Time-To-Live) feature introduced in Apache Flink 1.6 to manage the state size of Flink applications. The following three configurations are good starting points to help you effectively manage RocksDB resource consumption:
1.block_cache_size
This configuration will ultimately control the maximum number of uncompressed blocks cached in memory. As the number of blocks increases, so does the memory size-so by preconfiguring it, you can maintain a specific level of memory consumption.
2.write_buffer_size
This configuration establishes and controls the maximum size of MemTable in RocksDB. Active MemTables and READ ONLY MemTables will eventually affect the amount of memory in RocksDB, so adjusting it early may save you some trouble.
3.max_write_buffer_number
This configuration determines and controls the maximum number of MemTable retained in memory before RocksDB flushes state to the local disk as a SS Tables. This actually determines the maximum number of MemTables in the READ ONLY state in memory.
In addition to the configuration mentioned above, you can optionally configure indexes and bloom filters that consume extra memory space, as well as table cache on the side. Table caching not only consumes extra memory in RocksDB, it also saves open file descriptors to SST files that are not restricted by default, which, if not configured correctly, may conflict with the configuration of the operating system.
We have just guided you through some configuration options for using RocksDB as the status backend in Flink, which will help us manage memory size effectively.
On how to manage RocksDB memory size in Apache Flink to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.