What is the structure of LevelDB data file SSTable 07/08 Update SLTechnology News&Howtos

What is the structure of LevelDB data file SSTable

2025-07-08 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Most people do not understand the knowledge points of this article "LevelDB data file SSTable structure", so the editor summarizes the following contents, detailed contents, clear steps, and has a certain reference value. I hope you can get something after reading this article. Let's take a look at this "LevelDB data file SSTable structure" article.

The content of the SSTable file is divided into five parts, Footer, IndexBlock, MetaIndexBlock, FilterBlock and DataBlock. Which stores the key-value pair of content is DataBlock, the storage of Bloom filter binary data is that there are multiple FilterBlock,DataBlock, FilterBlock can also have multiple, but usually only one, the reason is designed to take into account scalability, may support other types of filters in the future. The other three parts are management blocks, in which IndexBlock records the meta-information related to DataBlock, MetaIndexBlock records the meta-information related to the filter, and Footer points out the offset information of IndexBlock and MetaIndexBlock in the file, which is the meta-information of meta-information, which is located at the end of the sstable file. Let's analyze each structure one by one from the top down.

Footer structure

It takes up only 48 bytes of space and only a few fields are stored inside. Let's use pseudo code to describe its structure.

/ / defines the location and size of the data block

Struct BlockHandler {

Varint offset

Varint size

}

Struct Footer {

File offset and length of BlockHandler metaIndexHandler; / / MetaIndexBlock

File offset and length of BlockHandler indexHandler; / / IndexBlock

Byte [n] padding; / / memory gasket

Int32 magicHighBits; / / the last 32 bits of magic number

Int32 magicLowBits; / / the first 32 bits of magic number

}

The middle part of the Footer structure adds a memory gasket, which is used to prop up the space of the Footer to 48 bytes. There is also a 64-bit magic number 0xdb4775248b80fb57 at the end of the structure. If the 8 bytes at the end of the file are not this number, the file is corrupted. The source of this magic number is interesting. It is the pre-64bit of the string returned below.

$echo http://code.google.com/p/leveldb/ | sha1sum

Db4775248b80fb57d0ce0768d85bcee39c230b61

Both IndexBlock and MetaIndexBlock have a unique one, so a BlockHandler structure is used to store the offset and length, respectively.

Block structure

Except for Footer, all the other parts are Block structures, and their names all end with Block. The so-called Block structure means that in addition to the internal valid data, there will be additional compressed type fields and check code fields.

Struct Block {

Byte [] data

Int8 compressType

Int32 crcValue

}

Each Block tail has a compression type and cyclic redundancy check code (crcValue), which takes up 5 bytes. If it is the compression type, the data data within the block will be compressed. The check code calculates the cyclic redundancy checksum for both the compressed data and the compressed type field. The compression algorithm defaults to snappy, and the check algorithm is crc32.

CrcValue = crc32 (data, compressType)

In all the Block structures described below, we no longer mention compression and parity codes.

DataBlock structure

The default size of DataBlock is 4K bytes (before compression), in which a series of key-value pairs are stored. As mentioned earlier, the Key in the sst file is ordered, which means that there is a good chance that adjacent Key will have a common prefix. It is with this in mind that DataBlock has optimized its structure, which can significantly reduce storage space.

Key = sharedKey + unsharedKey

The Key is divided into two parts, one is sharedKey and the other is unsharedKey. The former represents the common prefix content of the relative benchmark Key, while the latter represents the different suffix parts of the relative benchmark Key.

For example, if the benchmark Key is helloworld, then the Key of hellouniverse is hello,unsharedKey relative to the benchmark Key, and its sharedKey is universe.

What is stored in DataBlock is a continuous series of key-value pairs, which sets a benchmark Key every number of Key. The characteristic of benchmark Key is that its sharedKey part is empty string. The location of the benchmark Key, that is, its offset in the block, is called the "restart point" RestartPoint, and all the "restart point" locations are recorded in the DataBlock. The position of the first "restart point" is zero, which is the first Key in DataBlock.

Struct Entry {

Varint sharedKeyLength

Varint unsharedKeyLength

Varint valueLength

Byte [] unsharedKeyContent

Byte [] valueContent

}

Struct DataBlock {

Entry [] entries

Int32 [] restartPointOffsets

Int32 restartPointCount

}

The benchmark Key in DataBlock is set to one every 16 Key by default. From a space-saving point of view, this is not an intelligent strategy. For example, 26 consecutive Key is only different from the last letter, while DataBlock forces a "restart" every 16 Key, which is obviously not optimal. This also means that the Key whose sharedKey is an empty string is not necessarily the benchmark Key.

The default size of a DataBlock is only 4K bytes, so it usually contains only a few dozen key-value pairs. What if the content of a single key-value pair is too large for a DataBlock?

It must be corrected here that the size of DataBlock is 4K bytes, not its strict size, but after the last record is appended, it is found that it exceeds 4K bytes, and then another DataBlock will be opened. This means that a DataBlock can be larger than 4K bytes, and if the value value is very large, then the corresponding DataBlock will also be very large. DataBlock does not store the same Value value in blocks.

FilterBlock structure

If the Bloom filter is not turned on, the FilterBlock block does not exist. There can be more than one FilterBlock in a SSTable file, and each block holds one filter data. However, as far as the current implementation of LevelDB is concerned, there can only be one filter, and that is the Bloom filter.

The Bloom filter is used to speed up the Key location efficiency of SSTable disk files. If there is no Bloom filter, it needs to do a binary search for SSTable. If Key is not in it, it needs to do several IO readings to determine it. After checking, it turns out that it is nothing. The purpose of the Bloom filter is to avoid wasting IO operations when Key does not exist. By querying the Bloom filter, you can know once and for all whether Key may be in it.

What is stored in a single Bloom filter is a fixed-length bitmap array in which several Key fingerprints are stored. These Key are derived from a contiguous range of DataBlock. There are multiple consecutive arrays of Bloom filter bitmaps in the FilterBlock block, each of which is responsible for fingerprinting part of the data in the SSTable.

Struct FilterEntry {

Byte [] rawbits

}

Struct FilterBlock {

FilterEntry [n] filterEntries

Int32 [n] filterEntryOffsets

Int32 offsetArrayOffset

Int8 baseLg; / / Segmentation coefficient

}

Where baseLg defaults to 11, which means that every 2K bytes (2

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.