How to parse the storage mechanism of temporal database InfluxDB 07/04 Update SLTechnology News&Howtos

How to parse the storage mechanism of temporal database InfluxDB

2025-07-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

What this article shares to you is about how to analyze the storage mechanism of the time series database InfluxDB. The editor thinks it is very practical, so I share it with you to learn. I hope you can get something after reading this article.

Analysis of Storage Mechanism of InfluxDB

The following describes the design of InfluxDB for the storage / index of time series data. Since the cluster version of InfluxDB is no longer open source in version 0.12, unless otherwise specified, the objects introduced refer to the stand-alone version of InfluxDB.

1. Storage engine evolution of InfluxDB

Although InfluxDB has taken more than three years since its release, the technical architecture of its storage engine has made several major changes. Here is a brief introduction to the evolution of InfluxDB's storage engine.

1.1 A brief history of evolution

Before version 0.9.0

* * LSMTree scheme based on LevelDB * *

Version 0.9.0,0.9.4

* * mmap COW B+tree scheme based on BoltDB * *

Version 0.9.5 / 1.2

* * based on the self-developed WAL + TSMFile scheme * * (the TSMFile scheme is officially launched in version 0.9.6, and 0.9.5 only provides a prototype)

Version 1.3 to date

* * WAL + TSMFile + TSIFile scheme based on self-developed solution * *

1.2 considerations of evolution

InfluxDB's storage engine has tried a variety of solutions, including LevelDB and BoltDB. However, the following demands of InfluxDB cannot be perfectly supported:

There will be a large number of data deletion after downsampling of time series data.

= > * the LSMTree deletion cost of LevelDB is too high *

Do not occupy too many file handles when storing a large amount of data in a stand-alone environment.

= > * LevelDB will generate a large number of small files over time *

Hot backup is required for data storage

= > * LevelDB can only be cold backup *

Big data's write throughput should keep up with that in the scene.

= > * B+tree write throughput of BoltDB is a bottleneck *

Storage needs to have good compression performance

= > * BoltDB does not support compression *

In addition, because of the consistency of the technology stack and the ease of deployment (for container deployment), the InfluxDB team wants the storage engine to be written in GO like its upper TSDB engine, so the potential RocksDB option is excluded

Based on the above pain points, the InfluxDB team decided to make its own implementation of a storage engine.

2 data model of InfluxDB

Before parsing InfluxDB's storage engine, review the data model in InfluxDB.

In InfluxDB, time series data supports a multi-valued model, and a typical piece of time-point data is as follows:

Figure 1

Cdn.com/bceda83d3c4545a140d99a319188448dfe2193f6.png ">

Measurement:

A metric object, that is, a data source object. Each measurement can have one or more indicator values, that is, the * * field** described below. In practice, an object detected in reality (such as "cpu") can be defined as a measurement.

Tags:

The concept is equivalent to tags in most temporal databases, and the data source can usually be uniquely identified through tags. The key and value of each tag must be a string.

Field:

The specific indicator value of the data source record. Each indicator is called a "field", and the index value is the "value" corresponding to "field".

Timestamp:

The timestamp of the data. In InfluxDB, the timestamp can theoretically be accurate to the level of * * nanosecond * * (ns).

In addition, in InfluxDB, there is a concept of Database that aligns traditional DBMS above the concept of measurement, and logically there can be multiple measurement under each Database. In the stand-alone implementation of InfluxDB, each Database actually corresponds to a directory of the file system.

2.1Concepts of Serieskey

The concept of SeriesKey in InfluxDB is usually called timeline in the field of time series databases. The representation of a SeriesKey in memory is a byte array (github.com/influxdata/influxdb/model#MakeKey ()) of the following strings (commas and spaces escaped).

{measurement name} {tagK1} = {tagV1}, {tagK2} = {tagV2},...

The length of SeriesKey cannot exceed 65535 bytes

2.2 supported Field types

The field value of InfluxDB supports the following data types:

DatatypeSize in MemValue RangeFloat8 bytes1.797693134862315708145274237317043567981e+308 ~ 4.940656458412465441765687928682213723651e-324Integer8 bytes-9223372036854775808 ~ 9223372036854775807String0~64KBString with length less than 64KBBoolean1 bytetrue or false

In InfluxDB, the data type of Field must remain the same within the following ranges, otherwise an error type conflict will be reported when writing data.

Same Serieskey + same field + same shard

2.3 the concept of Shard

In InfluxDB, you can and can only specify one Retention Policy (RP for short) to a Database. RP allows you to set the retention time (duration) of time series data saved in a specified Database. The concept of Shard is derived from duration. Once the duration of a Database is determined, the timing data of the Database will be further sliced according to time within this duration range so that the data is divided into shard units.

The relationship between the time of shard fragmentation and duration is as follows

Duration of RPShard Duration

< 2 Hours1 Hour>

= 2 Hours and 6 Months7 Days

For the newly created Database, if the RC is not explicitly specified, the default RC is the data Duration is permanent, and the Shard fragmentation time is 7 days.

Note: in the closed-source cluster version of Influxdb, users can specify the data to be sliced further according to SeriesKey on the basis of time-based slicing through RC rules.

3. Analysis of storage engine of InfluxDB

The storage engine of time series database mainly needs to meet the performance requirements of the following three main scenarios.

High performance of bulk time series data writing

High performance of scanning data within a specified timestamp range directly according to the timeline (that is, Serieskey in Influxdb)

Indirectly query the high performance of all qualified timing data within a specified timestamp range through measurement and partial tag

InfluxDB launched their solution based on the considerations described in 1. 2, that is, the solution of WAL + TSMFile + TSIFile, which will be introduced below.

3.1 WAL parsing

InfluxDB writes time series data in order to ensure data integrity and availability. Like most database products, WAL is written first, then written to the cache, and finally flushed. For InfluxDB, the main process for writing timing data is shown in the following figure:

Figure 2

WAL of time series data

Since InfluxDB always only writes to time series data, its Entry does not need to distinguish between the types of operations, but can record the written data directly.

Figure 4

It is characterized in that the timing data (i.e Timestamp + Field value) is saved in the data area in a TSMFile, and the information of Serieskey and Field Name is saved in the index area, and the data blocks where the timing data are located are quickly located through an intra-file index similar to B+tree based on Serieskey + Fieldkey.

Note: in the current version, the maximum length of a single TSMFile is 2GB, and even if it is the same Shard, a new TSMFile will be opened to save data. The introduction of this article is for the sake of simplicity, the following content does not consider the scenario of TSMFile splitting of the same Shard

The composition of index blocks

The composition of the index block above is as follows: * figure 6 *

The * * index entry * * is called `directIndex` in the source code of InfluxDB. In TSMFile, index blocks are organized together according to Serieskey + Fieldkey * * sort * *. By understanding the composition of the index area of TSMFile, you can naturally understand how InfluxDB scans time series data in TSMFile with high performance: 1. Use binary search to find the * * index data block * * 2 of the specified Serieskey+FieldKey in the * * index area * * according to the user-specified timeline (Serieskey) and the Field name. Find out which (* or which *) * * index entries * * the data falls on in the * * index data block * * according to the timestamp range specified by the user. Load the found * * index entries * * corresponding to * * temporal data blocks * * into memory for further Scan* note: the above 1, 2, and 3 simply introduces the query mechanism, and the actual implementation also has a series of complex scenarios such as scanning time range across index blocks *.

Storage of time series data

The structure of the time series data block is introduced in figure 2: that is, all the timestamp-Field value pairs of the same Serieskey + Fieldkey are split into two regions: the Timestamps area and the Value area for storage. Its purpose is that the time stamp and field value can be stored according to different compression algorithms to reduce the size of time series data blocks.

The compression algorithm used is as follows:

When making a query, when the time series data block in the file is found by using the index of TSMFile, the data block is loaded into memory and decompressed by Timestamp and Field Value in order to continue the subsequent query operation.

Float class: Gorrila's Float Commpression

Integer type: Delta Encoding + Zigzag Conversion + RLE / Simple8b / None

String type: Snappy Compression

Boolean type: Bit packing

Timestamp: Delta-of-delta encoding

Field Value: because the Field Value of a single data block must have the same data type, different compression algorithms can be used centrally according to the data type.

3. 3 TSIFile parsing

With TSMFile, scenario 1 and scenario 2 of the three main scenarios mentioned at the beginning of Chapter 3 can be well resolved. But if the user does not specify the query conditions according to Serieskey as expected, but specifies more complex conditions, how to ensure its query performance? Typically, the solution to this problem is to rely on inverted indexes (Inverted Index).

The inverted index of InfluxDB depends on the following two data structures

Map

They appear in memory as follows:

Figure 7

Figure 8

However, in the actual production environment, because the size of the user's timeline will become very large, so the inverted index will use too much memory, so later InfluxDB introduced TSIFile.

The overall storage mechanism of TSIFile is similar to that of TSMFile, which also generates a TSIFile in units of Shard.

The above is how to analyze the storage mechanism of the time series database InfluxDB. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.