Principle of InfluxDB engine 07/12 Update SLTechnology News&Howtos

Principle of InfluxDB engine

2025-07-12 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Introduction

Time series database is mainly used to store index data based on time series, such as PV, UV and other indicators of a Web page, which are regularly collected and time-stamped, which is an index based on time series.

Why do you need a time series database

What are the advantages of time series database over traditional relational database and NoSQL? the following will be analyzed in combination with the characteristics of related models.

LSM Tree

The merge operation will traverse the leaf node of the tree in memory from left to right and merge the leaf node of the tree in the disk. When the amount of data being merged reaches the size of the storage page of the disk, the merged data will be persisted to disk. At the same time, update the pointer of the parent node to the leaf node.

However, disk writeback will be delayed, so in order to ensure the consistency of read data, it will be queried in memory first, and if not in memory, query will be made on disk.

When deleting data, look for it in memory (C0). If not, create a new index in memory and set the key value to delete the mark (create a tombstone), so that in the subsequent scrolling merge operation, there will be a query operation. It will be directly returned that the key value does not exist. The data will be deleted from the data file in a later Compaction.

Compaction

The threshold for log files exceeding a certain size is (default is 1MB):

Create a new memtable and log file, and use the new memtable and log file for future operations

The backend performs the following operations:

Write the old memtable to SSTable (the process is to convert to immtable_table first, and then traverse write)

Discard the old memtable

Delete old memtable and log files

Add the new SSTable to level 0.

For time series data, LSM tree is very efficient in reading and writing. But the efficiency of hot backup and batch data cleaning is not high.

B + Tree

B + Tree, many relational databases such as Berkerly DB, sqlite, and mysql use the B + tree algorithm to process the index. The characteristic of B + Tree is that the data is arranged orderly according to the index, at the expense of certain writing performance, and the reading efficiency is guaranteed. However, when the amount of data is very large (GB), the query efficiency will be very low. Because the larger the amount of data, the more bifurcations of the tree, the greater the cost of traversing.

TSM

Influxdb introduced the TSM engine in version 0.9.5, which was modified from LSM

Pre-write log

When the current log file reaches the 2MB size, it is closed and starts to write new log files.

This design ensures the consistency of the data.

Data file

File structure

The data blocks in a file are arranged in chronological order

Data Block structure

The metricvalue is stored in Compressd block, and the data compression algorithm will be described in more detail later.

Index Block structure

Read data

First of all, according to the time range of the query request, a binary search is conducted in the data file to find the file that matches the scope. Then the mapping table in memory obtains the ID according to the query metric item HASH and finds the starting address of the data block through the index. Then, according to the timestamp of the data block and its next data block, we can calculate how many data blocks need to be taken out, and finally decompress the data in the data block to get the result.

Update data

If multiple updates are in the same time frame, the prewritten log is cached and updated together.

Delete data

Two-stage processing, in the first stage, the prewritten log persists it in the log and informs the index to maintain the tombstone in memory. At this point, if you query the data, it will return that it does not exist. In the second stage, the pre-write log writes the index file, which gives priority to deletion, followed by other insertions after the delete operation (including deleted sequences and other sequences), and clears the tombstone in memory.

Data compression

The purpose of data compression is to reduce storage space and reduce the overhead of writing to disk.

Each compressed data block contains a series of points (compression timestamp, compression value), because the timestamp is a monotonously increasing sequence, so the offset of the time filled in during compression.

Summary

The data storage structure of influxdb realizes the orderly access of data based on series and timestamp. And by compressing the data to reduce the cost of Ithumb O. In the scenario of taking a series of data within a certain time range, the processing speed can be improved. Because the data is merged according to time, the Retention operation can be operated in terms of data files, and it will be more efficient.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.