Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the time series database TDengine

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the time series database TDengine". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Now let the editor to take you to learn "what is the time series database TDengine"!

Talking about temporal Database TDengine

Recently, TDengine is very popular, and I have been following it for a long time. Its official test performance results are very gratifying, so as soon as I open source, I have conducted relevant research, and finally found that there are still some problems. Look forward to the follow-up improvement.

Write problems must be combined with a table name for each Tag

The price paid:

Users must ensure that the table name of each Tag combination is unique, and once there are too many Tag combinations, it is difficult for users to remember the table name corresponding to each Tag combination. When querying, they basically rely on the super table STable to query. So for the user, the table name is almost useless, but it is up to the user to name it at a cost.

The ultimate goal of this design is to put the data of the same Tag combination together, but the system design can record a unique id or a unique string as the internal hidden table name for this Tag combination to replace the operation that allows the user to start the table name on his own, and only needs to present a super table STable for the user to lighten the user's burden.

In fact, we can see that the above is actually the only burden of internal judgment of the system to the user, troublesome users. If the system automatically judges whether the Tag combination is unique, it is always necessary to judge whether the current Tag combination exists and find the corresponding underlying unique id or unique string in the process of data writing, and letting the user start the table name saves the above cost. Because the table name created by the user is a unique string, the writing performance is naturally better.

Tag support and management

A maximum of 6 Tag is supported. If you want to support more, you need to recompile the source code.

The index of the super table STable to the Tag combination is full memory, which will eventually encounter a bottleneck. InfluxDB has already gone this way, from the previous full memory to the later tsi.

The index of the super table STable to the Tag combination is only for the first Tag as a key to build a skiplist, that is to say, when your query uses the first tag, you can use the above index, when your query does not use the first tag, that is brute force sweep, so this kind of filtering query ability is still very limited when there are too many Tag combinations. Like other temporal databases InfluxDB and Druid, inverted indexes are built for Tag combinations in the process of writing to deal with filtering of any dimension, and the write performance is naturally worse than that of TDengine.

It is also troublesome to expire Tag combinations that are no longer in use.

Write out of order is not supported

Each table records the maximum time that the table is currently written, and writes are not allowed once the subsequent write time is less than that time. If you accidentally write data from 00:00:00 on 2021-07-24 to a table, all data before that time cannot be written.

The benefit of doing so simplifies the writing process, which is always an append operation. To take a simple example, such as using an array to store in-memory data, the data in the array is sorted by time, and if the time of the later data is not incremented, then the data needs to be inserted somewhere in the middle of the array, and all the data after that position needs to be moved back. If the time of the later data is incremented, you can put it directly at the end of the array, so disorderly writing is not supported, that is, it simplifies the writing process at the expense of user use to improve write performance.

One of the troubles saved by not supporting out-of-order writes is compact, which is common in LSM. If out-of-order writing is allowed, then there will be overlap in the time range of the two files, so we need to do compact like RocksDB to eliminate the overlap, thus reducing the number of files to be queried. So you will find that hard-designed compact such as HBase, RocksDB, InfluxDB and so on basically do not exist in TDengine.

To sum up: disordered writes are not supported at the expense of users to improve write performance and simplify design

Query the question and find the group of topN

Order by can only sort time and tag. Top or bottom can only ask topN for a certain field

The group of topN, which is very common in the timing field, such as the three machines with the highest CPU utilization, cannot be satisfied at present.

Downsampling and aggregation

Downsampling: aggregates 1s granularity data on the same timeline into 10s granularity data

Aggregation: aggregates multiple timelines at the same time into one timeline

For example, each appId has multiple machines, and each machine records the number of connections of the machine every second. Currently, I want to draw a curve of the total number of connections for each appId.

If you use a standard SQL, it might be as follows:

Select sum (avg_host_conn), appid,new_time from (select avg (connection) as avg_host_conn, appid,host,time/10 as new_time from T1 group by appid,host,time/10) as T2 group by appid,new_time

The internal subquery averages the connection within the host 10s of each appid, namely downsampling, and the external query sums the above average value of the host under each appid, that is, aggregation

As such requirements are so common in time series queries, it is very troublesome to write using the above SQL. Some systems simplify the writing of such queries by nesting functions.

At present, the aggregate function of TDengine can only be downsampling or aggregation, and it does not support subqueries, so it cannot meet the above requirements.

Query aggregation schema

The query is divided into two stages: the first stage requests the management node to obtain the meta information of all tables that meet the tag filter (including which data node each table is on). If there are millions of tables that meet the requirements, the query at this stage is basically unbearable. In the second stage, the query aggregates the data of each table from the data node and returns it to the client, which then makes the final aggregation.

This kind of query scheme will eventually face the bottleneck of client-side aggregation, or the distributed query scheme coordinated by multiple computers, such as Presto, Impala and so on.

At this point, I believe you have a deeper understanding of "what is the timing database TDengine". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report