[cloud fusion analysis] from excess storage resources to long-term storage of distributed temporal databases 04/09 Update SLTechnology News&Howtos

[cloud fusion analysis] from excess storage resources to long-term storage of distributed temporal databases

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Background introduction:

As an Infra, the basic construction of the management platform and the basic quality of service are compulsory lessons, and how to measure the stability of the complex and various basic platforms, even the Ops system and business system running above, is a very important standard to measure whether Infra is competent or not.

The simple discrete index itself has no practical significance, only if the discrete index is stored in some way and supports end-user-friendly query and aggregation, it will be really meaningful. Therefore, an TSDB (Time Series Database) with sufficient performance, distributed, user-friendly and convenient for the next DevOps team to deploy has become an indispensable system.

Common TSDB includes InfluxDB, OpenTSDB, Prometheus and so on. Although the open source version of InfluxDB is excellent, it does not support cluster deployment, and TICK Stack itself does not support flexibility in data cleaning. If you use the open source version directly, statistics will be collected and reported. Because OpenTSDB is based on HBase, the cost of deployment is too high, and itself is not a complete monitoring system, but based on Prometheus and TiKV development, the whole system can maintain the most concise, but also has very rich ecological support.

Therefore, based on the actual situation, Rongyun finally chose TiPrometheus as the monitoring platform storage scheme of the Infra department.

Project profile:

The picture above is the official system architecture diagram of Prometheus, while the implementation of TiPrometheus uses a function of Prometheus that is not reflected in the above figure: Remote Storage, as its name suggests, its main function is to provide Prometheus with the ability to write remotely, this function is transparent to queries and is mainly used for long storage. At that time, our TiPrometheus implemented the Remote Storage of Prometheus based on TiKV and PD.

Core implementation

The data structure of Prometheus record is divided into two parts: Label and Samples. Label records some characteristic information, and Samples contains index data and Timestamp.

With the combination of Label and time range, you can query the desired Value.

To query these records, you need to build two indexes, Label Index and Time Index, and store Value in a special Key.

L Label Index

Each pair of Label will be stored with index:label:# as key and labelID as Value. The new record will be "," split and appended to the Value. This is an inverted index commonly used in search.

L Time Index

Each Sample entry takes index:timeseries:: as the Key,Timestamp as the Value and SplitTime as the starting point for the time slice. The additional Timestamp is also segmented by ",".

L Doc storage

We store each Samples record in TiKV with timeseries:doc:: as Key, where LabelID is the hash value of the full text of Label.

Let's do a comb:

Writing process

Generate labelID

Build time index,index:timeseries:: "ts,ts"

Write time series data timeseries:doc:: "value"

Query process

The collection of labelID is found according to the inverted index, and multiple queries to Label will intersect the labelID collection.

Query the included Timestamp based on labelID and time slicing within the time range.

Find the required Value based on labelID and Timestamp.

Why TiPrometheus

The project originated from Hackathon, which participated in the PingCAP organization, and hoped to complete the ideas in everyone's mind with the participants. in fact, the most important thing is that what is done is not for simple Demo, but to do a practical ability to be applied to the production environment in practical work, and to solve problems in production.

At the beginning, there were all kinds of wonderful ideas, including doing a set of ML and Hadoop over TiKV on TiSpark, but these ideas are a bit too hard to realize, and it is too unlikely for a project that needs to be completed in only two days of work; in other words, if you want to achieve Demo, you need too many Hack points. And GEO full-text search in Rongyun's existing production, as well as the existing system, there is no need to fill the hole, so there is no need to spend effort in this area to solve a problem that does not exist.

Because IM service is a kind of computing-intensive service, and the quality of service is the core competitiveness of cloud fusion, and the current storage resources are scattered nodes, and the utilization rate of storage resources of each node is not high, in order to maximize the use of existing idle resources, Rongyun finally designed and implemented this set of TiPrometheus system.

Result

Through TiKV and Prometheus, it provides a feasible idea for the design of time series database based on K, V storage.

It provides a set of practical solution for the long storage of Prometheus.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.