Cloud depth practical information | create "cloud edge integration". The technical principle of time series spatio-temporal database TSDB is deeply declassified. 07/09 Update SLTechnology News&Howtos

Cloud depth practical information | create "cloud edge integration". The technical principle of time series spatio-temporal database TSDB is deeply declassified.

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This article is selected from the self-taught lecture of the next generation cloud database analysis lecturer of Yunqi Conference-- self-study of "TSDB Yunbian Integrated time Series Spatio-temporal Database Technology"-- Senior expert of Aliyun Intelligent Database Product Division.

Although the first generation of temporal spatio-temporal data processing tools of TSDB can store temporal data in general relational databases, they are relatively inefficient in dealing with these data due to the lack of special optimization for time, such as storing and retrieving data at time intervals, and so on. The first generation of time series data typically comes from the monitoring field, and the simple storage tool based on flat file directly becomes the first storage way of this kind of data. Represented by RRDTool,Wishper, this kind of system usually deals with a single data model, limited single machine capacity, and embedded in the monitoring and alarm scheme. With the development of big data and Hadoop, the amount of time series data begins to grow rapidly, and the system business puts forward more requirements for the expansibility of processing time series data. Specially built time series databases based on general storage began to appear, which can store and process these data efficiently at time intervals. Like OpenTSDB,KairosDB and so on. On the basis of inheriting the advantages of general storage, this kind of time series database uses the characteristics of time series to avoid the disadvantages of some general storage, and makes a lot of innovations in data model and aggregation analysis. For example, OpenTSDB inherits the wide table properties of HBase and designs the offset storage model combined with timing, and uses salt to alleviate hot issues and so on. However, it also has many shortcomings, such as inefficient global UID mechanism, uncontrollable loading of aggregated data, inability to handle high cardinality tag queries, and so on. With the development of docker,kubernetes, micro-services and other technologies, and the expectation for the development of IoT is becoming more and more intense. In the process of data growing with time, time series data has become one of the fastest growing data types. High-performance, low-cost vertical time series database began to come into being, and the data storage engine with timing characteristics represented by InfluxDB gradually leads the market. They usually have more advanced data processing capabilities, efficient compression algorithms and storage engines that conform to timing characteristics. For example, InfluxDB's time-based TSMT storage, Gorilla compression, timing-oriented window calculation function p99jinrate, automatic rollup and so on. At the same time, due to the structure of index separation, there are still great challenges in inflated timeline, disorder and other scenarios. Starting from 2016, the third generation of cloud time series spatio-temporal database, major cloud manufacturers have laid out TSDB,2017.4 Microsoft to release a preview version of timing insights, providing fully managed, end-to-end storage and query highly situated loT time series data solutions. Powerful visualization is used for asset-based data insights and rich interactive interim data analysis. According to the data type, it is divided into warm data analysis and original data analysis, which are billed according to storage space and query quantity respectively. 2018.11 Amazon released a preview of Timestream at the AWS re Invent conference. Suitable for scenarios such as IoT and operational applications. The adaptive query processing engine is provided to analyze the data quickly and automatically summarize, retain, layer and compress the data. It is billed according to the write traffic, storage space and query data volume, and achieves the lowest cost management in the form of serverless. Since the arrival of the first edition of the time series database in 2016, Aliyun Intelligent TSDB team has gradually served DBPaaS,Sunfire and other group businesses. After public testing in mid-2017, it was officially commercialized at the end of March 2018. In this process, TSDB continues to absorb the strengths of various experts in the timing field in terms of technology, and gradually forms product advantages such as high performance and low cost, free of operation and maintenance, gradual improvement of ease of use, integration of edge and cloud, and rich ecology. Technical revelation

1. The temporal aggregation operation of the distributed streaming aggregator is the characteristic that distinguishes the temporal database from the general database. The main operators of TSDB aggregator include interpolation, downsampling, dimensionality reduction and other calculation functions in OpenTSDB protocol. Drawing lessons from the traditional database execution mode, the pipeline execution mode (aka Volcano / Iterator execution mode) is introduced. Pipeline contains different execution operators (operator). A query is parsed and decomposed into a DAG or operator tree by the physical plan generator, which is composed of different execution operators. The root operator on DAG is responsible for driving the execution of the query and returning the query results to the caller. At the implementation level, the top-down requirements driven (demand-driven) approach is used to drive the execution of the operator from the root operator. Such an execution engine architecture has advantages: this architecture is adopted by many database systems and proved to be effective; the interface is clearly defined, and different execution operators can be optimized independently without affecting other operators; easy to expand: by adding new computing operators, it is easy to achieve extended functions. For example, only the query conditions on tag are defined in the current query protocol. If you want to support query conditions on indicator values (cpu.usage Downsampling-> Interpolation-> Aggregation-> Rate Conversion-> Functions) to distinguish different query scenarios and optimize them separately, different aggregation operators are used to support streaming reading and materialization of result sets. Operator results use streaming aggregation when None,dsOp is included, while aggregation between some timelines is still a materialization operation. two。 Query and analysis of spatio-temporal data before introducing the query analysis of spatio-temporal data, this paper briefly introduces what is spatio-temporal data and the characteristics of spatio-temporal data. The era of big data produced a large number of spatio-temporal data with time and space and marking the individual behavior of the object. For example, signaling data generated by personal mobile phones, shared driver location and order data, real-time vehicle data of car networking and self-driving industry, location flow data of logistics, and delivery tracks of delivery boys are all such data. One of the characteristics of spatio-temporal data is complexity and diversity of targets. There are many spatio-temporal analysis methods, such as clustering, prediction, change detection, frequent pattern mining, anomaly detection and relationship mining. Another feature of spatio-temporal data is the exponential growth of data, which is also the expansion of time series data in high-dimensional space. The scalability of traditional database is poor, so it is difficult to manage massive spatio-temporal data. In the case of high concurrency, because there is no separation between storage and computing, the retrieval of spatio-temporal data will be a big bottleneck, which may lead to a sharp decline in retrieval performance and response time of more than a few minutes. In the face of such a large amount of data, computation and the challenge of analysis delay, the spatio-temporal database TSDB has made a breakthrough from several technical dimensions. Such as storage computing separation, high-performance spatio-temporal index, spatio-temporal SQL optimizer, spatio-temporal computing engine, spatio-temporal data compression algorithm. * * Identification and push-down of spatio-temporal filtering conditions * * is different from the sum = relationship of general data. The query filtering conditions of spatio-temporal data are usually spatial analysis functions similar to st_contains () and st_intersects (). Therefore, the SQL optimizer will parse and identify the spatio-temporal filtering conditions in the filtering conditions, and determine which filtering conditions can be pushed down according to the characteristics of the storage engine. If there are conditions that cannot be pushed down, these conditions will be left in the Filter operator and filtered by the computing engine. If the filter condition can be pushed down, the optimizer generates a new Filter operator. The relationship operator before and after optimization is shown in the following figure:

Spatio-temporal computing engine in the general database, JOIN is the equality of two columns in two tables, corresponding to NestedLoopJOIN, HashJOIN,SortMergeJOIN and other algorithms. For spatio-temporal data, it is almost impossible to find two equal geometric objects. Most of them are based on spatial position relations such as st_contains () or st_distance () distance relations to do JOIN. For example, to find all taxis near Yunqi town for 1 kilometer, the JOIN condition is that the location of taxis should be included in the circular space of Yunqi town with a circle center and a radius of 1 kilometer; for example, to find the nearest taxi to me, it will use KNN JOIN. These JOIN go beyond the scope that the general database JOIN algorithm can be optimized. In the spatio-temporal database TSDB, special Scalable Sweeping-Based Spatial Join algorithm, spatio-temporal index and Two level index of storage layer and computing layer are used to optimize. When the SQL optimizer recognizes that the condition of the two tables JOIN is a spatio-temporal analysis function, if the parameters and other conditions meet the requirements, the SQL optimizer will generate a special spatio-temporal JOIN operator, which is implemented by a special JOIN algorithm, which is much better than the simple JOIN operator. Open source ecological TSDB provides the support of open source influxDB and open source Prometheus. InfluxDB is the number one timing database on DBengines. Aliyun influxDB ®provides the following functions based on open source influxDB: 1. Horizontally scalable cluster scheme 2. Global memory Management 3. Fully compatible with the TICK eco-level scalable cluster solution, raft is used to achieve the high availability of influxDB data nodes, while providing multiple high-availability solutions, so that users can choose the one that suits them most in terms of availability and cost. Ali Cloud influxDB ®supports dynamically increasing the high availability groups of influxDB data nodes according to the amount of data.

Global memory management Aliyun influxDB ®realizes global memory management by optimizing the influxDB code. It can dynamically adjust memory usage global memory management support Aliyun influxDB to create any number of database global memory management to achieve memory management of data writing and data query, which can significantly prevent stability problems caused by OOM. Improve the availability of the whole system TICK ecological compatibility Aliyun influxDB fully compatible with TICK ecology. support docking telegraf,chronograf and kapacitor in addition, Aliyun influxDB supports docking grafana, users can use richer graphical tools to display data in Aliyun influxDB to provide "one-click" data acquisition tools, users can easily install and start data acquisition tools And manage data collection tools on Aliyun management platform

Aliyun influxDB not only provides high availability, cluster solutions, more stable services and embrace the open source ecology, but also actively integrates data collection, visualization and alarm features, while providing fully automatic monitoring and fully hosting "no OPS" services. Prometheus is a K8S open source monitoring and alarm system and time series database. Aliyun also provides Prometheus services. Compared with open source Prometheus, Aliyun Prometheus has the following features:

Original ecological docking Prometheus seamless docking InfluxDB without code modification, only need to modify the configuration of long-term data storage InfluxDB through Remote Storage long-term storage Prometheus data InfluxDB remote storage can achieve "write more than one read" query mode, multiple prometheus docking to the same influxDB, allowing joint query of multiple Prometheus, achieving data "global" query high availability and high reliability InfluxDB high availability cloud disk for Prometheus to provide high availability storage function InfluxDB to achieve high reliability of Prometheus data Effectively prevent data loss Ali Cloud Prometheus makes full use of the ability of Ali Cloud InfluxDB to enhance the ability of Ali Cloud Prometheus to achieve long-term data storage, high availability and high reliability, and at the same time achieve "global" query of data. Summarize Aliyun time series spatio-temporal database TSDB series products, focusing on the Internet of things, monitoring APM, traffic travel, car networking, logistics and other industries, committed to creating a cloud-side integrated time series spatio-temporal database. Developers and enterprise customers are welcome to use it and give us valuable suggestions. Author: Roin123 original link this article is the original content of Yunqi community and may not be reproduced without permission.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.