Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Figure database Nebula Graph TTL features

2025-04-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Guide reading

In the current era of big data, the amount of data we deal with needs to be calculated by TB, PB, or even EB. How to deal with huge data sets is a common problem for people engaged in the database field. The core of solving this problem is whether the data stored in the database is valid and useful data, so how to improve the utilization rate of the effective data in the data and clean out the invalid expired data, it has become a hot topic in the database field. In this article we will focus on how to deal with out-of-date data in the database.

There are a variety of ways to clean expired data in a database, such as stored procedures, events, and so on. Here, the author gives an example to briefly illustrate the process of cleaning up expired data by stored procedures + events that are often used by DBA.

Stored procedures + event cleaning data stored procedures (procedure)

A stored procedure is a collection of one or more SQL statements. When a series of read and write operations are performed on the database, the stored procedure can encapsulate these complex operations into a code block for reuse, which greatly reduces the workload of database developers. Usually stored procedures are compiled once and can be executed many times, so the efficiency is greatly improved.

Stored procedures have the following advantages:

Simplify the operation, encapsulate some highly repetitive operations into a stored procedure, simplify the batch processing of these SQL calls, SQL + loops, reduce traffic, that is, "run batches" unified interface, ensure the safety of data compiled and executed many times at a time, and improve efficiency.

Take MySQL as an example, if you want to delete data, the table structure is as follows:

Mysql > SHOW CREATE TABLE person +- -+ | Table | Create Table | + -+ | person | CREATE TABLE `person` (`age` int (11) DEFAULT NULL `inserttime`datetime DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8 | +-+- -- + 1 row in set (0.00 sec)

Create a table called person, where the inserttime field is of type datetime, and we use the inserttime field to store the generation time of the data.

Create a stored procedure that deletes the specified table data, as follows:

Mysql > delimiter / / mysql > CREATE PROCEDURE del_data (IN `date_ inter` int)-> BEGIN-> DELETE FROM person WHERE inserttime

< date_sub(curdate(), interval date_inter day); ->

END / / mysql > delimiter

Create a stored procedure called del_data, and the parameter date_inter specifies the number of days from the current time of the data to be deleted. When the inserttime field value of the table person (type datetime) plus the parameter date_inter day is less than the current time, the expired data is considered to have expired and the expired data will be deleted.

Event (event)

An event is a procedural database object that is called at the appropriate time. An event can be called once or started periodically, and it is managed by a specific thread, which is called the event Scheduler.

Events, like triggers, are started when something happens. When a statement is started on the database, the trigger starts, and the event starts based on the scheduled event. Because they are similar to each other, events are also called temporary triggers. The event scheduler can be accurate enough to perform one task per second.

Create an event that periodically calls the stored procedure at some point to clean up the data.

Mysql > CREATE EVENT del_event-> ON SCHEDULE-> EVERY 1 DAY-> STARTS '2020-03-20 12 ON SCHEDULE 00'-> ON COMPLETION PRESERVE ENABLE-> DO CALL del_data (1)

Create an event called del_event, which starts on 2020-03-20 and executes the stored procedure del_data (1) at 12:00:00 every day.

Then execute:

Mysql > SET global event_scheduler = 1

Open the event. In this way, the event del_event is automatically executed in the background at the specified time. Through the above stored procedure del_data and event del_event, the expired data can be deleted automatically on a regular basis.

TTL (Time To Live) cleaning data

Through the combination of the above stored procedures and events, you can regularly clean up the expired data in the database. Figure Database Nebula Graph provides a more simple and efficient way to use TTL to automatically clean out-of-date data.

The benefits of using TTL to automatically clean expired data are as follows:

Simple and convenient to deal with through the internal logic of the database system, safe and reliable

The database will automatically judge whether it needs to be processed according to its own status, and if it needs to be processed, it will be processed automatically in the background without human intervention. Introduction to TTL

TTL, whose full name is Time To Live, is used to specify the life cycle of the data, which will be deleted automatically after the expiration of the data expiration date. In the graph database Nebula Graph, we implement the TTL function. After the user sets the survival time of the data, the system will automatically delete the expired points or edges from the database within a predetermined time.

In TTL, expired data will be deleted at the next compaction, and query will filter out expired points and edges before the next compaction.

The TTL function of the figure database Nebula Graph needs to be used together with the two fields ttl_col and ttl_duration. The expiration threshold is the value corresponding to the attribute specified by ttl_col plus the number of seconds set by ttl_duration. Where the type of field specified by ttl_col should be integer or the unit of measurement of timestamp,ttl_duration is seconds.

TTL read filtering

For tag / edge,Nebula Graph, push the read data filtering logic down to the storage layer for processing in TTL. In the storage layer, first get the TTL information of the tag / edge, then traverse each vertex or edge in turn, take out the ttl_col field value, according to the ttl_duration value plus the ttl_col column field value, compare with the current time timestamp to determine whether the data is out of date, and the expired data will be ignored.

How TTL compactionRocksDB files are organized

The underlying storage of the figure database Nebula Graph uses that RocksDB,RocksDB files on disk are divided into multiple layers. The default is 7 layers, as shown in the following figure:

How SST files are organized on disk

The files contained in the Level 0 layer are generated from the Memtable flush in memory to the disk, and the SST files are arranged in order by key within a single file, and there is no order between the files. Multiple files on other Level are sorted according to key, and the files are also in order, as shown in the following figure:

File data partition of non-Level 0 layer

RocksDB compaction principle

RocksDB is implemented based on LSM, but LSM is not a specific data structure, but a concept and design idea of data structure. Refer to the LSM paper for details. The most important part of LSM is compaction, because the data files are written by Append only, while for expired data, duplicate and deleted data need to be cleaned up step by step through compaction.

RocksDB compaction logic

The compaction strategy of RocksDB we adopted is Level compaction. When the data is written

When RocksDB, the data is first written to a Memtable, and when a Memtable is full, it becomes the Memtable of Immutable. RocksDB sends the Memtable flush to disk through a flush thread in the background to generate a Sorted String Table (SST) file, which is placed on the Level 0 layer. When the number of SST files in Level 0 layer exceeds the threshold, compaction will be performed with Level 1 layer. It is usually necessary to compaction all files of Level 0 into Level 1, because the key of files of Level 0 is overlapped.

The compaction for Level 0 and Level 1 is as follows:

Compaction of Level 0 and Level 1

Like other compaction rules for Level, take the compaction of Level 1 and Level 2 as an example, as follows:

Compaction of Level 1 and Level 2

When the Level 0 compaction is complete, the total file size or the number of files for Level 1 may exceed the threshold, triggering compaction for Level 1 and Level 2. Select at least one file compaction from Level 1 layer to key overlapping files in Level 2. The compaction of the next Level may be triggered after compaction, and so on.

Without compaction, writing is very fast, but this results in poor read performance as well as serious spatial magnification problems. In order to balance the relationship among write, read, and space, RocksDB executes compaction in the background to merge the SST of different Level.

TTL compaction principle

In addition to the above default compaction operation (sst file merge), RocksDB also provides CompactionFilter functionality that allows users to customize personalized compaction logic. Nebula Graph uses this CompactionFilter to customize the TTL functionality discussed in this article. This function is RocksDB in the compaction process, each time a piece of data is read, a custom Filter function is called. The implementation of TTL compaction is to implement the TTL expired data deletion logic in the Filter function, as shown below:

First obtain the TTL information of tag / edge, then traverse each vertex or edge data, take out the ttl_col column field value according to the ttl_duration value plus the ttl_col column field value, compare it with the timestamp of the current time, and then determine whether the data is out of date, and the expired data will be deleted. TTL usage

In the graph database Nebula Graph, edge and tag implement the same logic. Here, we only take tag as an example to introduce the use of TTL in Nebula Graph.

Create a TTL property

There are two ways to use the TTL attribute in Nebula Graph:

Create tag specifies ttl_duration to represent the duration of the data, in seconds. Ttl_col specifies which column is the TTL column. The syntax is as follows:

Nebula > CREATE TAG t (id int, ts timestamp) ttl_duration=3600, ttl_col= "ts"

When the value of the ttl_col column field of a record plus the value of ttl_duration is less than the timestamp of the current time, the record expires, otherwise the record does not expire.

When the value of ttl_duration is non-positive, the tag property of the point does not expire. Ttl_col can only specify column names of type int or timestamp.

Or the TTL attribute is not specified when create tag. If you want to use the TTL feature later, you can use alter tag to set the TTL property. The syntax is as follows:

Nebula > CREATE TAG t (id int, ts timestamp); nebula > ALTER TAG t ttl_duration=3600, ttl_col= "ts"; View TTL properties

After creating a tag, you can use the following statement to view the TTL property of the tag:

Nebula > SHOW CREATE TAG ttl_duration = | Tag | Create Tag | = = | t | CREATE TAG t (id int, ts timestamp) ttl_duration = 3600, ttl_col = id |-- modify TTL attribute

You can use the alter tag statement to modify the properties of TTL:

Nebula > ALTER TAG t ttl_duration=100, ttl_col= "id"; delete TTL attribute

When you do not want to use the TTL property, you can delete the TTL property:

You can set the ttl_col field to empty, delete the configured ttl_col field, or set ttl_duration to 0 or-1.

Nebula > ALTER TAG T1 ttl_col = "";-- drop ttl attribute

Delete the configured ttl_col field:

Nebula > ALTER TAG T1 DROP (a);-- drop ttl_col

Set ttl_duration to 0 or-1:

Nebula > ALTER TAG T1 ttl_duration = 0;-- keep the ttl but the data never expires example

The following example shows that when using the TTL function and the data is out of date, when querying the data for that tag, the expired data is ignored.

Nebula > CREATE TAG t (id int) ttl_duration=100, ttl_col= "id"; nebula > INSERT VERTEX t (id) values 102: (1584441231); nebula > FETCH prop on t 102 execution succeeded (Time spent: 5.945 apiece 7.492 ms)

Note:

When a column is used as a ttl_col value, change is not allowed.

You must remove the TTL property before change the column. You cannot use the same tag,index and TTL functions at the same time. Even if index and TTL are created in different columns, they cannot be used at the same time.

The logic of edge is the same as that of tag, so I won't go into details here.

So much for the introduction of TTL. If you have any improvement ideas or other requirements for the TTL of the graph database Nebula Graph, please go to the GitHub: https://github.com/vesoft-inc/nebula issue area to mention issue to us or go to the official forum: https://discuss.nebula-graph.io/ to make suggestions under the Feedback category.

The author has something to say: Hi, I am panda sheep, I am a map database Nebula Graph R & D engineer. I am very interested in the database field, and I also have my own experience. I hope the experience I have written can help you. If there are any irregularities, I also hope to help correct them. Thank you.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 216

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report