Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The method of automatically deleting expired data by MongoDB (TTL index)

2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Preface:

Recently, due to the company's business needs, it is necessary to delete the expired data 3 months ago in order to free up space and facilitate maintenance.

Originally, I wanted to use crontab to write a script for regular execution, but I saw that Mongo itself has the function of automatically deleting expired data, so I'd better use it.

This method is to use TTL index. Later, I will write a script to delete regularly. For more examples of the use of TTL index, you can refer to this article: https://www.jb51.net/article/126810.htm

Introduction:

TTL index is a special index in MongoDB, which can support automatic expiration and deletion of documents after a certain period of time. Currently, TTL indexes can only be established on a single field, and the field type must be of type date or an array containing type date (if the array contains multiple fields of type date, the earliest time is the expiration time).

Official website introduction link: https://docs.mongodb.com/v3.2/core/index-ttl/

Mechanism:

When you build a TTL index in a field in the collection, there will be a single thread in the background to determine whether the document has expired by constantly querying the value of the index (default 60s once), and the action to delete the document is also based on the load of the mongod instance. If the load is high, it may be slightly delayed for a period of time.

It is also important to note that in the replica set members, the TTL background thread only deletes the expired data of primary. If this instance becomes the secondary role, the background thread is idle.

Create TTL index method:

The method of creating an index is the same as that of a normal index, except that an additional attribute is added

Example: in the collection of log_events, create an TTL index that expires one hour later on the createTime field.

> db.log_events.createIndex ({"createTime": 1},-- field name {expireAfterSeconds: 60cm 60})-Expiration time (in seconds) > db.log_events.getIndexes ()-View the index [{"v": 1, "key": {"_ id": 1}, "name": "_ id_", "ns": "tt.t1"} {"v": 1, "key": {"createTime": 1}, "name": "createTime_1", "ns": "tt.t1", "expireAfterSeconds": 3600}]

Modify the value of the expireAfterSeconds attribute of the TTL index:

Note: if you want to change the expiration time expireAfterSeconds, you can use the collMod method, otherwise you can only use the dropIndex (), createIndex () method to rebuild the index. I think this method is a headache under hundreds of millions of data.

Db.runCommand ({collMod: "log_events",-collection name index: {keyPattern: {createTime: 1},-createTime is the field name with TTL index expireAfterSeconds: 7200-modified expiration time (seconds)})

Although the above method can achieve automatic expiration deletion, if the business is very busy during the day, frequent deletion of data is bound to increase the load, so I want to delete expired data regularly at night (if there is less business at night).

The methods are as follows:

Add an expireTime field (to specify the expiration time) and set the value of the expireAfterSeconds attribute to 0

Note: the above createTime field does not need to have a TTL index, and the time of this expireTime needs to be specified at the time of insertion.

> db.log_events.createIndex ({"expireTime": 1},-field name {expireAfterSeconds: 0})-Expiration time (in seconds) > db.log_events.insert ({"expireTime": new Date ('Jan 22, 2016 2300 db.log_events.insert),-this document will automatically delete "logEvent": 2, "logMessage": "Success!"} at 23:00 on 2016-1-22)

In this way, we realize the action of automatic deletion at a specified time.

Restrictions:

There is a situation where TTL indexes cannot be used in a centralized situation.

① TTL indexes are single-field indexes, mixed indexes do not support TTL, and the expireAfterSeconds attribute is ignored

② cannot build TTL index on _ id primary key

③ cannot build an TTL index in capped collection because MongoDB cannot delete a document from capped collection

④ you cannot use createIndex () to change the value of existing TTL indexes. If you want to change expireAfterSeconds, you can use the collMod command, otherwise you can only delete the index and rebuild it.

⑤ you cannot create an TTL index on a field that already has an index. If you want to change a non-TTL index to a TTL index, you have to delete and rebuild the index.

Verify:

Although we have realized the function of automatic deletion in the evening, we are still worried about the load when deleting too much. With a simple test, we look at the consumption of 1.4 million expired data deleted by the TTL index in the billion-level collection.

Test the configuration:

OS:Vm virtual machine

CPU: 4

Memory: 8

Aggregate amount of data:

> db.t1.count ()

104273617

Because when I make test data, _ id increases sequentially, so I directly look at the createTime of the data of _ id=1500000, then calculate the time difference between this createTime and the current time, and then change the value of expireAfterSeconds according to this time difference, so that the 1.5 million data will expire and delete in 5 minutes.

After modifying the expireAfterSeconds, the output data of the "vmstat 1" command is strictly delayed.

My test results:

The whole process of delete operation is completed in about 90 seconds.

CPU occupies up to 90%, with an average of 50%

Memory occupies 3G

This is also a particularly accurate simulation, just a cursory look at the resource consumption of the TTL index to determine whether it is necessary to delete expired data in this way.

Screenshot of monitoring vmstat:

Summary

The above is the whole content of this article, I hope that the content of this article has a certain reference and learning value for your study or work, if you have any questions, you can leave a message and exchange, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report