Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

The Python script deletes the queried data into

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Share

Shulou(Shulou.com)06/03 Report--

Demand background

The business system stores all kinds of reports and statistical data in ES. Due to historical reasons, the system makes statistics in a full way every day. With the passage of time, the data storage space of ES is under great pressure. At the same time, due to the lack of planned use of es indexes, individual indexes even exceed the maximum number of documents, the challenge to operators is that they need to solve this problem at a minimum cost. The following is an example of using python scripts to solve this problem in an intranet development and test environment.

Each Elasticsearch shard is a Lucene index. There is a maximum number of documents you can have in a single Lucene index. As of LUCENE-5843, the limit is 2147483519 (= Integer.MAX_VALUE-128C) documents. You can monitor shard sizes using the _ cat/shards API.

Realization idea

Es itself supports deleting the queried data in the form of "_ delete_by_query". First, we get all the index information on the current es service through the "_ cat/indices" entry.

The first column represents the current health status of the index

The third column represents the name of the index

The fourth column represents the storage directory name of the index on the server

The fifth and sixth columns represent the number of copies and shard information of the index.

The seventh column represents the number of documents for the current index

The last two columns represent the storage space of the current index, respectively, and the penultimate column is equal to the penultimate column multiplied by the number of copies.

Secondly, we splice the delete command in the form of curl and send it to the es server for execution, where the createtime field is the data generation time, in milliseconds.

Curl-X POST "http://192.168.1.19:9400/fjhb-surveyor-v2/_delete_by_query?pretty"-H 'Content-Type: application/json'-d' {" query ": {" range ": {" createTime ": {" lt ": 1580400000000 "format": "epoch_millis"} 'concrete implementation #! / usr/bin/python#-*-coding: UTF-8-*-# Import the necessary module import requestsimport timeimport datetimeimport os# definition to get the ES data dictionary function Return index name and index storage size dictionary def getData (env): header = {"Content-Type": "application/x-www-form-urlencoded", "user-agent": "User-Agent:Mozilla/5.0 (Windows NT 10.0) Win64 X64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36 "} data = {} with open ('result.txt','w+') as f: req = requests.get (url=env+'/_cat/indices') Headers=header) .text f.write (req) f.seek (0) for line in f.readlines (): data [line.split () [2]] = line.split () [- 1] return data# defines the unix time conversion function Returned in millisecond format. The return value is int type def unixTime (day): today = datetime.date.today () target_day = today + datetime.timedelta (day) unixtime = int (time.mktime (target_day.timetuple () * 1000 return unixtime# defines the delete es data function, calls the system curl command to delete, and needs to input the environment, the time range of the data to be deleted (that is, the data before how many days ago) parameter Due to the large number of indexes We can only deal with indexes greater than 1G def delData (env,day): header = 'Content-Type: application/json' for key Value in getData (env) .items (): if 'gb' in value: size = float (value.split (' gb') [0]) if size > 1: url = dev +'/'+ key +'/ _ delete_by_query?pretty' command = ("curl-X POST\"% s\ "- H'% s'" "- d'\" query\ ": {\" range\ ": {\" createTime\ ": {\" lt\ ":% s \ "format\":\ "epoch_millis\"}}'% (url, header Day) print (command) os.system (command) if _ _ name__ ='_ _ main__': dev = 'http://192.168.1.19:9400' test1 =' http://192.168.1.19:9200' test2 = 'http://192.168.1.19:9600' day = unixTime (- 30) delData (dev,day) delData (test1,day) delData (test2 Day) result verification

Before deletion

After deletion

Matters needing attention

1. Currently, scripts are scheduled using the operating system crontab and run once a day.

2. It takes a long time to delete the data for the first time because of the large amount of data. The deletion efficiency is OK after deleting the amount of data for one day later.

3. The script does not take into account exceptions such as server error reports and alarm notifications, which need to be supplemented in practical application scenarios.

4. Log recording is not considered in the script, and the actual application scenario needs to be supplemented.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Servers

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report