[percona-toolkit manual] pt-archiver data archiving 04/24 Update SLTechnology News&Howtos

[percona-toolkit manual] pt-archiver data archiving

2025-04-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Pt-archiver

Brief introduction

Save some lines in the MySQL table to another table or file

Summary

1. Usage:

Pt-archiver [optional]-- source DSN-- where WHERE

two。 Detailed introduction

In order to minimize the impact on query performance in the OLTP environment, I designed this tool to have a low performance impact and a positive impact.

In the case of taking out and archiving the old data in small batches and small batches.

This tool relies on plug-in principles to achieve scalability, you can add more localized features to this tool, such as extension

More complex rules, or create a data warehouse while extracting data.

Tip: find the first row, and then look back for more rows of data based on some indexes, the conditions contained in the last where section of the command

Instead of scanning the entire table, you should be able to rely on the index to find the data you satisfy.

You can specify the I parameter (index) in the-- source declaration, which is critical for operations that want to get rows of data based on the index.

Use-- dry-run to declare that only the corresponding SQL statements are generated after the tool is executed, without actually manipulating the data, and then to the generated

The SQL statement performs an explain operation to check its index usage. You can further check the handle operation type during the execution of the statement

Make sure the statement does not have a full table scan (handle operation type: read,read_rnd_next,delete, etc., underlying atomic operations)

Parameters.

Note:

At least specify the-- dest,--file, or-- purge parameter

-- ignore and-- replace mutually exclusive

-- txn-size and-- commit-each mutually exclusive

-- low-priority-insert and-- delay-insert mutually exclusive

-- share-lock and-- for-update mutually exclusive

-- analyze and-- optimize mutually exclusive

-- no-ascend and-- no-delete mutually exclusive

If you are in COPY mode, the parameter value of-- dest inherits the value of-- source parameter by default.

-- analyze

After transferring the data, run ANALYZE TABLE to optimize the index information of the table.

-- analyze=d optimizes dest index information

-- analyze=s optimizes source index information

-- analyze=ds optimizes index information for source and target tables

-- no-ascend

Do not use incremental index optimization feature

By default, the pt-archiver tool uses the incremental indexing feature to optimize batch repetitive select operations, that is, when the next batch of data is archived

Starting from the location of the index at the end of the last batch of archives, you no longer need to scan from scratch to the location where you need to archive, but in multi-column composite indexes

The effect may be reduced in the case of

-- ascend-first

Incremental optimization using only the first column of the composite index, related to the above parameters, in cases where the composite index or where conditions are complex

Using only the first column of the composite index for incremental optimization is better than the above method that does not use the incremental index optimization feature at all

-- ask-pass

Enter the password interactively to prevent the development and testing of the peep screen.

-- buffer

Disable automatic refresh of the cache to the file specified by the-- file parameter, store the result set in the operating system cache, and flush the cache when committed.

(originally, each line was refreshed to the file once). But using operating system caching is an unreliable factor and may result in accidental loss

data.

When-- buffer is about 5-15%, the performance of this tool can be improved.

-- txn-size

Specifies how many rows of data a transaction processes. The default is 1. Set to 0, do not use batch feature, automatically commit transactions.

When the tool finishes processing the data that sets the number of rows, it commits the transaction to both the source and target libraries, and then refreshes to the-- file parameter.

This parameter in the file has a great impact on the performance of the source database, so the data should be taken into account when archiving the busy online OLTP server.

Extract performance and transaction commit performance. Increasing the value of this parameter can improve the performance of data extraction, but it will lead to the expansion of row lock range and even

Deadlock. Reducing the value of this parameter can alleviate the above situation, but too small will lead to the increase of transaction volume and the delivery of online business.

It can make a difference.

Author's own test: on PC, when set to 500, every 1K line takes only 2 seconds, but when set to 0, every 1K row is automatically committed.

It took nearly 38 seconds. Using a non-transactional engine, it is recommended to set it to 0.

-- commit-each

Each set of data is submitted once, and the txn-size parameter is automatically masked. You can use-- limit to control the total number of rows of operations.

If you want to archive a very large table, when limit=1000 plus txn-size=1000 is used at the same time, the tool opens a transaction connection

Read all the 999 rows of data that satisfy where, but since the tool itself does not know how much data is satisfied, it is possible

The whole table is scanned to the last row before the transaction is committed, so use the-- commit-each parameter to commit each group

Once can effectively avoid the above dilemma.

-- bulk-delete

Batch deletion, with a batch of data rows completed with a DELETE statement. It also means-- commit-each.

Usually, the method of deleting data is to delete data line by row according to the primary key, and batch deletion will have a significant speed under the condition of simple WHERE.

Improve.

This option defers the delete operation until all rows of data to be deleted have been collected. If there is a trigger before deletion, it will not be triggered.

(before delete), the trigger before the multi-line deletion will be triggered (before bulk delete)

[no] bulk-delete-limit

Batch deletion limit is enabled by default

By default, the-- bulk-delete parameter appends a-- limit parameter, and in some cases, you can use the

-- no-bulk-delete-limit ignores it, but still needs to add the-- limit parameter manually. This parameter is not recommended

-- bulk-insert

Use the LOAD DATA INFILE approach (in combination with bulk-delete and commit-each)

Compared to line-by-line insertion, by creating a temporary file for each batch of data, the data is first written to the temporary file, when a batch of data is obtained.

After picking it up, the import operation will be carried out.

To prevent data from being deleted line by line, but the data is not really archived in the new database because it is not actually archived in the cache file

To cause data loss in a data vacuum or even in the case of uncertainty, bulk-insert must be used with bulk-deletes

Appear in pairs to ensure that the data is not deleted by the old library until it is inserted into the new library.

This parameter can be used with-- low-priority-insert,--replace

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.