Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Production environment uses pt-table-checksum to check MySQL data consistency

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

The migration of the company's data center from the managed data room to Aliyun requires verification of data consistency after mysql migration (Replication), but it does not affect the use of the production environment. Pt-table-checksum has become an excellent and only checking tool.

Pt-table-checksum is one of the components of Percona-Toolkit, which is used to detect whether the data of MySQL master and slave libraries are consistent. The principle is that the sql statement based on statement is executed in the master database to generate the checksum of the master database block, the same sql statement is passed to the slave database for execution, and the checksum of the same data block is calculated on the slave database. Finally, the checksum value of the same data block on the master-slave database is compared to judge whether the master-slave data is consistent. The detection process divides the table into blocks (chunk) according to the row according to the unique index, which is calculated as a unit, and the locking of the table can be avoided. During the detection, the replication delay and the load of the master will be automatically determined. When the threshold is exceeded, the detection will be automatically suspended to reduce the impact on the online service.

Pt-table-checksum can handle most scenarios by default. Officials say that even if there are thousands of libraries and trillions of rows, it can still work well. This is due to the fact that the design is very simple. One table at a time does not require too much memory and redundant operations. If necessary, pt-table-checksum will dynamically change the chunk size according to the server load to reduce the latency of slave libraries.

To reduce interference with the database, pt-table-checksum also automatically detects and connects to the slave library, and of course, if it fails, you can specify the-- recursion-method option to tell the slave library where it is. Its ease of use is also reflected in that if there is a delay in replication, the checksum in the slave library will be paused until it catches up with the calculation time of the master library (also through the option-set a maximum tolerable delay, which is also considered inconsistent).

To secure the primary database service, the tool implements a number of protection measures:

Automatically set innodb_lock_wait_timeout to 1s to avoid causing

By default, pt-table-checksum pauses when the database has more than 25 concurrent queries. You can set this threshold by setting the-- max-load option

When the task is stopped with Ctrl+C, the tool will complete the current chunk test normally. Next time, use the-- resume option to start to resume the next chunk.

Working process

Take a look directly at the instructions of nettedfish:

1. Connect to the master library: the pt tool connects to the master library and then automatically discovers all slave libraries of the master library. Show full processlist is used by default to find the slave library, but this is valid only if the master and slave instance ports are the same.

3. Find out if the master library or slave library has replication filtering rules: this is the option to check by default for security. You can turn off this check, but this may cause checksum's sql statements to either not be synchronized to the slave library, or to find tables from the slave library that are not going to be checksum, which will cause the slave library to synchronize the card library.

5. Start getting tables and calculate them one by one.

6. If it is the first chunk of the table, then the chunk-size is generally 1000; if it is not the first chunk of the table, then use the results analyzed in step 19.

7. Check the table structure, convert the data type, etc., and generate the sql statement of checksum.

8. According to the index on the table and the distribution of data, choose the most appropriate method of split table.

9. Start the checksum table.

10. By default, delete the calculation results related to the last table before chunk a table. Unless-resume.

14. Based on the results of explain, determine whether the size of chunk exceeds the upper limit of your defined chunk-size. If it is exceeded, the chunk will be ignored in order not to affect online performance.

15. Add the for update lock to the line you want to checksum and calculate.

17-18. Store the results of the calculation in the master_crc master_count column.

19. Resize the next chunk.

20. Waiting for the slave library to catch up with the master library. If there is no delayed backup of the slave library running, it is best to check all slave libraries, and if it is found that the maximum delay is more than max-lag seconds, the pt tool will be paused here.

21. If the max-load of the main library is found to exceed a certain threshold, the pt tool will be paused here.

22. Continue with the next chunk until the table is finished by chunk.

23-24. Wait for the execution of the checksum from the slave library to facilitate the generation of summary statistical results. Each table is summarized and counted once.

25-26. Loop each table until the end.

After the verification, execute the following sql statement on each slave library to see if any master inconsistencies occur:

Select * from percona.checksums where master_cnt this_cnt OR master_crc this_crc OR ISNULL (master_crc) ISNULL (this_crc)\ G options you need to know

-- replicate-check: after executing the checksum query in the percona.checksums table, you may not necessarily check the results immediately-- yes will immediately compare the chunk crc32 value and output the DIFFS column, otherwise it will not be output. The default yes, if specified as-- noreplicate-check, is usually followed by the following-- replicate-check-only to output the DIFF result.

-- replicate-check-only: do not do the checksum query in the master-slave database, only query the results in the original percona.checksums table, and output information with inconsistent data. May be used to periodically check for consistency.

-- nocheck-binlog-format: log format is not detected. This option is important for ROW mode replication, because pt-table-checksum will set binlog_format=STATEMENT on Master and Slave (ensure that the slave library will also execute checksum SQL), and the MySQL restriction slave library cannot be set, so if the row copies the slave library, and then copies the new slave library as the master library (A-> B-> C), B's checksums data will not be transferred. (not verified)

-- replicate= specifies the database table in which the checksum calculation results are stored. If it is not specified, the default is percona.checksums.

However, we check that the mysql user generally does not have create table permission, so you may need to create it manually:

CREATE DATABASE IF NOT EXISTS percona CREATE TABLE IF NOT EXISTS percona.checksums (db CHAR (64) NOT NULL, tbl CHAR (64) NOT NULL, chunk INT NOT NULL, chunk_time FLOAT NULL, chunk_index VARCHAR (200) NULL, lower_boundary TEXT NULL, upper_boundary TEXT NULL, this_crc CHAR (40) NOT NULL, this_cnt INT NOT NULL, master_crc CHAR (40) NULL, master_cnt INT NULL, ts TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP, PRIMARY KEY (db Tbl,chunk), INDEX ts_db_tbl (ts,db,tbl)) ENGINE=InnoDB

Database user rights in the production environment are generally strictly managed. If the connection user is repl_user (that is, directly checked by the replication user), it should be given additional SELECT and LOCK TABLES permissions to other libraries. If you want to use pt-table-sync later, you need to write. Write access to the percona library:

GRANT ALL PRIVILEGEES on percona.* to repl_user@'%' IDENTIFIED BY 'repl_pass'; GRANT SELECT,LOCK TABLES,PROCESS,SUPER on *. * to repl_user@'%'

Note:

To reduce unnecessary hassle, make sure that your repl_user@'xxx' users can log in to both the master and slave libraries

The-- create-replicate-table option automatically creates the percona.checksums table, but it also means that additional CREATE TABLE permissions are given to the percona_tk@'xxx' user. Default yes

PROCESS is used to automatically discover slave library information, and SUPER permission is used for set binlog_format.

-- no-check-replication-filters means there is no need to check whether Filter is specified in the Master configuration. It will be checked by default. If Filter is configured, such as replicate_do_db,replicate-wild-ignore-table,binlog_ignore_db, etc., the checksum from the library will exit with an error because the table does not exist, so the official default is yes (--check-replication-filters), but we actually specify-- databases=, in the test, so there is no problem, just don't detect it.

-- empty-replicate-table: before each table checksum starts, clear the detection data before it (does not affect the checksum data of other tables), the default yes. Of course, if you use-- resume to start the test data will not be emptied.

When-- noempty-replicate-table is not empty, the chunk is not calculated, only.

-- databases=,-d: the database to be checked, separated by commas. Know with your toes-- databases-regex regularly matches the database to be tested,-- ignore-databases [- regex] ignores the checked library. Filter option.

-- tables=,-t: the table to check, separated by commas. If the tables you want to check are distributed in different db, you can use the form of-- tables=dbname1.table1,dbnamd2.table2. In the same way, there is tables-regex,--ignore-tables,--ignore-tables-regex. -- the checksum table specified by replicate is always filtered.

-- recursion-method: discover the way from the library. By default, pt-table-checksum can find the slave library replication process in the processlist of the master library to identify which slave libraries, but if you use a non-standard 3306 port, the slave library information will not be found. At this point, host will be used automatically, but you need to configure report_host and report_port information in slave my.cnf in advance, such as:

Report_host = MASTER_HOST report_port = 13306

The ultimate solution is that dsn,dsn specifies a table (such as percona.dsns), and the table row records the connection information of the master library (multiple) slave libraries. Any of the following situations apply:

The master library cannot automatically discover the slave library

Do not want to add additional configuration from the library (because you want to restart)

Master-slave detection connection user information is different

When multiple slaves only want to verify the consistency of the specified slaves

I prefer to use the DSN approach. This dsns table only needs to be accessible on the server where the pt-table-checksum command is executed. Here to correct an understanding, many people on the Internet say that pt-table-checksum should be executed on the master library, in fact, it is not. I have more mysql instances. I only need to install percona-toolkit on a certain server, and this service can access both the master library and the slave library. For specific usage, see the following example.

Check the master-slave consistency between the detection instance and the network segment

Scene:

Standard port 3306, which only checks the key tables of a certain library

One master, one slave, binlog is not ROW mode

For replication of the same IP address range, percona_tk@'192.168.5.%' has the necessary permissions:

GRANT ALL PRIVILEGEES on repl_user.* to repl_user@'192.168.5.%' IDENTIFIED BY 'repl_pass'; GRANT SELECT,LOCK TABLES,PROCESS,SUPER on *. * to repl_user@'192.168.5.%'

This is the easiest way to explain the information you want to connect and check:

# pt-table-checksum hobby master host host nocheck-replication-filters, upright replicated usertitle, pendant replication passbook, Prun3306\-- databases=d_ts_profile-- master-nocheck-replication-filters

If it is run for the first time, the percona.checksums table is automatically created in the main library.

Output result:

Replica lag is 2307 seconds on mysql-5. Waiting. Checksumming d_ts_profile.t_user_account: 54:48 remain TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE 12-18T16:07:48 00 313641 9 0 146.417 d_ts_profile.t_user 12-18T16:08:00 00 397734 12 0 11.747 d_ts_profile.t_user_detail 12-18T16:08:24 00 1668327 20 23.941 d_ts_profile.t_user_group

TS: the timestamp when the check was completed.

ERRORS: the number of errors and warnings that occurred during the check.

DIFFS: inconsistent number of chunk. When you specify-- no-replicate-check, that is, check but do not immediately output the results, it will always be 0; when you specify-- replicate-check-only, you will not check that only crc32 is calculated from the checksums table, and only inconsistent information will be displayed (after all, most of the output should be consistent and easy to cause interference).

ROWS: the number of table rows compared.

CHUNKS: the number of blocks divided into the table.

SKIPPED: the number of blocks skipped due to errors or warnings or too large.

TIME: time of execution.

TABLE: the name of the table being checked

Cross-data center detection using dsn

Scene:

Non-standard port 13306, only check all libraries starting with d_ts

One master and two slaves. Binlog is in ROW mode, of which one slave is on Aliyun ECS. The master database cannot directly access the slave library.

The account used for testing is different because it is not%.

The following is the situation of my environment

MASTER_HOST:13306 main library

REPLICA_HOST:3306 slave library

Server where PTCHECK_HOST pt-table-checksum is located

DSN_DBHOST, the database that records the dsns from the library (connection)

The best way is for dsn to specify the slave library. Install percona-toolkit in the slave library or the host of the same network segment.

Create the DSNs table on the DSN_DBHOST database instance:

Create database percona; CREATE TABLE `percona`.`dsns` (`id` int (11) NOT NULL AUTO_INCREMENT, `dsn` int (11) DEFAULT NULL, `dsn` varchar (255) NOT NULL, PRIMARY KEY (`id`)); GRANT ALL PRIVILEGEES on percona.* to percona_tk@'PTCHECK_HOST' IDENTIFIED BY 'percona_pass'

If you have multiple instances to check, you can create multiple similar dsns tables. The above percona_tk users are only used to access the dsn library. Insert slave library information:

Use percona; insert into dsns (dsn) values ('hackers replicated host Prune3306 reproduced username pendant replicated pass.'

DSNs records dsn column formats such as hashes replicated host host, uplink replicated userjournal pendant replicated pass

Execute the check command on PTCHECK_HOST:

# pt-table-checksum-- replicate=percona.checksums-- nocheck-replication-filters-- no-check-binlog-format\ hype master host host, uplink replicated username, pendant replicate passbook, databases-regex=d_ts.* 13306-- databases-regex=d_ts.*\-- recursion-method dsn=h=DSN_DBHOST,u=percona_tk,p='percona_pass',P=3306,D=percona,t=dsn

I won't say much about the meaning of the option.

If the test is consistent, it is actually for peace of mind, especially when doing data migration. If it is inconsistent, then you need to use the pt-table-sync tool without introduction.

Common mistakes

Diffs cannot be detected because no slaves were found

Cannot find slave library automatically, make sure that processlist or host or dsns mode is used correctly.

Cannot connect to hackers slave1. Com.com.pause.

You can add PTDEBUG=1 before the pt-table-checksum command to see the detailed execution process, such as port, user name, permission error.

Waiting for the-replicate table to replicate to XXX

The problem is that the percona.checksums table does not exist in the slave library, and the root cause is that it is not synchronized from the master library, so check to see if the slave library has a serious delay.

Pausing because Threads_running=25

Repeatedly print out a message similar to the stop check above. This is because the number of threads running in the current database is greater than the default 25ptMurtableMutual checksum paused to reduce the pressure on the library. Wait for the database pressure to pass, or you can directly Ctrl+C the terminal, next time add-- resume to continue execution, or increase-- max-load= value.

Character set problem

Error checksumming table Error executing checksum query: DBD::mysql::st execute failed: Illegal mix of collations12-17T14:48:04 Error checksumming table d_ec_cs.t_online_cs: Error executing checksum query: DBD::mysql::st execute failed: Illegal mix of collations for operation 'concat_ws' [for Statement "REPLACE INTO `percona`.`ali _ checksum` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT? COUNT (*) AS cnt, COALESCE (CONV (BIT_XOR (CAST (CRC32 (CONCAT_WS ('#', `froomcsid`), `froomcorpid`, `f _ valid`, `froomshowname`, `froomonlinemsg`, `froomoffline` _ msg`, `froomshow _ mobile`, `froomgroupid`, `fqqq`, `froomshowqq`, `fharmmsn`, `froomsmsonline`, `fscheme`, `ftel`, `ftelno`, `froomshowtel`, `fcontact`, `fmobile`, `fposition` `fother1`, `fother2`, `froomothertext1`, `froomothertext2`, `froomqemail`, `froomqqfirst`, `froomaids`, `froomaidsqqq`, `froomaidscrmqqq`, `froomaidscrmqqq`, `froomaidsskype`, `froomaidsaliww`, `froomaidsmsn`, `froomaidsalibaba`, `froomaidsalitdera`, CONCAT (ISNULL (`fattenshowname`), ISNULL (`froomaidscrmqq`), `froomaidsskype`, `froomaidsalibaba`, `froomaidsalibaba`) ISNULL (`franksmsonline`), ISNULL (`fharmothertext1`), ISNULL (`fharmothertext2`), ISNULL (ISNULL (`femail`)) AS UNSIGNED), 10,16)), 0) AS crc FROM `droomeccs`.`t _ online_ cs` / * checksum table*/ "with ParamValues: 0qualified profiles, 1categories, 3=undef, 4=undef, 5=undef] at / usr/bin/pt-table-checksum line 10520

It's a bug, which can't be solved for the time being, Illegal mix of collations for operation 'concat_ws'.

Transferred from: https://segmentfault.com/a/1190000004309169

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report