What is the MySQL master's inconsistent repair process? 07/11 Update SLTechnology News&Howtos

What is the MySQL master's inconsistent repair process?

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What this article shares with you is about the repair process of MySQL inconsistency. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Without saying much, let's take a look at it.

Yesterday, it was found that a 5.7 MySQL slave library reported an error while applying the log. Parallel replication was enabled from the library. The content of Last Error is:

Last_Error: Coordinator stopped because there were error (s) in the worker (s). The most recent failure being: Worker 0 failed executing transaction '8fc8d9ac mura62bMel 11e6Mua3eeMura4badb1b4a00Rose 7649' at master log mysql-bin.000011, end_log_pos 5290535. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

It still seems unfamiliar to this kind of problem, if you want to see some detailed information, you can see some in binlog. The relay log here is teststd-relay-bin.000013

/ usr/local/mysql/bin/mysqlbinlog-no-defaults-base64-output=DECODE-ROWS-verbose teststd-relay-bin.000013 > / tmp/mysqlbin.log

The repair method is slightly different from the conventional one.

STOP SLAVE

SET @ @ SESSION.GTID_NEXT = '8fc8d9aclya62bmur11e6Mua3eeMura4badb1b4a007649'

BEGIN; COMMIT

SET @ @ SESSION.GTID_NEXT = AUTOMATIC

START SLAVE

Then apply it again, but I find that the problems encountered in my list seem to be a little more troublesome than I thought. You can see from the error log that an error was thrown when the table sys _ user_audit in the backend database was modified.

2016-11-29T00:03:58.754386+08:00 [Note] Slave SQL thread for channel''initialized, starting replication in log' mysql-bin.000011' at position 5290028, relay log'. / teststd-relay-bin.000013' position: 27175

2016-11-29T00:03:58.754987+08:00 162[ ERROR] Slave SQL for channel': Worker 0 failed executing transaction '8fc8d9acmura62bmur11e6lica3eemura4badb1b4a00Vera 7649' at master log mysql-bin.000011, end_log_pos 5290535; Could not execute Update_rows event on table backend.sys_user_audit; Can't find record in' sys_user_audit', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 5290535, Error_code: 1032

After skipping manually several times, it is found that this is not a problem, if there are more such problems, you can directly modify the parameter slave_exec_mode to complete.

Set global slave_exec_mode=IDEMPOTENT

Of course, this way to solve the current problem is more appropriate, to keep up with the changes in the main library, reset to the original value.

Set global slave_exec_mode=STRICT; soon became normal from the state of the library, but there was a new problem. Why are the data in the master-slave database inconsistent? And more directly, I made a comparison between the master and slave of this table, and found that the data was inconsistent, and the slave database contained 9 fewer pieces of data than the master database. In this way, this slave library is not qualified.

One direct way to repair the data is to rebuild the slave library, but this is not a good solution. Is there any other solution? using navicator is also a good one, which can be done with a little bit of graphical interface. Another option is to use the pt tool to fix it.

I've heard about it for a long time, and I finally felt it today.

First of all, the installation is very regular, you can refer to my previous article. The installation and configuration of Percona-toolkit (R8 note day 86) is actually a download and decompression, a basic installation.

Create a temporary user in the master-slave library as a synchronization user, first do checksum, and then repair the data according to checksum, which involves two command-line tools, pt-table-checksum and pt-table-sync. Of course, there are many options for these two tools, and I only do some basic operations.

The way to create users is as follows. You need to compare the database of master-slave checksum to backend.

GRANT SELECT, PROCESS, SUPER, REPLICATION SLAVE ON *. * TO 'pt_checksum'@'10.127.%.%' IDENTIFIED BY' pt_checksum'

The tempdb created is percona, and the appropriate permissions need to be given.

Grant all on percona.* to 'pt_checksum'@'10.127.%.%'

The process of checksum is actually very complicated, there are generally the following steps, of course, we can simplify, achieve the goal and then delve into it.

Start doing checksum on the main library, if you encounter the following error.

# pt-table-checksum hobbies 10.127.128.99 mentions ubiquitous ptents checksummpons pamphlets ptacks checksummpons 3306-d backend-- nocheck-replication-filters-- replicate=percona.checksums

Replica teststd.test.com has binlog_format ROW which could cause pt-table-checksum to break replication. Please read "Replicas using row-based replication" in the LIMITATIONS section of the tool's documentation. If you understand the risks, specify-- no-check-binlog-format to disable this check.

The exact meaning of this option will be discussed later, there will be such a warning in row mode, you can ignore this check.

[root@testdb2 bin] # pt-table-checksum hobbies-- 10.127.128.99-- nocheck-replication-filters-- replicate=percona.checksums-- no-check-binlog-format

TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE

11-29T17:45:34 0 0105 10 0.017 backend.sys_resource

11-29T17:45:34 0 0 17 1 0 0.015 backend.sys_role

11-29T17:45:34 01 99 1 0 0.017 backend.sys_user

11-29T17:45:34 01 172 1 0 0.017 backend.sys_user_audit

After completion, a table will be generated under percona. The data in it is just some metadata for comparison. If there is a difference, there will be a diffs field marked.

If it is confirmed, you can start to repair the data, with the help of pt-table-sync, do not execute the SQL output, and enter the information of the master library and slave library correctly.

Pt-table-sync-- print-- replicate=percona.checksums hobbies 10.127.128.99 for upright ptacks checksummento pamphlet checksummpons 3306 hats 10.127.130.58 for upright ptweets checksummpons checksumpons 3306

And the principle of this operation is actually replace into.

REPLACE INTO `backend`.`sys _ user` (`id`, `user_ name`, xxxx) VALUES ('100th,' songlijiao@test-inc.com', 'songlijiao', xxxxx) / * percona-toolkit src_db:backend src_tbl:sys_user src_dsn:P=3306,h=10.127.128.99,p=...,u=pt_checksum dst_db:backend dst_tbl:sys_user dst_dsn:P=3306,h=teststd.test.com,p=... U=pt_checksum lock:1 transaction:1 changing_src:percona.checksums replicate:percona.checksums bidirectional:0 pid:28684 user:root host:testdb2.test.com*/

Remember to pay attention to permissions, and the user who synchronizes the data should have permission to operate on the target database.

Grant insert,delete,update,select on backend.* to 'pt_checksum'@'10.127.%.%'

This process does not last long and can be executed quickly, and it is perfectly normal to do checksum again after repair.

This is what the MySQL master's inconsistent repair process is like, and the editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.