MySQL case-insufficient disk space & MTS Group recovery failure 04/28 Update SLTechnology News&Howtos

MySQL case-insufficient disk space & MTS Group recovery failure

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

-text- -

Online business, the last internal testing phase

Background: MySQL-5.7.12

The phenomenon of the problem:

After receiving the alarm message, the heartbeat detection of the business master database A failed, and the standby database B was upgraded to the business master database.

The cause of the problem:

The remaining disk space of the data file directory of business main library An is 0%.

The process of problem solving:

After the disk space is full, the DML statement cannot change the data to the disk, thus causing the business master database to be unavailable.

So the way to deal with it is also very simple. After cleaning part of the space, purge dropped some binlog that had already been backed up.

However, a problem was found in the processing. Replication slave was reporting an error.

Check the error-log of mysql and you can see the following information:

At the same time, neither start slave nor change master can be completed, and similar error messages will be constantly refreshed in error-log.

Since the business master library An is degraded after the disk space is full, it can be confirmed that the business operations on standby library B cannot be performed on A, and there will be no consistency between the two libraries.

So I chose the way of reset slave all+change master and restored the synchronization.

After the fault report has been written, take a closer look at the causes of this phenomenon:

Found a bug record: occurred in MySQL-5.6

Http://bugs.mysql.com/bug.php?id=77496

And also found in 5.7.12:

Https://bugs.mysql.com/bug.php?id=80102

This problem occurs when relay_log_recovery=ON and slave-parallel-type=LOGICAL_CLOCK are mentioned in comment

The library that happens to be in trouble is also set up in this way.

The cause of the error:

Based on the Group Commit feature in 5.6and the Mutil-Thread-Slave feature implemented in 5.7, multiple threads will simultaneously reproduce the same group of transactions in relay-log.

Therefore, when multi-threaded replication slave is running, if it stops unexpectedly, because the consistency of the transaction cannot be confirmed, when relay_log_recovery is turned on, messages such as those in the screenshot will appear.

Officially recommended recovery steps:

1. Set up relay_log_recovery=0

two。 Start slave with a special command: START SLAVE UNTIL SQL_AFTER_MTS_GAPS

3. Set up relay_log_recovery=1

A very important point: relay_log_recovery is not a dynamic parameter and needs to restart the database instance

This problem was fixed in 5.7.13 and the whole procedure will be performed automatically on restart. Rebooted. Restart.

Although different from the bug documentation and the officially described scenario, the situation above should be caused by the same reason

Fortunately, we were able to confirm that there could be no transaction inconsistencies in the multi-threaded replication slave on library A, so we simply and rudely cleared the slave information and resynchronized it.

PS:GTID Dafa is good.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.