MySQL case-preliminary recovery: unlimited Crash from the library caused by alter 04/09 Update SLTechnology News&Howtos

MySQL case-preliminary recovery: unlimited Crash from the library caused by alter

2025-04-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

-text- -

Scene:

Database version when Crash occurs: MySQL-5.7.17, Crash occurs when the library is synchronized to a certain alter statement, and the same error is triggered repeatedly when Crash Recovery is restarted, resulting in Crash

Conclusion:

The read-only service temporarily switches to another read-only instance, and remakes a slave library for business use.

Key points!

The solution to the problem is certainly not to restore the library, but to make a new one, so the main purpose of the operation recorded in this paper is to practice and try. Please operate carefully in the production environment.

Scene:

The scene of production environment, shielding some sensitive information

First, intercept the important information compared in Error log:

Click (here) to collapse or open

2017-05-25 14:10:23 0x7f9b5b927700 InnoDB: Assertion failure in thread 140305232983808 in file fsp0fsp.cc line 2108

InnoDB: Failing assertion: frag_n_used > 0./usr/sbin/mysqld (_ Z14fseg_free_stepPhbP5mtr_t+0x36e) [0x11d073e]. / usr/sbin/mysqld (_ Z23trx_undo_insert_cleanupP14trx_undo_ptr_tb+0x16c) [0x1100b9c]. / usr/sbin/mysqld (_ Z19innobase_commit_lowP5trx_t+0x17) [0xf6a0e7] / usr/sbin/mysqld (_ ZN15Query_log_event14do_apply_eventEPK14Relay_log_infoPKcm+0x68d) [0xdf4c1d] / usr / sbin/mysqld (_ Z27slave_worker_exec_job_groupP12Slave_workerP14Relay_log_info+0x293) [0xe52353]. / usr/sbin/mysqld (handle_slave_worker+0x363) [0xe34f73] .Trying to get some variables.Some pointers may be invalid and cause the dump to abort.Query (xxxxxxxxx): alter table tb1 add primary key (`col1`) `col2`, `col3`) Connection ID (thread ID): 129443Status: NOT_KILLED.2017-05-25T14:10:24.858719+08:00 0 [Note] InnoDB: Ignoring data file'. / dbname/tb0.ibd' with space ID 375, since the redo log references. / dbname/tb0.ibd with space ID 280.2017-05-25T14:10:24.860628+08:00 0 [Note] InnoDB: Ignoring data file'. / dbname/tb1.ibd' with space ID 374, Since the redo log references. / dbname/tb1.ibd with space ID 279.2017-05-25T14:10:44.635655+08:00 0 [Note] InnoDB: Ignoring data file'. / dbname/#sql-7b4e1_9f3a1.ibd' with space ID 375. Another data file called. / dbname/tb0.ibd exists with the same space ID.2017-05-25T14:10:39.845947+08:00 0 [Note] InnoDB: Ignoring data file'. / dbname/#sql-7b4e1_9f3a1.ibd' with space ID 374. Another data file called. / report_gamein/t30741_hs_g47_day.ibd exists with the same space ID.2017-05-25T14:11:02.972181+08:00 0 [Note] Starting crash recovery...2017-05-25T14:11:02.972227+08:00 0 [Note] Crash recovery finished.

2017-05-25T14:11:40.547210+08:00 0 [Note] InnoDB: Rollback of trx with id 3377056858 completed2017-05-25T14:11:40.554385+08:00 0 [Note] InnoDB: Rollback of non-prepared transactions completed

.2017-05-25T14:11:03.752024+08:00 0 [Warning] Recovery from master pos 43851595 and file binlog.006933 for channel 'amaster'. Previous relaylog pos and relaylog file had been set to 43851818, / home/mysql/log/relaylog/relaylog-amaster_3537.020527 respectively.2017-05-25T14:11:03.817050+08:00 0 [Warning] Recovery from master pos 789680988 and file binlog.027468 for channel 'bmaster'. Previous relaylog pos and relaylog file had been set to 789681243, / home/mysql/log/relaylog/relaylog-bmaster_3637.020768 respectively.2017-05-25T14:11:04.353863+08:00 0 [Note] / usr/sbin/mysqld: ready for connections.Version: '5.7.17 MySQL Community Server log' socket:' / home/mysql/data/mysqld.sock' port: 3306 MySQL Community Server (GPL) 2017-05-25 14:11:04 0x7f10762de700 InnoDB: Assertion failure in thread 139708678924032 in file fsp0fsp.cc line 2108

InnoDB: Failing assertion: frag_n_used > 0.

The problem was intercepted for the first time, and mysql restarted the log of Crash Recovery.

You can see from the red mark that the crash recovery was already completed the first time it was restarted, and the mysqld process has already been ready for connections.

But immediately triggered the same problem, causing mysql to have Crash again, and mysqld_safe also "disappeared".

In subsequent restart attempts, each time before Crash, there is a message like this:

Click (here) to collapse or open

2017-05-25T15:25:33.025244+08:00 0 [Note] InnoDB: Cleaning up trx with id 3377057419

With the fsp0fsp.cc line 2108 shown in the stack information, find this line in the source code (red)

Click (here) to collapse or open

Fsp_free_page (

Const page_id_t& page_id

Const page_size_t& page_size

Mtr_t* mtr)

.if (state = = XDES_FULL_FRAG) {

/ * The fragment was full: move it to another list * /

Flst_remove (header + FSP_FULL_FRAG, descr + XDES_FLST_NODE)

Mtr)

Xdes_set_state (descr, XDES_FREE_FRAG, mtr)

Flst_add_last (header + FSP_FREE_FRAG, descr + XDES_FLST_NODE)

Mtr)

Mlog_write_ulint (header + FSP_FRAG_N_USED

Frag_n_used + FSP_EXTENT_SIZE-1

MLOG_4BYTES, mtr)

} else {

Ut_a (frag_n_used > 0)

Mlog_write_ulint (header + FSP_FRAG_N_USED, frag_n_used-1)

MLOG_4BYTES, mtr)

}

Combined with the log of the cleanup phase in the log and the analysis of innodb in Ali's mysql kernel monthly report, we can know that the alter statement will call this method during the drop index to retrieve the data page / return table space.

So the problem is basically determined that the alter statement caused the Crash of this mysql, and every restart reported an error in the same place indicating the alter. The data page of drop index does not complete cleanup. It tries every time it is restarted, and then triggers Crash.

Since the phase that triggers Crash is always after Crash Recovery, it is inferred that mysql may be in rollback phase, so innodb_force_recovery = 3 is set.

It is found that the database is normal after skipping rollback ~\ (≧▽≦) / ~

Then things will be easy, when innodb_force_recovery = 3, you can manipulate the table, such as. Drop.

In the log of the first Crash, you can see that there may be problems with the alter of both tb0 and tb1, because the intermediate result table generated by alter is still in the data directory

So the easiest way to stop rollback triggering the alter statement cleanup is to drop these two tables ~\ (≧▽≦) / ~

_ (: "∠") _ of course you can't drop... directly Dump out first.

However, before you actually do this, check the status of the current database:

To reproduce the transaction of the main library from the library, stay at this stage: 8fc6463a-f9b1-11e6-b218-ce0e1b052154:1-2241902370 virtual 2241902372-2241902383

By looking at relaylog, we can find the missing 2241902371 transaction, which happens to be the alter statement.

There are several important pieces of information, in addition to GTID and SQL, there is another one: last_committed=91691

Let's take a look at what transactions are after 2241902383.

Then it is basically confirmed that of the transaction groups with the current state of 91692, all but 2241902371 have been successfully committed, while the next transaction group 91693 has not yet started.

Check the problematic tb0 and tb1 again to confirm that the table structure corresponding to the 2241902371 transaction does not have a primary key, so the statement is not executed, so the current library should be in a consistent state (the success of Crash Recovery is the main premise)

Let's start to get mysql started.

_ (: tb0 "∠) _ dump both tb0 and tb1 first.

~\ (≧▽≦) / ~ then drop drops ~ (if you need to re-synchronize later, remember to turn off sql_log_bin and do not write drop statements to binlog)

(⊙ configuration ⊙) remove the innodb_force_recovery from the configuration file and restart mysql.

Done, the database is normal.

After that, it's easy to re-import the two tables into the database, turn on synchronization, and see the status of synchronization:

You can see that the missing 2241902371 transactions have been pulled down again, and the 2241902384 and later transactions have been pulled and executed normally.

When synchronization and synchronization catch up, you can verify whether the data is really consistent.

PS: since both MTR and Crash Recovery are successful during the restart, and the synchronization status is normal, and the transaction number of GTID remains continuous, from a personal point of view, it is more likely that the data is consistent ~

PPS: can this library be delivered to users for continued use? Although it is not recommended to deliver it back like this, but if the user says it can be used, it can be used.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.