What is the principle of mysqldump consistent hot backup 07/02 Update SLTechnology News&Howtos

What is the principle of mysqldump consistent hot backup

2025-07-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What is the principle of mysqldump consistent hot backup? I believe many inexperienced people don't know what to do about it. Therefore, this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.

First perform a consistent backup with mysqldump:

$mysqldump-uroot-p-- skip-opt-- default-character-set=utf8-- single-transaction-- master-data=2-- no-autocommit-B D1 > backup.sql

Explanation of key parameters:

-- single-transaction: performs a consistent backup.

-- master-data=2: the binlog location information at the time of backup is required to be saved as comments in the dump result.

-B: specify the database to dump, where D1 is a library that uses InnoDB as the storage engine, where there is only one table T1.

After the execution is completed, you can get the general log generated by mysqld, which records the instructions that mysqldump sends to server during the backup.

I have marked the key steps with a box. Please see below for a specific explanation.

The main execution flow of mysqldump consistent backup

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

Connect server

Close all tables twice, and add read locks the second time.

Set the isolation level to repeatable, start the transaction and create a snapshot

Get the current binlog location

Unlock all tables

Dump the specified library and table

The following combination of SQL content and source code to introduce the above main steps in turn.

Process analysis

1. Connect server

Mysqldump first establishes a connection with server and initializes some session-level variables of session,set, corresponding to SQL as shown in the following figure

The corresponding source code in the main function is a call to the connect_to_db function:

If (connect_to_db (current_host, current_user, opt_password)) {free_resources (); exit (EX_MYSQLERR)

two。 Close all tables twice, and add read locks the second time.

After the connection is established, mysqldump immediately closes the table twice, and adds a read lock to all tables at the same time. The corresponding SQL is shown below:

The corresponding source code for this part in the main function is:

If ((opt_lock_all_tables | | opt_master_data | | (opt_single_transaction & & flush_logs)) & & do_flush_tables_read_lock (mysql)) goto err

You can see that the actual operation is performed by the do_flush_tables_read_lock function, but here we need to pay attention to the prerequisites for the operation to be performed. Looking at the code, we can see that this table closing operation can only be performed in three cases:

Hongmeng official Strategic Cooperation to build HarmonyOS Technology Community

All tables are explicitly required to be locked through the-- lock-all-tables option.

The-- master-data option requires the dump to include the binlog location in the result.

The consistent backup of a single transaction is specified through-- single-transaction, and the refresh of the log file is required through-- flush-logs.

As you can see here, except for the first case, which explicitly requires locking, case 3 requires that no other transaction is writing before refreshing the log, so it is natural to put a read lock on all tables. Case 2 requires that the binlog location of the dump progress time be accurately recorded in the dump result. In order to accurately obtain the current binlog location, it is naturally necessary to add a shared lock to all tables to prevent other parallel transactions from writing to cause binlog updates, so there is an action of closing the table and adding a read lock.

Here is a detail that we know-- the single-transaction option can perform a consistent backup, so why not close the table and add a lock when there is only the-- single-transaction option? This is because-- the consistent backup guaranteed by single-transaction depends on a storage engine that supports transactions (such as InnoDB). As mentioned later, mysqldump creates a current snapshot of the database and a transactional id by executing START TRANSACTION WITH CONSISTENT SNAPSHOT, and all data updates made by transactions after that transaction are filtered to ensure backup consistency. The advantage of this approach is that it will not interfere with the normal operation of other transactions during consistent backup, so it realizes the so-called "hot backup", but the disadvantage is that it relies on transactional storage engine. For tables that use MyISAM and other storage engines that do not support transactions,-- single-transaction cannot guarantee their data consistency.

Then look at the source code of the do_flush_tables_read_lock function:

Static int do_flush_tables_read_lock (MYSQL * mysql_con) {return (mysql_query_with_error_report (mysql_con, 0, (opt_master_data! = 0)? "FLUSH / *! 40101 LOCAL * / TABLES": "FLUSH TABLES") | | mysql_query_with_error_report (mysql_con, 0, "FLUSH TABLES WITH READ LOCK");}

You can see that the logic is relatively simple, that is, two query are passed to server for execution, FLUSH TABLES and FLUSH TABLES WITH READ LOCK, respectively, in order of priority. Here, the core action is the latter query. The reason why the previous FLUSH TABLES is needed is based on performance considerations to minimize the impact of locking on other transactions.

3. Set the isolation level to repeatable, start the transaction and create a snapshot

After shutting down the table, mysqldump then starts a new transaction and creates a snapshot, as shown in the figure below:

The corresponding source code for this part in the main function is:

If (opt_single_transaction & & start_transaction (mysql)) goto err

As you can see, this step is performed only when the-- single-transaction option is specified. In fact, this step is the basis for mysqldump to achieve consistent hot backup. Let's take a look at the source code of the start_transaction function:

Static int start_transaction (MYSQL * mysql_con) {/ / omit some non-critical codes and comments return (mysql_query_with_error_report (mysql_con, 0, "SET SESSION TRANSACTION ISOLATION"LEVEL REPEATABLE READ") | | mysql_query_with_error_report (mysql_con, 0) "START TRANSACTION"/ *! 40100 WITH CONSISTENT SNAPSHOT * /") }

You can see that the core action is the two query passed to server for execution, first the SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ ensures that the isolation level of the current session is "repeatable", and then uses START TRANSACTION / *! 40100 WITH CONSISTENT SNAPSHOT * / to start a new transaction, generate a new transaction id, and create a snapshot based on which the data used in the dump process is based. In this way, all data updates made by transactions after the transaction are filtered and the data consistency of the backup is guaranteed.

However, such a hot standby method relies on a storage engine like InnoDB that supports transactions. On the contrary, the data consistency of a storage engine that does not support transactions such as MyISAM cannot be guaranteed during the backup process.

4. Get the current binlog location

Mysqldump then executes a query of SHOW MASTER STATUS to get the location information of the current binlog:

Looking at the source code of the corresponding part of the main function, you can see that the current binlog location will be obtained and recorded only when the-- master-data option is specified:

If (opt_master_data & & do_show_master_status (mysql)) goto err

Looking at the implementation of the do_show_master_status function, you can see that the core action is to pass in a query to execute a SHOW MASTER STATUS to the server, and finally write the resulting binlog location information into the dump result.

Static int do_show_master_status (MYSQL * mysql_con) {MYSQL_ROW row; MYSQL_RES * master; const char * comment_prefix = (opt_master_data = = MYSQL_OPT_MASTER_DATA_COMMENTED_SQL)? "-": "; if (mysql_query_with_error_report (mysql_con, & master," SHOW MASTER STATUS ")) {return 1 } else {row = mysql_fetch_row (master); if (row & & row [0] & & row [1]) {print_comment (md_result_file, 0, "\ nMuk-Position to start replication or point-in-time"recovery from\ nMustang -\ n\ n") / / write dump result fprintf (md_result_file, "% sCHANGE MASTER TO MASTER_LOG_FILE='%s', MASTER_LOG_POS=%s;\ n", comment_prefix, row [0], row [1]); check_io (md_result_file);} /...} return 0;}

5. Unlock all tables

Before officially starting the dump operation, mysqldump unlocks all tables that may have been locked in the previous operation:

Check the corresponding code in the main function:

If (opt_single_transaction & & do_unlock_tables (mysql)) / * unlock but no commit! * / goto err

As you can see, all previously locked tables will be unlocked only when the-- single-transaction option is specified. Combined with previous thinking, it can be inferred that the backup under-single-transaction can ensure data consistency through the transactional nature, so it is no longer necessary to retain the locks on all tables, so unlock is performed here to avoid blocking the progress of other transactions.

6. Dump the specified library and table

After the previous preparation operation is completed, mysqldump begins to formally perform the dump operation of the selected library and table:

The actual dump for the specified database is performed by the dump_databases function (or by the dump_all_databases function when all libraries are required to be dump when specified-- all-databases).

View the implementation of the dump_databases function:

Static int dump_databases (char * * db_names) {int result = 0; char * * db; DBUG_TRACE; for (db = db_names; * db; db++) {if (is_infoschema_db (* db)) die (EX_USAGE, "Dumping\'% s\'DB content is not supported", * db); if (dump_all_tables_in_db (* db) result = 1 } if (! result & & seen_views) {for (db = db_names; * db; db++) {if (dump_all_views_in_db (* db)) result = 1;}} return result;} / * dump_databases * /

The logic is clear, first dump all the tables in each specified database, and then if there is a view, the corresponding view will also be dump. Our investigation focuses on the dump of the table.

In the actual dump, the operation logic of a table is also relatively clear, that is, first get the structure information of the table, get the creation statement of the table, and then get the actual data of each row in the table and generate the corresponding insert statement.

However, a noteworthy point in the previous general log is the emergence of SAVEPOINT, which is not found in the mysqldump of MySQL 5.5. if you look at the implementation of the dump_all_tables_in_db function, you can find the corresponding code for setting savepoint:

/ / create savepoint if (opt_single_transaction & & mysql_get_server_version (mysql) > = 50500) {verbose_msg ("--Setting savepoint...\ n"); if (mysql_query_with_error_report (mysql, 0, "SAVEPOINT sp")) return 1;} while ((table = getTableName (0)) {char * end = my_stpcpy (afterdot, table) If (include_table (hash_key, end-hash_key)) {dump_table (table, database); / / dump / / omit part of the code for a pair of tables. / / ROLLBACK operation / * * ROLLBACK TO SAVEPOINT in-- single-transaction mode to release metadata lock on table which was already dumped. This allows to avoid blocking concurrent DDL on this table without sacrificing correctness, as we won't access table second time and dumps created by-- single-transaction mode have validity point at the start of transaction anyway. Note that this doesn't make-- single-transaction mode with concurrent DDL safe in general case. It just improves situation for people for whom it might be working. * / if (opt_single_transaction & & mysql_get_server_version (mysql) > = 50500) {verbose_msg ("- Rolling back to savepoint sp...\ n"); if (mysql_query_with_error_report (mysql, 0, "ROLLBACK TO SAVEPOINT sp")) maybe_exit (EX_MYSQLERR);}

You can see that the savepoint is created before the dump table, and then traverses every table in the library, and every time a table is dump, a ROLLBACK TO SAVEPOINT sp operation is performed. Why? In fact, the comments in the above code have been explained clearly:

To put it simply, when we dump a table and then we no longer need to use this table, then the DDL operations of other transactions will not affect the correctness of the data obtained by our dump. The significance of adding savepoint is that if we want dump table A to record the state that table A has not been locked by MDL before dump table A, when we start dump table A, due to a series of select operations A MDL lock is added to table A to prevent the DDL operation of other transactions from changing the table structure and causing errors in the read action. Finally, when the dump of Table An is completed, Table A will no longer be accessed, and the MDL locks that are not released are meaningless, but will block the DDL operations of other parallel transactions to Table A.

In this regard, the solution of MySQL is to record a savepoint through SAVEPOINT sp before accessing Table A, and then return to the current state through ROLLBACK TO SAVEPOINT sp after dump finishes Table A, which can release the MDL lock added to Table An and release the DDL operations of other transactions on the table.

Summary

The above is the introduction of the consistent backup principle of mysqldump based on MySQL 8.0. compared with MySQL 5.5.Today, there are some improvements in the implementation of mysqldump in MySQL 8.0. in addition to the savepoint mechanism mentioned above is a significant difference, there are other things such as the support for GTID and the dump operation of column statistics are not mentioned in this article, but generally speaking, the implementation principle of mysqldump in consistent backup has not changed much.

Expanding Reading-- the realization of Percona

From the emergence to the popularity of MySQL, there have been many other excellent distributions, and the implementation of consistent backup in MySQL is not perfect, so it is meaningful to consider the implementation of other distributions in this regard.

Backup Lock

As I mentioned earlier, consistent backups implemented by the single-transaction option in mysqldump do not require locking on tables, but this feature is based on a transactional storage engine, so only InnoDB tables or tables of other transactional storage engine types are guaranteed to filter out updates from other parallel transactions when backing up. However, for tables that use MyISAM, a storage engine that does not support transactions, single-transaction cannot guarantee the consistency of its data, that is, if updates from other parallel transactions occur during the backup, it is likely to be written into the backup.

In that case, what if you want to back up the MyISAM table and want to ensure its consistency? One way is to pass in the-- lock-all-tables option when executing the mysqldump, which causes a FLUSH TABLES WITH READ LOCK statement to be executed before the dump operation and ensures that all tables are read locked throughout the dump. But there is no doubt that this is a kind of overkill, just to ensure the consistency of some of the tables of the non-transactional storage engine, all tables need to be locked, so that all writes to server in the business are blocked for a period of time (if the amount of data backed up is large, this can lead to a disaster).

I haven't found a good solution to this problem in MySQL 8.0, but Percona has come up with a solution: in the mysqldump of the Percona distribution, you can pass in a-- lock-for-backup option during execution, which causes mysqldump to execute a LOCK TABLES FOR BACKUP statement before dump, which is a query unique to Percona, which mainly does the following:

Block update operations on MyISAM, MEMORY, CSV, ARCHIVE tables

Block DDL operations on any table

Update operations on temporary tables and log tables are not blocked.

Obviously, with the above features, when two options-- lock-for-backup and-- single-transaction are passed in at the same time, mysqldump can ensure the data consistency of all tables and cause as little online business interference as possible.

This part of the logic can be found in the code of mysqldump in Percona Server 8.0, in the main function:

If (opt_lock_all_tables | | (opt_master_data & & (! has_consistent_binlog_pos | |! has_consistent_gtid_executed)) | (opt_single_transaction & & flush_logs) {if (do_flush_tables_read_lock (mysql)) goto err; ftwrl_done = true;} else if (opt_lock_for_backup & & do_lock_tables_for_backup (mysql)) goto err

Careful friends will find that this is a logical rewrite of the "close table and read lock operation" above, which adds an else if logic branch to replace the previous FLUSH TABLES; FLUSH TABLES WITH READ LOCK; operation. The main purpose is to be more compatible with the consistent backup of-- single-transaction, and to achieve as little blocking to the online business as possible.

Then looking at the implementation of the do_lock_tables_for_backup function, you can see that you simply pass a LOCK TABLES FOR BACKUP statement unique to Percona to server:

Static int do_lock_tables_for_backup (MYSQL * mysql_con) noexcept {return mysql_query_with_error_report (mysql_con, 0, "LOCK TABLES FOR BACKUP");}

Binlog Snapshot

In the implementation of MySQL 8.0, there is a common option that still leads to the execution of the "annoying" FLUSH TABLES WITH READ LOCK, that is, the-- master-data option.

As mentioned earlier, the-- master-data option requires that the binlog location at the beginning of the current backup be stored in the results after dump, and in order to satisfy the consistency of the obtained binlog location, you need to acquire read locks on all tables before executing SHOW MASTER STATUS to block all binlog submission events, so a FLUSH TABLES WITH READ LOCK is required. But is there a better way? Percona also offers its own solution.

In Percona Server, two new global status:Binlog_snapshot_file and Binlog_snapshot_pos are added to record the current binlog file and binlog location, and the values of the two status can be obtained through SHOW STATUS LIKE 'binlog_snapshot_%'. So what's the difference between using this method and SHOW MASTER STATUS?

The difference between the two status is that Binlog_snapshot_file and Binlog_snapshot_pos are transactional. As long as a new transaction and consistency snapshot is created through START TRANSACTION WITH CONSISTENT SNAPSHOT before the execution of the SHOW STATUS LIKE 'binlog_snapshot_%' statement, Binlog_snapshot_file and Binlog_snapshot_pos record the binlog file and location information at the beginning of the transaction, and the consistency of binlog information is guaranteed. The whole process of this process does not require the implementation of FLUSH TABLES WITH READ LOCK.

SHOW MASTER STATUS, by contrast, is not transactional, and each execution of the statement returns the latest binlog location information, which is why all tables need to be read locked before it is executed.

After reading the above, have you mastered the principle of mysqldump consistent hot backup? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.