What is the method for MySQL to restore backups 07/13 Update SLTechnology News&Howtos

What is the method for MySQL to restore backups

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is the method of MySQL restore backup". In daily operation, I believe many people have doubts about what is the method of MySQL restore backup. Xiaobian consulted all kinds of information and sorted out simple and easy operation methods. I hope to help you answer the doubts of "what is the method of MySQL restore backup"! Next, please follow the small series to learn together!

Facebook's MySQL database is spread across our global data centers, and we must be able to recover from failures anywhere at any time. After such a disaster, it is important not only to restore service as quickly and reliably as possible, but also to ensure that no data is lost throughout the process. To do this, we built a system that continuously tested our ability to recover databases from backups.

Our restore system consists of two important components:

Continuous Restore Tier (CRT)-Responsible for scheduling and monitoring all restore operations. This component locates databases containing new backups, creates restore jobs for them, monitors the restore process, and ensures that each backup can be restored successfully.

ORC Reduction Coordinator (ORC)-Consists of a work process (Peon) responsible for performing the reduction and a Load Balancer (Warchief). Warchief receives a new restore job from CRT and assigns it to Peon. Peon hosts the local MySQL instance responsible for performing the actual restore process.

Data CRT collects progress information for each restore job to help us understand the resource requirements of database restore operations, and ORC helps us verify the integrity of backups. This article will focus on the internals of ORC, especially the internal Peon State Machine, and the challenges we encounter and overcome in restoring a single database.

backup overview

Before building a continuous restore process, we first need to understand the nature of the various backup options available. There are currently three main types of backups, all of which are stored in HDFS:

Full logical backups are made every few days using mysqldump.

Differential backups are performed on all days when full backups are not performed. A full dump is created again during the backup process, but only the differences from the previous full backup are stored. We record through metadata which full backup each differential backup is based on.

Binlog backups are continuously streamed from the master database to HDFS.

Both full and differential backups pass the--single-transaction option to mysqldump so we can get a consistent snapshot of the database, either from the Slave or Master instance. Differential and full backups are collectively referred to below as dumps.

Since only one dump is performed per day, Binlog backups ensure that every transaction performed by the database since the backup is recorded. Subsequently, as long as the dump content is restored to restore the database to a certain point in time, and then the transaction is replayed through Binlog, the point-in-time recovery of the database can be successfully realized. All of our database servers use Global Transaction IDs (GTIDs), so we gain an extra layer of control when it comes to transaction replay from Binlog backups.

In addition to storing backups in HDFS, we also write them to offsite locations

ORC: ORC Reduction Coordinator

architecture

ORC consists of three components:

Warchief -Load Balancer. This is a Python program that exposes the Thrift interface, which receives new restore requests and dispatches them to available Peons.

ORC DB -The central MySQL database responsible for maintaining information such as the status of jobs assigned to each Peon, the current status of each job, and the health status of Peons. Warchief uses the information stored in this database to decide which Peon to assign a job to and which Peon to use during failover.

Peon -The work process responsible for the restore operation. Peon is also written in Python, exposing the Thrift interface through which various state information about Peon is received. Each Peon periodically synchronizes with the ORC DB, queries for new assignments assigned to it, and reports on its health status. The server running Peon also runs a local MySQL instance to which backups are restored.

Internal Principle: Peon

Peon contains all the logic for taking a backup from HDFS, loading it into your own local MySQL instance, pushing the instance to a certain point via Binlog replay, and so on. Each restore job processed by Peon goes through the following five steps:

SELECT-Decide which backup needs to be restored from (e.g. full or incremental, HDFS or fallout, etc.).

DOWNLOAD-Downloads the selected file to disk. If you want to restore a full backup, just download a file. For differential backups, full and differential backups need to be downloaded first, then differential backups are applied on top of the full backups, and the final results are stored to disk. Whatever the backup type, we now have the contents of a mysqldump output on disk.

LOAD-loads the downloaded backup into Peon's local MySQL instance. Each table in the backup file is parsed according to the statements associated with it, restoring each table in parallel, a process similar to Percona's Mydumper.

VERIFY-Performs a health check on data loaded into MySQL.

REPLAY-If necessary, download a binary log backup and replay the transaction on top of the restored backup. We use the mysqlbinlog program to filter out Binlog events from other collocated databases and empty transactions, and then replay the desired transactions on the same MySQL instance.

Each step has its own failure status, so if a job fails at the DOWNLOAD step, it displays the DOWNLOAD_FAILED status and does not proceed to the LOAD step.

Binlog Selection Logic

Probably the most challenging part of the restore process is determining which binlogs to download and replay. Full and differential backups can be taken from the master or slave database, however we only create Binlog backups from the master database. This means that simple timestamp comparisons cannot be used to determine which binlogs to replay.

We deployed GTID on all servers in 2014, so each transaction gets a globally unique identifier. In addition, each server running MySQL maintains a gtid_executed (GTID set) variable that serves as a counter to the number of transactions currently executed by the instance.

With GTID, transactions replayed from the master database to the slave database can maintain their own GTID, which also means that we can clearly know whether each transaction is contained in a GTID set. Superset/subset comparisons can also be made for the gtid_executed value of the server, since that value is essentially a strictly incrementing counter and mathematical function.

Together, these techniques allow you to record the gtid_executed value of the server at the time the dump is created, as well as the GTID set contained in each Binlog file, thereby performing consistent comparisons and determining which transactions in the Binlog need to be replayed. In addition, once the first transaction to be replayed is identified, all subsequent transactions to be replayed can be identified without re-comparing other GTIDs. We also use mysqlbinlog's--stop-datetime option to determine where the Binlog stream stops.

At this point, about "MySQL restore backup method is what" learning is over, I hope to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.