What does TiDB use to ensure the consistency of backups 07/19 Update SLTechnology News&Howtos

What does TiDB use to ensure the consistency of backups

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

What does TiDB use to ensure the consistency of backups? in response to this question, this article introduces the corresponding analysis and answers in detail, hoping to help more partners who want to solve this problem to find a more simple and easy way.

Background

As a MySQL DBA, you should understand that MySQL backups, whether logical or physical, use FLUSH TABLES WITH READ LOCK (hereinafter referred to as FTWRL) locks to ensure the consistency of database backups.

Describe the impact of FTWRL locks on consistency

Take MySQL logical backup MySQLDump as an example.

MySQLDump, in order to ensure backup consistency, you need to add 2 parameters

-single-transaction-master-data=2.

When-single-transaction is enabled, the backup process for MySQLDump is probably to do the following in MySQL.

1. Refresh the table flush tables to prevent DDL operations.

2. Execute the FTWRL lock, when the whole database is locked, so that the database is in a consistent state.

3. Set the isolation level of the current session transaction to RR.

4. Record the location of the current MySQLbinlog or GTID information.

5. Unlock. # the execution speed from locking to unlocking will be very fast, as long as there is no lock conflict, and if there is a lock conflict, it will go to a state of lock waiting.

Physical backup xtrabackup, physical backup takes a relatively long time to execute FTWRL locking. Let's take a look at the process of xtrabackup to FTWRL locking.

Execute the FTWRL lock.

Copy frm, MYD, MYI, etc copies.

Wait for the copy of the redo to complete.

Record the location of the current MySQLbinlog, or GTID information.

Unlock.

The purpose of xtrabackup locking is to ensure that if there are MyiSAM tables in the database, the backup consistency of MyiSAM tables can be ensured as much as possible.

# A classmate said before. It is wrong to conclude that physical backups plus FTWRL locks take less time to lock than logical backups. The locking time of the physical backup depends entirely on the size of the MyiSAM table and the MyiSAM table in the current database.

What does TiDB use to ensure database consistency?

Let's start with the logical backup mydumper officially recommended by TiDB. At first, I thought mydumper also used FTWRL locks to ensure backup consistency. As a result, when I was reading the document today, I found that this conclusion was wrong.

Officials have optimized and modified mydumper. Take a look at the official description first. The following is from the official TiDB documentation.

1. For TiDB, you can set the value of tidb_snapshot to specify the time point of the backup data, so as to ensure the consistency of the backup, instead of ensuring the consistency of the backup through FLUSH TABLES WITH READ LOCK.

2. Use TiDB's hidden column _ tidb_rowid to optimize the concurrent export performance of data in a single table.

Keep in mind that TiDB is backed up through tidb_snapshot, not guaranteed by FTWRL locks. What's wrong with this design? Can you ensure the consistency of data backup?

To answer this question, talk briefly about the architectural design of TiDB.

The storage node of TiDB is TiKV. The following is mainly for TiKV. First, think of TiKV as a very large Key-value memory.

(figure 1 is selected from the official TiDB documentation)

This piece has nothing to do with backup. Let's get a general idea of what TiKV stores.

The following is related to backup. The MVCC (multi-version controller) implementation of TiDB is in TiKV. MVCC,key and value are added to TiKV.

I think version is the TSO (the only globally incremented timestamp), which I discovered through the TiDB two-phase commit.

If not, the version information of version will be stored in PD, so the design will increase the pressure on PD and feel unrealistic.

In view of the above description, there is a small conclusion that TiKV will store historical key information.

Here is a question and answer to answer the above questions.

Q: what does TiDB use to ensure data consistency?

Answer: it is based on the MVCC in TiKV to guarantee and issue commands according to the current timestamp information.

Sql= "SET SESSION tidb_snapshot = '415599012634951683'".

This session will read the historical version of the data at this point in time.

The next step is to scan all the tables and the data inside.

Q: can the backup achieved through MVCC achieve consistency? (because there is no lock)

A: yes, you can take a look at my previous article on "analyzing the second-phase commit of TiKV". It is written that only a transaction can be successfully committed can it be written to the TiKV, and there will be a TiDB (global only incremental timestamp). That is, the key in TiKV is submitted successfully.

Then successful transactions committed during the backup process will not be swept.

Because the tso (global unique incremental timestamp) of the transaction committed during the backup is greater than the tso (global unique incremental timestamp) initiated by the current backup.

Q: what are the problems with using MVCC backup method?

A: I think the biggest problem is that the old key is GC (garbage cleanup) in the backup process. The best way to solve this problem is to set the GC (garbage cleanup) time a little longer.

UPDATE mysql.tidb SET VARIABLE_VALUE = '800h' WHERE VARIABLE_NAME =' tikv_gc_life_time'

It can be set to 800h (depending on the time), which should be modified after the backup, otherwise storage space will be wasted.

From the above description, you should know the details of TiDB's consistency handling of backups.

In TiDB4.0 's distributed backup recovery tool br, the processing is similar in this area. It is also realized by the way of MVCC.

Finally, take a look at TiDB4.0 's backup tool br in Amway. The speed of backup is fast and the consumption of resources is relatively low. The following case is for reference only if you are interested, I can do a detailed test and leave a message.

Machine description: three Tencent Cloud 4C8G SSD50G,Sysbench pressure tables with 10 million pieces of data each.

It takes about 5 minutes as a whole, and the relevant information will be recorded in brlog.

The start time is 16-4-4-4-4-7. 009. The end time is 16-4-9-9-4. 395.

In the same environment, I use mydumper to test, and mydumper runs on the node of tidb.

Mydumper is the number of 4 threads (default threads)

He pressed tidb into OOM in the process of backup.

# can be avoided by using the-r parameter to control the amount of data per concurrent processing.

Probably my machine configuration is low, and mydumper and tidb-server are the same machine.

This is the answer to the question about what TiDB uses to ensure the consistency of backups. I hope the above content can be of some help to you. If you still have a lot of doubts to be solved, you can follow the industry information channel to learn more about it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.