Analysis of crash-safe replication and matters needing attention of master and slave 04/21 Update SLTechnology News&Howtos

Analysis of crash-safe replication and matters needing attention of master and slave

2025-04-21 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I encountered a failure earlier, so crash-safe replication was introduced, which is described in detail below.

Fault description:

The slave library (slave side) is down. After rebooting, mysql synchronously discovers that there is a data primary key conflict.

Fault analysis:

After the downtime of the library, the mysql service from the library was exited abnormally, and some of the data may not be refreshed to disk, but in this case, some data has been flushed to disk. After restarting, the slave library pulled this part of the data from the master database again and applied it again, resulting in data conflict. So the next step is to analyze some parameters related to master and slave.

First of all, talk about the synchronization between master and slave.

The so-called master-slave synchronization is the process that the master library takes the binlog of the master library from the library after generating the binlog, and then applies the data to the slave library.

Synchronization is divided into semi-synchronous and asynchronous synchronization. This case is asynchronous, not to mention semi-synchronous. Asynchronous synchronization refers to the process of getting the binlog of the master library from the library to the slave machine after the binlog is generated by the master library.

Then here we will focus on a few nouns:

IO_thread

SQL_thread

Master-info.log

Relay-info.log

IO_tread and SQL_thread, students who often build master and slave, know that these are the two key threads used for synchronization from above the library (can be seen by show processlist)

Master-info.log and relay-info.log record some information such as location and status of synchronization from the library.

Let's take a look at the relationship between these nouns:

Start from the main library to generate binlog, fetch the binlog from the library to the local machine. After the transfer is completed, IO_thread will record the file name and transaction location (postion and gtid) of the pulled binlog to master-info.log. At the same time, SQL_thread will apply the pulled binlog locally. After the application is completed, the completed binlog file name and transaction location (postion and gtid) will be written to relay-info.log.

This is the whole synchronization process. (note: the binlog-related information recorded in these two files is the information of the master library, and the binlog of the slave library will not record these two files.)

There are two points to note in this process:

1. In the default configuration, the master-info.log and relay-info.log file system level completes the write operation. In other words, when mysql commits the change information to OS's cache, it is up to OS to decide when to swipe the disk.

two。 If there is an outage from the library, the location information of master-info.log will be read after restart. If the relay_log_recovery parameter is enabled after 5.6, the location of relay-info.log will be read instead of master-info.log.

Then combined with this case, we know very clearly what the problem is.

After 1.IO_thread has pulled the data, the change information of master_info is in the OS cache, and the file has not been changed. At the same time, SQL_thread has finished applying the data, so it is down at this time. As a result, the master-info.log is not updated, but the database has already written the data. After reboot, IO_thread pulls the duplicate data based on the unupdated master-info.log.

two。 After relay_log_recovery is enabled, the relay-info.log file is read after restart from the library, similar to the above, it is not updated to relay-info.log, and then reapplied.

The 3.sync_relay_log and master_relay_log parameters are 1, in which case the last transaction is lost. A transaction conflict that leads to the last synchronization.

Solution:

To check whether the conflicting data is consistent, either delete the data from the library and reapply it from the library, or skip the error and continue to synchronize.

Summary:

Master-slave asynchronous synchronization is used in many scenarios, and master-slave non-synchronization often occurs. Fortunately, mysql has been constantly improving and modifying. To mysql5.7, the master-slave architecture has been quite mature. You have to believe that many seniors have stepped on the pit you stepped on many times, so don't worry about using it boldly. In order to reduce the number of pitfalls, I would like to summarize some parameter settings and some pits for using asynchronous synchronization.

Main library: 1.binlog_format=row can choose row format, do not consider the other two, there are too many specific pits.

2. Don't ask me why. Security is the only standard of the database.

3.log_bin_trust_function_creators = 1, turn this on, spare, in case the binlog line mode is changed by that jerk.

4.binlog_rows_query_log_events = 1 this is very useful, record sql,row format, to see binlog is not pretty. With this parameter, leverage.

Slave library: master-info-repository=table

Change relay-log-info-repository=table to table. SQL_thread will put the change relay-log-info into the same transaction of the application transaction, thus realizing the atomicity of transaction and log modification.

Relay_log_recovery = 1 reads the relay-info information, and deletes the unexecuted relay-log and pulls it again.

Sync_relay_log_info=1 this, if relay-info is table, yes or no. If it's file, add insurance.

Sync_master_info=1 can guarantee the consistency of master-info.

Crash-safe, whether it is a master library or a slave library, is very painful. The above is mainly about some parameters of crash-safe replication. In short, under the premise of ensuring safety, go to improve efficiency. If you think you can sacrifice safety to ensure efficiency, ignore it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.