Analysis and treatment of Master-Slave non-synchronization in MySQL 07/09 Update SLTechnology News&Howtos

Analysis and treatment of Master-Slave non-synchronization in MySQL

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Previously, we deployed the Mysql master-slave replication environment (MySQL master-slave replication environment deployment [http://blog.itpub.net/31015730/viewspace-2153251/]) and summarized the principle and related knowledge of mysql master-slave replication (MySQL master-slave replication principle and necessary knowledge summary [http://blog.itpub.net/31015730/viewspace-2154408/]), but there are many problems in the process of mysql master-slave synchronization, resulting in abnormal data synchronization. There are two main headaches:

1. How to deal with master-slave data when they are out of sync?

2. How to solve the problem of master-slave synchronization delay?

The following contents sort out several possible problems in master-slave synchronization and their solutions:

(1) slave is too slow to synchronize with master, that is, the master-slave synchronization delay of MySQL database

MySQL database slave server latency is very common, MySQL replication allows slave SELECT operations, but in the actual production environment, because of slave delay, it is difficult to transfer read operations to slave computers. This leads to the following hidden rules: "the read operations with low real-time requirements can be put on the slave server, and the read operations with high real-time requirements can be put on the master server", and "the slave computer can only do the statistical query of the previous day."

Slave lag means that slave cannot quickly execute all events from master, thus avoiding the delay in updating slave data.

In mysql's master-slave architecture, master only does write, update and delete operations, while slave does select operations. There are many reasons for the lag of slave.

The principle of slave synchronization delay

The master-slave replication of MySQL is a single-threaded operation, and the main library writes all the logs generated by DDL and DML into binlog. Because binlog is written sequentially, it is very efficient.

The IO Thread thread of Slave reads the log from the bin log in the main library.

The SQL Thread thread of Slave replays the DDL and DML operation events of the main library in slave. The IO operations of DML and DDL are random, not sequential, and are much more expensive.

Because SQL Thread is also single-threaded, if other queries on slave cause lock contention, or if a DML statement (large transaction, large query) gets stuck for a few minutes, then all subsequent DML will wait for the DML to finish execution before continuing, resulting in a delay. Some people may wonder: the same DDL on the main library will also be executed for a few minutes, why is the slave delayed? The reason is that master can execute concurrently, while Slave_SQL_ running threads cannot.

Possible causes of slave synchronization delay

1--slave 's I / O threads delay reading event information in the log; the most common reason is that slave executes all transactions in a single thread, while master has many threads that can execute transactions in parallel.

2Mel-long queries that lead to inefficient connections, Imax O restrictions on disk reads, lock contention, and synchronous start of innoDB threads, etc.

3--Master load

4mura-Network delay

5Mel-Machine configuration (cpu, memory, hard disk)

(how does the master-slave synchronization delay occur? In short, when the TPS concurrency of the master library is high, and the number of DDL generated exceeds the limit that a sql thread of slave can handle, master-slave synchronization will cause a delay, or when a large query statement in the slave produces a lock wait, it will also produce a delay.

How to view synchronization delays

1Mel-you can compare the log locations on master and slave

2Mel-check the value of Seconds_Behind_Master through "show slave status\ G". This value represents the time of master-slave synchronization delay. The higher the value, the more serious the delay. A value of 0 is normal. A positive value indicates that there has been a delay. The larger the number, the more the slave database lags behind the master database.

3mure-use percona-toolkit 's pt-hearbeat tool to view it.

An operation scheme to reduce synchronization delay

1Murray-reduce lock competition

If the query results in a large number of table locks, you need to consider refactoring the query statement to avoid too many locks as much as possible.

2mura-load balancing

How many slave to build and using lvs or nginx for query load balancing can reduce the number and time of query execution per slave, thus spending more time on master-slave synchronization.

Higher 3--salve machine configuration

4--slave adjust parameters

In order to ensure high data security, configure settings such as sync_binlog=1,innodb_flush_log_at_trx_commit=1. Slave can turn off binlog,innodb_flush_log_at_trx_commit or set to 0 to improve the efficiency of sql execution (these two parameters are very useful)

5mi-parallel replication

That is to change single-threaded replication to multi-threaded replication.

There are two threads related to replication in the slave library: io_thread is responsible for fetching binlog from the master library and writing to relaylog, and sql_thread is responsible for reading relaylog and executing.

The idea of multithreading is to turn the sql_thread into a distribution thread, which is then executed by a set of worker_thread.

Almost all parallel replication is the same idea, and the difference is the distribution strategy of sql_thread.

MySQL5.7 's real parallel replication enhanced multi-threaded slave (MTS) solves the delay problem of master-slave synchronous replication very well.

(2) Slave_IO_Running: NO appears in slave synchronization status

Error report: Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Could not find first log file name in binary log index file'

Reason 1: cleaning the data causes the master and slave libraries to be out of sync (provided that the binlog log of the master library has not been violently deleted or erroneously deleted, that is, make sure that the latest binlog file being used exists on the master master library machine).

Solution:

1) go to slave and execute "slave stop;" to stop synchronization from the library.

2) execute "flush logs;" in master to clear the log

3) then execute in master: "show master status;" to check the status of the main library, mainly log files and position

4) then go back to slave and execute: "CHANGE MASTER TO. Execute synchronization instructions

Reason 2: this error occurred when the io process of the slave library pulled the log from the master library and found that the first file in the mysql_bin.index file of the master library did not exist. This kind of error may be due to the fact that your slave has been stopped for a long time for some reason. When you restart slave replication, you cannot find the corresponding binlog on the main library. Or because the binlog on the main library of some settings has been deleted, the corresponding binglog file cannot be obtained from the library.

Solution:

1) to avoid data loss, the slave synchronization operation needs to be re-performed.

2) pay attention to the cleaning policy of the main database binlog, and choose the deletion method based on time expiration or space utilization.

3) remember that it is best not to delete the binlog file using the "rm-rf" command, so that the binlog entries of the mysql_bin.index record will not be modified synchronously. When deleting the binlog, make sure that the master library retains the binlog file corresponding to the Relay_Master_Log_File of the slave library "show slave status\ G". Do not delete the latest binlog file you are using at any time; it is best not to delete the bin-log file, but to back it up.

In the case of reason 2, it is obvious that the solution of reason 1 cannot be solved! The solution at this time is:

Execute on the slave library:

Mysql > stop slave

Mysql > reset slave

Mysql > start slave

Mysql > show slave status\ G

(3) Slave_IO_Running: Connecting appears in slave synchronization status

The common reasons for this error are:

1mura-the network is not working

2Mel-permission problem (the user name and password of the connection master are not consistent with the master authorization)

3Mel-the log file and pos nodes used in the connection are not consistent with the result of "show master status"

(4) Slave_SQL_Running: No appears in the slave synchronization status, that is, the slave is out of sync!

Solution:

The first method: continue to synchronize after ignoring the error.

This method is suitable for situations where there is little difference between master and slave database data, or where the data can not be completely unified, and where the data requirements are not strict (the following are all operations on slave machines)

Mysql > stop slave

Mysql > set global sql_slave_skip_counter = 1; / / means to skip a step error and the following number is variable, or add slave-skip-errors = all to my.cnf (which has been added in the configuration above)

Mysql > start slave

Mysql > show slave status\ G / / View

The second method: re-master and slave, complete synchronization

This method is suitable for situations where there is a large difference between master and slave database data, or when the data is required to be completely unified.

Operation on 1--master main library

Mysql > flush tables with read lock; / / locks the table to prevent data writing. Note that it is locked as read-only and the statement is case-insensitive.

# mysqldump-- lock-all-tables-- all-databases-- flush-logs-- master-data=2 > / root/allsql.sql / / full backup of the main database (if the database synchronization is specified, the backup must be specified). Make sure the database backup is carried out regularly to ensure that the data is foolproof.

Mysql > show master status; / / View master status, pay attention to log file and pos nodes, slave synchronization will be used

# scp mysql.bak.sql root@192.168.1.102:/tmp/ transfer backup files to slave slave machine for data recovery

2--slave slave library operation

Mysql > stop slave

Mysql > source / tmp/mysql.bak.sql

Mysql > change master to master_host = '192.168.1.101, master_user =' slave', master_port=3306.

Mysql > start slave

Mysql > show slave status\ G

Slave_IO_Running: Yes

Slave_SQL_Running: Yes

There are two key steps in this method:

Lock the table on the ① master server to make a full backup and scroll the log

② performs half-recovery from the server.

(5) slave Relay Log relay-log is damaged?

What is a relay log?

Relay-log is stored on the slave server, and the slave server copies the binary log file of the master server to its own host and puts it in the relay log, and then calls the SQL thread to execute according to the binary log file in the copy relay log file so that the data synchronization can be achieved.

How to avoid relay log corruption:

After mysql version 5.6, it can be avoided by turning on relay_log_recover=1 in the my.cnf file.

(6) slave connection timed out and reconnected frequently

If there is more than one slave, and there is no server_id set or two server_id with the same slave setting, there may be an ID conflict on the server. In this case, one of the slave may time out frequently or reconnect the sequence after it is lost.

So make sure that each slave and master are set to a different server_id in the my.cnf.

(7) out of sync caused by the use of different storage engines between the master and slave libraries

(8) prompt table does not exist when synchronizing from the library

Error: Last_Error: Error executing row event: 'Table' test.t1' doesn't exist'

Solution: rebuild the table from the library.

(9) too small max_allowed_packet setting causes slave to report an error

The default value of max_allowed_packet is 16m, and the max_allowed_ packet values of the master and slave libraries do not match those on the standby database.

In this case, the main library may record a package that the standby library considers to be too large. When the standby library obtains the binary log event, it may encounter a variety of problems, such as unlimited error and retry, relay log corruption and so on.

Specific performance:

The Slave_IO_Thread from the library is dead. After viewing it, the following error message appears:

Got a packet bigger than 'max_allowed_packet' bytes

It is obvious that the setting of max_allowed_packet is too small, and then check the settings on the master / slave library. The setting of the master library is larger than that of the slave library, because max_allowed_packet is a dynamic parameter. It is normal to restart the I / O thread separately after adjusting the max_allowed_packet on the slave library to be the same as the master database.

Principle: binlog events are recorded in RBR format, and the current event length is larger than the max_allowed_packet from the library, resulting in unable Slave IO can not read master binlog event normally.

(10) failure when deleting a record on master

After deleting a record on master, an error was reported on slave because the record could not be found.

Solution:

Since this statement has been deleted on the main library, it can be skipped.

In this case, it means that there may be data inconsistencies in master-slave synchronization, so you need to use pt-table-checksum for database consistency comparison.

(11) A record is updated in master, but slave cannot find it.

When the master-slave data is not available, master has a record, but there is no such record on salve. If the record is updated on master, an error may be reported in slave.

Solution:

1Mel-look up the binary log on the master database according to the location where the slave database exception occurred.

2Mel-find the whole updated record according to the binary log information of the main database.

3Mel-the recording information found on the master database is executed on the slave database, and the insert operation is performed.

4Murray-skip this statement and synchronize the slave.

5Mel-use pt-table-checksum to check whether the data of master and slave database tables are consistent.

Article link: http://www.cnblogs.com/kevingrace/p/6261111.html

Author: SEian.G (hard practice changes in 72, but it is difficult to laugh at 81)

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.