What is the cause of MySQL master-slave replication delay? 07/06 Update SLTechnology News&Howtos

What is the cause of MySQL master-slave replication delay?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the cause of MySQL master-slave replication delay". Interested friends may wish to take a look. The method introduced in this paper is simple, fast and practical. Next, let the editor take you to learn "what is the cause of MySQL master-slave replication delay?"

In asynchronous or semi-synchronous replication structures, latency from the library is quite normal.

Although delays are normal, it is generally up to the business to assess whether there is a need for attention.

For example, if there is a high consistency of reading business from the library, and the delay is less than a certain value, then it needs to be concerned.

First, give a brief overview of the replication logic:

1. The main database records the changes to the database instance to binlog.

2. The master library will have binlog dump threads to monitor binlog changes in real time and push these new events to the slave library (Master has sent all binlog to slave; waiting for more updates).

3. Receive these events from the IO Thread of the library and record them in relaylog.

4. Read the events of relaylog from the SQL Thread of the library, and apply (or replay) these events to the slave library instance.

The above is the default asynchronous replication logic, while semi-synchronous replication is slightly different, so I won't repeat it here.

In addition, judging that there is a delay from the library is a very simple thing:

Through SHOW SLAVE STATUS on the slave library

Just check the Seconds_Behind_ Master value.

The cause of delay and the way to deal with it

0 frequent DML requests in the main library (tps is larger)

That is, the main library has more write requests, a large number of insert, delete, update concurrent operations, resulting in a large number of binlog in a short time.

[cause analysis]

The master library writes data concurrently, while the slave library SQL Thread is a single-threaded application log, which can easily cause relaylog accumulation and delay.

[solution]

Do sharding and break up write requests through scale out. Or consider upgrading to MySQL 5.7 replication to turn on parallel replication based on logical clocks.

0 main library executes large transactions

For example, import a large amount of data, such as INSERT INTO $tb1 SELECT * FROM $tb2, LOAD DATA INFILE, etc.

Such as UPDATE, DELETE, full table, etc.

Exec_Master_Log_Pos has not changed, Slave_SQL_Running_State is Reading event from the relay log

By analyzing the main library binlog, you can also know from the transactions currently executed by the main library.

[cause analysis]

If the master database takes 200s to update a large table, if the configuration of the master and slave libraries is similar, the slave database needs to update the large table in almost the same time, and the subsequent events cannot be updated since the database delay begins to accumulate.

[solution]

Break up big affairs and submit them in time.

0 main library executes DDL statements on large tables

The phenomenon is similar to that of the main database performing large transactions.

Check that the Exec_Master_Log_Pos has not been moved, or it is possible that the DDL is being executed.

By analyzing the main library binlog, you can also know from the transactions currently executed by the main library.

[cause analysis]

1. DDL does not start and is blocked. SHOW SLAVE STATUS detects that Slave_SQL_Running_State is waiting for table metadata lock and Exec_Master_Log_Pos remains unchanged.

2. DDL is being executed, and SQL Thread single-threaded applications result in increased latency. Slave_SQL_Running_State is altering table,Exec_Master_Log_Pos unchanged

[solution]

Find the query blocking the DDL statement through processlist or information_schema.innodb_trx, kill the query, and have DDL execute normally in the slave library.

The delay caused by DDL itself is inevitable, so it is recommended to consider:

Implementation of ① business during the trough period

After ② set sql_log_bin=0, execute DDL manually on the master and slave libraries respectively (this operation will cause data inconsistency for some DDL operations, be sure to test it strictly)

0 the configuration of master database is inconsistent with that of slave database:

[cause analysis]

Hardware: the master database instance server uses SSD, while the slave database instance server uses ordinary SAS disk, inconsistent cpu frequency, etc.

Configuration: such as inconsistent raid card writing policy, inconsistent OS kernel parameter settings, inconsistent MySQL disk setting, etc.

[solution]

Try to unify the configuration of DB machines (including hardware and option parameters)

Even for some OLAP services, the hardware configuration of the slave instance is higher than that of the master database.

0 table lacks primary key or unique index

In the case of binlog_format=row, if the table lacks a primary key or unique index, it may cause a sharp increase in slave latency in the case of UPDATE or DELETE.

At this point, Slave_SQL_Running_State is Reading event from the relay log.

And the SHOW OPEN TABLES WHERE in_use=1 table exists all the time.

Exec_Master_Log_Pos remains the same.

The cpu of the mysqld process is nearly 100% (when there is no read business), there is little pressure on the io.

[cause analysis]

In an extreme case, assume that the main database updates 20w rows of data in a 500w table, and the update statement requires a full table scan

In row format, 20w update operations are recorded to binlog, and SQL Thread playback will be very slow, and each update may require a full table scan.

[solution]

Check the table structure to ensure that each table has an explicit self-increasing primary key and establish the appropriate index.

0 from the excessive pressure on the library itself

[cause analysis]

Execute a large number of select requests from the library, or most of the select requests are routed to the instance of the slave library, or even a large number of OLAP services, or the slave library is being backed up.

At this time, the cpu load is too high, the io utilization is too high, and the SQL Thread application is too slow.

[solution]

Build more slave libraries, break up read requests, and reduce the pressure on existing slave instances.

0 MyISAM storage engine

Now the slave library Slave_SQL_Running_State is Waiting for table level lock.

[cause analysis]

MyISAM only supports table-level locks, and read and write operations are not concurrent.

The master library can execute insert concurrently in select when the corresponding value of @ @ concurrent_insert is set, but not when SQL Thread is replayed from the slave library. If you are interested, you can take a look at the implementation of myisam.

[solution]

Of course, I chose to forgive it. Since I chose MyISAM, I should be prepared psychologically. (there are other scenarios, and MyISAM is not recommended for use in replication structures.)

Change it to InnoDB.

At this point, I believe you have a deeper understanding of "what is the cause of MySQL master-slave replication delay". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.