How to solve the problem of master-slave delay in MySQL 07/01 Update SLTechnology News&Howtos

How to solve the problem of master-slave delay in MySQL

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "how to solve the problem of MySQL master-slave delay". In daily operation, I believe many people have doubts about how to solve the problem of MySQL master-slave delay. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the doubt of "how to solve the problem of MySQL master-slave delay". Next, please follow the editor to study!

The cause of master-slave delay

1. When a user is using the database, there is a large master-slave delay. Show slave status\ G, there is already a shortfall of more than 60 binlog.

2. Observation shows that it should be stuck on a big thing (Retrieved_Gtid_Set has been rising, but Executed_Gtid_Set is stuck at a point). Through the analysis of relay_log, we find this big thing: it is a thing that deletes Table A.

Relay_Log_File: relay-bin.000010Relay_Log_Pos: 95133771

Seeing here, it feels like another example of a master-slave delay caused by a table without a primary key in ROW mode. Look at the table structure to confirm that this table is not small, there are hundreds of fields, there are primary keys, and it is a partitioned table with many partitions. This is interesting! It's not that we've encountered many times the master-slave delay caused by DML without a primary key in ROW mode (PS: why is there a delay in this case? Instead, there is a primary key and a secondary index, so why is playback so slow? ).

Later, I learned that the user calls the Detelestatement in the stored procedure to clean up the archived data. after taking a look at the stored procedure, the problem can be simplified as follows: calling the delete statement in the stored procedure, taking the secondary index to delete the partitioned table with the primary key, and playback delay from the machine.

At this time, we need to disassemble the problem, control the variables, and check one by one:

1. Executing delete,SQL directly will appear in statement format, and there will be no master-slave delay.

2. Call procedure. When the delete statement is executed in procedure, it will change to ROW format and cause delay.

OK, with the above two tests, our questions can be focused on:

1. Why the same delete statement is executed directly and the binlog format of the record executed in procedure is not the same (binlog in ROW format causes the playback to slow down, and the global setting is in mixed mode, this SQL should be in statement format, why execution in procedure becomes ROW format, and how to make the SQL execution in procedure become statement records into binlog).

Delete from xxxxxwhere update_datetime

< DATE_ADD(B_DATE,INTERVAL -1 day)and DATE_FORMAT(update_datetime,'%i') not in ('00','05','10','15','20','25','30'); 通过show processlist，可以看到这条delete在procedure内部执行的时候，被MySQL自动加上了NAME_CONST函数，所以导致了以ROW模式记录binlog格式。那为什么在procedure中会被改写成这样的SQL呢？怎么样才能让这条SQL记录为statement的格式呢？看了MySQL官方在procedure里面的限制描述，MySQL会自动加上NAME_CONST主要是为了从机可以识别到B_DATE这个SP的Local vairable，不至于从机回放的时候报错。 2、为什么ROW模式的binlog在从库回放的时候，即使delete的这张表有主键也很慢。我们先看一下SQL线程回放是卡在哪里了？为什么会慢？通过pstack抓取堆栈，找到SQL_thread线程对应的thread 15，再结合perf信息，可以看到从机回放慢是卡在了bitmap_get_next_set()。看一下bitmap_get_next_set()的代码。 bitmap_get_next_set()都是一些位运算，速度按理来说应该很快。所以不应该是程序卡在了这个函数中，大概率是因为多次调用了这个函数。所以我们再往上层继续看代码。 get_next_used_partition(uint part_id) 直接调用了bitmap_get_next_set（），继续往上看。

There is a suspicious loop in the function try_semi_consistent_read (), where m_tot_parts get_next_used_partition is called. Take a look at defining that m_tot_parts is the total number of partitions in the partition table!

When you see here, the truth comes out.

The number of rows of SQL changes for this delele is about 300W rows, and the total number of partition tables is 7200. Then the number of calls to bitmap_get_next_set here has been magnified to 21.6 billion!

Compared to playback in statement format, the stack information of the slave computer will not enter the bitmap_get_next_set.

Solution

After analyzing for such a long time, how can we deal with such a problem?

Solution 1: finally, we force the binlog_format=statement of session in SP to play back the delete in the slave machine in statement mode, so as to avoid triggering the bug in MySQL.

Scenario 2: repair the kernel.

Solution 3: schedule in shell instead of using stored procedures.

At this point, the study on "how to solve the problem of master-slave delay in MySQL" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.