Synchronization of MySQL Cascade replication (1) 07/13 Update SLTechnology News&Howtos

Synchronization of MySQL Cascade replication (1)

2025-07-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

I encountered a somewhat strange problem today. There is a set of environments. There are some problems when copying from master to slave.

The general process design is as follows:

The three nodes are located in three different areas. Because there is a problem in the network between node 1 and node 3, node 2 is used to transfer. Therefore, delay is inevitable, but the delay cannot be too large. The final data is still subject to statistical analysis query through node 3. The amount of data in this environment is not large, but the data seems to change frequently. The colleague who developed in the morning reported that the node synchronization felt delayed greatly and wanted me to help find out what went wrong.

Looking at Node 1, Node 2 has no latency, the problem is the latency from Node 2 to Node 3.

View slave status in Node 3:

> show slave status\G

*************************** 1. row ***************************

Slave_IO_State: Waiting for master to send event

Master_Host:xxxx

Master_User: repl

Master_Port: 3307

Connect_Retry: 10

Master_Log_File: mysql-bin.000009

Read_Master_Log_Pos: 16186388

Relay_Log_File: relay-bin.000004

Relay_Log_Pos: 13599457

Relay_Master_Log_File: mysql-bin.000009

Slave_IO_Running: Yes

Slave_SQL_Running: No

Replicate_Do_DB:

Replicate_Ignore_DB:

...

Last_Errno: 1032

Last_Error: Could not execute Delete_rows event on table test_mbi.test_dist_online; Can't find record in 'test_dist_o

Skip_Counter: 0

Exec_Master_Log_Pos: 13599294

Relay_Log_Space: 16304336

Until_Condition: None

...

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 1032

Last_SQL_Error: Could not execute Delete_rows event on table test_mbi.test_dist_online; Can't find record in 'test_dist_o

Replicate_Ignore_Server_Ids:

Master_Server_Id: 23307

Master_UUID: 189a00c4-16a3-11e6-a678-06c76b65c01e

Master_Info_File: mysql.slave_master_info

SQL_Delay: 0

SQL_Remaining_Delay: NULL

Slave_SQL_Running_State:

Master_Retry_Count: 86400

1 row in set (0.00 sec)

Found error 1032 in log application, i.e. deleted data not found in slave repository. Generally speaking, this kind of problem seems to be small, so skip it and find that this is not an expedient measure, because after skipping this problem and then encountering the same problem, so repeatedly modifying skip itself is a matter of scratching the boot, and in fact the data is inconsistent.

Because of the urgent demand, tight time and large delay of data, it is found after simple evaluation that it is still necessary to rebuild the slave library.

Of course, this procedure is very routine. Let me also list briefly:

Because it is a multi-instance scene, the following command is used to export:

/opt/mysql/bin/mysqldump -S /data2/bmbidb/mysql.sock --single-transaction --master-data=2 -B test_ad test_mbi test_sys_mgr |gzip > test.sql.gz

Then, after working around various network levels, I finally copied this dump from node 2 to node 3 of the slave library environment.

Then stop slave at node 3 and start importing data:

gunzip

< test.sql.gz | /opt/mysql/bin/mysql --socket=/home/bmbidb/mysql.sock --port=3307 start slave 接着开始change master,当然这个时候对于MASTER_LOG_FILE，MASTER_LOG_POS可以通过dump来得到这些信息 gunzip < tes.sql.gz | head -50 会发现下面这么一段内容： -- CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=241903809; 这就是需要我们关注的地方，然后直接使用即可。 CHANGE MASTER TO MASTER_HOST='xxxx',MASTER_USER='repl',MASTER_PASSWORD='xxxx',MASTER_PORT=3307,MASTER_LOG_FILE='mysql-bin.000008', MASTER_LOG_POS=241903809,MASTER_CONNECT_RETRY=10; 这样从库的设置就完成了。然后在下午的晚些时间又碰到了类似的问题，这可让我很纠结了，不可能一出现这种情况我就重建从库吧。排除了很多潜在的原因，包括sync_binlog,表结构差异，节点中的数据库权限，表的存储引擎等。貌似还是没有找到要领。通过mysqlbinlog去解析relay日志，依旧是无功而返。 /opt/mysql/bin/mysqlbinlog -vv relaylog.05 --base64-output decode-rows >

relay05.tmp

So it's still a tricky question.

With the assistance of colleagues, a temporary plan was used to make the transition first. For this type of DML operation, if the data does not exist, you can choose to ignore it, that is, set slave_exec_mode to IDEMPORT, and the default position STRICT

> set global slave_exec_mode='IDEMPOTENT';

Query OK, 0 rows affected (0.00 sec)

> stop slave;set global sql_slave_skip_counter=1;start slave;

Query OK, 0 rows affected (0.00 sec)

After the revision is completed, such problems will come to an end temporarily, and the root cause needs to be found. In this case, some of the data were compared and no other data conflicts were found, but the solution also required a reasonable explanation. Let's talk about this in the next post, and there should be an answer.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.