MySQL: an example of a fault that produces a large number of small relay log 07/01 Update SLTechnology News&Howtos

MySQL: an example of a fault that produces a large number of small relay log

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Please forgive me for mistakes due to limited capacity, source code version 5.7.22

Welcome to my "in-depth understanding of MySQL Master and Slave principle 32", as follows:

If the picture cannot be displayed, please see the following link:

Https://www.jianshu.com/p/d636215d767f

I. case sources and phenomena

This case is an online problem encountered by a friend @ peaceful, who finally found the clue himself. The phenomena are as follows:

1. There are a large number of very small relay log as follows The stock is about 2600:...-rw-r- 1 mysql dba 12827 Oct 11 12:28 mysql-relay-bin.036615-rw-r- 1 mysql dba 4908 Oct 11 12:28 mysql-relay-bin.036616-rw-r- 1 mysql dba 1188 Oct 11 12:28 mysql-relay-bin.036617-rw-r- 1 mysql dba 5823 Oct 11 12:29 mysql-relay-bin.036618-rw -r-1 mysql dba 507 Oct 11 12:29 mysql-relay-bin.036619-rw-r- 1 mysql dba 1188 Oct 11 12:29 mysql-relay-bin.036620-rw-r- 1 mysql dba 3203 Oct 11 12:29 mysql-relay-bin.036621-rw-r- 1 mysql dba 37916 Oct 11 12:30 mysql-relay-bin.036622-rw-r- 1 mysql dba 507 Oct 11 12:30 mysql-relay-bin.036623-rw-r- 1 mysql dba 1188 Oct 11 12:31 mysql-relay-bin.036624-rw-r- 1 mysql dba 4909 Oct 11 12:31 mysql-relay-bin.036625-rw-r- 1 mysql dba 1188 Oct 11 12:31 mysql-relay-bin.036626-rw-r- 1 mysql dba 507 Oct 11 12:31 mysql-relay-bin.036627 -rw-r- 1 mysql dba 507 Oct 11 12:32 mysql-relay-bin.036628-rw-r- 1 mysql dba 1188 Oct 11 12:32 mysql-relay-bin.036629-rw-r- 1 mysql dba 454 Oct 11 12:32 mysql-relay-bin.036630-rw-r- 1 mysql dba 6223 Oct 11 12:32 mysql-relay-bin.index2, The error log of the main library contains the following error 2019-10-11T12:31:26.517309+08:00 61303425 [Note] While initializing dump thread for slave with UUID Found a zombie dump thread with the same UUID. Master is killing the zombie dump thread (61303421). 2019-10-11T12:31:26.517489+08:00 61303425 [Note] Start binlog_dump to master_thread_id (61303425) slave_server (19304313), pos (, 4) 2019-10-11T12:31:44.203747+08:00 61303449 [Note] While initializing dump thread for slave with UUID, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread (61303425). 2019-10-11T12:31:44.203896+08:00 61303449 [Note] Start binlog_dump to master_thread_id (61303449) slave_server (19304313), pos (4) II. Slave_net_timeout parameter analysis

In fact, I felt strange at first glance at this case, because very few people set the slave_net_timeout parameter, and we haven't set it, so we pay less attention to it. But @ peaceful itself found that the setting that might be problematic is the current slave slave_net_timeout parameter set to 10. I'm going to follow this clue, and let's first look at the function of the slave_net_timeout parameter.

Currently, it seems that the slave_net_timeout of the slave library has the following two functions:

1. Set the connection timeout for the IO thread when it is idle (without Event reception).

This parameter is 60 seconds after 5.7.7, and it used to be 3600 seconds. After the modification, you need to restart the master and slave to take effect.

2. If change master does not specify MASTER_HEARTBEAT_PERIOD, it will be set to slave_net_timeout/2

Generally speaking, the master and slave configuration does not specify this heartbeat cycle, so it is slave_net_timeout/2, which controls how often a heartbeat Event is sent to the IO thread of the slave database to maintain the connection if no Event is generated in the master database. However, once we have configured the master-slave (change master) value, it will not change with the change of the slave_net_timeout parameter. We can find the corresponding setting in the slave_master_ information table as follows:

Mysql > select Heartbeat from slave_master_info\ gateway * 1. Row * * Heartbeat: 301 row in set (0.01 sec)

If we want to change this value, we have to re-change master.

III. Summary of reasons

If the following three conditions are met, the failure in the case will occur:

The value of MASTER_HEARTBEAT_PERIOD in the master is greater than that in the slave library slave_net_timeout. The current pressure of the master library is very low. There is a certain delay in the master and slave before the slave_net_timeout setting time does not generate a new Event.

In this case, the IO thread is disconnected before the master database heartbeat Event is sent to the IO thread of the slave library. After being disconnected, the IO thread will reconnect, and each reconnection will generate a new relay log, but these relay log cannot be cleaned up due to latency issues, as in the case.

The following is the description of this section in the official documentation:

If you are logging master connection information to tables, MASTER_HEARTBEAT_PERIOD can be seenas the value of the Heartbeat column of the mysql.slave_master_info table.Setting interval to 0 disables heartbeats altogether. The default value for interval is equal to thevalue of slave_net_timeout divided by 2.Setting @ @ global.slave_net_timeout to a value less than that of the current heartbeat intervalresults in a warning being issued. The effect of issuing RESET SLAVE on the heartbeat interval is toreset it to the default value. Fourth, case simulation

With a theoretical basis, the simulation is good, but the delay is not very good to simulate, so I turned off the SQL thread from the library to simulate the backlog.

Configure the master / slave in advance to view the current heartbeat cycle and slave_net_timeout parameters as follows:

Stop slave sql_thread

You can see that there is actually a warning here.

3. Restart the IO thread

Only in this way can the slave_net_timeout parameter take effect.

Mysql > stop slave; Query OK, 0 rows affected (0.01 sec) mysql > start slave io_thread;Query OK, 0 rows affected (0.01 sec) 4, observation phenomenon

A relay log file is generated about every 10 seconds as follows:

-rw-r- 1 mysql mysql 2019-09-27 23 rw-r- 48 rw-r- 32.655001361 + 0800 relay.000142-rw-r- 1 mysql mysql 2019-09-27 23 48 relay.000142-rw-r- 42.943001355 + 0800 relay.000143-rw-r- 1 mysql mysql 53.293001363 + 0800 relay.000144-rw-r- 1 mysql mysql 500 2019-09-27 23:49 : 03.502000598 + 0800 relay.000145-rw-r- 1 mysql mysql 500 2019-09-27 23 relay.000146-rw-r- 49 mysql mysql 13.799001357 + 0800 relay.000146-rw-r- 1 mysql mysql 500 2019-09-27 23 23 relay.000146-rw-r- 4924.055001354 + 0800 relay.000147-rw-r- 1 mysql mysql 500 2019-09-27 23 2349 mysql mysql 34.280001827 + 0800 relay.000148-rw-r- 1 mysql mysql 2019-09-27 23 relay.000150-rw-r- 44.49600 1365 + 0800 relay.000149-rw-r- 1 mysql mysql 2019-09-27 23 2949 relay.000150-rw-r- 1 mysql mysql 500 2019-09-27 23 23 relay.000150-rw-r- 54.789001353 + 0800 relay.000151-rw-r- 1 mysql mysql 55.485001371 + 0800 relay.000151-rw-r- 1 mysql mysql 500 2019-09-27 23 relay.000151-rw-r- 55.910001430 + 0800 relay.000152

About every 10 seconds, the log of the main database will output the following log:

2019-10-08T02:27:24.996827+08:00 217 [Note] While initializing dump thread for slave with UUID, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread. 2019-10-08T02:27:24.998297+08:00 217 [Note] Start binlog_dump to master_thread_id 217 slave_server (953340), pos (, 4) 2019-10-08T02:27:35.265961+08:00 218 [Note] While initializing dump thread for slave with UUID, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread. 2019-10-08T02:27:35.266653+08:00 218 [Note] Start binlog_dump to master_thread_id (218) slave_server (953340), pos (, 4) 2019-10-08T02:27:45.588074+08:00 219 [Note] While initializing dump thread for slave with UUID, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread. 2019-10-08T02:27:45.589814+08:00 219 [Note] Start binlog_dump to master_thread_id slave_server (953340), pos (, 4) 2019-10-08T02:27:55.848558+08:00 220 [Note] While initializing dump thread for slave with UUID, found a zombie dump thread with the same UUID. Master is killing the zombie dump thread. 2019-10-08T02:27:55.849442+08:00 220 [Note] Start binlog_dump to master_thread_id (220) slave_server (953340), pos (, 4)

This log is exactly the same as in the case.

Solve the problem

After knowing the reason, it is very simple to solve the problem. We only need to set the slave_net_timeout parameter to 2 times of MASTER_HEARTBEAT_PERIOD, and then restart the master and slave after setting it.

V. the way of realization

Here we will use a simple source code call analysis to see the impact of slave_net_timeout parameters and MASTER_HEARTBEAT_PERIOD on the master and slave.

1. Use the parameter slave_net_timeout from the library

When starting the IO thread from the library, the timeout is set with the parameter slave_net_timeout:

-> connect_to_master-> mysql_optionscase MYSQL_OPT_CONNECT_TIMEOUT: / / MYSQL_OPT_CONNECT_TIMEOUT mysql- > options.connect_timeout= * (uint*) arg; break

This value is used when establishing a connection to the main library

Connect_to_master-> mysql_real_connect-> get_vio_connect_timeouttimeout_sec= mysql- > options.connect_timeout

So we also saw that the slave_net_timeout parameter takes effect only when the IO thread is restarted.

2. Set the MASTER_HEARTBEAT_ cycle value from the library

This value is set each time you use the slave library change master, which defaults to slave_net_timeout/2:

-> change_master-> change_receive_options mi- > heartbeat_period= min (SLAVE_MAX_HEARTBEAT_PERIOD, (slave_net_timeout/2.0f))

So we see that only change master will reset this value, and restart the master and slave will not reset it.

3. Use the MASTER_HEARTBEAT_ cycle value

Each time the IO thread starts, this value is passed to the DUMP thread of the main library, which should be done by building the statement 'SET @ master_heartbeat_period'. As follows:

-> handle_slave_io-> get_master_version_and_clockif (mi- > heartbeat_period! = 0.0) {char llbuf [22]; const char query_format [] = "SET @ master_heartbeat_period=% s"; char query [sizeof (query_format)-2 + sizeof (llbuf)]

When the main library starts the DUMP thread, it will find this value by searching as follows

-> Binlog_sender::init-> Binlog_sender::init_heartbeat_perioduser_var_entry * entry= (user_var_entry*) my_hash_search (& masked-> user_vars, (uchar*) name.str, name.length) Entry- > val_int (& null_value): 0bot 4. DUMP thread uses MASTER_HEARTBEAT_PERIOD to send heartbeat Event

This is mainly done through a timeout wait, as follows:

-> Binlog_sender::wait_new_events-> Binlog_sender::wait_with_heartbeatset_timespec_nsec (& ts, m_heartbeat_period); / / heartbeat timeout ret= mysql_bin_log.wait_for_update_bin_log (m_thd, & ts) / wait for if (ret! = ETIMEDOUT & & ret! = ETIME) / / receive a signal if it is received normally, indicating that a new Event has arrived, otherwise send heartbeat Event break; / / normal return 0 is timeout return ETIMEDOUT continue loop if (send_heartbeat_event (log_pos)) / / send heartbeat Event return 1 match 5, reconnect will kill the possible existing DUMP thread

The comparison based on UUID is as follows:

-> kill_zombie_dump_threadsFind_zombie_dump_thread find_zombie_dump_thread (slave_uuid); THD * tmp= Global_THD_manager::get_instance ()-> find_thd (& find_zombie_dump_thread); if (tmp) {/ * Here we do not call kill_one_thread () as it will be slow because it will iterate through the list again. We just to do kill the thread ourselves. * / if (log_warnings > 1) {if (slave_uuid.length ()) {sql_print_information ("While initializing dump thread for slave with"UUID, found a zombie dump thread with the"same UUID. Master is killing the zombie dump "" thread (% u). ", slave_uuid.c_ptr (), tmp- > thread_id ();} / / this is the log in this case.

Here we see the log in the case.

6. About the DUMP thread flow chart

Finally, a flow chart of the DUMP thread in section 17 of "MySQL Master-Slave principle 32" is given as follows:

You can see where the heartbeat Event is sent in the figure.

Author Wechat: gp_22389860

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.