Starting from Mysql slave system lock delay 07/09 Update SLTechnology News&Howtos

Starting from Mysql slave system lock delay

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This paper mainly analyzes the reasons for the emergence of system lock in sql thread, but the author has not systematically studied the code of master-slave, which is also a goal in 2018, I am full in 2018, tragedy. So if there are any mistakes, please point them out and use them as a note for later study. At the same time, it also gives the author now knows several kinds of possibility of causing delay and the way of delay calculation.

Based on the 5.7.17 source code, this paper only considers the row format binlog mainly considers the DML statement, the DDL statement is relatively simple and does not consider the first, delay calculation method

In fact, every time the show slave status command, the background will call the function show_slave_status_send_data to calculate in time, this delay is not saved anywhere. The stack frames are as follows:

# 0 show_slave_status_send_data (thd=0x7fffd8000cd0, mi=0x38ce2e0, io_gtid_set_buffer=0x7fffd800eda0 "e859a28b-b66d-11e7-8371-000c291f347d:42-100173" Sql_gtid_set_buffer=0x7fffd8011ac0 "e859a28b-b66d-11e7-8371-000c291f347d:1-100173") at / mysql/mysql-5.7.17/sql/rpl_slave.cc:3602 # 1 0x0000000001867749 in show_slave_status (thd=0x7fffd8000cd0) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:3982 # 2 0x0000000001867bfa in show_slave_status_cmd (thd=0x7fffd8000cd0) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:4102

Its calculation is basically based on this code.

Time_diff= ((long) (time (0)-mi- > rli- > last_master_timestamp)-mi- > clock_diff_with_master)

Explain a little bit:

Time (0): take the current slave server system time. Mi- > rli- > last_master_timestamp: is the time + exetime of timestamp in event common header, where exetime is only available in query event, and all others are 0, which also causes the maximum delay in binlog row format to be (2 times the execution time of the master database), but the maximum index delay contained in the DDL statement in query event is (1 times the execution time of the master database) mi- > clock_diff_with_master: this is the difference between the slave database and the master database time.

Here we also see that the local time of timestamp and slave in common header in event is the determining factor. Because each time the command time (0) is initiated, the delay is increasing even if the time of the timestamp in the common header in event is constant.

There is also a famous pseudo code as follows:

/ * The pseudo code to compute Seconds_Behind_Master: if (SQL thread is running) {if (SQL thread processed all the available relay log) {if (IO thread is running) print 0; else print NULL;} else compute Seconds_Behind_Master;} else print NULL; * /

In fact, he also comes from the function show_slave_status_send_data, who is interested to see for himself. I won't explain too much.

In this part, you can also refer to Taobao Kernel monthly report.

Https://www.kancloud.cn/taobaomysql/monthly/140089, this is also the article I will study later. 2. The time when Binlog writes the Binlog file and the time when event is generated

I find that some friends have questions about this, so give a simple explanation.

Binlog actually writes to the binlog file after commit, and the transfer is after order commit's flush. Event generation time: if the open thing is not displayed, Gtid event/query event/map event/dml event/xid event is the command initiation time. If the display start thing Gtid event/xid event is the commit command initiation time, the other event is the dml statement initiation time.

Of course, binlog writing to the binlog file or when it is transferred to slave has nothing to do with event generation. Here is a typical event life cycle for a small thing

> Gtid Event:Pos:234 (0Xea) N_pos:299 (0X12b) Time:1513135186 Event_size:65 (bytes) Gtid:31704d8a-da74-11e7-b6bf-52540a7d243:100009 last_committed=0 sequence_number=1-> Query Event:Pos:299 (0X12b) N_Pos:371 (0X173) Time:1513135186 Event_size:72 (bytes) Exe_time:0 Use_db:test Statment (35b-trun): BEGIN / *! Trx beginnings / Gno:100009-> Map Event:Pos371 (0X173) N_pos 415 (0X19f) Time:1513135186 Event_size:44 (bytes) TABLE_ID:108 DB_NAME:test TABLE_NAME:a Gno:100009-> Insert Event:Pos:415 (0X19f) N_pos:455 (0X1c7) Time:1513135186 Event_size:40 (bytes) Dml on table: test.a table_id:108 Gno:100009 > Xid Event:Pos:455 (0X1c7) N_Pos:486 (0X1e6) Time:1513135186 Event_size:31 (bytes) COMMIT / *! Trx end*/ Gno:100009 III. Possible causes of delay

This is part of the reason why I sum up what I know:

Big things delay slightly 2 * execution time state is: reading event from the relay log big table DDL delay slightly 1 * execution time status is: altering table long-term uncommitted transaction delay, will cause delay instantaneous increase in table without primary key or unique key status: system lock or reading event from the relay loginnodb layer lock causes delay state: system lock or reading event from the relay log slave library parameter settings such as sync_binlog Parameters such as sync_relay_log,innodb_flush_log_at_trx_commit

These are all reasons that I have encountered before. Next I want to analyze the reasons for the formation of system lock from the library.

IV. The cause of the problem

The problem mainly occurs on the slave database of our online library. We often find some databases with a large amount of data, and sql thread is often in the state of system lock, which is roughly as follows:

The official description of this state is as follows:

The thread has called mysql_lock_tables () and the thread state has not been updated since.This is a very general state that can occur for many reasons.For example, the thread is going to request or is waiting for an internal or external system lock for thetable. This can occur when InnoDB waits for a table-level lock during execution of LOCK TABLES.If this state is being caused by requests for external locks and you are not using multiple mysqldservers that are accessing the same MyISAM tables, you can disable external system locks with the--skip-external-locking option. However, external locking is disabled by default, so it is likelythat this option will have no effect. For SHOW PROFILE, this state means the thread is requesting thelock (not waiting for it).

Obviously it can't solve my problem, and I'm speechless for a moment. Today, I found this state when testing manually adding row locks from the library and sql thread conflicts, so combined with gdb debugging to do the following analysis, I hope to be useful to you, but also as a note I learned later.

5. Reasons for system lock delay

Here directly give the reason for your direct reference: necessary condition: due to a large number of small things such as UPDATE/DELETE table where a row of data, this kind of statement contains only one row of DML event, table is a large table.

There is no primary key or unique key on this table. Due to similar innodb lock congestion, that is, slave modifies the data from the library while modifying the same data as sql_thread. It's true that I can't handle it any longer, so you can try to modify the parameters.

If there are a large number of tables that do not have primary keys or unique keys, you can consider modifying the parameter slave_rows_search_algorithms. But not using a primary key or not choosing a good primary key in innodb is tantamount to suicide.

VI. Analysis of system lock delay

The following analysis is that my conclusion from the gdb code may be incorrect. We know that all state is a state of the upper layer of mysql. If there is to be a state change, we must call THD::enter_stage to change the state, while system lock is the state entered by calling mysql_lock_tables. At the same time, there is another important state reading event from the relay log from the library SQL_THREAD.

Here is a small part of the rpl_slave.cc handle_slave_sql function that is mainly used to prove my analysis.

/ * Read queries from the IO/THREAD until this thread is killed * / while (! sql_slave_killed (thd,rli)) / / Big cycle {THD_STAGE_INFO (thd, stage_reading_event_from_the_relay_log) / / enter the reading event from the relay log state if (exec_relay_log_event (thd,rli)) / / here you will first call the next_event function to read an event, and then call lock_tables, but if it is not the first time to call lock_tables, you do not need to call mysql_lock_tables / / logic. If you call mysql_lock_tables in lock_tables, you will set the state to system lock, and then enter the innodb layer to find and modify the data}

Here also specifically asked Ali's Yinfeng brother to verify that mysql_lock_tables is a function of myisam to implement table locks. Innodb will be set to a shared lock.

Here, let's put aside query event/map event and so on. Only consider DML event

If a small thing has only one DML event, the logic in the scenario is as follows:-> enter reading event from the relay log state-> read an event (refer to the next_event function)-> enter system lock state-> innodb layer to find and modify data. If it is a big thing, the scenario logic that contains multiple DML event is as follows:-> enter reading event from the relay log state-> read an event ( Refer to next_event function)-> enter system lock state-> innodb layer to find and modify data-> enter reading event from the relay log state-> read an event (refer to next_event function)-> innodb layer to find and modify data-> enter reading event from the relay log state-> read an event (refer to next_event function)-> innodb layer to find and modify data. Until the event execution of this thing is complete

So we see that for a small thing, our sql_thread will find and modify the data with system lock, so we draw my conclusion. At the same time, if the sql_thread blockage caused by the innodb layer lock will also be in the state of holding system lock. But it is different for a big thing, although there is the same problem, but its state is reading event from the relay log. So if the emergence of system lock is to consider the conclusions given above, but the conclusions given above may not necessarily lead to system lock, which depends on whether it is a big thing.

The following part is that I used breakpoints and stack frames when I was doing gdb. I saw it myself.

7. Breakpoints used in analysis

Mysql_lock_tables this function changes the status to system lockgdb print: P tables [0]-> s-> table_name

THD::enter_stage this function changes the status gdb print: P new_stage- > m_name

Ha_innobase::index_read innodb Lookup data Interface gdb print: P index- > table_name

Ha_innobase::delete_row innodb delete data interface

Exec_relay_log_event gets event and applies gdb printing: ev- > get_type_code ()

8. Two stack frames enter system lock state # 0 THD::enter_stage (this=0x7fffec000970, new_stage=0x2ccd180, old_stage=0x0, calling_func=0x2216fd0 "mysql_lock_tables", calling_file=0x22167d8 "/ mysql/mysql-5.7.17/sql/lock.cc", calling_line=323) at / mysql/mysql-5.7.17/sql/sql_class.cc:731 # 1 0x00000000017451a6 in mysql_lock_tables (thd=0x7fffec000970, tables=0x7fffec005e38, count=1) Flags=0) at / mysql/mysql-5.7.17/sql/lock.cc:323 # 2 0x00000000014fe8da in lock_tables (thd=0x7fffec000970, tables=0x7fffec012b70, count=1, flags=0) at / mysql/mysql-5.7.17/sql/sql_base.cc:6630 # 3 0x00000000014fe321 in open_and_lock_tables (thd=0x7fffec000970, tables=0x7fffec012b70, flags=0, prelocking_strategy=0x7ffff14e2360) at / mysql/mysql-5.7.17/sql/sql_base.cc:6448 # 4 0x0000000000eee1d2 in open_and_lock_tables (thd=0x7fffec000970, tables=0x7fffec012b70 Flags=0) at / mysql/mysql-5.7.17/sql/sql_base.h:477 # 5 0x000000000180e7c5 in Rows_log_event::do_apply_event (this=0x7fffec024790, rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/log_event.cc:10626 # 6 0x00000000017f7b7b in Log_event::apply_event (this=0x7fffec024790, rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/log_event.cc:3324 # 7 0x00000000018690ff in apply_event_and_update_pos (ptr_ev=0x7ffff14e2818, thd=0x7fffec000970 Rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:4709 # 8 0x000000000186a7f2 in exec_relay_log_event (thd=0x7fffec000970 Rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:5224// you can see different event processing # 9 0x0000000001870db6 in handle_slave_sql (arg=0x357fc50) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:7332// here is the entire main logic # 10 0x0000000001d5442c in pfs_spawn_thread (arg=0x7fffd88fb870) at / mysql/mysql-5.7.17/storage/perfschema/pfs.cc:2188 # 11 0x00007ffff7bc7851 In start_thread () from / lib64/libpthread.so.0 # 12 0x00007ffff672890d in clone () from / lib64/libc.so.6 looks up data # 0 ha_innobase::index_read (this=0x7fffec0294c0) in system lock state Buf=0x7fffec0297b0 "\ 375\ 311y", key_ptr=0x0, key_len=0, find_flag=HA_READ_AFTER_KEY) at / mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:8540 # 1 0x000000000192126c in ha_innobase::index_first (this=0x7fffec0294c0, buf=0x7fffec0297b0 "\ 375\ 311y") at / mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:9051 # 2 0x00000000019214ba in ha_innobase::rnd_next (this=0x7fffec0294c0 Buf=0x7fffec0297b0 "\ 375\ 311y") at / mysql/mysql-5.7.17/storage/innobase/handler/ha_innodb.cc:9149 # 3 0x0000000000f4972c in handler::ha_rnd_next (this=0x7fffec0294c0, buf=0x7fffec0297b0 "\ 375\ 311y") at / mysql/mysql-5.7.17/sql/handler.cc:2947 # 4 0x000000000180e1a9 in Rows_log_event::do_table_scan_and_update (this=0x7fffec035c20 Rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/log_event.cc:10475 # 5 0x000000000180f453 in Rows_log_event::do_apply_event (this=0x7fffec035c20, rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/log_event.cc:10941 # 6 0x00000000017f7b7b in Log_event::apply_event (this=0x7fffec035c20, rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/log_event.cc:3324 # 7 0x00000000018690ff in apply_event_and_update_pos (ptr_ev=0x7ffff14e2818, thd=0x7fffec000970 Rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:4709 # 8 0x000000000186a7f2 in exec_relay_log_event (thd=0x7fffec000970 Rli=0x393b9c0) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:5224 # 9 0x0000000001870db6 in handle_slave_sql (arg=0x357fc50) at / mysql/mysql-5.7.17/sql/rpl_slave.cc:7332 # 10 0x0000000001d5442c in pfs_spawn_thread (arg=0x7fffd88fb870) at / mysql/mysql-5.7.17/storage/perfschema/pfs.cc:2188 # 11 0x00007ffff7bc7851 in start_thread () from / lib64/libpthread.so.0 # 12 0x00007ffff672890d in clone () from / lib64/libc.so.6

Author Wechat:

Wechat .jpg

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.