In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-15 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces how to solve the waiting timeout of the row lock in MySQL. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.
1. Background # 20191219 10 com.alibaba.druid.filter.logging.Log4jFilter.statementLogError (Log4jFilter.java:152) | ERROR | {conn-10593, pstmt-38675} execute error. Update operation_service set offlinemark =?, resourcestatus =? Where RowGuid =? Com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
Students who have come into contact with MySQL should have encountered this mistake more or less. Professionally speaking, this error is called lock wait timeout. According to the type of lock, it is mainly subdivided into:
Row lock wait timeout
When SQL times out because it waits for a row lock, it times out for row lock waiting, which often occurs in multiple concurrent transaction scenarios.
Metadata lock wait timeout
When SQL times out because it waits for a metadata lock, it times out for the metadata lock wait, which often occurs during a DDL operation.
This article only describes how to effectively resolve row lock wait timeouts, because most projects are such errors, and metadata lock wait timeouts do not involve explanation.
Second, waiting for row locks
Before introducing how to solve the problem of row lock waiting, let's briefly introduce the causes of such problems. Brief description of the cause: when multiple transactions operate (add, delete and modify) a row of data at the same time, in order to maintain the ACID feature, MySQL will use locks to prevent multiple transactions from operating a row of data at the same time to avoid data inconsistencies. Only the transaction assigned to the row lock has the right to operate on the row, and the row lock is not released until the end of the transaction, while other transactions that are not assigned to the row lock generate row lock waiting. If the waiting time exceeds the configured value (that is, the value of the innodb_lock_wait_timeout parameter, and the default is 50s when the personal custom is configured as 5sMagneMySQL), a row lock wait timeout error will be thrown.
As shown in the figure above, transaction An and transaction B will Insert a data with a primary key value of 1 at the same time. Because transaction A first acquires a row lock with a primary key value of 1, transaction B waits because it is unable to acquire the row lock. Transaction B does not acquire the row lock until transaction A commits and completes the commit. The concept of row lock is emphasized here. Although transaction B repeatedly inserts the primary key, the transaction is always in the state of row lock waiting before the row lock is acquired, and the error of primary key conflict will be reported only after the row lock is acquired. Of course, this kind of Insert row lock conflict is relatively rare, and it only occurs in a large number of concurrent insertion scenarios. What is really common on projects is line lock waiting between update&delete. This is only used as an example, and the principle is the same.
Third, the causes
According to the problems I have been exposed to before, they can be roughly divided into the following reasons:
1. Transaction hangs due to non-database interaction in the program
If non-database interactive operations such as interface calls or file operations are embedded in the SQL transaction code, the whole transaction is likely to hang (the interface is not available for timeout or upload and download large attachments).
two。 The transaction contains a poor-performing query SQL
There is a slow query in the transaction, so that other DML in the same transaction cannot release the occupied row lock in time, causing the row lock to wait.
3. A single transaction contains a large number of SQL
It is usually caused by adding a for loop to the transaction code, although a single SQL runs fast, but when the number of SQL is large, the transaction will be slow.
4. Cascading update SQL takes longer to execute
This kind of SQL is easy to give people the wrong impression, such as: update A set. Cascading updates such as where... in (select B) will occupy not only the row lock on table A, but also the row lock on table B. when SQL is executed for a long time, it is easy to cause row lock waiting on table B.
5. Transaction suspended due to disk issu
In rare cases, such as when the storage is suddenly offline, the SQL execution gets stuck on the step of the kernel calling the disk, waiting and the transaction cannot be committed. To sum up, if the transaction is not committed for a long time and the DML operation is included in the transaction, then it is possible to cause a row lock wait and cause an error.
Fourth, the difficulty of positioning
After the row lock timeout error occurs in the web log, many developers will come to me to troubleshoot the problem. Here is the difficulty of locating the problem!
1. MySQL itself does not actively record the information about row lock waiting, so it cannot effectively analyze after the event.
two。 There are many reasons for lock contention, and it is difficult to determine what kind of problem scenario it is afterwards, especially when the problem cannot be reproduced afterwards.
3. After finding the problem SQL, the developer can not effectively mine the complete transaction from the code, which is also related to the architecture of the company framework-product-project, which needs to rely on DBA to collect the complete transaction SQL for analysis.
5. Common methods
First introduce how individuals usually solve this kind of problem, the premise of problem solving here is that the problem can be repeated, as long as it does not appear suddenly, and then never appear again, the source of the problem can generally be found.
The recurrence of the problem here is divided into two scenarios:
1. Manual reproduction
As long as you follow a certain operation, you can repeat the error report, this kind of scene is relatively simple!
two。 Random recurrence
Do not know when will suddenly report an error, can not be manually reproduced, this kind of scene is more difficult!
The following is to write down a unified simulation scenario to reproduce the row lock timeout problem for everyone to understand:
-- the table structure CREATE TABLE `emp` (`id` int (11) NOT NULL, KEY `idx_ id` (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 inserts 100w rows of records from 1 to 100w. -- testing process: transaction 1: start transaction; delete from emp where id = 1; select * from emp where id in (select id from emp);-- > simulate slow query, which takes a long time to execute, so the transaction is not committed and the row lock is not released. Commit; transaction 2: start transaction; delete from emp where id
< 10; -->The row lock is waiting for id=1. When the row lock timeout is reached (here I have configured a timeout of 5s), the return row lock timeout error rollback is returned.
5.1 manually recreate the scene
This scenario usually only needs to know the innodb row lock waiting situation of the current MySQL through the innodb row lock waiting script. For example, we simulate the above error scenario (simulate the page operation) and use the script query (you need to query before the timeout, otherwise you will not see the error after the timeout).
/ * innodb line lock waiting script * / SELECT r.trx_mysql_thread_id waiting_thread,r.trx_query waiting_query, concat (timestampdiff (SECOND,r.trx_wait_started,CURRENT_TIMESTAMP ()),'s') AS duration, b.trx_mysql_thread_id blocking_thread,t.processlist_command state,b.trx_query blocking_current_query E.sql_text blocking_last_query FROM information_schema.innodb_lock_waits w JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id JOIN performance_schema.threads t on t.processlist_id = b.trx_mysql_thread_id JOIN performance_schema.events_statements_current e USING (thread_id)
As we can see above, the thread id of transaction 2 is 76, which has been blocked for 3s by transaction 1, that is, the transaction with thread id 75, and we can see that the SQL currently executed by transaction 1 is a SELECT. This also explains why SELECT blocks other sessions, as many developers often ask me. If you encounter such a situation, then the treatment is actually very simple. You just need to optimize the SELECT, but you can't. Just throw the query out of the transaction, and you don't even need to dig out the whole transaction. The above problem simulation, in fact, corresponds to the second point in the cause of the problem in section 3 (the transaction contains the query SQL with poor performance). Let's also simulate the first point (transaction suspension caused by non-database interactive operations in the program) and compare the phenomenon. We just need to change the process of transaction 1 to the following.
Transaction 1: start transaction; delete from emp where id = 1; select * from emp where id in (select id from emp); wait 60s (do nothing)-> simulate interface call timeout, transaction tamping, and then execute commit. Commit
Looking at it again with a script, you can see that the phenomenon is different, except that the blocking transaction is in the sleep state, that is, the transaction is not currently running SQL. From the DBA point of view, it can be concluded that this kind of phenomenon is probably caused by the transaction hang caused by the code embedding other interactions in the transaction (or it may also be caused by the transaction deadlock caused by network problems), because the program is not like a human, it will not be lazy, there will not be transactions in the middle of execution, take a break and then commit again.
If it is the problem of this kind of phenomenon, because the essence is not the transaction hanging caused by slow SQL, you must find the corresponding point in the code to see what interaction is stuck.
Here, developers need to troubleshoot the code to find the source, but the only available information is the last SQL executed by the transaction, which is the last column in the figure above. From my previous experience (most of the time), it is very difficult for developers to find the specific location in the code from this SQL alone, especially if this SQL is a very common SQL!
When faced with this situation, it should be much easier for DBA to mine all the SQL executed by the transaction and then let the developer troubleshoot the code. Here you need to use MySQL's general_log, which is used to record all running SQL in MySQL.
-- check whether general_log is enabled and the file name mysql > show variables like'% general_log%' +-+ | Variable_name | Value | +-+- -+ | general_log | OFF | | general_log_file | / data/mysql_data/192-168188-155.log | +-- -+-temporarily enable general_log mysql > set global general_log = 1 Query OK, 0 rows affected (0.00 sec)-temporarily closes general_log mysql > set global general_log = 0; Query OK, 0 rows affected (0.00 sec)
After general_log is enabled, when manually reproducing, you can query the thread ID in the result of the innodb row lock waiting script, and go to general_log to find the corresponding transaction analysis, as follows:
According to the thread ID, you can easily find the transaction operation at the corresponding point in time from the general_log (in the actual scenario, you may need to filter through the pipe command). As shown in the figure above, all SQL of transaction 1 and transaction 2 can be found, and then the corresponding location can be found in the code through these SQL. For example, for a transaction with thread ID of 111in the figure above, after executing select * from emp where id in (select id from emp) to the actual commit, the reason is either that the SQL query is slow, or that the code is performing other interactive operations.
PS:general_log will record all SQL, so it has a great impact on the performance of MySQL, and it is easy to soar, so it is only temporarily turned on when troubleshooting problems. Please close it in time after troubleshooting!
5.2 randomly reproduce the scene
Compared with the manual reproduction of the scene, this scenario is random, so it is impossible to simulate the error while querying the specific blocking situation through the script, so it is necessary to monitor the blocking of MySQL in other ways. I usually record MySQL blocking by running a monitoring script (innodb_lock_monitor.sh) on Linux. The script is as follows:
#! / bin/bash # account, password, monitoring log user= "root" password= "Gepoint" logfile= "/ root/innodb_lock_monitor.log" while true do num= `mysql-u ${user}-p$ {password}-e "select count (*) from information_schema.innodb_lock_waits" | grep-v count`if [[$num-gt 0]] Then date > > / root/innodb_lock_monitor.log mysql-u ${user}-p$ {password}-e "SELECT r.trx_mysql_thread_id waiting_thread,r.trx_query waiting_query,\ concat (timestampdiff (SECOND,r.trx_wait_started,CURRENT_TIMESTAMP ()),'s') AS duration,\ b.trx_mysql_thread_id blocking_thread,t.processlist_command state,b.trx_query blocking_query E.sql_text\ FROM information_schema.innodb_lock_waits w\ JOIN information_schema.innodb_trx b ON b.trx_id = w.blocking_trx_id\ JOIN information_schema.innodb_trx r ON r.trx_id = w.requesting_trx_id\ JOIN performance_schema.threads t on t.processlist_id = b.trx_mysql_thread_id\ JOIN performance_schema.events_statements_current e USING (thread_id)\ G "> ${logfile } fi sleep 5 done
Check again
-- use the nohup command to run the monitoring script [root@192-16818815555] # nohup sh innodb_lock_monitor.sh & [2] 31464 nohup: ignoring input and appending output to 'nohup.out'-- check whether nohup.out has reported an error [root@192-168188155155] # tail-f nohup.out mysql: [Warning] Using a password on the command line interface can be insecure. Mysql: [Warning] Using a password on the command line interface can be insecure. Mysql: [Warning] Using a password on the command line interface can be insecure. -- regularly check whether the monitoring log has output (if there is no output, the log will not be generated!) # tail-f innodb_lock_monitor.log Wed Feb 5 11:30:11 CST 2020 * * 1. Row * * waiting_thread: 112waiting_query: delete from emp where id
< 10 duration: 3s blocking_thread: 111 state: Sleep blocking_query: NULL sql_text: select * from emp where id in (select id from emp) 当监控日志有输出阻塞信息时,后续解决方案就和之前的手动复现场景一致。 如果是事务卡在慢 SQL,那么就需要优化 SQL。 如果是事务挂起,那么就通过 general_log 分析事务,然后找到具体的代码位置。 PS:问题排查完成后,请及时关闭后台监控进程,通过 kill+pid 的方式直接关闭即可! 六、Performance_Schema 之前的方法感觉不是很方便,因为 general_log 需要访问服务器,且过滤分析也较难,需要一定的 MySQL 基础及 Linux 基础才适用,因此想寻找一种更为简便的方法。 6.1 方法介绍 个人想法是利用 MySQL 5.5 开始提供的 performance_schema 性能引擎来进行分析,Performance_Schema 是 MySQL 提供的在系统底层监视 MySQL 服务器性能的一个特性,其提供了大量监控项,包括:锁、IO、事务、内存使用等。 介绍下主要原理: 1. 主要用的表有 2 张 events_transactions_history_long 和 events_statements_history_long。 2. transactions_history_long 会记录历史事务信息,events_statements_history_long 则记录历史 SQL。 3. 从 transactions_history_long 中得到回滚事务的线程 ID,再根据时间范围去筛选出可疑的事务,最后从 events_statements_history_long 得到事务对应的 SQL,从中排查哪个为源头。 优点: 1. 不需要通过 general_log 来获取事务 SQL。 2. 不需要监控脚本来获取到行锁等待情况。 3. 只需要访问 MySQL 就可以实现,而不需要访问服务器。 4. 性能开销较小,且不会暴涨,因为是循环覆盖写入的。 5. 可以知道每条 SQL 的运行时长。 缺点: 1. history_long 相关表默认保留记录有限,可能会把有用的数据刷掉,尤其是在 SQL 运行较多的系统。 2. 如果要加大 history_long 相关表的最大保留行数,需要重启 MySQL,无法在线修改参数。 3. history_long 相关表记录中的时间均为相对时间,也就是距离 MySQL 启动的时长,看起来不是很方便。 4. history_long 相关表不会主动记录行锁等待的信息,所以只能通过先根据时间范围刷选出可疑的事务,再进一步分析,不如脚本监控定位的准。 /*开启performance_schema相关监控项,需要提前开启performance_schema*/ UPDATE performance_schema.setup_instruments SET ENABLED = 'YES', TIMED = 'YES' where name = 'transaction'; UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' where name like '%events_transactions%'; UPDATE performance_schema.setup_consumers SET ENABLED = 'YES' where name like '%events_statements%'; /*查看回滚事务SQL,确认是否是日志里报错的事务*/ SELECT a.THREAD_ID ,b.EVENT_ID ,a.EVENT_NAME ,CONCAT (b.TIMER_WAIT / 1000000000000,'s') AS trx_druation ,CONCAT (a.TIMER_WAIT / 1000000000000,'s') sql_druation ,a.SQL_TEXT,b.STATE,a.MESSAGE_TEXT FROM performance_schema.events_statements_history_long a JOIN performance_schema.events_transactions_history_long b ON a.THREAD_ID = b.THREAD_ID AND (a.NESTING_EVENT_ID = b.EVENT_ID OR a.EVENT_ID = b.NESTING_EVENT_ID) WHERE b.autocommit = 'NO' AND a.SQL_TEXT IS NOT NULL AND b.STATE = 'ROLLED BACK' /*查看该时间段内可疑事务即超过5s的事务SQL,这里默认innodb_lock_wait_timeout为5s*/ SELECT a.THREAD_ID ,b.EVENT_ID ,a.EVENT_NAME ,CONCAT (b.TIMER_WAIT / 1000000000000,'s') AS trx_druation ,CONCAT (a.TIMER_WAIT / 1000000000000,'s') sql_druation ,a.SQL_TEXT,b.STATE,a.MESSAGE_TEXT,a.ROWS_AFFECTED,a.ROWS_EXAMINED,a.ROWS_SENT FROM performance_schema.events_statements_history_long a JOIN performance_schema.events_transactions_history_long b ON a.THREAD_ID = b.THREAD_ID AND (a.NESTING_EVENT_ID = b.EVENT_ID OR a.EVENT_ID = b.NESTING_EVENT_ID) WHERE b.autocommit = 'NO' AND SQL_TEXT IS NOT NULL AND b.STATE = 'COMMITTED' AND b.TIMER_WAIT / 1000000000000 >5 AND b.TIMER_START
< (SELECT TIMER_START FROM performance_schema.events_transactions_history_long WHERE THREAD_ID = 70402 /*上述SQL查询结果中的线程ID*/ AND EVENT_ID = 518) /*上述SQL查询结果中的事件ID*/ AND b.TIMER_END >(SELECT TIMER_END FROM performance_schema.events_transactions_history_long WHERE THREAD_ID = 70402 / * Threads in the above SQL query results ID*/ AND EVENT_ID = 518) / * event ID*/ ORDER BY a.THREAD_ID in the above SQL query results
6.2 Test Simulation
If this method is used, then there is no need to break up to reproduce or randomly reproduce. The operation method is the same. The following simulation shows how to do it:
1. First of all, enable the monitoring items related to performance_schema by using the above method, which will take effect directly without restarting MySQL.
two。 Then reproduce the problem, it is best to reproduce it manually (because if you do not check it in time, the monitoring data may be brushed out), if not, you will have to wait for a random repeat.
3. After the problem is repeated, the above script is used to query whether there is a rollback transaction (that is, a transaction that is rolled back because the row lock timed out).
4. Then, according to the thread ID and event ID that rolls back the transaction, it is brought into the last script to view the suspicious transaction and analyze it.
Because it is a test environment simulation, so the results are very clear, there may be many actual output results on the project, we need to analyze whether there is a problem with the transaction one by one!
VII. Summary
After the actual test, it is found that there are many restrictions on troubleshooting row lock waiting timeout through performance_schema, and the final analysis is also a technical task, which is not as simple as it was supposed at first, and it goes against one's wishes.
There are three points that are the most difficult to deal with when troubleshooting problems through performance_schema:
1. The question of time, how to convert relative time into absolute time, this has been unable to find a good way.
two。 The information of row lock waiting will not be actively recorded, so it can only be further analyzed after being brushed by the time node.
3. The record is brushed, because it is a memory table, the setting is very large and easy to overflow, and if the setting is very small, it is easy to be brushed out quickly.
On the MySQL row lock wait timeout how to solve here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
Gc buffer busyThis wait event, also known as global cache buffer busy prior to Oracle 10g,specifies
© 2024 shulou.com SLNews company. All rights reserved.