In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly introduces the MySQL master-slave replication error Errno1205 solution, the content of the article is carefully selected and edited by the author, with a certain pertinence, for everyone's reference significance is still relatively great, the following with the author to understand the MySQL master-slave replication error Errno1205 solution.
The nagios alarm roughly means
At 14:39, vip drifted.
* * Nagios * Notification Type: PROBLEM Service: check_ha_mysqld Host: prod-bjuc-mysql1-vip Address: 192.168.87.74 State: CRITICAL Date/Time: Wed Mar 28 14:39:27 CST 2018 Additional Info: CRITICAL: HA Failover from 192.168.87.72 to 192.168.87.123, 192.168.87.123 result:PROCS OK: 1 processes with command name / usr/sbin/mysqld, args VIP=192.168.87.74 192.168.87.72:PROCS OK: 1 processes with command name / usr/sbin/mysqld
14:46, master-slave replication is interrupted.
* Nagios * Notification Type: PROBLEM Service: check_mysql_health_slave-sql-running Host: bjuc-mysql1 Address: 192.168.87.72 State: CRITICAL Date/Time: Wed Mar 28 14:46:06 CST 2018 Additional Info: CRITICAL-Slave sql is not running
Log in to the database cloud server to check the status of master-slave replication and find that 1205 errors are reported.
Mysql > show slave status\ gateway * 1. Row * * Slave_IO_State: Waiting for master to send event Master_Host: 192.168.87.123 Master_User: zhu Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin.000481 Read_Master_Log_Pos: 218802975 Relay_Log_File: mysql-relay-bin.000428 Relay_Log_Pos: 2428110 Relay_Master_Log_File: mysql-bin.000481 Slave_IO_Running: Yes Slave_SQL _ Running: No Replicate_Do_DB: Replicate_Ignore_DB: performance_schema Information_schema,test,mysql Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table:%.% Replicate_Wild_Ignore_Table: Last_Errno: 1205 Last_Error: Error 'Lock wait timeout exceeded Try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN' Skip_Counter: 0 Exec_Master_Log_Pos: 182088298 Relay_Log_Space: 39142986 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULLMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 1205Last_SQL_Error: Error 'Lock wait timeout exceeded Try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN' Replicate_Ignore_Server_Ids: Master_Server_Id: 61 row in set (0.00 sec)
Check the error log for an error message:
The cause and solution of the error are given in the error log.
Tail-70 error.log
180328 14:36:05 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:36:21 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:36:38 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:36:56 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:37:15 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:37:35 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:37:56 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:38:17 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:38:38 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:38:59 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:39:20 [ERROR] Slave SQL: Error 'Lock wait timeout exceeded; try restarting transaction' on query. Default database: 'statusnet'. Query: 'UPDATE notice_status_new set valid = 0 WHERE conversation = 384667 AND user_id IN', Error_code: 1205180328 14:39:20 [ERROR] Slave SQL thread retried transaction 10 time (s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.180328 14:39:20 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205180328 14:39:20 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000481' position 182088298
If you look at the combination of parameters and two parameters, you can see that the reason for the error reported by master-slave replication is:
180328 14:39:20 [ERROR] Slave SQL thread retried transaction 10 time (s) in vain, giving up. Consider raising the value of the slave_transaction_retries variable.
180328 14:39:20 [Warning] Slave: Lock wait timeout exceeded; try restarting transaction Error_code: 1205
180328 14:39:20 [ERROR] Error running query, slave SQL thread aborted,Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000481' position 182088298
This update statement is blocked because the table notice_status_new generates a lock
The parameter innodb_lock_wait_time is set to 15 seconds, and a single transaction begins to report a 1025 error after waiting for 15 seconds: because the lock timed out and the transaction was restarted
The parameter slave_transaction_retries is set to 10 times, and replication is interrupted if the number of transaction retries exceeds 10 times. (but 11 attempts were made in the error log)
Mysql > show variables like'% innodb_lock_wait_timeout%' +-- +-+ | Variable_name | Value | +-+-+ | innodb_lock_wait_timeout | 15 | +-+ -+ 1 row in set (0.01 sec) mysql > show variables like'% slave_transaction_retries%' +-- +-+ | Variable_name | Value | +-+-+ | slave_transaction_retries | 10 | +- -+-+ 1 row in set (0.00 sec)
Solution: resume master-slave replication after restarting slave, and start chasing logs
Mysql > stop slave;Query OK, 0 rows affected (0.02 sec) mysql > start slave Query OK, 0 rows affected (0.00 sec) mysql > show slave status\ gateway * 1. Row * * Slave_IO_Running: Yes Slave_SQL_Running: Yes 1 row in set (0.00 sec)
Vip's problem will be cut back in the evening.
-- I'm the dividing line--
Although the problem has been solved this time, we must find the root sound when we find the problem, so that the problem can be regarded as a closed loop.
Why is this lock generated? because the tables that record the lock information are real-time, we can't find the lock information at that time, but we can still find some problems from the running state of the storage engine, and the database has a deadlock! Obviously in update this form will be waiting!
A strange SQL was found, and the locked table was also notice_status_new.
UPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id! = 7384662 AND user_id IN (7359498)
Show engine innodb status\ G intercept this part of the deadlock-LATEST DETECTED DEADLOCK----180328 14purl 33ghur 52mm ACTIVE * (1) TRANSACTION:TRANSACTION 1B285E9F9, ACTIVE 0 sec starting index readmysql tables in use 1, locked 1LOCK WAIT 4 lock struct (s), heap size 1248, 3 row lock (s), undo log entries 1MySQL thread id 3972036, OS thread handle 0x7fc0c3464700 Query id 2525495433 192.168.87.42 dstatusnet UpdatingUPDATE notice_status_new SET count = count + 1, push_count = push_count + 1 WHERE conversation = 2609759 AND user_id! = 7384662 AND user_id IN (7359498) * * (1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_ user`of table `statusnet`.`statusnet`.`statusnet``statusnet`.`statusnet`` trx id 1B285E9F9 lock_mode X locks rec but not gap waitingRecord lock, heap no 241PHYSICAL RECORD: n_fields 3 Compact format; info bits 00: len 8; hex 000000000027d25f; asc'_; 1: len 4; hex 80704c0a; asc pL; 2: len 4; hex 80b1a0fe; asc * * (2) TRANSACTION:TRANSACTION 1B285E9FD, ACTIVE 0 sec starting index read, thread declared inside InnoDB 500mysql tables in use 1, locked 14 lock struct (s), heap size 1248, 3 row lock (s), undo log entries 1MySQL thread id 3867908, OS thread handle 0x7fbfc6d4f700, query id 2525495437 192.168.87.42 dstatusnet UpdatingUPDATE notice_status_new SET count = count + 1 Push_count = push_count + 1 WHERE conversation = 2609759 AND user_id! = 7359498 AND user_id IN (7384662) * * (2) HOLDS THE LOCK (S): RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_ user`of table `statusnet`.`statusnet`.`statusnet`` trx id 1B285E9FD lock_mode X locks rec but not gapRecord lock, heap no PHYSICAL RECORD: n_fields 3 Compact format; info bits 00: len 8; hex 00000000000027d25f; asc'_; 1: len 4; hex 80704c0a; asc pL; 2: len 4; hex 80b1a0fe; asc; * * (2) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 3535 page no 65360 n bits 448 index `conv_ user`of table `statusnet.`statusnet`` trx id 1B285E9FD lock_mode X locks rec but not gap waitingRecord lock, heap no PHYSICAL RECORD: n_fields 3; compact format; info bits 00: len 8; hex 000000000027d25f; Asc';; 1: len 4; hex 8070ae56; asc p V politics; 2: len 4; hex 80b1a0FC; asc;; * * WE ROLL BACK TRANSACTION (2)
Find the problem, find the development of the business, communicate and decide to launch the new version on Saturday to solve the problem completely.
Can write a real-time monitoring lock script, the content output and print, so that it is convenient to locate the problem later!
After thinking about it, it may be due to the master-slave replication error caused by vip drift.
The online architecture is MM+heartbeat
The reason for vip drift remains to be determined. After vip drift, the database begins to write to the database on the 192.168.87.123 CVM, and the database 72 serves as a backup database, but the data from the 123database is also synchronized. Because there is still a deadlock in the 192.168.87.72 database table notice_status_new, lock waiting occurs when slave SQL thread applies the relay log from the 123CVM to the SQL. Wait 15 seconds and time out and retry the transaction. After trying 10, the slave SQL thread thread stops working and the master-slave synchronization is interrupted while still in the locked table state.
-
March 29 and development review
Because the alarm time is not very accurate, the actual time is still based on the heartbeat log and the mysql log. By looking at the log, it is determined that the heartbeat drift occurs first, and then the master-slave replication reports an error.
There are mainly two issues under discussion.
Why did ① Heartbeat drift?
By looking at the heartbeat log, there was heartbeat detection in 72 and 123, and packet loss occurred during 72ping123
Normal in 123ping72, 72 nodes think they are dead? Vip drift occurred at 14:35:26, from 72 to 123.
Mar 28 14:35:26 bjuc-mysql1 pengine: [14176]: notice: LogActions: Move VIP_192_168_87_74 (Started bjuc-mysql1-> prod-uc-yewu-db-slave2)
Why is there a deadlock in ②
From the running status of the storage engine, if the two SQL requests arrive at the database at the same time, because the innodb supports the granularity of row locks, the second SQL will lock many rows. If there is a repetition with the first SQL, they will wait for each other, so deadlocks will occur.
The where conditions of these two SQL are as follows:
User_id IN (6032694, 5824664, 4344233, 5450840, 6032678, 4951596, 6032668, 4344453, 6038786, 434434756, 60754768)
User_id! = 6185996
These two conditions may conflict logically, and then lead to deadlock
After reading the above about MySQL master-slave replication error Errno1205 solution, many readers must have some understanding, if you need to get more industry knowledge and information, you can continue to follow our industry information column.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.