In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
Author: Su Po
Kangaroo Cloud Service Department-DBA team
Database engineer
Communicate with a customer two days ago
The client put forward some
Questions about mysql isolation level and locks
Then asked about the troubleshooting idea of the phenomenon of lock waiting.
It reminds me of
Long long ago
A diagnostic case I've done.
I was not an old driver at that time.
It took two or three days to get it done.
There is still some reference value in retrospect.
Therefore, share ~
1 problem description
Database example: main library XXXX:3306
Details of the problem: the customer reported that the callback operation of the program related to the user_ site table is very slow, and some of the operations will time out and report an error.
The operation of placing an order is slow and even fails to report an error directly.
The error message on the program side is as follows:
General error: 1205 Lock wait timeout exceeded; try restarting transaction "
2 analysis
This is a typical lock waiting timeout exit error, which may be due to the fact that some transactions are not committed in time and the lock resources are not released. 1. Slow SQL analysis
Idea: check the recent slow SQL through the rds console to see if there are DML or select lock reads related to the user_ site table.
View slow SQL as shown in the following figure
Unfortunately, the relevant slow SQL is not found, which means that the SQL executes quickly and there is a problem in the processing of the transaction. Therefore, we believe that it is because some programs do not commit the transaction in time.
However, there are still some gains, and we found some potential problems, that is, the slow SQL in the past few years of top. They are all unlocked select, so they have nothing to do with this phenomenon, but these query operations take nearly 100 seconds at most, consume a lot of CPU and memory resources, and need to be optimized.
2. Transaction blocking analysis
Idea: through a script, regularly collect blocking transaction information, as long as the relevant operations are carried out again, the transaction information of blocking events will be recorded in the log and can be analyzed in detail.
The collection script is as follows:
About half an hour later, the relevant information was crawled in the log.
Through the analysis of the log, we find that there are more serious interlock waiting phenomena every once in a while, mainly in the following two situations:
Explain the log output above:
Waiting_trx_id: blocked transaction ID
Waiting_thread: blocked MySQL thread ID
Waiting_query: blocked SQL statement
Blocking_trx_id: transaction ID of the blocker
Blocking_thread: the MySQL thread ID of the blocker
Blocking_query: SQL statement of the blocker
We find that the blocker is always inactive, and the subsequent operations will enter a waiting queue. Blocking occurs because the empty transaction holds the lock resources needed by the blocked transaction, that is, access to the same record in a locked way.
Referring to the results of the previous slow SQL analysis, it is further inferred that the SQL execution within the transaction should be faster, so the program may not commit after processing the SQL.
Positioning program
After communicating with the customer development classmates, reproduce the scene. Output the log in real time, get the thread ID of the blocker, and query the program accessing the mysql database through the information_schema.processlist view.
The customer developer carefully examined the code, but unfortunately did not find the code for the uncommitted transaction. Then the problem may lie in the program itself, which is slow to process, not SQL. Before committing a transaction, the program is likely to go through a slow process, and further verification is needed as to what to handle.
Next, verify the above inference ~
3. SQL audit analysis
Idea: if you want to find out which piece of code it is, blindly looking for a needle in a haystack is undoubtedly a needle in a haystack. Fortunately, RDS provides SQL audit, and as long as you find the SQL operation before the transaction commit, the location of this problem should be clear.
Search for the relevant SQL audit records through the thread ID that was previously checked. The SQL audit can be viewed in the following figure.
Click Database Security → SQL Audit. Enter filter criteria: database name, user name of the program to access the database, keyword input thread ID, and select a valid time range.
This time, finally some gains, and should be able to make a reasonable explanation for this phenomenon. The following are suspicious database operations
We found that several SQL operations were performed immediately after the transaction was opened, and none of these SQL were counted in the slow log, so the SQL execution was fast (less than 2S). From the previous log output, we can also see that the SQL execution is instantaneous, because we can't grab the transaction's SQL statement at all.
However, the commit operation of the transaction takes place more than ten minutes later, so the question now is the key to solving the problem. More than ten minutes of transactions, while SQL executes a total of less than 2S, so what is the program doing the rest of the time?
After having a targeted purpose, the client developer immediately located the program and found the problem. Before committing, the program will insert the queue and delete the cache, and these operations take up the lock and inactivity time of the transaction.
At this point, the analysis process of the problem is all over, and the cause of the problem lies in the non-database operations within the transaction.
3 treatment scheme
Root cause analysis
Not only database operations are carried out within the transaction, the insertion queue and cache deletion operations carried out by the program extend a transaction completed within 2S to more than ten minutes, which means that within these ten minutes, all SQL operations that request lock resources within the transaction must enter the lock waiting state.
Treatment suggestion
1. Only SQL operations are performed within the transaction, and the processing of the program is placed after the transaction is committed or before the transaction begins.
two。 If there is a logic problem that cannot be modified, you can design a processing queue for the relevant processing process and throw an instruction without having to wait for the process to finish processing before committing the transaction.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.