How to analyze the short-term performance problem of locating MySQL with pt-stalk 04/11 Update SLTechnology News&Howtos

How to analyze the short-term performance problem of locating MySQL with pt-stalk

2025-04-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >

Shulou(Shulou.com)06/01 Report--

This article shows you how to analyze the short-term performance problems of using pt-stalk to locate MySQL. The content is concise and easy to understand, which will definitely brighten your eyes. I hope you can get something through the detailed introduction of this article.

[background]

MySQL has a short performance problem of 3-30 seconds, so it is difficult for general monitoring tools to catch the scene, and it is difficult to accurately locate the cause of the problem.

For this type of requirement, our daily MySQL analysis tools have some deficiencies:

1. Performance monitoring tools. At present, the granularity is at the level of minutes and cannot reflect the fluctuation of performance in seconds.

2. The MySQL Performance_schema tool collects 10000 rows of records on the ground in 3 seconds. For servers with QPS greater than 3000, data is lost in the collection.

Performance_schema data is usually used to analyze statement-level performance problems, such as high CPU consumption, scan lines and other sentence problems, but it is impossible to locate the competition for resources such as mutex,lock,thread in the system.

3. Table DML tool (5-minute granularity)

4. Slow Log records slow queries of more than 1 second, which may reflect the effect rather than the cause

5. The MySQL Guard tool is triggered by the alarm system. Generally, problems that last more than 1 minute can be crawled to the scene.

Previously, a function has been extended to monitor high CPU. The granularity can reach about 10 seconds.

Pt-stalk tool can solve more fine-grained fault on-site collection, daemon way to try it, can help us solve some problems.

[use of pt-stalk tool]

Try to use pt-stalk tool to capture snapshots at the fault site

1. Customize the script and define CPU as the trigger condition

Function trg_plugin () {

Echo $(sar 1 1 | grep-I "Average:" | awk'{print $8}'); echo 100-$a | bc

}

2. Start the daemon process with pt-stalk. The following command implements the custom pt_cpu.sh script as the judgment condition. When the value of CPU (100-%idle) is greater than 50, the interval time for judgment is 1 second. Snapshot capture is triggered when the condition is met for 3 times in a row, and 60 seconds after the trigger is triggered.

Pt-stalk-daemonize-dest=/tmp/log/pt-stalk-user=-password=-port=-function=/tmp/pt_cpu.sh-variable highcpu-cycles=3-interval=1-threshold 50-sleep=60 log=/var/log/pt-stalk.log

For more information, please see man pt-stalk.

[case study]

There are temporary problems with threads and CPU alarms on a server. Now there are CPU alarms around 9: 00 every day, but the duration is short, so it is difficult for MySQL Guard tools to collect them from the scene.

According to the previous performance counter response, it is speculated that the increase in IO caused by binlog backup and the backlog of threads is not the reason, and the coincidence of binlog backup time is just a coincidence.

After the server started the pt-stalk daemon, the collection was triggered this morning when the CPU alarm

The snapshot information captured is as follows:

According to the fault snapshot information, combined with the details of slow log and performance_schema statements, there is enough information to locate the cause of the problem.

1. CPU rose at 9:01

2. The CPU information collected by pt-stalk records more fine-grained information for 30 seconds, in which the proportion of CPU sys for 30 seconds is more than 80%. It is usually due to high concurrent threads and excessive context switch resulting in sys consumption.

3. The Threads_running of 30 seconds in a row is really high.

4. After further analysis, it is easy to find the problem because job runs at 9:00 every day, and there is a high concurrency slow query SQL that leads to a backlog of threads.

6. Slow query SQL is caused by missing index. We will observe it after reconstructing the index.

[performance of pt-stalk]

Under normal circumstances, the performance overhead of the daemon is not large. It is recommended that you can customize and enable it when you need to troubleshoot. Here is its processing logic

The above content is how to analyze the short-term performance problems of using pt-stalk to locate MySQL. Have you learned the knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.