[Mysql] Mysql is overloaded and app access is delayed 04/30 Update SLTechnology News&Howtos

[Mysql] Mysql is overloaded and app access is delayed

2025-04-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Received the alarm message that the load of the MySQL instance at the backend of an online business is relatively high, so log in to the server to check and confirm.

1. First of all, let's check the OS level to make sure that here) fold or open

Top command [yejr@imysql.com:~] # top top-11:53:04 up 702 days, 56 min, 1 user, load average: 7.18,6.70,6.47 Tasks: 576 total, 1 running, 575 sleeping, 0 stopped, 0 zombie Cpu (s): 7.7%us, 3.4%sy, 0.0%ni, 77.6%id, 11.0%wa, 0.0%hi, 0.3%si, 0.0%st -% us and% wa are higher Indicates that the current big bottleneck may be on the CPU consumed by the user process and the disk Iswap O wait. Mem: 49374024k total, 32018844k used, 17355180k free, 115416k buffers Swap: 16777208k total, 117612k used, 16659596k free, 5689020k cached PID USER PR NI VIRT RES SHR S% MEM TIME+ COMMAND 14165 mysql 20 08822m 3.1g 4672 S 162.3 6.6 89839 MEM TIME+ COMMAND 59 mysqld 40610 mysql 20 0 25.6g 14g 8336 S 121.7 31.5 280908 mysqld 49023 mysql 20.016.9g 5.1g 4772 S 4.610.8 349409 mysqld

If% us is too high

1: there is a lot of sorting work

2:sql index is unreasonable to view slow logs, analyze and optimize SQL

If% sy is too high

[root@HaoDai_App_DB01] # iostat-x-m 2

Linux 2.6.32-504.16.2.el6.x86_64 (HaoDai_App_DB01) 03 HaoDai_App_DB01 18 CPU 2016 _ x86 pound 6440

Avg-cpu:% user nice% system% iowait% steal% idle

4.29 0.00 0.20 0.05 0.00 95.46

Device: rrqm/s wrqm/s rUnip s wdeband s rMB/s wMB/s avgrq-sz avgqu-sz await svctm% util

Sda 0.00 2.97 0.04 0.48 0.00 0.01 72.81 0.00 0.49 0.28 0.01

Sdb 0.00 143.43 0.00 131.67 0.00 1.04 16.10 0.01 0.10 0.07 0.91

Avg-cpu:% user nice% system% iowait% steal% idle

12.54 0.00 0.38 0.03 0.00 87.06

Device: rrqm/s wrqm/s rUnip s wdeband s rMB/s wMB/s avgrq-sz avgqu-sz await svctm% util

Sda 0.00 0.00 0.00

Sdb 0.00 174.00 0.00 137.50 0.00 1.17 17.40 0.03 0.19 0.14 1.95

If the throughput (rMB/s+wMB/s) is too low, but the util is too high, the random io is particularly serious (you can use pt-ioprofile to locate the table causing the problem) IOPS=R/s+W/s

Then use iotop to confirm which processes consume the most disk Imando resources:

The amount of data required to read and write at one time is too large, resulting in a large read and write value for disk I _ O. For example, tens of thousands of rows of data or more have to be read or updated in a SQL. It is best to find a way to reduce the amount of data to be read and written at one time. There is no appropriate index in the SQL query that can be used to complete conditional filtering, sorting (ORDER BY), grouping (GROUP BY), data aggregation (MIN/MAX/COUNT/AVG, etc.). Add an index or rewrite SQL. There are a large number of requests in an instant, which is generally good as long as it can withstand the peak. To be on the safe side, it is necessary to improve the configuration of the server properly, in case the peak cannot be resisted, the avalanche effect may occur; because the load caused by some scheduled tasks increases, such as doing data statistical analysis and backup, this consumption of CPU, memory and disk I _ UDO is very large, so it is best to implement it on a separate slave server. The server's own energy-saving strategy finds that the CPU will reduce the frequency when the load is low, and then increase the frequency automatically when the load increases, but it is usually not so timely, resulting in insufficient performance of CPU and can not resist sudden requests. When using raid card, it is usually equipped with BBU (backup battery for cache module). In the early days, lithium battery technology is generally used, which requires regular charge and discharge (DELL server is once every 90 days, IBM is 30 days). We can discharge it in advance before the next charge and discharge time, but most of the new generation servers use capacitive batteries, so this problem does not exist. The file system uses ext4 or even ext3 instead of xfs, which is likely to cause% util to reach 100% when the pressure is high, but iops can no longer be improved, and xfs can generally be greatly improved. The io scheduler strategy of the kernel uses cfq instead of deadline or noop, which can be directly adjusted online or greatly improved. Basically, if% us uses too much or% us and% iowait are both high, it is usually caused by poor writing of sql in concurrency.

Use this idea to analyze the reasons for the high load in our production (from 19:00 to 19:05)

SELECT COUNT_STAR, SUM_TIMER_WAIT, AVG_TIMER_WAIT, EVENT_NAME FROM performance_schema.events_waits_summary_global_by_event_name where COUNT_STAR > 0 and EVENT_NAME like 'wait/synch/%' order by SUM_TIMER_WAIT desc limit 10 +-- + | COUNT_STAR | SUM_TIMER_WAIT | AVG_TIMER_WAIT | EVENT_NAME | + -+ | 36847781 | 1052968694795446 | 28575867 | wait/synch/mutex/innodb/lock_mutex | | 8096 | 81663413514785 | 10086883818 | wait/synch / cond/threadpool/timer_cond | 19 | 3219754571347 | 169460766775 | wait/synch/cond/threadpool/worker_cond | | 12318491 | 1928008466219 | 156446 | wait/synch/mutex/innodb/trx_sys_mutex | | 36481800 | 1294486175099 | 35397 | wait/synch/mutex/innodb/trx_mutex | | 14792965 | 459532479943 | 31027 | wait/synch/mutex/innodb/os_mutex | 2457971 | 62564589052 | 25346 | wait/synch/mutex/innodb/mutex_list_mutex | 2457939 | 62188866940 | 24909 | wait/synch/mutex | / innodb/rw_lock_list_mutex | | 201370 | 32882813144 | 163001 | wait/synch/rwlock/innodb/hash_table_locks | | 1555 | 15321632528 | 9853039 | wait/synch/mutex/innodb/dict_sys_mutex | +- -+ 10 rows in set (0.01 sec)

It can be confirmed from the above table that lock_mutex (corresponding to lock_sys- > mutex in MySQL source code) has the longest lock wait accumulation time (SUM_TIMER_WAIT). Lock_sys represents the global InnoDB lock system. When you see in the source code that InnoDB adds / unlocks a record lock (this case is an X lock) and needs to maintain lock_sys, it will request lock_sys- > mutex.

In this case, because X locks are frequently added / unlocked during the Searching rows for update phase, lock_sys- > mutex is frequently requested, resulting in a long total waiting time for lock_sys- > mutex locks and consuming a lot of CPU while waiting.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.