Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Analysis of unexpected restart of production Database

2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article introduces the relevant knowledge of "the analysis of unexpected restart of production database". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

Investigation

Let's first explain the database version:

Mysql > status-mysql Ver 14.14 Distrib 5.7.22-22, for Linux (x86 / 64) using 6.2Connection id: 59568Current database:Current user: root@localhostSSL: Not in useCurrent pager: stdoutUsing outfile:''Using delimiter: Server version: 5.7.22-22-log Percona Server (GPL), Release 22, Revision f62d93cProtocol version: 10

Troubleshooting crashes is by no means an interesting task, especially if MySQL does not report the cause of the crash. For example, when MySQL runs out of memory.

Message from database mail alarm reminder:

Type: mysqlTags: production main library Host: 172.16.1.66:3306Level: criticalItem: connectValue: downMessage: mysql server down

Log in to the Grafana monitoring panel, during which time the database connection has increased significantly.

Casually check the previous server mail monitoring alarm record, the last point in time, the memory occupancy rate of 99%, indicating that the increase in database connections may be the last straw to crush the server.

In fact, the direct cause of OOM is not complicated, because the server does not have enough memory, and the kernel needs to recover memory. Reclaiming memory means kill the program that uses the most memory on the server, while the MySQL service may use the most memory, so it is OOM.

Type: osTags: 66 Database Host: 172.16.1.66:Level: criticalItem: memoryValue: 99%Message: too more memory usage View system Log

Let's take a look at the log with this question:

# check the log tail-500f / var/log/messages# the following is oom-killerNov 27 14:55:48 itstyledb1 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0Nov 27 14:55:48 itstyledb1 kernel: mysqld cpuset=/ mems_allowed=0-1Nov 27 14:55:48 itstyledb1 kernel: CPU: 2 PID: 895Comm: mysqld Kdump: loaded Not tainted 3.10.0-862.3.2.el7.x86_64 # 1Nov 27 14:55:48 itstyledb1 kernel: Hardware name: Huawei RH1288 V3/BC11HGSC0 BIOS 3.22 05/16/2016

Friends, continue to look down:

0 pages HighMem/MovableOnlyNov 27 14:55:48 itstyledb1 kernel: 291281 pages reservedNov 27 14:55:48 itstyledb1 kernel: [pid] uid tgid total_vm rss nr_ptes swapents oom_score_adj nameNov 27 14:55:48 itstyledb1 kernel: [468] 0 468 28271 4326 62 55 0 systemd-journalNov 27 14:55:48 itstyledb1 kernel: [490] 0 490 11492 2 24 553-1000 systemd-udevdNov 27 14:55:48 itstyledb1 kernel: 0787 13877 18 27 96-1000 auditdNov 27 14:55:48 itstyledb1 kernel: 81 810 14552 81 34 89-900 dbus-daemonNov 27 14:55:48 itstyledb1 kernel: 0815 55956 1 60 466 0 abrtdNov 27 14:55:48 itstyledb1 kernel: [816] 0 816 55327 9 64 346 0 abrt-watch-logNov 27 14:55:48 itstyledb1 kernel: [818] 0 818 121607 220 90 495 0 NetworkManagerNov 27 14:55:48 itstyledb1 kernel: [822] 0 822 5415 49 16 33 0 irqbalanceNov 27 14:55:48 itstyledb1 kernel: [823] 997 823 134634 97 60 1306 0 polkitdNov 27 14:55:48 itstyledb1 kernel: [825] 0 825 6594 42 20 41 0 systemd-logindNov 27 14:55:48 itstyledb1 kernel: [830] 0 830 31578 28 21 139 0 crondNov 27 14:55:48 itstyledb1 kernel: [839] 0 839 27522 2 10 31 0 agettyNov 27 14:55:48 itstyledb1 kernel: [1142] 0 1142 143454 114 97 2672 0 tunedNov 27 14:55:48 itstyledb1 kernel: [1144] 0 1144 28203 11 59 246 -1000 sshdNov 27 14:55:48 itstyledb1 kernel: [1145] 0 1145 97438 694 103 328 0 rsyslogdNov 27 14:55:48 itstyledb1 kernel: [1369] 0 1369 22526 20 44 256 0 masterNov 27 14:55:48 itstyledb1 kernel: [1371] 89 1371 22 596 32 46 251 0 qmgrNov 27 14:55:48 Itstyledb1 kernel: [5140] 0 5140 5102 1617 15 239 0 mysqld_exporterNov 27 14:55:48 itstyledb1 kernel: [9430] 0 9430 55966 378 62 790 0 snmpdNov 27 14:55:48 itstyledb1 kernel: [30320] 27 30320 22951376 13928375 43437 8163662 0 mysqldNov 27 14:55:48 itstyledb1 kernel: [688] 89 688 22552 271 4600 pickupNov 27 14:55:48 itstyledb1 kernel: Out of memory: Kill process 30320 (mysqld) score 984 or sacrifice childNov 27 14:55:48 itstyledb1 kernel: Killed process 30320 (mysqld) total-vm:91805504kB Anon-rss:55713500kB, file-rss:0kB, shmem-rss:0kBNov 27 14:56:00 itstyledb1 systemd: mysqld.service: main process exited, code=killed, status=9/KILLNov 27 14:56:00 itstyledb1 systemd: Unit mysqld.service entered failed state.Nov 27 14:56:00 itstyledb1 systemd: mysqld.service failed.Nov 27 14:56:00 itstyledb1 systemd: mysqld.service holdoff time over, scheduling restart.Nov 27 14:56:01 itstyledb1 systemd: Starting MySQL Server...

When outofmemory occurs, the outofmemory function selects a process that the kernel considers guilty of allocating too much memory and kills that process. Obviously Mysql is the "sinner".

MySql then restarts automatically. After the restart, the memory came down, but near the end of the day, it was almost full again.

[root@itstyledb1] # free-m total used free shared buff/cache availableMem: 55803 54976 241 10585 349Swap: 32064 25036 7028

Find the MySql process and execute the following top-p pid, using 52.4g memory

PID USER PR NI VIRT RES SHR S% CPU% MEM TIME+ COMMAND935 mysql 20 0 79.7g 52.4g 7336 S 0.396.1 255 MEM TIME+ COMMAND935 mysql 44.76 mysqld computing memory usage

1) check how much memory MySQL takes up globally

SELECT (@ @ innodb_buffer_pool_size+@@innodb_log_buffer_size+@@key_buffer_size) / 1024 / 1024 AS MEMORY_MB

The query results are as follows:

+-| MEMORY_MB | +-+ | 20512.00000000 | +-+

2) check how much memory is consumed by performance_schema

SELECT SUBSTRING_INDEX (event_name,'/',2) AS code_area, sys.format_bytes (SUM (current_alloc)) AS current_alloc FROM sys.x$memory_global_by_current_bytes GROUP BY SUBSTRING_INDEX (event_name,'/',2) ORDER BY SUM (current_alloc) DESC

The query results are as follows:

+-- +-+ | code_area | current_alloc | +-+-+ | memory/performance_schema | 349.80 MiB | +-- -+-+

3) check how much memory is consumed by each thread

SELECT ((@ @ read_buffer_size+ @ @ read_rnd_buffer_size+ @ @ sort_buffer_size+ @ @ join_buffer_size+ @ @ binlog_cache_size+ @ @ thread_stack+ @ @ max_allowed_packet+ @ @ net_buffer_length)) / (1024 / 1024) AS MEMORY_MB

The query results are as follows:

+-| MEMORY_MB | +-+ | 87.5156 | +-+

View current thread

Show full processlist

The end result is:

+-+ | MEMORY_MB | +-+ | 87.51563.37 | +-+

4) check how much memory is consumed by the memory storage engine

SELECT SUM (max_data_length) / 1024 Universe 1024 AS MEMORY_MB FROM information_schema.tables WHERE ENGINE='memory'

The query results are as follows:

+-| MEMORY_MB | +-+ | 3857.37713909 | +-+

The above four items add up to almost as much as 27975MB, which is not bad for 28G, but the MySql process shows that it takes up 52.4G, so where is the remaining 24.4G?

Thread pool

This thread pool is not the other connection pool. In fact, there is a big difference between the two. Connection pool is generally set on the client, while thread pool is configured on the DB server. In addition, connection pool can be used to avoid frequent connection creation and destruction, but cannot reach the goal of controlling the number of active threads of MySQL. In high concurrency scenarios, it cannot be used to protect DB. A better way is to combine connection pooling with thread pooling.

Some parameters about the thread pool:

Mysql > show variables like 'thread%' +-- +-- + | Variable_name | Value | + -+ | thread_handling | one-thread-per-connection | | thread_pool_high_prio_mode | transactions | | thread_pool_high_prio_tickets | 4294967295 | | thread_pool_idle_timeout | 60 | thread_pool_max_threads | 100000 | | thread_pool_oversubscribe | 3 | | thread_pool_size | 12 | | thread_pool_stall_limit | 500 | +-| -+ thread_handling:

This parameter is the configuration thread model, and the default is one-thread-per-connection, that is, thread pools are not enabled. Thread pooling is enabled by setting this parameter to pool-of-threads.

Threadpoolsize:

This parameter sets the number of Group for the thread pool, which defaults to the number of system CPU, making full use of CPU resources.

Threadpooloversubscribe:

This parameter sets the maximum number of threads in group, and the maximum number of threads per group is threadpooloversubscribe+1. Note that listener threads are not included.

Threadpoolhighpriomode:

The control parameters of the high priority queue have three values (transactions/statements/none). The default is transactions. The three values are as follows:

Transactions: put statements that have started transactions in a high priority queue, depending on the following threadpoolhighpriotickets parameter

Statements: in this mode, all statements are placed in the high priority queue, and the low priority queue is not used

None: this mode does not use high priority queues

Threadpoolhighpriotickets:

This parameter controls the maximum number of word orders per connection to be placed in the high priority queue. The default is 4294967295. Note that this parameter has an effect only when threadpoolhighpriomode is transactions.

Threadpoolidle_timeout:

The maximum idle time of the worker thread is 60 seconds by default. If the limit is exceeded, the thread will exit.

Threadpoolmax_threads:

This parameter is used to limit the maximum number of threads in the thread pool. After this limit, no more threads can be created. The default is 100000.

Threadpoolstall_limit:

This parameter sets the interval at which the timer thread detects whether group is abnormal, which defaults to 500ms.

The final configuration is as follows:

# the number of thread poolthread_handling=pool-of-threads#Group, which defaults to the number of system CPU. Make full use of the CPU resource thread_pool_size=24#, the maximum number of threads per group is thread_pool_oversubscribe+1thread_pool_oversubscribe=3performance_schema=off#extra connection, to prevent unable to log in when the thread pool is full. MySQLextra_max_connections = 8extra_port = 33333

Note: thread pooling is available in Percona,MariaDB,Oracle MySQL Enterprise Edition, but not in Oracle MySQL Community Edition.

The appearance of the thread pool does not directly lead to memory non-collection. It is said on the Internet that enabling Thread pool and PS at the same time will lead to memory leakage, but the current version of Percona server 5.7.21-20 + has fixed this problem, which obviously does not exist.

Slow query

Because of the production environment, this problem drags on for a long time, so will slow queries affect memory usage? With this problem, I looked at the slow query background list and found that there were a lot of slow query statements in the period before the database crashed. But this does not explain the problem to a certain extent, because the memory of the server's MySql service has bottomed out before it is killed, the number of connections at this time is not many, only about 30 or 40, most of them are dormant, and most of the Swap space has been occupied at this time. In other words, in the case of limited resources, there must be a lot of slow query statements.

This is the end of the analysis of unexpected restart of production database. Thank you for your reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report