Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Example Analysis of MySQL instance crash

2025-04-05 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly introduces the example analysis of MySQL example crash, which is very detailed and has certain reference value. Friends who are interested must finish it!

[problem description]

In our production environment, there are multiple MySQL servers in a cluster (MySQL 5.6.21), which are crash from time to time, but only restart information is recorded in error log, and the stack when crash is not recorded:

Mysqld_safe Number of processes running now: 0mysqld_safe mysqld restarted

Next, first check the Syslog / var/log/message file. There is no other abnormal information during crash, and it is not caused by OOM.

[ideas for investigation]

Because there is no valuable information recorded in the log. In order to locate the reason for crash, first enable the function of mysql core dump.

Here are the steps to turn on core dump:

1. Add 2 configuration items to the my.cnf file

[mysqld] core_ file[mysqld _ safe] core-file-size=unlimited

2. Modify system parameters and configure suid_dumpable

Echo 1 > / proc/sys/fs/suid_dumpable

3. Restart the mysql service, and the configuration takes effect.

[problem Analysis]

When core dump is enabled, core file is generated when the server crash again.

Using gdb to analyze the generated core file, you can see that the stack information of crash is as follows:

From the function table_esms_by_digest::delete_all_rows, you can see that it is the truncate table events_statements_summary_by_digest operation that triggers crash.

We have an internal DML analysis tool, which is used to count the number of visits to the database every minute. The data source of the tool is the events_statements_summary_by_digest table, and the acquisition program will collect the data of the table every minute, and perform truncate operation after the collection is completed.

There is no more crash in MySQL after pausing the DML collection program on this group of clusters.

Further analysis of multiple core file shows that the final function calls all take place on the _ lf_pinbox_real_free function.

Combined with the on-site environment, there are two areas worth analyzing:

1. Abnormal value of memory. When printing this variable, the address of the variable here is low, which is not normal:

(gdb) p pins- > pinbox$2 = (LF_PINBOX *) 0x1367208

2. The scarlet letter is used by pfs to release digest records one by one. An error occurred while releasing a row of data:

Void reset_esms_by_digest () {uint index;if (statements_digest_stat_array = = NULL) return;PFS_thread * thread= PFS_thread::get_current_thread (); if (unlikely (thread= = NULL)) return;for (index= 0; index

< digest_max; index++){statements_digest_stat_array[index].reset_index(thread);statements_digest_stat_array[index].reset_data();}digest_index= 1;} 猜测有两种可能导致错误: 1、高并发下,对内存访问出现冲突; 2、某个特殊SQL导致,在处理hash时。 在网上搜索类似的问题,有了进一步的进展,基本确定了这个问题是bug导致 如下Mysql的bug report中讲述了类似问题 https://bugs.mysql.com/bug.php?id=73979 更详细的环境描述如下连接中 https://bugs.launchpad.net/percona-server/+bug/1351148 查到5.6.35上的bug fix的修复内容,和我们碰到的情况非常类似。 对比_lf_pinbox_real_free的修改,该部分确实进行很大的调整。 下面是MySQL 5.6.35函数_lf_pinbox_real_free的代码片段: static void _lf_pinbox_real_free(LF_PINS pins){LF_PINBOX pinbox= pins->

Pinbox;struct st_match_and_save_arg arg = {pins, pinbox, pins- > purgatory}; pins- > purgatory= NULL;pins- > purgatory_count= 0; & pinbox- > pinarray, (lf_dynarray_func) match_and_save, & arg); if (arg.old_purgatory) {void * last= arg.old_purgatory;while (pnext_node (pinbox, last) last= pnext_node (pinbox, last); pinbox- > free_func (arg.old_purgatory, last, pinbox- > free_func_arg);}}

The following is a code snippet of the _ lf_pinbox_real_free of the MySQL 5.6.21 function

Static void _ lf_pinbox_real_free (LF_PINS pins) {int npins;void list;void * * addr= NULL;void first= NULL, last= NULL;LF_PINBOX pinbox= pins- > pinbox;npins= pinbox- > pins_in_array+1;if (pins- > stack_ends_here! = NULL) {int alloca_size= sizeof (void) LF_PINBOX_PINSnpins;if (available_stack_size (& pinbox, * pins- > stack_ends_here) > alloca_size) {struct st_harvester hv;addr= (void * *) alloca (alloca_size); hv.granary= addr Hv.npins= npins;_lf_dynarray_iterate (& pinbox- > pinarray, (lf_dynarray_func) harvest_pins, & hv); npins= hv.granary-addr;if (npins) qsort (addr, npins, sizeof (void *), (qsort_cmp) ptr_cmp);}}

At the same time, it is observed that the index of the cluster with the problem is abnormal, and the QPS is less than 6000 and Threadsconnected is nearly 8000. Compared with other clusters with high concurrency, the QPS is above 20000 and the Threads_connected is only about 300.

Check the connection mode of the application side, and know that one of the applications has nearly 100 application servers, which may initiate requests at the same time, but there is no reasonable reuse connection. Maintaining a large number of connection threads increases the probability of bug trigger.

Bugs Fixed is described as follows:

Miscalculation of memory requirements for qsort operations could result in stack overflow errors in situations with a large number of concurrent server connections. (Bug # 73979, Bug # 19678930, Bug # 23224078)

[solution]

We analyze the core file file of crash, find the trigger condition of crash, pause the DML acquisition program (truncate table events_statements_summary_by_digest operation) and resume it.

It was later learned that this is a bug of MySQL, which has been fixed since version 5.6.35 of MySQL. This bug is easier to trigger when the application side establishes a large number of connections to the database.

The above is all the content of the article "sample Analysis of MySQL instance crash". Thank you for reading! Hope to share the content to help you, more related knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report