What should I do if a SQL in MySQL inflates the disk and causes MySQL Crash 07/06 Update SLTechnology News&Howtos

What should I do if a SQL in MySQL inflates the disk and causes MySQL Crash

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Today, I will talk to you about what to do when a SQL in MySQL causes the disk to soar and lead to MySQL Crash. Many people may not know much about it. In order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

When you receive a disk alarm for a MySQL instance and see the monitoring chart, the disk utilization is growing very fast, and it has increased by more than 170G in more than two hours.

Check that the binlog directory is not large, but datadir is more than 180g, and it is found that ibtmp1 is very large and continues to grow.

Quickly go up to see processlist, there are not many threads, check that there is a SELECT case. When... The thread, whose state is sending data, has lasted for thousands of seconds.

A little confused, there is no first time kill to drop this thread, when show processlist again. I found that it was probably dead.

Mysql > show processlist

ERROR 2006 (HY000): MySQL server has gone away

No connection. Trying to reconnect...

ERROR 2002 (HY000): Can't connect to local MySQL server through socket'$datadir/mysqld.sock' (2)

ERROR: Can't connect to the server

Well, now that it's dead, let's start looking for a reason.

Take a look at the disk mounted by datadir, it has been released empty, and sure enough it has been rebooted.

Ext4 197G 4.6G 191G 3% $datadir

Check the mysqld process and find that it has been pulled up by mysqld_safe.

Check the error log:

The scarlet letter is information about mysqld crash, and it is clear that the disk mounted by datadir is full.

There is no extra space to write binlog and ibtmp1 temporary tablespace files (5.7new).

And the ibtmp1 file finally reached 201876045824bytes, nearly 190G, while the total size of the mounted disk was less than 200G.

The blue word is about sql thread, and after mysqld comes up, sql thread also has a problem.

Fix this later.

InnoDB: posix_fallocate (): Failed to preallocate data for file $datadir/ibtmp1, desired size 67108864 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html

[ERROR] Disk is full writing'$datadir/mysql-bin.000004' (Errcode: 15870576-No space left on device). Waiting for someone to free space...

[ERROR] Retry in 60 secs. Message reprinted in 600 secs

[Warning] InnoDB: 1048576 bytes should have been written. Only 647168 bytes written. Retrying for the remaining bytes.

[Warning] InnoDB: Retry attempts for writing partial data failed.

[ERROR] InnoDB: Write to file $datadir/ibtmp1 failed at offset 201911697408, 1048576 bytes should have been written, only 647168 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.

[ERROR] InnoDB: Error number 28 means'No space left on device'

[Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html

[Warning] InnoDB: Error while writing 67108864 zeroes to $datadir/ibtmp1 starting at offset 201876045824

[ERROR] $basedir/bin/mysqld: The table'$tmpdir/#sql_37c5_0' is full

InnoDB: posix_fallocate (): Failed to preallocate data for file. / thread_quartz/QRTZ_FIRED_TRIGGERS.ibd, desired size 32768 bytes. Operating system error number 28. Check that the disk is not full or a disk quota exceeded. Make sure the file system supports this function. Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/ operating-system-error-codes.html

[Warning] InnoDB: Retry attempts for writing partial data failed.

[Warning] InnoDB: Error while writing 32768 zeroes to. / thread_quartz/QRTZ_FIRED_TRIGGERS.ibd starting at offset 442362017-07-06T11:49:21.893377Z mysqld_safe Number of processes running now: 0

Mysqld_safe mysqld restarted

[Note] InnoDB: Last MySQL binlog file position 0 690908428, file name mysql-bin.000004

[Note] InnoDB: Starting in background the rollback of uncommitted transactions

[Note] InnoDB: Rollback of non-prepared transactions completed

[Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

[Note] InnoDB: Creating shared tablespace for temporary tables

[Note] InnoDB: Setting file'$datadir/ibtmp1' size to 12 MB. Physically writing the file full; Please wait...

[Note] InnoDB: File'$datadir/ibtmp1' size is now 12 MB.

[Note] InnoDB: 96 redo rollback segment (s) found. 96 redo rollback segment (s) are active.

InnoDB: 32 non-redo rollback segment (s) are active.

[Note] InnoDB: Waiting for purge to start

[Note] InnoDB: 5.7.12 started; log sequence number 4828513952

[Note] InnoDB: page_cleaner: 1000ms intended loop took 7748ms. The settings might not be optimal. Flushed=0 and evicted=0, during the time.

[Note] InnoDB: Loading buffer pool (s) from $datadir/ib_buffer_pool

[Note] Plugin 'FEDERATED' is disabled.

InnoDB: Buffer pool (s) load completed at 170706 19:49:30

[Note] Recovering after a crash using $basedir/mysql-bin

[ERROR] Error in Log_event::read_log_event (): 'read error', data_len: 579, event_type: 2

[Note] Starting crash recovery...

[Note] InnoDB: Starting recovery for XA transactions...

[Note] InnoDB: Transaction 6729603 in prepared state after recovery

[Note] InnoDB: Transaction contains changes to 1 rows

[Note] InnoDB: 1 transactions in prepared state after recovery

[Note] Found 1 prepared transaction (s) in InnoDB

[Note] Crash recovery finished.

[Note] Crashed binlog file $basedir/mysql-bin.000004 size is 690909184, but recovered up to 690908428. Binlog trimmed to 690908428 bytes.

[Warning] Failed to set up SSL because of the following SSL library error: SSL context is not usable without certificate and private key

[ERROR] Error in Log_event::read_log_event (): 'read error', data_len: 835, event_type: 2

[Warning] Error reading GTIDs from relaylog:-1

[Note] Slave in log O thread: Start asynchronous replication to master'*'in log 'mysql-bin.000014' at position 286486095

[Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.

[Note] Slave thread for channel O thread for channel': connected to master'*', replication started in log 'mysql-bin.000014' at position 286486095

[Warning] Slave SQL for channel'': If a crash happens this configuration does not guarantee that the relay log info will be consistent, Error_code: 0

[Note] Slave SQL thread for channel''initialized, starting replication in log' mysql-bin.000014' at position 286485153, relay log'$datadir/mysql-relay.000013' position: 286485326

[ERROR] Error in Log_event::read_log_event (): 'read error', data_len: 835, event_type: 2

[ERROR] Error reading relay log event for channel'': slave SQL thread aborted because of I take O error

[ERROR] Slave SQL for channel'': Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running' mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave. Error_code: 1594

[ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000014' position 286485153

[Note] Event Scheduler: Loaded 0 events

[Note] $basedir/mysqld: ready for connections.

Version: '5.7.12' socket:'$datadir/mysqld.sock' port: 3306 Source distribution

Because it is dead, and there is no time to fully record the sql executed at that time, I can only see a small part in front of the show processlist.

When it comes to business, the business says that this is a slave, unread business, but occasionally performs report-related SELECT operations manually.

Then check the slow log for two hours before crash:

It is found that there are several SQL messages repeatedly:

...

00VOU 11.392881 select * from tb...

04.779748 select (case when …

04.779748 select (case when t2.col1 …

03.328248 select (case when t2.col1 …

04.276773 select count (t1.id) from tb1

05.039027 select (case when t2.col …

0000VRO 10.263063 select (case when t2.col …

03.131713 select t2.* from tb1 T1 …

00VOUR 15.909456 select t2.* from tb1 T1...

0000Rose 14.367047 select * from tb …

...

Although no one is longer than 20 seconds, don't forget that a second before the disk is blown out, there is a SQL that has been executed for thousands of seconds.

And case when has only one order by.

To take out one of them, check the execution plan:

+- -+

+- -+

+- -+

It doesn't look very good, but it doesn't use a 190g temporary watch.

Although the temporary watch has been accumulated for a long time, looking at the monitoring, disk utilization is indeed accumulated rapidly in this more than an hour.

First of all, this slave library has no business, but through the customer side may be manually connected to do some query services.

After knowing this information, continue to compare with slow log, and find that this sql of slow log about case when is not exactly the same.

This may mean that the customer is still debugging the SQL.

And the case when, which has been executed for thousands of seconds, may be poorly written.

Unfortunately, at the last moment of the mysqld crash, the sql is still not completed, resulting in not being logged to the slow log.

That sql may never be able to reappear.

So how bad SQL, or how big a table, can use so many disk-based temporary tables or file sorting?

Check the database table quickly:

Table 1 ① is 100w, Table 2 is 40w, and there are no fields such as blob,text.

The datadir after ② restart has only 3 or 4 gigabytes of data.

So I have a big idea:

It is possible that the customer generated a SQL like this during debugging:

I compared several sql that have been recorded in slow log, desensitized, formatted and restored as follows:

SELECT (CASE

WHEN t2.coloring infotainment 000' THEN'$a'

WHEN t2.coloring infotainment 001' THEN'$b'

WHEN t2. Collocation infotainment 002 'THEN' $c'

WHEN t2.coloring infotainment 003 'THEN' $d'

WHEN t2.coloring infotainment 004' THEN'$e'

WHEN t2.coloring infotainment 005' THEN'$f'

WHEN t2. Collocation infotainment 006 'THEN' $g'

ELSE t2.col

END) AS source XX

(CASE

WHEN t2.clo2cm infographic 000' THEN'$a'

WHEN t2.clo2cm infotainment 001' THEN'$b'

WHEN t2. Clo2The infographic 002 'THEN' $c'

WHEN t2. Clo2The infographic 003 'THEN' $d'

WHEN t2.clo2cm infographic 004' THEN'$e'

WHEN t2.clo2percent infographic 005' THEN'$f'

WHEN t2.clo2cm infographic 006' THEN'$g'

ELSE t2.col

END) AS target XX

T2.col4 AS XX time

T2.col5 AS amount

FROM $tb1 T1 JOIN $tb2 T2

WHERE t1.col3 = 4 AND (t2.col LIKE'$info%' OR t2.clo2 LIKE'$info%')

ORDER BY t1.col4 DESC

Yes, these two tables make a JOIN, but there are no conditions.

It produces a very large Cartesian product.

Then do the math:

Column: 13 + 19 = 32

Line: 1088181 / 440650 = 479506957650 rows

Test it again:

Dump the two tables into sql files and take a look at the size: 330MB, not big.

Import your own experimental environment, and then run the above SQL that produces Cartesian product.

After a period of waiting:

It has been carried out for more than half an hour. And the SQL is still executing:

You may not believe it when you say it, but the size of the ibtmp1 at this time:

# du-sh *

4.0K auto.cnf

4.0K ib_buffer_pool

2.9G ibdata1

48M ib_logfile0

48M ib_logfile1

54G ibtmp1

12M mysql

1.1M performance_schema

676K sys

633M test

47M test_2

It's probably easy to explain why two not-so-large tables produce such large temporary tables after some queries are executed.

Of course, during the waiting time for the experiment, the copied ERROR has been fixed.

The copied error was reported at that time:

……

Slave_IO_Running: Yes

Slave_SQL_Running: No

……

Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

Last_IO_Errno: 0

Last_IO_Error:

Last_SQL_Errno: 1594

Last_SQL_Error: Relay log read failure: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted (you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running' mysqlbinlog' on the relay log), a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to know their names by issuing 'SHOW SLAVE STATUS' on this slave.

Combined with the error log posted earlier, you can see more information, it is inferred that the relay log is damaged.

Then it will be easy to solve:

STOP SLAVE

-- master_log_file and master_log_pos are the locations recorded when the error was reported.

CHANGE MASTER master_log_file='xxx', master_log_pos=xxx

START SLAVE

I will not repeat it here.

So far, the datadir of my experimental environment is full, but it is different from the formal environment here.

When the disk is full, it does not immediately cause mysql crash, but outputs:

[ERROR] InnoDB: Write to file. / ibtmp1failed at offset 83579895808, 1048576 bytes should have been written, only 958464 were written. Operating system error number 28. Check that your OS and file system support files of this size. Check also that the disk is not full or a disk quota exceeded.

[ERROR] InnoDB: Error number 28 means'No space left on device'

[Note] InnoDB: Some operating system error numbers are described at http://dev.mysql.com/doc/refman/5.7/en/operating-system-error-codes.html

[Warning] InnoDB: Error while writing 67108864 zeroes to. / ibtmp1 starting at offset 83563118592

[ERROR] / data/mysql-base/mysql57/bin/mysqld: The table'# sql_b26_0' is full

[ERROR] Disk is full writing'/ data/mysql-data/mysql57-3357 ax binlog bind MySQL Lue bin.000015'(Errcode: 15868064-No space left on device). Waiting for someone to free space...

[ERROR] Retry in 60 secs. Message reprinted in 600 secs

In fact, under normal circumstances, mysql will check every 1 minute when the disk is full and output the check information to the error log every 10 minutes.

Since mysqld was not killed, I executed that terrible sql again.

Of course. Error log outputs the message of being kill as desired.

And the mysqld process is killed, at this point:

[ERROR] / data/mysql-base/mysql57/bin/mysqld: Binary logging not possible. Message: An error occurred during flush stage of the commit. 'binlog_error_action' is set to 'ABORT_SERVER'. Hence aborting the server.

14:42:48 UTC-mysqld got signal 6

This could be because you hit a bug. It is also possible that this binary

Or one of the libraries it was linked against is corrupt, improperly built

Or misconfigured. This error can also be caused by malfunctioning hardware.

Attempting to collect some information that could help diagnose the problem.

As this is a crash and something is definitely wrong, the informati

Interestingly, the word "information" has not been printed out, so it is probably impossible to write a single character.

An error was also reported when trying to add a few characters to the error log with vim and save it.

To sum up:

For sql executed in online environment, it is recommended to audit, report and other OLAP requirements also need to be more formal.

0 consider whether the size of ibtmp1 can be limited, that is, the maximum value of innodb_temp_data_file_path can be set.

0 may need to restart mysqld periodically to shrink the temporary tablespace.

0 do a good job of monitoring and timely response, this time the response time is too long, reached the threshold, and then to the machine to check, it is already late, helplessly saw mysqld in front of my eyes crash.

After reading the above, do you have any further understanding of what to do about a SQL in MySQL that causes the disk to soar and causes MySQL Crash? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.