Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the cause of the error problem of iPot O in MySQL

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

What is the cause of the error problem with Icano in MySQL? This problem may be often seen in our daily study or work. I hope you can gain a lot from this question. The following is the reference content that the editor brings to you, let's take a look at it!

Problem phenomenon

Recently, I used sysbench to test MySQL. Due to the long testing time, I wrote a script to run in the background in the order of prepare- > run- > cleanup. After running, check the log and find a problem. There are several errors similar to the following information in the error log of MySQL service:

[ERROR] InnoDB: Trying to do I/O to a tablespace which does not exist. I/O type: read, page: [page id: space=32, page number=57890], I/O length: 16384 bytes .

It looks like there is an error in iCandle O, but the MySQL process does not crash and the sysbench client does not report an error.

The process of finding problems

According to the time record of the error report and the comparison of the time points of each stage of the script output, it is determined that the command being executed by the script at that time is:

Sysbench-tables=100-table-size=4000000-threads=50-mysql-db=sbtest-time=300 oltp_delete cleanup

The use case is executed manually again, but the same situation does not occur again. But you can still find this error message with script execution. The initial suspicion is that the gap between run and cleanup should not be too long to trigger this problem. Because it takes a long time to execute 100G of data once, and the reproduction cost is high, try to reduce the amount of use case data first. Change the-table-size=4000000 to 2000000, execute the script at this time, and the problem will not be triggered. Finally, the-table-size=3000000 can be triggered stably and reduce part of the recurrence time. To confirm that the interval is too long to cause it to fail to reproduce, modifying the script to sleep for 10 seconds between run and cleanup does not trigger this error message. If you change it to sleep for 5 seconds, it can still trigger, but the number of errors has been reduced.

Problem investigation

Looking at the code for the corresponding version of mysql5.7.22, there is only one place for this error: in the fil_io () function on line 5578 of the fil0fil.cc file. Debug directly with gdb, add a breakpoint at this location, and execute a reproducible script to get the following stack:

(gdb) bt#0 fil_io (type=..., sync=sync@entry=false, page_id=..., page_size=..., byte_offset=byte_offset@entry=0, len=16384, buf=0x7f9ead544000, message=message@entry=0x7f9ea8ce9c78) at mysql-5.7.22/storage/innobase/fil/fil0fil.cc:5580#1 0x00000000010f99fa in buf_read_page_low (err=0x7f9ddaffc72c, sync=, type=0, mode=, page_id=..., page_size=... Unzip=true) at mysql-5.7.22/storage/innobase/buf/buf0rea.cc:195#2 0x00000000010fc5fa in buf_read_ibuf_merge_pages (sync=sync@entry=false, space_ids=space_ids@entry=0x7f9ddaffc7e0, page_nos=page_nos@entry=0x7f9ddaffc7a0, n_stored=2) at mysql-5.7.22/storage/innobase/buf/buf0rea.cc:834#3 0x0000000000f3a86c in ibuf_merge_pages (n_pages=n_pages@entry=0x7f9ddaffce30, sync=sync@entry=false) at mysql-5.7.22/storage/innobase/ibuf/ibuf0ibuf.cc:2552#4 0x0000000000f3a94a in ibuf_merge (sync=false Sync=false N_pages=0x7f9ddaffce30) at mysql-5.7.22/storage/innobase/ibuf/ibuf0ibuf.cc:2656#5 ibuf_merge_in_background (full=full@entry=false) at mysql-5.7.22/storage/innobase/ibuf/ibuf0ibuf.cc:2721#6 0x000000000102bcf4 in srv_master_do_active_tasks () at mysql-5.7.22/storage/innobase/srv/srv0srv.cc:2132#7 srv_master_thread (arg=) at mysql-5.7.22/storage/innobase/srv/srv0srv. Cc:2383#8 0x00007fa003eeddc5 in start_thread () from / lib64/libpthread.so.0#9 0x00007fa002aab74d in clone () from / lib64/libc.so.6

It is obvious that the background thread is doing the insert buffer merge operation. At this point, it is found that space- > stop_new_ops is true, which means that the space to which the page to be processed belongs is being deleted. Why would you manipulate the space that is being deleted? This requires an investigation of the insert buffer function, the process of insert buffer merge, and the process of deleting tables.

Background knowledge of insert buffer

Insert buffer is a special data structure (B + tree) that caches changes when the secondary index page is not in the buffer pool and later merges when the page is loaded into the buffer pool by other read operations. When MySQL first introduced this feature, it can only cache insert operations, so it is called insert buffer. Now these operations can be INSERT, UPDATE, or DELETE (DML), so it is called change buffer (this article is still described in insert buffer), but the source code is still identified by ibuf. This function caches several updates to the same page, merges them into an one-time update operation, reduces IO, and converts random IO into sequential IO, which can avoid the performance loss caused by random IO and improve the write performance of the database.

Related insert buffer merge logic

When buffer page is read into buffer pool, insert buffer merge occurs. There are several main scenarios where the merge process occurs:

When the page is read into the buffer pool, the merge of ibuf is performed after the reading is completed before the page is available

The merge operation is executed as a background task. Parameter innodb_io_capacity sets the upper limit of the number of pages per merge process for InnoDB background tasks.

During crash recovery, the insert buffer merge of the corresponding page is executed when the index page is read into the buffer pool

Insert buffer is persistent, and a system crash will not cause it to fail. After restart, the insert buffer merge operation will return to normal

You can use-innodb-fast-shutdown = 0 to force a full merge of ibuf when the server is shut down.

Our problem this time obviously belongs to the second situation. The innodb main thread (svr_master_thread) actively performs the merge operation of insert buffer every other second. First, determine whether there has been any activity on the server in the past 1 s (inserting tuples into the page, row operations on the undo table, etc.). If so, the maximum number of pages in merge is 5% of the number set by innodb_io_capacity. If not, the maximum number of pages for merge is the value set by innodb_io_capacity.

The main flow of the innodb main thread (svr_master_thread) merge is as follows:

The main thread reads the page number and space number from the leaf node of the ibuf tree and records it into a binary array (unlocked)

The main thread detects whether the space in the tuple is in the tablespace cache. If not, it has been deleted. Delete the record of the corresponding ibuf.

The main thread determines whether to asynchronously read an space being deleted, if so, reports an error, deletes the record of the corresponding ibuf, and goes to procedure 2 to continue the judgment of the next array element.

If everything is judged to be normal, the main thread issues an async io request, and the async reads the indexed page that needs to be merge

The async O handler thread, after receiving the completed merge O, performs the merge operation

When performing merge, call fil_space_acquire to augment space- > n_pending_ops. Avoid concurrency of delete operations

After execution, call fil_space_release to subtract space- > n_pending_ops.

Related logic for deleting tables

Lock fil_system- > mutex, set sp- > stop_new_ops = true, mark space is being deleted, no new operations are allowed, and then unlock fil_system- > mutex

Lock fil_system- > mutex, detect space- > n_pending_ops, and unlock fil_system- > mutex. If it is detected that it is greater than 0, it means that the dependent operation has not been completed. Sleep 20ms and try again.

Lock fil_system- > mutex, detect space- > n_pending_flushes and (* node)-> n_pending, and unlock fil_system- > mutex. If it is detected that it is greater than 0, it means that there is still a dependent I 20ms O that is not completed. Try again after sleeping.

At this point, it is considered that there is no conflict, brush out all dirty pages or delete all pages in a given table space.

Deletes the record of the specified space from the tablespace cache

Delete the corresponding data file.

Problem conclusion

It is clear that there is no lock guarantee between the process of the main thread acquiring ibuf (space,page) and the process of performing the delete operation, and only the merge operation and the delete operation after the completion of the merge operation are mutually exclusive. This error message will occur if the background thread starts ibuf merge and has already performed the detection in step 2, but has not yet performed the detection in step 3, and the user thread starts to delete the table and sets the stop_new_ops flag but has not yet executed step 5 to delete the tablespace cache. The interaction between the two threads is shown in the following figure:

If nothing happens, there must be a thread performing the deletion of the corresponding table when the breakpoint is hit. Sure enough, we can find the following stack:

Thread 118 (Thread 0x7f9de0111700 (LWP 5234)): # 0 0x00007fa003ef1e8e in pthread_cond_broadcast@@GLIBC_2.3.2 () from / lib64/libpthread.so.0#1 0x0000000000f82f41 in broadcast (this=0xd452ef8) at mysql-5.7.22/storage/innobase/os/os0event.cc:184#2 set (this=0xd452ef8) at mysql-5.7.22/storage/innobase/os/os0event.cc:75#3 os_event_set (event=0xd452ef8) at mysql-5.7.22/storage/innobase/os/os0event .cc: 483-4 0x00000000010ec8a4 in signal (this=) at mysql-5.7.22/storage/innobase/include/ut0mutex.ic:105#5 exit (this=) at mysql-5.7.22/storage/innobase/include/ib0mutex.h:690#6 exit (this=) at mysql-5.7.22/storage/innobase/include/ib0mutex.h:961#7 buf_flush_yield (bpage= Buf_pool=) at mysql-5.7.22/storage/innobase/buf/buf0lru.cc:405#8 buf_flush_try_yield (processed=, bpage=, buf_pool=) at mysql-5.7.22/storage/innobase/buf/buf0lru.cc:449#9 buf_flush_or_remove_pages (trx=, flush=, observer=, id=, buf_pool=) at mysql-5.7.22/storage/innobase/buf/buf0lru.cc:632#10 buf_flush_dirty_pages (buf_pool=, id=, observer=, flush= Trx=) at mysql-5.7.22/storage/innobase/buf/buf0lru.cc:693#11 0x00000000010f6de7 in buf_LRU_remove_pages (trx=0x0, buf_remove=BUF_REMOVE_FLUSH_NO_WRITE, id=55, buf_pool=0x31e55e8) at mysql-5.7.22/storage/innobase/buf/buf0lru.cc:893#12 buf_LRU_flush_or_remove_pages (id=id@entry=55, buf_remove=buf_remove@entry=BUF_REMOVE_FLUSH_NO_WRITE) Trx=trx@entry=0x0) at mysql-5.7.22/storage/innobase/buf/buf0lru.cc:951#13 0x000000000114e488 in fil_delete_tablespace (id=id@entry=55, buf_remove=buf_remove@entry=BUF_REMOVE_FLUSH_NO_WRITE) at mysql-5.7.22/storage/innobase/fil/fil0fil.cc:2800#14 0x0000000000fe77bd in row_drop_single_table_tablespace (trx=0x0, is_encrypted=false, is_temp=false, filepath=0x7f9d7c209f38 ". / sbtest/sbtest25.ibd", tablename=0x7f9d7c209dc8 "sbtest/sbtest25" Space_id=55) at mysql-5.7.22/storage/innobase/row/row0mysql.cc:4189#15 row_drop_table_for_mysql (name=name@entry=0x7f9de010e020 "sbtest/sbtest25", trx=trx@entry=0x7f9ff9515750, drop_db=, nonatomic=, nonatomic@entry=true, handler=handler@entry=0x0) at mysql-5.7.22/storage/innobase/row/row0mysql.cc:4741#16 0x0000000000f092f3 in ha_innobase::delete_table (this= Name=0x7f9de010f5e0 ". / sbtest/sbtest25") at mysql-5.7.22/storage/innobase/handler/ha_innodb.cc:12539#17 0x0000000000801a30 in ha_delete_table (thd=thd@entry=0x7f9d7c1f6910, table_type=table_type@entry=0x2ebd100, path=path@entry=0x7f9de010f5e0 ". / sbtest/sbtest25", db=db@entry=0x7f9d7c00e560 "sbtest", alias=0x7f9d7c00df98 "sbtest25", generate_warning=generate_warning@entry=true) at mysql-5.7.22/sql/handler.cc:2586#18 0x0000000000d0a6af in mysql_rm_table_no_locks (thd=thd@entry=0x7f9d7c1f6910, tables=tables@entry=0x7f9d7c00dfe0, if_exists=true, drop_temporary=false Drop_view=drop_view@entry=false, dont_log_query=dont_log_query@entry=false) at mysql-5.7.22/sql/sql_table.cc:2546#19 0x0000000000d0ba58 in mysql_rm_table (thd=thd@entry=0x7f9d7c1f6910, tables=tables@entry=0x7f9d7c00dfe0, if_exists=, drop_temporary=) at mysql-5.7.22/sql/sql_table.cc:2196#20 0x0000000000c9d90b in mysql_execute_command (thd=thd@entry=0x7f9d7c1f6910 First_level=first_level@entry=true) at mysql-5.7.22/sql/sql_parse.cc:3589#21 0x0000000000ca1edd in mysql_parse (thd=thd@entry=0x7f9d7c1f6910, parser_state=parser_state@entry=0x7f9de01107a0) at mysql-5.7.22/sql/sql_parse.cc:5582#22 0x0000000000ca2a20 in dispatch_command (thd=thd@entry=0x7f9d7c1f6910, com_data=com_data@entry=0x7f9de0110e00) Command=COM_QUERY) at mysql-5.7.22/sql/sql_parse.cc:1458#23 0x0000000000ca4377 in do_command (thd=thd@entry=0x7f9d7c1f6910) at mysql-5.7.22/sql/sql_parse.cc:999#24 0x0000000000d5ed00 in handle_connection (arg=arg@entry=0x10b8e910) at mysql-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:300#25 0x0000000001223d74 in pfs_spawn_thread (arg=0x10c48f40) at mysql-5.7.22/storage/perfschema/pfs.cc:2190 # 26 0x00007fa003eeddc5 in start_thread () from / lib64/libpthread.so.0#27 0x00007fa002aab74d in clone () from / lib64/libc.so.6 solution

Add a parameter ignore_missing_space for buf_read_ibuf_merge_pages, buf_read_page_low and fil_io. This means that the space being deleted is ignored. It defaults to false and is set to true when ibuf_merge_pages is called. Determine whether the parameter is true where the fil_io error is reported. If so, do not report the error, and continue other processes.

Or pass in the IORequest::IGNORE_MISSING parameter directly when buf_read_ibuf_merge_pages calls buf_read_page_low.

For more information, please see MariaDB commit:8edbb1117a9e1fd81fbd08b8f1d06c72efe38f44.

Affect the version

Take a look at the relevant information, this problem was introduced when you modified Bug# 19710564 to delete the tablespace version.

MySQL Community Server 5.7.6 was introduced, version 5.7.22 has not been fixed, version 8.0.0 has been fixed.

MariaDB Server 10.2 was affected. MariaDB Server 10.2.9, 10.3.2 fixed

Optimization suggestion

Can optimize the performance: record the error space id in buf_read_ibuf_merge_pages, determine the space id of the next page during the cycle, and delete the record of the corresponding ibuf directly if the space id is the same (the maximum space id record currently allocated is in the system tablespace, space id accounts for 4 bytes, which is lower than 0xFFFFFFF0UL, the value saved in the system tablespace is read during allocation, and then add one, which is unique).

End: I will stop here for the knowledge points, writing a little fast, there may be deficiencies, but also hope to exchange a lot of correction, hoping to help you all.

Thank you for reading! After reading the above, do you have a general idea of the cause of the error in MySQL? I hope the content of the article will be helpful to all of you. If you want to know more about the relevant articles, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report