Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

An example of show engine innodb status failure of MySQL:5.6 large transaction

2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Today, I encountered a friend's online problem, which roughly means that I have a large online transaction about 100G, and I am doing rollback. It seems that it affects the online business at present, and the rollback is very slow. Can it reduce the impact on the online business? And friends have canceled the double 1 setting, but nothing has changed. Version MySQL 5.6

Welcome to my "in-depth understanding of MySQL master-slave principle 32 lectures," as follows:

The first thing we need to know is that MySQL is not suitable for large transactions. Here are some effects of large transactions in MySQL:

Binlog file as a one-time write, will consume a lot of IO in the sync phase, will lead to the whole library hang master, mostly query end state. Large transactions cause master-slave delays. Large transactions may cause some backups to hang because flush table with read lock, MDL GLOBAL level lock is not available, waiting for global read lock. Large transactions can lead to larger Innodb row lock locking ranges, resulting in row lock wait problems. Rolling back is difficult.

Based on some of the incomplete listings above, we should avoid big transactions online as much as possible. All right, let's move on to the problem.

I. Problems

As mentioned above, we have cancelled the double 1 setting, which is sync_binlog=1 and innodb_flush_log_at_trx_commit=1. These two parameters should be guaranteed to be 1 on the line. The former ensures the safety of binlog and the latter ensures the safety of redo. They play a key role in database crash recovery. If they are not set to double 1, data loss may occur. The specific parameters are not discussed too much. But the problem here is that even if the double 1 is removed, nothing changes, so it seems that IO problems are not the main bottleneck? Here are a few screenshots:

vmstat screenshot

iostat screenshot

top -Hu Screenshot

Focusing on columns r and b of vmstat, we find that there is nothing wrong with IO queues and wa% is not large. We look at the %util in iostat and the size of the read and write data, and the tps is far from reaching its limit (SSD). We can observe that %us is not small, and there are threads that are full (99.4% CPU) of a CPU core.

Therefore, we can change the direction to study the generation of CPU bottlenecks, hoping to help the problem, and then from the perf top provided, we have the following findings:

Let's lock the problem to lock_number_of_rows_locked.

lock_number_of_rows_locked

My friend uses version 5.6, but I describe it here as version 5.7.26. The next section then describes the key differences in the 5.6 and 5.7 algorithms.

I don't know if you've noticed this sign in show engine innodb status:

This tag comes from the function lock_number_of_rows_locked, meaning the number of rows locked by the current transaction. This function is wrapped under the lock_print_info_all_transactions function, which prints the core parameters we usually see in the transaction section of show engine innodb status. Let's look at the simple process:

PrintNotStarted print_not_started(file);//Create a struct to print the not start transaction ut_list_map(trx_sys->mysql_trx_list, print_not_started); //This place prints out transactions whose transaction status is no start. mysql_trx_list is a full transaction. const trx_t* trx; TrxListIterator trx_iter; //This iterator is trx_sys->rw_trx_list iterator of this list const trx_t* prev_trx = 0; /* Control whether a block should be fetched from the buffer pool. */ bool load_block = true; bool monitor = srv_print_innodb_lock_monitor && (srv_show_locks_held != 0); while ((trx = trx_iter.current()) != 0){ //iterating through iterators, obviously there will be no read-only transaction information, all read-write transactions. ... /* If we need to print the locked record contents then we need to fetch the containing block from the buffer pool. */ if (monitor) { /* Print the locks owned by the current transaction. */ TrxLockIterator& lock_iter = trx_iter.lock_iter(); if (! lock_trx_print_locks( //Print lock details file, trx, lock_iter, load_block))

Simply put, it is to print which transactions are not started first, and then print the information of those read and write transactions. Of course, our rollback transactions must also be included. It should be noted that the read-only transaction show engine will not print.

For transactions in the rollback state we can observe the following information in show engine:

trx_print_low can see most of the information, so I won't explain it in detail here. In this case we need to understand how lock_number_of_rows_locked is calculated, as discussed below.

3. Algorithm changes of lock_number_of_rows_locked function

The lock_number_of_rows_locked function prints the number of rows locked in the current transaction. Let's look at the differences between 5.6 and 5.7.

5.7.26

In fact, there is only one sentence:

return(trx_lock->n_rec_locks);

We can see that this returns a counter, and the increment of this counter is done after locking each row. At the end of the function lock_rec_set_nth_bit, you can see ++lock->trx->lock.n_rec_locks, so this is a pre-calculated mechanism.

This is therefore computationally inexpensive, and it is not computationally expensive because a transaction holds a large number of locks.

5.6.22

Then I flipped through the code for 5.6.22 and found that it was completely different as follows:

for (lock = UT_LIST_GET_FIRST(trx_lock->trx_locks); //use the for loop for each acquired lock structure lock != NULL; lock = UT_LIST_GET_NEXT(trx_locks, lock)) { if (lock_get_type_low(lock) == LOCK_REC) { //Filter as row lock ulint n_bit; ulint n_bits = lock_rec_get_n_bits(lock); for (n_bit = 0; n_bit < n_bits; n_bit++) {//Start loop each bit of each lock structure for statistics if (lock_rec_get_nth_bit(lock, n_bit)) { n_records++; } } } } return(n_records);

We know that loops themselves are CPU-intensive operations, and here we use nested loop implementations. Therefore, if a large transaction occurs in 5.6 and a large number of rows are operated, then the number of row lock records obtained will cause CPU consumption.

IV. Cause summary and solution

With the above analysis, we are very clear that the reasons for triggering are as follows:

MySQL version 5.6 has large transactions, about 100 gigabytes of data with row locks using show engine innodb status

This way, when counting the number of locks in this large transaction row, a large number of loop operations will be performed. The phenomenon is that threads consume a lot of CPU resources and are at the top of perf top.

The reason is simple. Find out the monitoring tool that frequently uses show engine innodb status, and then the business returns to normal. IO utilization also rises as follows:

Of course, if you can use newer versions such as 5.7 and 8.0, this problem will not occur, and you can consider using higher versions.

Analyzing performance problems requires first finding performance bottlenecks and then focusing on breakthroughs, such as in this case CPU resource consumption is more serious. Maybe the solution is in an instant.

V. Others

The bugs that were finally queried by friends are as follows:

https://bugs.mysql.com/bug.php? id=68647

It was discovered that Yin Feng (Zhai Weixiang) had raised this problem many years ago and made a modification opinion, and this modification opinion was officially adopted, that is, the algorithm change we analyzed above. After printing wind (Zhai Weixiang) test has bug in the following description:

From perf top, function lock_number_of_rows_locked may occupy more than 20% of CPU sometimes

CPU consumption can be as high as 20%.

Here is the 5.7.26 call stack frame:

#0 lock_number_of_rows_locked (trx_lock=0x7fffedc5bdd0) at /mysql/mysql-5.7.26/storage/innobase/lock/lock0lock.cc:1335#1 0x0000000001bd700f in trx_print_latched (f=0x301cad0, trx=0x7fffedc5bd08, max_query_len=600) at /mysql/mysql-5.7.26/storage/innobase/trx/trx0trx.cc:2633#2 0x0000000001a3ac40 in lock_trx_print_wait_and_mvcc_state (file=0x301cad0, trx=0x7fffedc5bd08) at /mysql/mysql-5.7.26/storage/innobase/lock/lock0lock.cc:5170#3 0x0000000001a3b28f in lock_print_info_all_transactions (file=0x301cad0) at /mysql/mysql-5.7.26/storage/innobase/lock/lock0lock.cc:5357#4 0x0000000001b794b1 in srv_printf_innodb_monitor (file=0x301cad0, nowait=0, trx_start_pos=0x7fffec3f4cc0, trx_end=0x7fffec3f4cb8) at /mysql/mysql-5.7.26/storage/innobase/srv/srv0srv.cc:1250#5 0x00000000019bd5c9 in innodb_show_status (hton=0x2e85bd0, thd=0x7fffe8000c50, stat_print=0xf66cab ) at /mysql/mysql-5.7.26/storage/innobase/handler/ha_innodb.cc:15893#6 0x00000000019bdf35 in innobase_show_status (hton=0x2e85bd0, thd=0x7fffe8000c50, stat_print=0xf66cab , stat_type=HA_ENGINE_STATUS) at /mysql/mysql-5.7.26/storage/innobase/handler/ha_innodb.cc:16307

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report