Lookup from library data and parameter slave_rows_search_alg 07/06 Update SLTechnology News&Howtos

Lookup from library data and parameter slave_rows_search_alg

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In this section, we assume that the parameter binlog_row_image is set to 'FULL', which is the default value.

Starting from a Liezi

Before we begin, let's assume that the parameter 'slave_rows_search_algorithms'' is the default, that is:

TABLE_SCAN,INDEX_SCAN

Because this parameter will directly affect the use of the index.

Let's take the 'Delete' operation' as an example, in fact, the same is true for the index selection 'Update' operation, because the data is found through before_image. The table structure, data, and operations I tested are as follows:

Mysql > show create table tkkk\ G

* * 1. Row *

Table: tkkk

Create Table: CREATE TABLE `tkkk` (

`a`int (11) DEFAULT NULL

`b` int (11) DEFAULT NULL

`c`int (11) DEFAULT NULL

KEY `a` (`a`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

1 row in set (0.00 sec)

Mysql > select * from tkkk

+-+

| | a | b | c | |

+-+

| | 1 | 1 | 1 |

| | 2 | 2 | 2 |

| | 3 | 3 | 3 |

| | 4 | 4 | 4 |

| | 5 | 5 | 5 |

| | 6 | 6 | 6 |

| | 7 | 7 | 7 |

| | 8 | 8 | 8 |

| | 9 | 9 | 9 |

| | 10 | 10 | 10 | |

| | 11 | 11 | 11 |

| | 12 | 12 | 12 |

| | 13 | 13 | 13 |

| | 15 | 15 | 15 |

| | 15 | 16 | 16 | |

| | 15 | 17 | 17 |

+-+

16 rows in set (2.21 sec)

Mysql > delete from tkkk where axi15

Query OK, 3 rows affected (6.24 sec)

Because I did the debug index. It looks like a long time here.

For such a 'Delete' statement, the main database will use the index KEY a. We actually only need to locate the index once for the deleted three pieces of data (see the btr_cur_search_to_nth_level function), and then sequentially scan the next data for deletion. The approximate process is shown in the following figure:

The before_image of the three pieces of data deleted by this data will be recorded in a DELETE_ROWS_EVENT. Which index should be used will be re-evaluated when applied from the library, giving priority to primary and unique keys. Index positioning operation is required for each piece of data in Event, and for non-unique indexes, the first row returned for the first time may not be deleted data, and the next row needs to be scanned. There is the following code in the function Rows_log_event::do_index_scan_and_update:

While (record_compare (m_table, & m_cols)) / / compare each field if it is not equal to scan the next line

{

While ((error= next_record_scan (false) / / scan the next line

{

/ * We just skip records that has already been deleted * /

If (error = = HA_ERR_RECORD_DELETED)

Continue

DBUG_PRINT ("info", ("no record matching the given row found"))

Goto end

}

These costs are higher than the main library. There are no primary and unique keys in this column, so the index KEY an is still used, as shown in the following figure:

But if we add a primary key to the slave library, the process for applying the slave library is as follows:

Judging from the above process, the main difference between the master library 'Delete' operation and the slave library' Delete' operation is:

Each piece of data from the library requires index positioning to find data.

The first piece of data found from the library through a non-unique index in some cases may not be deleted, so index positioning and lookup needs to continue.

Generally speaking, only one data location search is needed for the main database, and then access to the next piece of data is fine. In fact, there is not much difference for the real delete operation. If the primary key and unique key are used properly, the impact of the two points mentioned above can be reduced. In the case of slave library delay, the lack of reasonable use of primary keys and unique keys is a more important reason.

Finally, if there is no index on the table, the situation becomes more serious, as shown in the simple figure:

We can see that changes to each row of data require a full table scan, and this problem is very serious. The use of the HASH_SCAN option of the parameter 'slave_rows_search_algorithms' may improve performance in this case, which we will discuss below.

2. Confirm the way to find data

In the previous example, we touched on the parameter 'slave_rows_search_algorithms', which is mainly used to confirm how to find the data. The value can be the following combinations (from official documents), and the source code is reflected in a bitmap: Wuxi Gynecology Hospital ranks http://www.0510bhyy.com/.

TABLE_SCAN,INDEX_SCAN (default)

INDEX_SCAN,HASH_SCAN

TABLE_SCAN,HASH_SCAN

TABLE_SCAN,INDEX_SCAN,HASH_SCAN

The following instructions are found in the source code, as well as in the official documents:

/ *

Decision table:

-I-- > Index scan / search

-T-- > Table scan

-Hi-- > Hash over index

-Ht-- > Hash over the entire table

| |-+-|

| | Index\ Option | I, T, H | I, T | I, H | T, H |

| |-+-|

| | competitive / UK | I | I | I | Hi | |

| | K | Hi | I | Hi | Hi | |

| | No Index | Ht | T | Ht | Ht | |

| |-+-|

* /

In fact, there are three ways to find data in the source code, which are:

ROW_LOOKUP_INDEX_SCAN

Corresponding function interface: Rows_log_event::do_index_scan_and_update

ROW_LOOKUP_HASH_SCAN

Corresponding function interface: Rows_log_event::do_hash_scan_and_update

ROW_LOOKUP_TABLE_SCAN

Corresponding function interface: Rows_log_event::do_table_scan_and_update

In the source code is as follows:

Switch (m_rows_lookup_algorithm) / / decide which method to use according to different algorithms

{

Case ROW_LOOKUP_HASH_SCAN:

Do_apply_row_ptr= & Rows_log_event::do_hash_scan_and_update

Break

Case ROW_LOOKUP_INDEX_SCAN:

Do_apply_row_ptr= & Rows_log_event::do_index_scan_and_update

Break

Case ROW_LOOKUP_TABLE_SCAN:

Do_apply_row_ptr= & Rows_log_event::do_table_scan_and_update

Break

The decision on how to find the data and which index to look through is determined by the setting of the parameter 'slave_rows_search_algorithms' and whether there is an appropriate index in the table, not entirely by the' slave_rows_search_algorithms' parameter.

The following figure is the decision process, you can refer to the function decide_row_lookup_algorithm_and_key (figure below).

Third, data search in ROW_LOOKUP_HASH_SCAN mode

In general, this approach is different from ROW_LOOKUP_INDEX_SCAN and ROW_LOOKUP_TABLE_SCAN in that it compares the data in the table to the data in Event, not the data in Event to the data in the table, which we will describe in detail below.

Assuming that we set the parameter 'slave_rows_search_algorithms' to INDEX_SCAN,HASH_SCAN and there are no primary and unique keys on the table, the flow in the figure above will set the data lookup method to ROW_LOOKUP_HASH_SCAN.

In ROW_LOOKUP_HASH_SCAN, there are two ways to find data:

Hi-- > Hash over index

Ht-- > Hash over the entire table

As far as ROW_LOOKUP_HASH_SCAN is concerned, it will first read out each row of data in Event and store it in the HASH structure. If Hi can be used, it will also maintain an additional collection (set) and store the index key value in the collection as the basis for index scanning. If there is no index to this collection (set) will not maintain the direct use of full table scans, that is, Ht.

It is important to note that the unit of this process is Event. We said earlier that a DELETE_ROWS_EVENT may contain multiple rows of data, and the maximum Event is about 8K. Therefore, using the method of Ht-- > Hash over the entire table, you will change from one full table scan per row of data to one full table scan per Event.

But for Hi-- > Hash over index, the effect is not so obvious, because if the deleted data duplication value is very small, you still need enough index location lookups, but if you delete more data duplication values, then the constructed set (set) elements will be greatly reduced, which will reduce the cost of index lookup location.

On the other hand, if my delete statement deletes only one row of data at a time instead of delete a large amount of data, then there is only one piece of data per DELETE_ROWS_EVENT, so using ROW_LOOKUP_HASH_SCAN will not improve performance, because this data still requires a full table scan or index location to find the data, which is no different from the default way.

For the whole process, refer to the following APIs:

Rows_log_event::do_hash_scan_and_update: master interface, call the following two interfaces.

Rows_log_event::do_hash_row: add data to the hash structure and maintain the collection (set) if there is an index.

Rows_log_event::do_scan_and_update: find and delete, and call Rows_log_event::next_record_scan for data lookup.

Rows_log_event::next_record_scan: the specific search method implements the search method of Hi-> Hash over index and Ht-- > Hash over the entire table.

Let's use the original column, we delete three pieces of data, so there are three pieces of data in the DELETE_ROW_EVENT. Suppose our parameter 'slave_rows_search_algorithms' is set to INDEX_SCAN,HASH_SCAN. Because I don't have primary and unique keys in my table, I end up using ROW_LOOKUP_HASH_SCAN for data lookups. But because we have an index key a, we use Hi-- > Hash over index. In order to better describe Hi and Ht, we also assume that the other case is that there is no index on the table. I put the two methods in a diagram to make it easier for you to find the differences, as shown in the following figure:

IV. Summary

I remember a friend once asked me that the primary library does not have a primary key. If I am building a primary key from the slave library, can it reduce the delay? Here we know that the answer is yes, because the slave library will choose to use the index based on the row data in the Event. So to sum up:

Setting HASH_SCAN for the slave_rows_search_algorithms parameter does not necessarily improve performance. Performance will only be improved if the following two conditions are met:

The utilization of slave database index is self-judged, and the order is primary key-> unique key-> general index.

If the slave_rows_search_algorithms parameter is not set HASH_SCAN, and there is no primary key / unique key, then the performance will degrade sharply and cause delay. This situation is even more serious if there is no index, because each row of data changed causes a full table scan.

So we found that there is one more reason to force primary keys in MySQL.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.