Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Section 24 data lookup and parameter slave_rows_search_algorithms from the library

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

Lookup from library data and parameter slave_rows_search_algorithms

Note: this article is Section 24 of "an in-depth understanding of Section 32 of MySQL's Master-Slave principle"

This section contains a note as follows:

Https://www.jianshu.com/p/5183fe0f00d8

We already know that for the DML statement, changes to its data will be placed in the corresponding Event. For example, the 'Delete' statement will put all the before_image of deleted data into the DELETE_ROWS_EVENT. Just read these before_image from the library to find the data, and then call the corresponding' Delete' 'operation to delete the data. Let's discuss how to do data lookup from the library.

In this section we assume that the parameter binlog_row_image is set to 'FULL', which is the default value, and the impact of the binlog_row_image parameter is described in section 11.

For more master-slave synchronization, please refer to my "in-depth understanding of MySQL Master-Slave principle Section 32" column:

If the picture cannot be displayed, please see the following link:

Https://www.jianshu.com/p/d636215d767f

Starting from a Liezi

Before we begin, let's assume that the parameter 'slave_rows_search_algorithms'' is the default, that is:

TABLE_SCAN,INDEX_SCAN

Because this parameter will directly affect the use of the index.

Let's take the 'Delete' operation' as an example, in fact, the same is true for the index selection 'Update' operation, because the data is found through before_image. The table structure, data, and operations I tested are as follows:

Mysql > show create table tkkk\ gateway * 1. Row * * Table: tkkkCreate Table: CREATE TABLE `tkkk` (`a` int (11) DEFAULT NULL, `b` int (11) DEFAULT NULL, `c` int (11) DEFAULT NULL, KEY `a` (`a`) ENGINE=InnoDB DEFAULT CHARSET=utf81 row in set (0.00 sec) mysql > select * from tkkk +-+ | a | b | c | +-+ | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 4 | 4 | 5 | 5 | 6 | 6 | 6 | 7 | 7 | 7 | 8 | 8 | | 8 | | 9 | 9 | 9 | 10 | 10 | 10 | 11 | 11 | 11 | 12 | 12 | 13 | 13 | 13 | 15 | 15 | 16 | 16 | 15 | 17 | +-+ 16 rows in set (2.21 sec) mysql > delete from tkkk where astat15 | Query OK, 3 rows affected (6.24 sec) because I did the debug index here seems to be a long time.

For such a 'Delete' statement, the main database will use the index KEY a. We actually only need to locate the index once for the deleted three pieces of data (see the btr_cur_search_to_nth_level function), and then sequentially scan the next data for deletion. The approximate process is shown in the following figure:

The before_image of the three pieces of data deleted by this data will be recorded in a DELETE_ROWS_EVENT. Which index should be used will be re-evaluated when applied from the library, giving priority to primary and unique keys. Index positioning operation is required for each piece of data in Event, and for non-unique indexes, the first row returned for the first time may not be deleted data, and the next row needs to be scanned. There is the following code in the function Rows_log_event::do_index_scan_and_update:

While (record_compare (m_table, & m_cols)) / / compare each field if it is not equal scan the next line {while ((error= next_record_scan (false) / / scan the next line {/ * We just skip records that has already been deleted * / if (error= = HA_ERR_RECORD_DELETED) continue; DBUG_PRINT ("info", ("no record matching the given row found")) Goto end;}}

These costs are higher than the main library. There are no primary and unique keys in this column, so the index KEY an is still used, as shown in the following figure:

But if we add a primary key to the slave library, the process for applying the slave library is as follows:

Judging from the above process, the main difference between the master library 'Delete' operation and the slave library' Delete' operation is:

Each piece of data from the library requires index positioning to find data. The first piece of data found from the library through a non-unique index in some cases may not be deleted, so index positioning and lookup needs to continue.

Generally speaking, only one data location search is needed for the main database, and then access to the next piece of data is fine. In fact, there is not much difference for the real delete operation. If the primary key and unique key are used properly, the impact of the two points mentioned above can be reduced. In the case of slave library delay, the lack of reasonable use of primary keys and unique keys is a more important reason.

Finally, if there is no index on the table, the situation becomes more serious, as shown in the simple figure:

We can see that changes to each row of data require a full table scan, and this problem is very serious. The use of the HASH_SCAN option of the parameter 'slave_rows_search_algorithms' may improve performance in this case, which we will discuss below.

2. Confirm the way to find data

In the previous example, we touched on the parameter 'slave_rows_search_algorithms', which is mainly used to confirm how to find the data. The values can be the following combinations (from official documents), which are reflected in the source code as a bitmap:

TABLE_SCAN,INDEX_SCAN (default) INDEX_SCAN,HASH_SCANTABLE_SCAN,HASH_SCANTABLE_SCAN,INDEX_SCAN,HASH_SCAN

The following instructions are found in the source code, as well as in the official documents:

/ * Decision table:-I-> Index scan / search-T-> Table scan-Hi-> Hash over index-Ht-> Hash over the entire table |-+-| Index\ Option | I, T, H | I, T | I, H | T H | |-+-| | competitive / UK | I | Hi | K | Hi | I | Hi | Hi | | No Index | Ht | T | Ht | Ht | | |-+-| * / |

In fact, there are three ways to find data in the source code, which are:

ROW_LOOKUP_INDEX_SCAN

Corresponding function interface: Rows_log_event::do_index_scan_and_update

ROW_LOOKUP_HASH_SCAN

Corresponding function interface: Rows_log_event::do_hash_scan_and_update

It also includes:

(1) Hi-> Hash over index

(2) Ht-> Hash over the entire table

Discussed later

ROW_LOOKUP_TABLE_SCAN

Corresponding function interface: Rows_log_event::do_table_scan_and_update

In the source code is as follows:

Switch (m_rows_lookup_algorithm) / / decide which method to use according to different algorithms {case ROW_LOOKUP_HASH_SCAN: do_apply_row_ptr= & Rows_log_event::do_hash_scan_and_update; break; case ROW_LOOKUP_INDEX_SCAN: do_apply_row_ptr= & Rows_log_event::do_index_scan_and_update; break Case ROW_LOOKUP_TABLE_SCAN: do_apply_row_ptr= & Rows_log_event::do_table_scan_and_update; break

The decision on how to find the data and which index to look through is determined by the setting of the parameter 'slave_rows_search_algorithms' and whether there is an appropriate index in the table, not entirely by the' slave_rows_search_algorithms' parameter.

The following figure is the decision process, you can refer to the function decide_row_lookup_algorithm_and_key (figure 24-1, the high-definition original image is included in the original image at the end of the article).

Third, data search in ROW_LOOKUP_HASH_SCAN mode

In general, this approach is different from ROW_LOOKUP_INDEX_SCAN and ROW_LOOKUP_TABLE_SCAN in that it compares the data in the table to the data in Event, not the data in Event to the data in the table, which we will describe in detail below.

Assuming that we set the parameter 'slave_rows_search_algorithms' to INDEX_SCAN,HASH_SCAN and there are no primary and unique keys on the table, the flow in the figure above will set the data lookup method to ROW_LOOKUP_HASH_SCAN.

In ROW_LOOKUP_HASH_SCAN, there are two ways to find data:

Hi-> Hash over indexHt-> Hash over the entire table

As far as ROW_LOOKUP_HASH_SCAN is concerned, it will first read out each row of data in Event and store it in the HASH structure. If Hi can be used, it will also maintain an additional collection (set) and store the index key value in the collection as the basis for index scanning. If there is no index to this collection (set) will not maintain the direct use of full table scans, that is, Ht.

Ht-> Hash over the entire table scans the entire table, where each row queries the hash structure to compare the data. Hi-> Hash over index will scan the index location through the set we mentioned earlier, and each row of data will also query the hash structure to compare the data.

It is important to note that the unit of this process is Event. We said earlier that a DELETE_ROWS_EVENT may contain multiple rows of data, and the maximum Event is about 8K. Therefore, using the method of Ht-> Hash over the entire table, it will change from one full table scan per row of data to one full table scan per Event.

But for Hi-> Hash over index, the effect is not so obvious, because if the deleted data duplication value is very small, you still need enough index location lookups, but if you delete more data duplication values, then the constructed set (set) elements will be greatly reduced, which will reduce the cost of index lookup location.

On the other hand, if my delete statement deletes only one row of data at a time instead of delete a large amount of data, then there is only one piece of data per DELETE_ROWS_EVENT, so using ROW_LOOKUP_HASH_SCAN will not improve performance, because this data still requires a full table scan or index location to find the data, which is no different from the default way.

For the whole process, refer to the following APIs:

Rows_log_event::do_hash_scan_and_update: master interface, call the following two interfaces. Rows_log_event::do_hash_row: add data to the hash structure and maintain the collection (set) if there is an index. Rows_log_event::do_scan_and_update: find and delete, and call Rows_log_event::next_record_scan for data lookup. Rows_log_event::next_record_scan: the specific search method implements the search methods of Hi-> Hash over index and Ht-> Hash over the entire table

Let's use the original column, we delete three pieces of data, so there are three pieces of data in the DELETE_ROW_EVENT. Suppose our parameter 'slave_rows_search_algorithms' is set to INDEX_SCAN,HASH_SCAN. Because I don't have primary and unique keys in my table, I end up using ROW_LOOKUP_HASH_SCAN for data lookups. But because we have an index key a, we use Hi-> Hash over index. In order to better describe Hi and Ht, we also assume that the other case is that there is no index on the table. I put the two methods into a diagram to make it easier for you to find the differences, as shown in the following figure (figure 24-2, the original HD image is included in the original image at the end of the article):

IV. Summary

I remember a friend once asked me that the primary library does not have a primary key. If I am building a primary key from the slave library, can it reduce the delay? Here we know that the answer is yes, because the slave library will choose to use the index based on the row data in the Event. So to sum up:

Setting HASH_SCAN for the slave_rows_search_algorithms parameter does not necessarily improve performance. Performance will only be improved if the following two conditions are met:

(1) (there is no index in the table) or (there is an index and there are many duplicate values of data keywords in this update/delete).

(2) A update/delete statement deletes a large amount of data, forming a lot of UPDATE_ROW_EVENT/DELETE_ROW_EVENT about 8K. Update/delete statements that modify only a small amount of data (such as modifying a row of data per statement) do not improve performance.

The utilization of slave database index is self-judged, and the order is primary key-> unique key-> general index.

If the slave_rows_search_algorithms parameter is not set HASH_SCAN, and there is no primary key / unique key, then the performance will degrade sharply and cause delay. This situation is even more serious if there is no index, because each row of data changed causes a full table scan.

So we found that there is one more reason to force primary keys in MySQL.

The end of section 24

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report