Learn ICP, MRR and BKA of MYSQL 07/09 Update SLTechnology News&Howtos

Learn ICP, MRR and BKA of MYSQL

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Index Condition Pushdown (ICP) Index Condition Pushdown (ICP) is an optimized way for mysql to retrieve row data from a table using an index.

ICP principle disables ICP, and the storage engine locates the rows in the base table by traversing the index, then returns them to the MySQL Server layer, and then filters the WHERE conditions for these data rows.

If ICP is enabled, if some WHERE conditions can use the fields in the index, MySQL Server will push this part down to the storage engine layer, and the storage engine will filter through the index to read the rows that meet the requirements from the table. ICP can reduce the number of times the engine layer accesses the base table and the MySQL Server accesses the storage engine.

The goal of ICP is to reduce the number of full record reads from the base table, thereby reducing IO operations

For InnoDB tables, ICP applies only to secondary indexes.

ICP identifies that when ICP optimization is used, the Extra column of the execution plan displays the Using indexcondition prompt

Related parameter optimizer_switch= "index_condition_pushdown=on"; applicable scenario # auxiliary index INDEX (zipcode, lastname, firstname).

SELECT * FROM peopleWHERE zipcode='95054'AND lastname LIKE'% etrunia%'AND address LIKE'% Main Street%'

The People table has a secondary index INDEX (zipcode, lastname, firstname). Users only know a user's zipcode, and approximate lastname and address, and want to query relevant information at this time.

If ICP: is not used, all zipcode='95054' data is fetched from the base table through the value of zipcode in the secondary index, and then the server layer filters lastname LIKE'% etrunia%'AND address LIKE'% Main Street%';.

If you use ICP:, the filtering operation of lastname LIKE'% etrunia%'AND address LIKE'% Main Street%' is completed in the secondary index, and then go to the base table to fetch the relevant data.

Use restriction

L only supports select statements

Only MyISAM and InnoDB engines are supported in l5.6

ICP; that does not support partitioned tables in l5.6 supports ICP of partitioned tables from MySQL 5.7.3.

The optimization strategy of ICP can be used for data access methods of range, ref, eq_ref and ref_or_null types.

L the primary indexed ICP is not supported (for the Innodb clustered index, the complete record has been read to Innodb Buffer, using ICP does not reduce the IO operation)

L when SQL uses an override index but retrieves only part of the data, ICP cannot use the

The acceleration effect of ICP depends on the percentage of data filtered out by ICP within the storage engine.

The full name of Multi-Range Read (MRR) MRR is Multi-Range Read Optimization, which is a means by which the optimizer converts random IO into sequential IO to reduce IO overhead in the query process.

MRR principle select non_key_column from tb where ey_column=x

When the MRR feature is not used, the pseudo code for MySQL to execute the query first gets the collection of secondary indexes and primary keys based on the secondary index in the where condition, and the result set is rest.

Select key_column, pk_column from tb where key_column=x order by key_column

The second step is to obtain the corresponding value through the primary key obtained in the first step.

For each pk_column value in rest do:

Select non_key_column from tb where pk_column=val

When using the MRR feature, the pseudo code for MySQL to execute the query first gets the collection of secondary indexes and primary keys according to the secondary index in the where condition, and the result set is rest

Select key_column, pk_column from tb where key_column = x order by key_column

The second step is to put the result set rest in the buffer (read_rnd_buffer_size size until buffer is full), then sort the result set rest according to pk_column, and get the result set rest_sort.

The third step is to access the data in the table using the sorted result set, which is sequential IO.

Select non_key_column fromtb where pk_column in (rest_sort)

To sum up, when not using MRR, the optimizer needs to "return the table" based on the records returned by the secondary index, which generally has more random IO. When using MRR, the execution process of the SQL statement is as follows:

1) the optimizer puts the records queried by the secondary index into a buffer

2) if the secondary index scans to the end of the file or the buffer is full, use quick sort to sort the contents of the buffer by the primary key

3) the user thread calls the MRR API to fetch cluster index, and then fetches row data according to cluster index

4) when the data is fetched according to the cluster index in the buffer, continue to call procedure 2) 3) until the scan is completed

Through the above process, the optimizer sorts the random IO of the secondary index and transforms it into the ordered arrangement of primary keys, thus realizing the conversion from random IO to sequential IO and improving performance.

In addition, MRR can also split some range queries into key-value pairs for batch data query, as follows: SELECT * FROM tWHERE key_part1 > = 1000 AND key_part1

< 2000AND key_part2 = 10000; 表t上有二级索引(key_part1, key_part2)，索引根据key_part1,key_part2的顺序排序。若不使用MRR：索引扫描会将key_part1在1000到2000的索引元组，而不管key_part2的值，这样对key_part2不等于10000的索引元组也做了额外的扫描。此时扫描的范围是： [{1000, 10000}, {2000, MIN_INT}]此间隔可能包含key_part2不等于10000的部分若使用MRR：扫描则分为多个范围，对于每一个Key_part1(1000,1001…,1999)单个值的扫描只需要扫描索引中key_part2为10000的元组。如果索引中包含很多key_part2不为10000的元组，最终MRR的效果越好。MRR扫描的范围是多个单点间隔[{1000, 10000}], ..., [{1999, 10000}] 此间隔只包含key_part2=10000的部分。 MRR标识当使用ICP优化时，执行计划的Extra列显示Using MRR提示相关参数用optimizer_switch 的标记来控制是否使用MRR.设置mrr=on时，表示启用MRR优化。 mrr_cost_based表示是否通过cost base的方式来启用MRR. 当mrr=on,mrr_cost_based=on,则表示cost base的方式还选择启用MRR优化,当发现优化后的代价过高时就会不使用该项优化当mrr=on,mrr_cost_based=off,则表示总是开启MRR优化 SET @@optimizer_switch='mrr=on,mrr_cost_based=on'; 参数read_rnd_buffer_size 用来控制键值缓冲区的大小。二级索引扫描到文件的末尾或者缓冲区已满，则使用快速排序对缓冲区中的内容按照主键进行排序适用场景#辅助索引key_part1，查询key_part1在1000到2000范围内的数据 SELECT * FROM t WHERE key_part1 >

= 1000 AND key_part1 < 2000

Do not use MRR: first fetch the key_part1,pk_col order by key_part1 that meets the condition through the key_part1 field of the secondary index. Then the data that meets the conditions is extracted from the table through pk_col. At this time, because the extracted pk_col is out of order, and the table stores the data in pk_col, when the data is fetched from the table, a large number of random IO will be generated.

Using MRR: first fetch the key_part1,pk_col order by key_part1 that meets the condition through the key_part1 field of the secondary index. Put it into the cache (read_rnd_buffer_size). When the corresponding cache is full, sort the key values according to pk_col, and then go to fetch the data in the table according to the sorted reset. At this time, the pk_col1 is sequential, the random IO is converted into sequential IO, and the multi-page data records can be read in one time or several times according to the primary key range, so as to reduce the IO operation and improve the query efficiency.

Use restrictions on MRR applicable to queries of range, ref, and eq_ref

Batched Key Access (BKA) and Block Nested-Loop (BNL) Batched Key Access (BKA)-algorithms that improve the performance of table join.

When the table by join can be indexed, arrange the order first, and then retrieve the table by join. It sounds similar to MRR. In fact, MRR can also be thought of as the join of secondary index and primary key.

If there is no index on the table being Join, the old version of BNL policy (BLOCK Nested-loop) is used.

BKA principle

For multi-table join statements, when MySQL accesses the second join table using the index, a join buffer is used to collect the relevant column values generated by the first Operand. After the key is built by BKA, it is passed to the engine layer in batch for index search. The key is submitted to the engine through the MRR interface (mrr is intended to be sequential). In this way, MRR makes the query more efficient.

The general process is as follows:

1 BKA uses join buffer to save eligible data generated by the first operation of join

2 then the BKA algorithm builds key to access the connected table, and uses the MRR interface to submit the keys to the database storage engine for lookup.

3 after submitting the keys, MRR uses the best way to get rows and feedback to BKA

Both BNL and BKA submit some rows to the join table in batches, thus reducing the number of visits, so what's the difference between them?

The first BNL appeared earlier than BKA. BKA did not appear until 5.6, while NBL existed at least in 5.1.

The second BNL is mainly used when there is no index on the table being join.

The third BKA mainly means that there is an index on the join table that can be used, then the rows are sorted by the index field just before they are submitted to the table by join, thus reducing the random IO, sorting this is the biggest difference between the two, but what if the table is not indexed by join? Then use NBL.

BKA and BNL identify Using join buffer (Batched Key Access) and Using join buffer (Block Nested Loop)

Related parameters

BAK uses MRR. If you want to use BAK, you must turn on MRR, while MRR's mrr_cost_based-based cost estimation does not guarantee that MRR will always be used. It is officially recommended to set mrr_cost_based=off to always enable MRR. Turn on the BAK feature (BAK default OFF):

SET optimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on'

BKA uses join buffer size to determine the size of the buffer, and the larger the buffer, the more sequentially the tables / internal tables being join are accessed.

BNL is enabled by default. Set parameters related to BNL:

SET optimizer_switch='block_nested_loop' scenarios support inner join, outer join, semi-join operations,including nested outer joins

BKA is mainly suitable for join. If there is an index on the table, you can only use BNL without an index.

Simple summary

The following partial sources: http://www.cnblogs.com/zhoujinyi/p/4746483.html

ICP Optimization (Index Condition Pushdown)

Index Condition Pushdown (ICP) is an optimization in which MySQL uses the index to fetch data from the table. Before disabling ICP (before MySQL5.6), the engine layer uses the index to find data rows in the base table, then returns them to the MySQL Server layer, and then filters the WHERE conditions for those data rows (back to the table). When ICP is enabled (after MySQL5.6), if some of the WHERE conditions can use the fields in the index, MySQL will push this part down to the engine layer. The storage engine reads the satisfied rows from the table by using the index. ICP reduces the number of times the engine layer accesses the base table and the MySQL Server accesses the storage engine. In a word, the optimization of ICP can filter out a large amount of data at the engine layer, reduce the number of io, and improve the performance of query statements.

MRR Optimization (Multi-Range Read)

Multi-Range Read Multi-range read (MRR), which is based on the query of secondary / second index, reduces random IO, and converts random IO into sequential IO to improve query efficiency. Before there is no MRR (before MySQL5.6), the collection of secondary index and primary key is obtained according to the secondary index in the where condition, and then the corresponding value is obtained by the primary key. The primary key acquired by the secondary index to access the data in the table will lead to random IO (the storage order of the secondary index is not the same as the order of the primary key). When different primary keys are not in the same page, it will inevitably lead to multiple IO and random reads. After using MRR optimization (after MySQL5.6), first get the collection of secondary index and primary key according to the secondary index in the where condition, then put the result set in buffer (read_rnd_buffer_size size until buffer is full), and then sort the result set according to pk_column to get an ordered result set rest_sort. Finally, the data in the table is accessed using the sorted result set, which in this case is the sequential IO. That is, MySQL sorts the result sets obtained according to the secondary index according to the primary key, turns disorder into order, accesses the base table sequentially with the primary key, converts random reading into sequential reading, and multi-page data records can be read in one time or in stages according to the primary key range, so as to reduce IO operations and improve query efficiency.

Nested Loop Join algorithm:

Take the result set of the driving table / external table as the cyclic basic data, then loop the result set, get one piece of data at a time as the filter condition query data of the next table, and then merge the results to get the result set and return it to the client. Nested-Loop passes only one line into the inner loop at a time, so the memory loop is executed as many times as there are as many rows in the outer loop, which is very inefficient.

Block Nested-Loop Join algorithm:

The row / result set of the outer loop is stored in join buffer, and each row of the inner loop is compared with the record in the entire buffer, thus reducing the number of inner loops. Mainly used when there is no index on the table being join.

Batched Key Access algorithm:

When the tables being join can be indexed, they are sorted first, and then the tables that are join are retrieved. These rows are sorted by index fields, thus reducing random IO. If there is no index on the table being Join, the old version of the BNL policy (BLOCK Nested-loop) is used.

Reference:

Http://dev.mysql.com/doc/refman/5.7/en/select-optimization.html

Http://blog.itpub.net/22664653

Http://www.kancloud.cn/taobaomysql/monthly/117959

Http://www.kancloud.cn/taobaomysql/monthly/67181

Http://www.cnblogs.com/zhoujinyi/p/4746483.html

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.