What's the use of MySQL's ref? 04/28 Update SLTechnology News&Howtos

What's the use of MySQL's ref?

2025-04-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly introduces "what is the use of MySQL ref". In daily operation, I believe many people have doubts about what is the use of MySQL ref. Xiaobian consulted all kinds of information and sorted out simple and easy operation methods. I hope to help you answer the question of "what is the use of MySQL ref"! Next, please follow the small series to learn together!

Recall the query cost

For a query, sometimes it can be performed through different indexes or full table scans. MySQL optimizer analyzes how many records need to be scanned when using each index by using pre-generated statistics or a small number of accesses to the B+ tree index, then calculates the query cost of using different indexes, and finally selects the one with the lowest cost to execute the query.

create a scene

Suppose we now have a table t with the following table structure:

CREATE TABLE t ( id INT NOT NULL AUTO_INCREMENT PRIMARY KEY, key1 VARCHAR(100), common_field VARCHAR(100), INDEX idx_key1 (key1) ) ENGINE=InnoDB CHARSET=utf8;

This table contains three columns:

id column is self-increasing primary key

The key1 column is used to store strings, and we create a normal secondary index for the key1 column.

The common_field column is used to store strings

There are now 10000 records in the table:

mysql> SELECT COUNT(*) FROM t; +----------+ | COUNT(*) | +----------+ | 10000 | +----------+ 1 row in set (2.65 sec)

There are 2310 records where key1 is listed as 'a':

mysql> SELECT COUNT(*) FROM t WHERE key1 = 'a'; +----------+ | COUNT(*) | +----------+ | 2310 | +----------+ 1 row in set (0.83 sec)

There are also 2310 records in the key1 column between 'a' and 'i':

mysql> SELECT COUNT(*) FROM t WHERE key1 > 'a' AND key1

< 'i'; +----------+ | COUNT(*) | +----------+ | 2310 | +----------+ 1 row in set (1.31 sec) 现在我们有如下两个查询：查询1：SELECT * FROM t WHERE key1 = 'a'; 查询2：SELECT * FROM t WHERE key1 >

'a' AND key1

< 'i'; 按理说上边两个查询需要扫描的记录数量是一样的，MySQL查询优化器对待它们的态度也应该是一样的，也就是要么都使用二级索引idx_key1执行它们，要么都使用全表扫描的方式来执行它们。不过现实是貌似查询优化器更喜欢查询1，而比较讨厌查询2。查询1的执行计划如下所示： # 查询1的执行计划 mysql>

EXPLAIN SELECT * FROM t WHERE key1 = 'a'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t partitions: NULL type: ref possible_keys: idx_key1 key: idx_key1 key_len: 303 ref: const rows: 2310 filtered: 100.00 Extra: NULL 1 row in set, 1 warning (0.04 sec)

The execution plan for Query 2 is as follows:

#Execution plan for query 2 mysql> EXPLAIN SELECT * FROM t WHERE key1 > 'a' AND key1

< 'i'\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: t partitions: NULL type: ALL possible_keys: idx_key1 key: NULL key_len: NULL ref: NULL rows: 9912 filtered: 23.31 Extra: Using where 1 row in set, 1 warning (0.03 sec) 很显然，查询优化器决定使用idx_key1二级索引执行查询1，而使用全表扫描来执行查询2。为什么?凭什么?同样是扫描相同数量的记录，凭什么我range访问方法就要比你ref低一头?设计MySQL的大叔，你为何这么偏心... 解密偏心原因世界上没有无缘无故的爱，也没有无缘无故的恨。这事儿还得从索引结构说起。比方说idx_key1二级索引结构长这样：原谅我们把索引对应的B+树结构弄了一个极度精简版，我们忽略掉了页的结构，只保留了叶子节点的记录。虽然极度精简，但是我们还是保留了一个极其重要的特性：B+树叶子节点中的记录是按照索引列的值从小到大排序的。对于二级索引idx_key1来说：二级索引叶子节点的记录只保留key1列和id列二级索引记录是先按照key1列的值从小到大的顺序进行排序的。如果key1列的值相同，则按照主键值，也就是id列的值从小到大的顺序进行排序。也就是说，对于所有key1值为'a'的二级索引记录来说，它们都是按照id列的值进行排序的。对于查询1：查询1: SELECT * FROM t WHERE key1 = 'a'; 由于查询列表是* ，也就是说我们需要通过读取到的二级索引记录的id值执行回表操作，到聚簇索引中找到完整的用户记录(为了去获取common_field列的值)后才可以将记录发送到客户端。对于所有key1列值等于'a'的二级索引记录，由于它们是按照id列的值排序的，所以：前一次回表的id值所属的聚簇索引记录和下一次回表的id值所属的聚簇索引记录很大可能在同一个数据页中即使前一次回表的id值所属的聚簇索引记录和下一次回表的id值所属的聚簇索引记录不在同一个数据页中，由于回表的id值是递增的，所以我们很大可能通过顺序I/O的方式找到下一个数据页，也就是说这个过程中很大可能不需要很大幅度的移动磁头就可以找到下一个数据页。这可以减少很多随机I/O带来的性能开销。综上所述，执行语句1时，回表操作带来的性能开销较小。而对于查询2来说：查询2: SELECT * FROM t WHERE key1 >

'a' AND key1 < 'i';

Because the id values corresponding to the secondary index records to be scanned are out of order, when performing the table return operation, the data pages where the clustered index records to be accessed are likely to be out of order, which will cause a lot of random I/O. So if you use idx_key1 to execute query 1 and query 2, the cost of executing query 1 will obviously be lower than query 2, which is why the uncle who designed MySQL prefers ref to range.

Internal implementation of MySQL

MySQL Optimizer calculates the cost of table retrieval. In the case of query execution using secondary index and table retrieval, ref and range are obviously treated differently:

For range, the number of secondary index records that need to be scanned is equivalent to the number of pages that need to be accessed. For each page visited, the I/O cost of returning to the table is increased by 1.

For example, for query 2, the number of records that need to be returned to the table is 2310, and the I/O cost calculated because of the return operation is 2310.

For ref, there is a ceiling on the I/O cost caused by table return overhead, that is, an upper limit is defined:

double worst_seeks;

The upper limit is the smaller of two values:

For example, for query 1, the number of records returned to the table is 2310, and it is reasonable to calculate the I/O cost caused by the table return operation should be 2310. However, since there is a ceiling in calculating the I/O cost of the table return operation for the ref access method, the smaller one, in this case 291, is chosen from one tenth of the total table records (i.e. 9912/10=991, 9912 is the estimate) and three times the number of pages occupied by the cluster index (in this case 97 pages occupied by the cluster index, multiplied by 3 is 291).

1/10 of the total number of records in the table (the total number of records in the table is statistical data and is an estimate)

Cluster index occupies 3 times as many pages

Tip: In the code of cost analysis, range, index and all are classified into a class, ref is a parent son, and a wave is analyzed separately. However, we can also see that when designing MySQL uncle to calculate the cost of the range access method, he directly thinks that each time he goes back to the table, he needs to perform a page I/O, which is very rude, not to mention that our actual clustered index has a total of 97 pages, but it calculates the cost of returning to the table as 2310, which is also very inaccurate.

At this point, the study of "what is the use of MySQL ref" is over, hoping to solve everyone's doubts. Theory and practice can better match to help everyone learn, go and try it! If you want to continue learning more relevant knowledge, please continue to pay attention to the website, Xiaobian will continue to strive to bring more practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.