Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

A case thoroughly understands how to use the mysql inndb federated index correctly.

2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)06/01 Report--

There is a business that queries the latest 5 approved data

SELECT `id`, `title`FROM `th_content`WHERE `audit_time`

< 1541984478 AND `status` = 'ONLINE'ORDER BY `audit_time` DESC, `id` DESCLIMIT 5; 查看当时的监控情况 cpu 使用率是超过了100%,show processlist看到很多类似的查询都是处于create sort index的状态。 查看该表的结构 CREATE TABLE `th_content` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `title` varchar(500) CHARACTER SET utf8 NOT NULL DEFAULT '' COMMENT '内容标题', `content` mediumtext CHARACTER SET utf8 NOT NULL COMMENT '正文内容', `audit_time` int(11) unsigned NOT NULL DEFAULT '0' COMMENT '审核时间', `last_edit_time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP COMMENT '最近编辑时间', `status` enum('CREATED','CHECKING','IGNORED','ONLINE','OFFLINE') CHARACTER SET utf8 NOT NULL DEFAULT 'CREATED' COMMENT '资讯状态', PRIMARY KEY (`id`), KEY `idx_at_let` (`audit_time`,`last_edit_time`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; 索引有一个audit_time在左边的联合索引,没有关于status的索引。 分析上面的sql执行的逻辑: 从联合索引里找到所有小于该审核时间的主键id(假如在该时间戳之前已经审核了100万条数据,则会在联合索引里取出对应的100万条数据的主键 id)未来如果有一个优化就好了,目前还有:对100个主键 id 排序,然后在下面一步回表操作中挨得近的主键可能一次磁盘 I/O 就都取到了逐个回表,查出100万行记录,筛选出status='ONLINE'的行记录最后对查询的结果进行排序(假如有50万行都是ONLINE,则继续对这50万行进行排序) 最后因为数据量很大,虽然只取5行,但是按照我们刚刚举的极端例子,实际查询了100万行数据,而且最后还在内存中进行了50万行数据库的内存排序。 所以是非常低效的。 画了一个示意图,说明第一步的查询过程,粉红色部分表示最后需要回表查询的数据行。 图中我按照索引存储规律来YY伪造填充了一些数据,如有不对请留言指出。希望通过这张图大家能够看到联合索引存储的方式和索引查询的方式 改进思路 1 范围查找向来不太好使用好索引的,如果我们增加一个audit_time, status的联合索引,会有哪些改进呢? ALTER TABLE `th_content` ADD INDEX `idx_audit_status` (`audit_time`, `status`);mysql>

explain select `id`, `title` from `th_content` where `audit_time` < 1541984478 and `status` = 'ONLINE' order by `audit_time` desc, `id` desc limit 5;+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+| 1 | SIMPLE | th_content | range | idx_at_ft_pt_let,idx_audit_status | idx_audit_status | 4 | NULL | 209754 | Using where |+----+-------------+------------+-------+------------------------------------------+------------------+---------+------+--------+-------------+

Details: Because audit_time is a range lookup, the index of the second column is not used, only audit_time can be used, so key_len is 4. In Idea 2 below, these two fields key_len is 5.

Or analyze the execution process after adding the index:

Find the union index of the row with the largest audit_time less than the audit time and then look down in turn, because < audit_time is a range lookup, and the second column index values are scattered. So you need to look forward in turn and match the index rows that meet the condition (status='ONLINE') until you get to the fifth row. Back to the table to query the specific data required

In the diagram above, pink identifies rows that meet the index requirements of the first column. Query forward in turn. Three records are filtered on this leaf node. Then you need to continue to the left and continue to query the previous leaf node. Until you find 5 rows that satisfy the record, and finally return to the table.

be desired

Because there is a status value in the index, there is no need to return to the table query when filtering rows that satisfy status='ONLINE'. Only 5 rows of data are queried when returning to the table, which will be greatly reduced on iops.

Disadvantages of this index

If all 5 lines scanned in idx_audit_status are status ONLINE, then only 5 lines need to be scanned;

If only 4 of the first 1 million rows scanned in idx_audit_status are ONLINE, 1 million and 1 row need to be scanned to obtain the required 5 rows. The number of rows the index needs to scan is uncertain.

Improved thinking 2

ALTER TABLE `th_content` DROP INDEX `idx_audit_status`;ALTER TABLE `th_content` ADD INDEX `idx_status_audit` (`status`, `audit_time`);

This way, no matter whether it is sorting or returning to the table, there is no pressure.

summary

The above is all the content of this article, I hope the content of this article for everyone's study or work has a certain reference learning value, thank you for your support. If you want to know more about this, please check out the links below.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report