What is the operation process of MySQL deep paging problem solving? 04/25 Update SLTechnology News&Howtos

What is the operation process of MySQL deep paging problem solving?

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

Today, I will talk to you about the operation process of MySQL deep paging problem solving, which may not be well understood by many people. in order to make you understand better, the editor has summarized the following content for you. I hope you can get something according to this article.

Preface

When we do paging requirements, we usually use limit, but when the offset is particularly large, the query efficiency becomes inefficient. This article will be divided into four schemes to discuss how to optimize the deep paging of millions of MySQL data, and attach a recent practical case of optimizing the production of slow SQL.

Why does limit deep paging slow down?

Let's take a look at the structure of the following table:

CREATE TABLE account (id int (11) NOT NULL AUTO_INCREMENT COMMENT 'primary key Id', name varchar (255) DEFAULT NULL COMMENT' account name', balance int (11) DEFAULT NULL COMMENT 'balance', create_time datetime NOT NULL COMMENT 'creation time', update_time datetime NOT NULL ON UPDATE CURRENT_TIMESTAMP COMMENT 'update time', PRIMARY KEY (id), KEY idx_name (name) KEY idx_update_time (update_time) / / Index) ENGINE=InnoDB AUTO_INCREMENT=1570068 DEFAULT CHARSET=utf8 ROW_FORMAT=REDUNDANT COMMENT=' account table'

Suppose the execution SQL of deep paging is as follows:

Select id,name,balance from account where update_time > '2020-09-19' limit 100000Bron10

The execution time of this SQL is as follows:

It takes 0.742 seconds to complete, so why does deep paging slow down? If you change it to limit 0J10, it will only take 0.006 seconds.

Let's first take a look at the execution process of this SQL:

Through the ordinary secondary index tree idx_update_time, filter the update_time condition to find the record ID that meets the condition.

Through ID, go back to the primary key index tree, find the row that satisfies the record, and then fetch the displayed column (back to the table)

Scan the 100010 lines that meet the criteria, then throw away the first 100000 lines and return.

The execution process of SQL

The implementation plan is as follows:

SQL slows down for two reasons:

The limit statement scans the offset+n row first, then discards the previous offset row and returns the last n rows of data. In other words, limit 100000re10 scans 100010 lines, while limit 0re10 scans only 10 lines.

Limit 10000010 scans more rows, which also means going back to the table more times.

Optimization through subquery

Because the above SQL, returned to the table 100010 times, in fact, we only need 10 pieces of data, that is, we only need 10 times to return to the table is actually enough. Therefore, we can optimize by reducing the number of returns to the table.

Review the structure of B+ tree

So, how to reduce the number of times to return to the table? Let's review the index structure of B+ tree first.

In InnoDB, indexes are divided into primary key index (clustered index) and secondary index.

Primary key index, the leaf node stores the whole row of data

In the secondary index, the leaf node holds the value of the primary key.

Transfer the condition to the primary key index tree

If we transfer the query conditions back to the primary key index tree, we can reduce the number of times we return to the table. Transfer to the primary key index tree query, the query conditions have to be changed to the primary key id, before the SQL update_time these conditions how to do? Draw it to the sub-query.

How do you get it in the sub-query? Since the secondary index leaf node has the primary key ID, we can directly look up the primary key ID based on update_time. At the same time, we also transfer the condition of limit 100000 to the subquery. The complete SQL is as follows:

Select id,name,balance FROM account where id > = (select a.id from account a where a.update_time > = '2020-09-19' limit 100000, 1) LIMIT 10

The query effect is the same, the execution time is only 0.038 seconds!

Let's take a look at the implementation plan.

According to the execution plan, the subquery table a query uses the idx_update_time index. First of all, get the primary key ID of the clustered index on the index, omitting the operation of returning to the table, and then the second query directly according to the ID of the first query to check 10 more!

Therefore, this scheme is possible.

INNER JOIN delay correlation

The optimization idea of deferred association is actually the same as that of subqueries: the conditions are transferred to the primary key index tree, and then reduced back to the table. The difference is that the deferred association uses inner join instead of the subquery.

The optimized SQL is as follows:

SELECT acct1.id,acct1.name,acct1.balance FROM account acct1 INNER JOIN (SELECT a.id FROM account a WHERE a.update_time > = '2020-09-19' ORDER BY a.update_time LIMIT 100000, 10) AS acct2 on acct1.id= acct2.id

The query effect is also leveraged and takes only 0.034 seconds.

The implementation plan is as follows:

The query idea is to first query the primary key ID that meets the conditions through the idx_update_time secondary index tree, and then connect with the original table through the primary key ID, so that the primary key index is directly followed, and the back table is reduced at the same time.

Label recording method

The essence of the limit deep paging problem is that the larger the offset, the more rows mysql will scan and then discard. This leads to a decline in query performance.

In fact, we can use the label recording method, that is, to mark which one was queried last time, and the next time we come back to check, start to scan down from that entry. It's like reading a book. when you saw it last time, you folded it or clipped a bookmark, and the next time you looked at it, you went straight to it.

Assuming that the last record was 100000, SQL can be modified to:

Select id,name,balance FROM account where id > 100000 order by id limit 10

In this way, no matter how many pages are turned later, the performance will be good, because the id index is hit. But for you, this approach has its limitations: you need a field that is similar to continuous self-increment.

Use between...and...

In many cases, a limit query can be converted into a query with a known location, so that the MySQL can get the corresponding results by scanning the between...and through the range.

If you know that the boundary value is 100000140010, you can optimize it like this:

Select id,name,balance FROM account where id between 100000 and 100010 order by id desc; hand-in-hand combat case

Let's take a look at an actual combat case. Suppose you now have a table with the following structure and 2 million data.

CREATE TABLE account (id varchar (32) COLLATE utf8_bin NOT NULL COMMENT 'primary key', account_no varchar (64) COLLATE utf8_bin NOT NULL DEFAULT''COMMENT' account 'amount decimal (20jue 2) DEFAULT NULL COMMENT' amount 'type varchar (10) COLLATE utf8_bin DEFAULT NULL COMMENT' type A B' create_time datetime DEFAULT NULL COMMENT 'creation time', update_time datetime DEFAULT NULL COMMENT 'update time', PRIMARY KEY (id), KEY `idx_account_ No` (account_no), KEY `idx_create_ time` (create_time)) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT=' account table'

The business requirements are as follows: obtain type An account data for the most 2021 and report to big data platform.

The way to realize the general idea

Many partners receive such a demand and will directly realize it like this:

/ / query the total number of reported pages Integer total = accountDAO.countAccount (); / / query the total reported quantity corresponding to SQL seelct count (1) from account where create_time > = '2021-01-01 00lange 00lv 00' and type =' Avalanche / number of calculated pages int pageNo = total% pageSize = 0? Total / pageSize: (total / pageSize + 1); / / pagination query and report to for (int I = 0; I

< pageNo; i++){ List list = accountDAO.listAccountByPage(startRow,pageSize); startRow = (pageNo-1)*pageSize; //上报大数据 postBigData(list);} //分页查询SQL（可能存在limit深分页问题，因为account表数据量几百万） seelct * from account where create_time >

= '2021-01-01 00 startRow 00' and type ='A 'limit # {startRow}, # {pageSize} actual combat optimization scheme

In the above implementation, there will be the problem of limit deep paging, because the account table has millions of data. Then how to optimize it?

In fact, you can use the label recording method, some partners may have doubts, the id primary key is not continuous ah, really can use label recording?

Of course, id is not continuous, we can make it continuous through order by. The optimization scheme is as follows:

/ / query the minimum IDString lastId = accountDAO.queryMinId (); / / query the SQLselect MIN (id) from accountwhere create_time for the maximum ID > = '2021-01-01 00:00:00'and type =' Amalabash / the number of entries on a page Integer pageSize = 100 do list list; do {list = listAccountByPage (lastId,pageSize); / / tag recording method to record the last query Id lastId = list.get (list,size ()-1). GetId () / / report big data postBigData (list);} while (CollectionUtils.isNotEmpty (list)); select * from account where create_time > = '2021-01-01 00 order by id asc limit 0000and id > # {lastId} and type =' A' order by id asc limit # {pageSize} after reading the above, do you have any further understanding of the operation process of MySQL deep paging problem solving? If you want to know more knowledge or related content, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.