Optimization | if you use traditional paging SQL, you are dead. 04/25 Update SLTechnology News&Howtos

Optimization | if you use traditional paging SQL, you are dead.

2025-04-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

0. Guide reading

In the development of paging function, we are used to using the method of LIMIT OMagin N to fetch data. This method will kill MySQL when it encounters a large paging offset. Please don't write SQL like that again.

Usually, we will use ORDER BY LIMIT start, offset to do paging query. For example, the following SQL:

SELECT * FROM `t1` WHERE ftype=1 ORDER BY id DESC LIMIT 100,10

Or a paging SQL like this without any conditions:

SELECT * FROM `t1` ORDER BY id DESC LIMIT 100,10

Generally speaking, the time consuming of paging SQL increases sharply with the increase of start value. Let's take a look at the following two different starting values of paging SQL execution time:

Yejr@imysql.com > SELECT * FROM `t1` WHERE ftype=1

ORDER BY id DESC LIMIT 500, 10

...

10 rows in set (0.05sec)

Yejr@imysql.com > SELECT * FROM `t1` WHERE ftype=6

ORDER BY id DESC LIMIT 935500, 10

...

10 rows in set (2.39 sec)

As you can see, with the increase in the number of pages, SQL query time has increased dozens of times, which is obviously unscientific.

Today we will analyze how to optimize this paging scheme.

In general, the ultimate solution to optimize paging is: no paging, ha, ha, don't say I talk nonsense, indeed, you can give the paging algorithm to Solr, Lucene, Sphinx and other third-party solutions, especially when there is a need for fuzzy search, there is no need for MySQL to do things it is not good at.

Of course, a partner said, it is too troublesome to use a third party, we just want to use MySQL to do this pagination, what should we do? Don't worry, let us analyze it slowly.

First, take a look at the following table DDL, data volume, query SQL execution plan and other information:

Yejr@imysql.com > SHOW CREATE TABLE `t1`

CREATE TABLE `t1` (

`id`int (10) unsigned NOT NULL AUTO_INCREMENT

...

`ftype` tinyint (3) unsigned NOT NULL

...

PRIMARY KEY (`id`)

) ENGINE=InnoDB DEFAULT CHARSET=utf8

Yejr@imysql.com > select count (*) from T1

+-+

| | count (*) |

+-+

| | 994584 |

+-+

Yejr@imysql.com > EXPLAIN SELECT * FROM `t1` WHERE ftype=1

ORDER BY id DESC LIMIT 500,10\ G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: t1

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 510

Extra: Using where

Yejr@imysql.com > EXPLAIN SELECT * FROM `t1` WHERE ftype=1

ORDER BY id DESC LIMIT 935500, 10\ G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: t1

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 935510

Extra: Using where

As you can see, although the data is scanned through the primary key index, the number of records that need to be scanned by the second SQL is too large, and you need to scan about 935510 records first, and then fetch 10 records according to the sorting results, which must be very slow.

In view of this situation, our optimization ideas are relatively clear, there are two points:

Get data directly from the index as much as possible to avoid or reduce the number of times to scan row data again (that is, what we usually call avoiding going back to the table)

Minimize the number of records scanned, that is, determine the starting range first, and then take N records later.

According to the above two optimization ideas, there are corresponding SQL rewriting methods: subquery, table join, such as the following:

# method 1

# optimize by subquery, first get the maximum id from the index, then reverse the order, and then take 10 rows of result sets

# Note that the reverse order is used twice here, so when taking the start value of LIMIT, it is 10 times higher than the original value, that is, 935510, otherwise the result will be inconsistent with the original

Yejr@imysql.com > EXPLAIN SELECT * FROM (SELECT * FROM `t1` WHERE

Id > (SELECT id FROM `t1` WHERE ftype=1

ORDER BY id DESC LIMIT 935510, 1) LIMIT 10) t ORDER BY id DESC\ G

* * 1. Row *

Id: 1

Select_type: PRIMARY

Table:

Type: ALL

Possible_keys: NULL

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 10

Extra: Using filesort

* 2. Row * *

Id: 2

Select_type: DERIVED

Table: t1

Type: ALL

Possible_keys: PRIMARY

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 973192

Extra: Using where

* 3. Row * *

Id: 3

Select_type: SUBQUERY

Table: t1

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 935511

Extra: Using where

# method 2

# using INNER JOIN optimization, the JOIN clause also gives priority to getting the ID list from the index, and then directly associating the query to get the final result. There is no need to add 10.

Yejr@imysql.com > EXPLAIN SELECT * FROM `t1` INNER JOIN

(SELECT id FROM `t1` WHERE ftype=1

ORDER BY id DESC LIMIT 935500jue 10) T2 USING (id)\ G

* * 1. Row *

Id: 1

Select_type: PRIMARY

Table:

Type: ALL

Possible_keys: NULL

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 935510

Extra: NULL

* 2. Row * *

Id: 1

Select_type: PRIMARY

Table: t1

Type: eq_ref

Possible_keys: PRIMARY

Key: PRIMARY

Key_len: 4

Ref: t2.id

Rows: 1

Extra: NULL

* 3. Row * *

Id: 2

Select_type: DERIVED

Table: t1

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 973192

Extra: Using where

Then compare the execution time / cost of these two optimizations:

# 1. Subquery optimization: according to the results of profiling, the time consuming is 28.2% less than that of the original.

Yejr@imysql.com > SELECT * FROM (SELECT * FROM `t1` WHERE

Id > (SELECT id FROM `t1` WHERE ftype=1

ORDER BY id DESC LIMIT 935510, 1) LIMIT 10) T ORDER BY id DESC

...

Rows in set (1.86 sec)

# 2. INNER JOIN optimization: from the result of profiling, the time-consuming is reduced by 30.8%.

Yejr@imysql.com > SELECT * FROM `t1` INNER JOIN

(SELECT id FROM `t1` WHERE ftype=1

ORDER BY id DESC LIMIT 935500 jade 10) T2 USING (id)

...

10 rows in set (1.83 sec)

Let's take a look at a paged SQL comparison without filtering conditions:

# 1. Original SQL

Yejr@imysql.com > EXPLAIN SELECT * FROM `t1` ORDER BY id DESC LIMIT 935500, 10\ G

* * 1. Row *

Id: 1

Select_type: SIMPLE

Table: t1

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 935510

Extra: NULL

Yejr@imysql.com > SELECT * FROM `t1` ORDER BY id DESC LIMIT 935500, 10

...

10 rows in set (2.22 sec)

# 2. Adopt subquery optimization, which takes 10.6% less time than the original.

Yejr@imysql.com > EXPLAIN SELECT * FROM (SELECT * FROM `t1` WHERE

Id > (SELECT id FROM `t1` ORDER BY id DESC

LIMIT 935510, 1) LIMIT 10) t ORDER BY id DESC

* * 1. Row *

Id: 1

Select_type: PRIMARY

Table:

Type: ALL

Possible_keys: NULL

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 10

Extra: Using filesort

* 2. Row * *

Id: 2

Select_type: DERIVED

Table: t1

Type: ALL

Possible_keys: PRIMARY

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 973192

Extra: Using where

* 3. Row * *

Id: 3

Select_type: SUBQUERY

Table: t1

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 935511

Extra: Using index

Yejr@imysql.com > SELECT * FROM (SELECT * FROM `t1` WHERE

Id > (SELECT id FROM `t1` ORDER BY id DESC

LIMIT 935510, 1) LIMIT 10) t ORDER BY id DESC

...

10 rows in set (2.01 sec)

# 3. Using INNER JOIN optimization, the time-consuming is reduced by 30.2%.

Yejr@imysql.com > EXPLAIN SELECT * FROM `t1` INNER JOIN

(SELECT id FROM `t1`order BY id DESC

LIMIT 935500jue 10) T2 USING (id)\ G

* * 1. Row *

Id: 1

Select_type: PRIMARY

Table:

Type: ALL

Possible_keys: NULL

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 935510

Extra: NULL

* 2. Row * *

Id: 1

Select_type: PRIMARY

Table: t1

Type: eq_ref

Possible_keys: PRIMARY

Key: PRIMARY

Key_len: 4

Ref: t1.id

Rows: 1

Extra: NULL

* 3. Row * *

Id: 2

Select_type: DERIVED

Table: t1

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 973192

Extra: Using index

Yejr@imysql.com > SELECT * FROM `t1` INNER JOIN

(SELECT id FROM `t1`order BY id DESC

LIMIT 935500 jade 10) T2 USING (id)

...

10 rows in set (1.70 sec)

So far, we have seen a significant improvement after using subqueries or INNER JOIN for optimization, and this method is also suitable for smaller paging.

In conclusion, the efficiency improvement of subquery and INNER JOIN paging optimization methods is as follows:

Paging with WHERE conditions can improve query efficiency: 24.9% and 156.5%, respectively.

Paging without WHERE conditions improves query efficiency: 554.5% and 11.7%, respectively.

In terms of the proportion of the increase alone, it is quite considerable. And these two optimization methods are basically applicable to a variety of paging modes, and it is strongly recommended to change to this habit of SQL writing at the very beginning.

Let's take a look at the corresponding percentage of improvement for each scenario:

Large pages, with WHERE large pages, without WHERE large pages average promotion ratio small pages, with WHERE small pages, without WHERE overall average improvement than example query optimization 28.20%10.60%19.40%24.90%554.40%154.53%INNER JOIN optimization 30.80% 30.20% 30.50% 156.50% 11.70% 57.30%

It is obvious from this point of view, especially in the case of large pages, so we give priority to using INNER JOIN to optimize the paging algorithm.

For each of the above tests, the mysqld instance is restarted and SQL_NO_CACHE is added to ensure that it is read in a direct data file or index file each time. If the data is preheated, the query efficiency will be improved to a certain extent, but the corresponding efficiency improvement ratio mentioned above is basically the same.