How to write SQL statement correctly 02/14 Update SLTechnology News&Howtos

How to write SQL statement correctly

2026-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/03 Report--

This article introduces the relevant knowledge of "how to write SQL sentences correctly". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!

The order in which sql statements are executed:

FROM

JOIN

WHERE

GROUP BY

HAVING

SELECT

DISTINCT

ORDER BY

LIMIT

1. LIMIT statement

Paging queries are one of the most common scenarios, but they are also usually the most prone to problems. For example, for the following simple statement, the general idea of DBA is to add a combined index to the type, name, and create_time fields. In this way, conditional sorting can be effectively used to the index, and the performance is improved rapidly.

SELECT *

FROM operation

WHERE type = 'SQLStats'

AND name = 'SlowLog'

ORDER BY create_time

LIMIT 1000, 10

Well, maybe more than 90% of DBA solves the problem so much. But when the LIMIT clause changes to "LIMIT 1000000 million 10", programmers still complain: why am I taking only 10 records? why is it still slow?

To know that the database does not know where the 1000000 record starts, even if there is an index, it needs to be calculated from scratch. In most cases, programmers are lazy when this kind of performance problem occurs.

In scenarios such as front-end data browsing and paging, or big data exporting in batches, the maximum value of the previous page can be used as a parameter as a query condition. SQL has been redesigned as follows:

SELECT *

FROM operation

WHERE type = 'SQLStats'

AND name = 'SlowLog'

AND create_time > '2017-03-16 1400VOG 00'

ORDER BY create_time limit 10

Under the new design, the query time is basically fixed and will not change with the growth of the amount of data.

2. Implicit conversion

Another common mistake is a mismatch between query variables and field definition types in SQL statements. For example, the following statement:

Mysql > explain extended SELECT *

> FROM my_balance b

> WHERE b.bpn = 14000000123

> AND b.isverified IS NULL

Mysql > show warnings

The field bpn is defined as varchar (20), and the strategy of MySQL is to convert a string into a number before comparing. The function acts on the table field and the index is invalid.

The above situation may be the parameters automatically filled in by the application framework, not the original intention of the programmer. Nowadays, many application frameworks are very complicated, so when it is easy to use, you should also be careful that it may dig holes for yourself.

3. Associate updates and deletes

Although MySQL5.6 introduces materialization features, it needs to be noted that it is currently only optimized for query statements. For updates or deletions, you need to manually rewrite to JOIN.

For example, in the following UPDATE statement, MySQL actually executes a circular / nested subquery (DEPENDENT SUBQUERY), and its execution time is conceivable.

UPDATE operation o

SET status = 'applying'

WHERE o.id IN (SELECT id

FROM (SELECT o.id

O.status

FROM operation o

WHERE o.group = 123

AND o.status NOT IN ('done')

ORDER BY o.parent

O.id

LIMIT 1) t)

Execute the plan:

+- -- +

+- -- +

| | 1 | PRIMARY | o | index | | PRIMARY | 8 | | 24 | Using where; Using temporary |

| | 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |

+- -- +

After rewriting to JOIN, the selection mode of the subquery changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly reduced from 7 seconds to 2 milliseconds.

UPDATE operation o

JOIN (SELECT o.id

O.status

FROM operation o

WHERE o.group = 123

AND o.status NOT IN ('done')

ORDER BY o.parent

O.id

LIMIT 1) t

ON o.id = t.id

SET status = 'applying'

The execution plan is simplified to:

+- -+

+- -+

| | 2 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where; Using filesort |

+- -+

4. Mixed sorting

MySQL cannot use indexes for mixed sorting. However, in some scenarios, there is an opportunity to use special methods to improve performance.

SELECT *

FROM my_order o

INNER JOIN my_appraise an ON a.orderid = o.id

ORDER BY a.is_reply ASC

A.appraise_time DESC

LIMIT 0, 20

The execution plan is displayed as a full table scan:

+-- +

+-- +

| | 1 | SIMPLE | a | ALL | idx_orderid | NULL | 1967647 | Using filesort |

+-- +

Since is_reply has only 0 and 1 states, the execution time is reduced from 1.58 seconds to 2 milliseconds after rewriting as follows.

SELECT *

FROM ((SELECT *

FROM my_order o

INNER JOIN my_appraise a

ON a.orderid = o.id

AND is_reply = 0

ORDER BY appraise_time DESC

LIMIT 0,20)

UNION ALL

(SELECT *

FROM my_order o

INNER JOIN my_appraise a

ON a.orderid = o.id

AND is_reply = 1

ORDER BY appraise_time DESC

LIMIT 0,20) t

ORDER BY is_reply ASC

Appraisetime DESC

LIMIT 20

5. EXISTS statement

When MySQL treats the EXISTS clause, it still executes the nested subquery. Such as the following SQL statement:

SELECT *

FROM my_neighbor n

LEFT JOIN my_neighbor_apply sra

ON n.id = sra.neighbor_id

AND sra.user_id = 'xxx'

WHERE n.topic_status < 4

AND EXISTS (SELECT 1

FROM message_info m

WHERE n.id = m.neighbor_id

AND m.inuser = 'xxx')

AND n.topic_type 5

The implementation plan is:

+-+-- +-+

+-+

| | 1 | PRIMARY | n | ALL | | NULL | 1086041 | Using where |

| | 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using where |

| | 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition; Using where |

+-+

Removing exists and changing it to join can avoid nesting subqueries and reduce the execution time from 1.93 seconds to 1 milliseconds.

SELECT *

FROM my_neighbor n

INNER JOIN message_info m

ON n.id = m.neighbor_id

AND m.inuser = 'xxx'

LEFT JOIN my_neighbor_apply sra

ON n.id = sra.neighbor_id

AND sra.user_id = 'xxx'

WHERE n.topic_status < 4

AND n.topic_type 5

New implementation plan:

+-+-+

+-+-+

| | 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition |

| | 1 | SIMPLE | n | eq_ref | | PRIMARY | 122 | ighbor_id | 1 | Using where |

| | 1 | SIMPLE | sra | ref | | idx_user_id | 123 | const | 1 | Using where |

+-+-+

6. Conditional push-down

Situations where external query conditions cannot be pushed down to complex views or subqueries are:

1, aggregate subquery, 2, subquery with LIMIT, 3, UNION or UNION ALL subquery, 4, subquery in output field

As in the following statement, you can see from the execution plan that its condition acts on the aggregate subquery:

SELECT *

FROM (SELECT target

Count (*)

FROM operation

GROUP BY target) t

WHERE target = 'rm-xxxx'

+-- +

+-- +

| | 1 | PRIMARY | | ref | 514 | const | 2 | Using where |

| | 2 | DERIVED | operation | index | idx_4 | idx_4 | 519 | NULL | 20 | Using index |

+-- +

Make sure that the semantic query condition can be pushed down directly, and rewrite as follows:

SELECT target

Count (*)

FROM operation

WHERE target = 'rm-xxxx'

GROUP BY target

The execution plan is changed to:

+-- +

+-- +

| | 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where; Using index |

+-- +

For a detailed explanation that the external conditions of MySQL cannot be pushed down, please refer to the previous article: push to the physical and chemical table http://mysql.taobao.org/monthly/2016/07/08 under the condition of MySQL performance optimization

7. Narrow the scope ahead of time

Start with the initial SQL statement:

SELECT *

FROM my_order o

LEFT JOIN my_userinfo u

ON o.uid = u.uid

LEFT JOIN my_productinfo p

ON o.pid = p.pid

WHERE (o.display = 0)

AND (o.ostaus = 1)

ORDER BY o.selltime DESC

LIMIT 0, 15

The original intention of the SQL statement is to make a series of left joins, and then sort the first 15 records. As can be seen from the execution plan, the last step estimates that the number of sorted records is 900000 and the time consumption is 12 seconds.

+- -- +

+- -- +

| | 1 | SIMPLE | o | ALL | NULL | 909119 | Using where; Using temporary; Using filesort |

| | 1 | SIMPLE | p | ALL | PRIMARY | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |

+- -- +

Since the final WHERE condition and sorting are for the leftmost main table, we can reduce the amount of data in advance of my_order sorting before making a left join. After SQL rewriting, the execution time is reduced to about 1 millisecond.

SELECT *

FROM (

SELECT *

FROM my_order o

WHERE (o.display = 0)

AND (o.ostaus = 1)

ORDER BY o.selltime DESC

LIMIT 0, 15

) o

LEFT JOIN my_userinfo u

ON o.uid = u.uid

LEFT JOIN my_productinfo p

ON o.pid = p.pid

ORDER BY o.selltime DESC

Limit 0, 15

Check the execution plan again: the subquery is materialized (select_type=DERIVED) to participate in the JOIN. Although the estimated row scan is still 900000, the actual execution time becomes very small after using the index and the LIMIT clause.

+- -- +

+- -- +

| | 1 | PRIMARY | | ALL | NULL | 15 | Using temporary; Using filesort |

| | 1 | PRIMARY | p | ALL | PRIMARY | NULL | 6 | Using where; Using join buffer (Block Nested Loop) |

| | 2 | DERIVED | o | index | NULL | idx_1 | 5 | NULL | 909112 | Using where |

+- -- +

8. Push down the intermediate result set

Let's take a look at the following example that has been initially optimized (the main table precedence query condition in the left join):

SELECT a.*

C.allocated

FROM (

SELECT resourceid

FROM my_distribute d

WHERE isdelete = 0

AND cusmanagercode = '1234567'

ORDER BY salecode limit 20) a

LEFT JOIN

(

SELECT resourcesid, sum (ifnull (allocation, 0) * 12345) allocated

FROM my_resources

GROUP BY resourcesid) c

ON a.resourceid = c.resourcesid

So are there any other problems with this statement? It is not difficult to see that the subquery c is a full table aggregate query, which will degrade the performance of the entire statement in the case of a particularly large number of tables.

In fact, for subquery c, the left concatenation of the final result set is only concerned with the data that can match the main table resourceid. So we can rewrite the statement as follows, and the execution time is reduced from 2 seconds to 2 milliseconds.

SELECT a.*

C.allocated

FROM (

SELECT resourceid

FROM my_distribute d

WHERE isdelete = 0

AND cusmanagercode = '1234567'

ORDER BY salecode limit 20) a

LEFT JOIN

(

SELECT resourcesid, sum (ifnull (allocation, 0) * 12345) allocated

FROM my_resources r

(

SELECT resourceid

FROM my_distribute d

WHERE isdelete = 0

AND cusmanagercode = '1234567'

ORDER BY salecode limit 20) a

WHERE r.resourcesid = a.resourcesid

GROUP BY resourcesid) c

ON a.resourceid = c.resourcesid

But the subquery an appears multiple times in our SQL statement. This way of writing not only has extra overhead, but also makes the whole sentence complicated. Rewrite it again using the WITH statement:

WITH an AS

(

SELECT resourceid

FROM my_distribute d

WHERE isdelete = 0

AND cusmanagercode = '1234567'

ORDER BY salecode limit 20)

SELECT a.*

C.allocated

FROM a

LEFT JOIN

(

SELECT resourcesid, sum (ifnull (allocation, 0) * 12345) allocated

FROM my_resources r

WHERE r.resourcesid = a.resourcesid

GROUP BY resourcesid) c

ON a.resourceid = c.resourcesid, "how to write SQL statements correctly" ends here. Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.