In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-18 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article shows you the possible miswritten usage of SQL, which is concise and easy to understand, which will definitely brighten your eyes. I hope you can gain something through the detailed introduction of this article.
1. LIMIT statement
Paging queries are one of the most common scenarios, but they are also usually the most prone to problems. For example, for the following simple statement, the general idea of DBA is to add a combined index to the type, name, and create_time fields. In this way, conditional sorting can be effectively used to the index, and the performance is improved rapidly.
SELECT * FROM operation WHERE type = 'SQLStats' AND name =' SlowLog' ORDER BY create_time LIMIT 1000, 10
Well, maybe more than 90% of DBA solves the problem so much. But when the LIMIT clause changes to "LIMIT 1000000 million 10", programmers still complain: why am I taking only 10 records? why is it still slow?
To know that the database does not know where the 1000000 record starts, even if there is an index, it needs to be calculated from scratch. In most cases, programmers are lazy when this kind of performance problem occurs.
In scenarios such as front-end data browsing and paging, or big data exporting in batches, the maximum value of the previous page can be used as a parameter as a query condition. SQL has been redesigned as follows:
SELECT * FROM operation WHERE type = 'SQLStats' AND name =' SlowLog' AND create_time > '2017-03-16 1414 SQLStats' AND name 0000' ORDER BY create_time limit 10
Under the new design, the query time is basically fixed and will not change with the growth of the amount of data.
2. Implicit conversion
Another common mistake is a mismatch between query variables and field definition types in SQL statements. For example, the following statement:
Mysql > explain extended SELECT * > FROM my_balance b > WHERE b.bpn = 14000000123 > AND b.isverified IS NULL; mysql > show warnings; | Warning | 1739 | Cannot use ref access on index 'bpn' due to type or collation conversion on field' bpn'
The field bpn is defined as varchar (20), and the strategy of MySQL is to convert a string into a number before comparing. The function acts on the table field and the index is invalid.
The above situation may be the parameters automatically filled in by the application framework, not the original intention of the programmer. Nowadays, many application frameworks are very complicated, so when it is easy to use, you should also be careful that it may dig holes for yourself.
3. Associate updates and deletes
Although MySQL5.6 introduces materialization features, it needs to be noted that it is currently only optimized for query statements. For updates or deletions, you need to manually rewrite to JOIN.
For example, in the following UPDATE statement, MySQL actually executes a circular / nested subquery (DEPENDENT SUBQUERY), and its execution time is conceivable.
UPDATE operation o SET status = 'applying' WHERE o.id IN (SELECT id FROM (SELECT o.id) O.status FROM operation o WHERE o.group = 123AND o.status NOT IN ('done') ORDER BY o.parent O.id LIMIT 1) t)
Execute the plan:
+- -- + | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + -+-+-- + | 1 | PRIMARY | | o | index | | PRIMARY | 8 | | 24 | Using where | Using temporary | | 2 | DEPENDENT SUBQUERY | | Impossible WHERE noticed after reading const tables | | 3 | DERIVED | o | ref | idx_2,idx_5 | idx_5 | 8 | const | 1 | Using where Using filesort | +-+- -- +
After rewriting to JOIN, the selection mode of the subquery changes from DEPENDENT SUBQUERY to DERIVED, and the execution speed is greatly reduced from 7 seconds to 2 milliseconds.
UPDATE operation o JOIN (SELECT o.id, o.status FROM operation o WHERE o.group = 123AND o.status NOT IN ('done') ORDER BY o.parent O.id LIMIT 1) t ON o.id = t.id SET status = 'applying'
The execution plan is simplified to:
+- -+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +- +-- + | 1 | PRIMARY | | | | Impossible WHERE noticed after reading const tables | | 2 | DERIVED | o | ref | idx_2 | Idx_5 | idx_5 | 8 | const | 1 | Using where Using filesort | +-+- -- +
4. Mixed sorting
MySQL cannot use indexes for mixed sorting. However, in some scenarios, there is an opportunity to use special methods to improve performance.
SELECT * FROM my_order o INNER JOIN my_appraise an ON a.orderid = o.id ORDER BY a.is_reply ASC, a.appraise_time DESC LIMIT 0,20
The execution plan is displayed as a full table scan:
+-+ | id | select_type | table | type | possible_keys | key | | key_len | ref | rows | Extra +-+ | 1 | SIMPLE | A | ALL | idx_orderid | NULL | 1967647 | Using filesort | | 1 | SIMPLE | o | eq_ref | PRIMARY | a.orderid | 1 | NULL | + -- +
Since is_reply has only 0 and 1 states, the execution time is reduced from 1.58 seconds to 2 milliseconds after rewriting as follows.
SELECT * FROM ((SELECT * FROM my_order o INNER JOIN my_appraise an ON a.orderid = o.id AND is_reply = 0 ORDER BY appraise_time DESC LIMIT 0 20) UNION ALL (SELECT * FROM my_order o INNER JOIN my_appraise an ON a.orderid = o.id AND is_reply = 1 ORDER BY appraise_time DESC LIMIT 0,20) t ORDER BY is_reply ASC, appraisetime DESC LIMIT 20
5. EXISTS statement
When MySQL treats the EXISTS clause, it still executes the nested subquery. Such as the following SQL statement:
SELECT * FROM my_neighbor n LEFT JOIN my_neighbor_apply sra ON n.id = sra.neighbor_id AND sra.user_id = 'xxx' WHERE n.topic_status < 4 AND EXISTS (SELECT 1 FROM message_info m WHERE n.id = m.neighbor_id AND m.inuser = 'xxx') AND n.topic_type 5
The implementation plan is:
+- -+-+ | 1 | PRIMARY | n | ALL | | NULL | 1086041 | Using where | 1 | PRIMARY | sra | ref | | idx_user_id | 123 | const | 1 | Using Where | | 2 | DEPENDENT SUBQUERY | m | ref | | idx_message_info | 122 | const | 1 | Using index condition Using where | +-- +-+
Removing exists and changing it to join can avoid nesting subqueries and reduce the execution time from 1.93 seconds to 1 milliseconds.
SELECT * FROM my_neighbor n INNER JOIN message_info m ON n.id = m.neighbor_id AND m.inuser = 'xxx' LEFT JOIN my_neighbor_apply sra ON n.id = sra.neighbor_id AND sra.user_id =' xxx' WHERE n.topic_status < 4 AND n.topic_type 5
New implementation plan:
+-- + | id | select_ Type | table | type | possible_keys | key | key_len | ref | rows | Extra | +-- + -- + | 1 | SIMPLE | m | ref | | idx_message_info | 122 | const | 1 | Using index condition | 1 | SIMPLE | n | eq_ref | PRIMARY | 122 | ighbor_id | 1 | Using where | 1 | SIMPLE | sra | ref | idx_user | _ id | 123 | const | 1 | Using where | +-+-- +-- + -- +
6. Conditional push-down
Situations where external query conditions cannot be pushed down to complex views or subqueries are:
Aggregate subquery
Subquery with LIMIT
UNION or UNION ALL subquery
Subqueries in the output field
As in the following statement, you can see from the execution plan that its condition acts on the aggregate subquery:
SELECT * FROM (SELECT target Count (*) FROM operation GROUP BY target) t WHERE target = 'rm-xxxx'+----+-+- +-+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + -+-+ | 1 | PRIMARY | | ref | | 514 | const | 2 | Using where | | 2 | DERIVED | operation | index | idx_4 | idx_4 | 519 | NULL | 20 | Using index | | +-+
Make sure that the semantic query condition can be pushed down directly, and rewrite as follows:
SELECT target, Count (*) FROM operation WHERE target = 'rm-xxxx' GROUP BY target
The execution plan is changed to:
+-- + | id | select_type | table | type | Possible_keys | key | key_len | ref | rows | Extra | +- -+ | 1 | SIMPLE | operation | ref | idx_4 | idx_4 | 514 | const | 1 | Using where Using index | +-- +
For a detailed explanation that the external conditions of MySQL cannot be pushed down, please refer to the article:
Http://mysql.taobao.org/monthly/2016/07/08
7. Narrow the scope ahead of time
Start with the initial SQL statement:
SELECT * FROM my_order o LEFT JOIN my_userinfo u ON o.uid = u.uid LEFT JOIN my_productinfo p ON o.pid = p.pid WHERE (o.display = 0) AND (o.ostaus = 1) ORDER BY o.selltime DESC LIMIT 0,15
The original intention of the SQL statement is to make a series of left joins, and then sort the first 15 records. As can be seen from the execution plan, the last step estimates that the number of sorted records is 900000 and the time consumption is 12 seconds.
+- -- + | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +-+ -+- -+ | 1 | SIMPLE | o | ALL | NULL | 909119 | Using where Using temporary; Using filesort | | 1 | SIMPLE | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | 1 | SIMPLE | p | ALL | PRIMARY | NULL | 6 | Using where Using join buffer (Block Nested Loop) | +-- +- -- +
Since the final WHERE condition and sorting are for the leftmost main table, we can reduce the amount of data in advance of my_order sorting before making a left join. After SQL rewriting, the execution time is reduced to about 1 millisecond.
SELECT * FROM (SELECT * FROM my_order o WHERE (o.display = 0) AND (o.ostaus = 1) ORDER BY o.selltime DESC LIMIT 0,15) o LEFT JOIN my_userinfo u ON o.uid = u.uid LEFT JOIN my_productinfo p ON o.pid = p.pid ORDER BY o.selltime DESC limit 0,15
Check the execution plan again: the subquery is materialized (select_type=DERIVED) to participate in the JOIN. Although the estimated row scan is still 900000, the actual execution time becomes very small after using the index and the LIMIT clause.
+- -- + | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | + -+-+-- + | 1 | PRIMARY | | ALL | NULL | 15 | Using temporary Using filesort | | 1 | PRIMARY | u | eq_ref | PRIMARY | PRIMARY | 4 | o.uid | 1 | NULL | 1 | PRIMARY | p | ALL | PRIMARY | NULL | 6 | Using where Using join buffer (Block Nested Loop) | | 2 | DERIVED | o | index | idx_1 | 5 | NULL | 909112 | Using where | + -+-+
8. Push down the intermediate result set
Let's take a look at the following example that has been initially optimized (the main table precedence query condition in the left join):
SELECT A. ORDER BY salecode limit, c.allocated FROM (SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a LEFT JOIN (SELECT resourcesid) Sum (ifnull (allocation, 0) * 12345) allocated FROM my_resources GROUP BY resourcesid) c ON a.resourceid = c.resourcesid
So are there any other problems with this statement? It is not difficult to see that the subquery c is a full table aggregate query, which will degrade the performance of the entire statement in the case of a particularly large number of tables.
In fact, for subquery c, the left concatenation of the final result set is only concerned with the data that can match the main table resourceid. So we can rewrite the statement as follows, and the execution time is reduced from 2 seconds to 2 milliseconds.
SELECT A. ORDER BY salecode limit, c.allocated FROM (SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a LEFT JOIN (SELECT resourcesid) Sum (ifnull (allocation, 0) * 12345) allocated FROM my_resources r (SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) a WHERE r.resourcesid = a.resourcesid GROUP BY resourcesid) c ON a.resourceid = c.resourcesid
But the subquery an appears multiple times in our SQL statement. This way of writing not only has extra overhead, but also makes the whole sentence complicated. Rewrite it again using the WITH statement:
WITH an AS (SELECT resourceid FROM my_distribute d WHERE isdelete = 0 AND cusmanagercode = '1234567' ORDER BY salecode limit 20) SELECT a., c.allocated FROM a LEFT JOIN (SELECT resourcesid Sum (ifnull (allocation, 0) * 12345) allocated FROM my_resources r, a WHERE r.resourcesid = a.resourcesid GROUP BY resourcesid) c ON a.resourceid = c.resourcesid
The database compiler generates an execution plan, which determines the actual execution of SQL. But compilers only try their best to serve, and all database compilers are not perfect.
Most of the scenarios mentioned above also have performance problems in other databases. Only by understanding the characteristics of the database compiler can we avoid its shortcomings and write high-performance SQL statements.
Programmers should bring in the idea or consciousness of algorithms when designing data models and writing SQL statements.
To write complex SQL statements, you should get into the habit of using WITH statements. Concise and clear-thinking SQL statements can also reduce the burden on the database.
The above are the possible misspelled uses of SQL. Have you learned any knowledge or skills? If you want to learn more skills or enrich your knowledge reserve, you are welcome to follow the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.