In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article introduces the relevant knowledge of "what are the ways to make SQL run faster". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
-to illustrate the problem more intuitively, the SQL run time in all instances has been tested, and no more than 1 second is expressed as (
< 1秒)。 ---- 测试环境-- ---- 主机:HP LH II ---- 主频:330MHZ ---- 内存:128兆 ---- 操作系统:Operserver5.0.4 ----数据库:Sybase11.0.3 一、不合理的索引设计 ----例:表record有620000行,试看在不同的索引下,下面几个 SQL的运行情况: ---- 1.在date上建有一非个群集索引 select count(*) from record where date >'19991201' and date
< '19991214'and amount >2000 (25 seconds)
Select date,sum (amount) from record group by date
(55 seconds)
Select count (*) from record where date >
'19990901' and place in ('BJ','SH') (27 seconds)
-Analysis:
There are a large number of duplicate values on date. Under a non-clustered index, the data is physically stored randomly on the data page. When looking for a range, you must perform a table scan to find all the rows in this range.
-2. A cluster index on date
Select count (*) from record where date >
'19991201' and date
< '19991214' and amount >2000 (14 seconds)
Select date,sum (amount) from record group by date
(28 seconds)
Select count (*) from record where date >
'19990901' and place in ('BJ','SH') (14 seconds)
-Analysis:
-under the cluster index, the data is physically arranged on the data page in order, and the duplicate values are arranged together, so when searching for a range, you can first find the beginning and end of this range, and only scan the data page within this range, avoiding large-scale scanning and improving the speed of query.
-3. Composite index on place,date,amount
Select count (*) from record where date >
'19991201' and date
< '19991214' and amount >2000 (26 seconds)
Select date,sum (amount) from record group by date
(27 seconds)
Select count (*) from record where date >
'19990901' and place in ('BJ, 'SH') (
< 1秒) ---- 分析: ---- 这是一个不很合理的组合索引,因为它的前导列是place,第一和第二条SQL没有引用place,因此也没有利用上索引;第三个SQL使用了place,且引用的所有列都包含在组合索引中,形成了索引覆盖,所以它的速度是非常快的。 ---- 4.在date,place,amount上的组合索引 select count(*) from record where date >'19991201' and date
< '19991214' and amount >2000 (
< 1秒) select date,sum(amount) from record group by date (11秒) select count(*) from record where date >'19990901' and place in ('BJ','SH') (
< 1秒) ---- 分析: ---- 这是一个合理的组合索引。它将date作为前导列,使每个SQL都可以利用索引,并且在第一和第三个SQL中形成了索引覆盖,因而性能达到了最优。 ---- 5.总结: ---- 缺省情况下建立的索引是非群集索引,但有时它并不是最佳的;合理的索引设计要建立在对各种查询的分析和预测上。一般来说: ---- ①.有大量重复值、且经常有范围查询 (between, > < ,>=, < =) and order by
, the columns that occur in group by, you can consider establishing a cluster index
-②. Multiple columns are often accessed at the same time, and each column contains duplicate values. Consider establishing a combined index.
-③. The composite index should try to make the key query form index coverage, and its leading column must be the most frequently used column.
Second, insufficient connection conditions:
-example: table card has 7896 rows, there is a nonclustered index on card_no, table account has 191122 rows, and there is a nonclustered index on account_no. Try the execution of two SQL under different table join conditions:
Select sum (a.amount) from account a
Card b where a.card_no = b.card_no (20 seconds)
-change SQL to:
Select sum (a.amount) from account a
Card b where a.card_no = b.card_no and a.
Account_no=b.account_no (< 1 second)
-Analysis:
-under the first join condition, the best query solution is to use account as the outer table and card as the inner table. Using the index on card, the number of Icano can be estimated by the following formula:
-22541 pages on the outer table account + (191122 rows of the outer table account * 3 pages to be found on the inner table card corresponding to the first row of the outer table) = 595907 times
-under the second join condition, the best query solution is to use card as the outer table and account as the inner table. Using the index on account, the number of Icano can be estimated by the following formula:
-1944 pages on the outer table card + (7896 rows of the outer table card * 4 pages to be found for each row of the inner table account) = 33528 times
It can be seen that only with sufficient connection conditions can the really best solution be implemented.
-Summary:
-1. Before the multi-table operation is actually performed, the query optimizer will list several groups of possible join schemes according to the join conditions and find out the best one with the least system overhead. The join condition should fully consider the table with index and the table with a large number of rows; the choice of inner and outer surface can be determined by the formula: the number of matching rows in the outer table * the number of times of each search in the inner table, and the minimum product is the best solution.
-2. See how to execute the scheme-with set showplanon, turn on the showplan option, you can see the connection order and what index to use; for more detailed information, you need to use the sa role to execute dbcc (3604310302).
Third, non-optimizable where clause
-1. Example: the columns in the following SQL conditional statements are properly indexed, but the execution speed is very slow:
Select * from record where
Substring (card_no,1,4) = '5378' (13 seconds)
Select * from record where
Amount/30 < 1000 (11 seconds)
Select * from record where
Convert (char (10), date,112) = '19991201' (10 seconds)
-Analysis:
-the result of any operation on a column in the where clause is calculated column by column when SQL is running, so it has to perform a table search instead of using the index above that column; if these results are available at query compilation time, they can be optimized by the SQL optimizer to use indexes to avoid table search, so SQL is rewritten as follows:
Select * from record where card_no like
'5378%'(< 1 second)
Select * from record where amount
< 1000030 (< 1 second)
Select * from record where date= '1999 Greater 01'
(< 1 second)
-you will find that SQL is obviously getting up!
-2. For example: table stuff has 200000 rows, and there is a non-clustered index on id_no. See the following SQL:
Select count (*) from stuff where id_no in ('04th century 1')
(23 seconds)
-Analysis:
-the 'in'' in the where condition is logically equivalent to 'or', so the parser converts in (' 0,0,0,1') to id_no=' 0' or id_no='1' to execute. We expect it to look for each or clause separately, and then add the results, so that it can take advantage of the index on id_no, but in fact (according to showplan), it uses the "OR strategy", that is, it first fetches the rows that satisfy each or clause, stores them in the worksheet of the tempdb, builds a unique index to remove the duplicate rows, and finally calculates the results from this temporary table. Therefore, the actual process does not take advantage of indexes on id_no, and the completion time is also affected by the performance of the tempdb database.
-practice has proved that the more rows in the table, the worse the performance of the worksheet. When the stuff has 620000 rows, the execution time reaches 220 seconds! You might as well separate the or clause:
Select count (*) from stuff where id_no='0'
Select count (*) from stuff where id_no='1'
Two results are obtained, and it is worthwhile to add again. Because the index is used for each sentence, the execution time is only 3 seconds, and under 620000 lines, the time is only 4 seconds. Or, in a better way, write a simple stored procedure:
Create proc count_stuff as
Declare @ an int
Declare @ b int
Declare @ c int
Declare @ d char (10)
Begin
Select @ a=count (*) from stuff where id_no='0'
Select @ b=count (*) from stuff where id_no='1'
End
Select @ c=@a+@b
Select @ d=convert (char (10), @ c)
Print @ d
-calculate the result directly, and the execution time is as fast as above!
-Summary:
-it can be seen that the so-called optimization, that is, the where clause makes use of the index, and non-optimization means table scan or additional overhead.
-1. Any operation on a column will result in a table scan, which includes database functions, evaluation expressions, and so on, and the query should move the operation to the right of the equal sign as much as possible.
-2.in and or clauses often use worksheets to invalidate the index; if you do not produce a large number of duplicate values, you can consider taking the clause apart; the split clause should contain the index.
-3. Be good at using stored procedures, which make SQL more flexible and efficient.
That's all for "what are the ways to make SQL run faster?" Thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.