In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "how to make SQL run faster". The content in the article is simple and clear, easy to learn and understand. Please follow the editor's train of thought to study and learn "how to make SQL run faster".
People often fall into a misunderstanding when using SQL, that is, they pay too much attention to whether the results obtained are correct, while ignoring the possible performance differences between different implementation methods, especially in large or complex database environments (such as online transaction processing OLTP or decision support system DSS). In my working practice, the author finds that poor SQL often comes from inappropriate index design, inadequate join conditions and unoptimizable where clauses. After properly optimizing them, their running speed has been significantly improved! Below, I will sum up from these three aspects:
-to illustrate the problem more intuitively, the SQL run time in all instances has been tested, and those not exceeding 1 second are expressed as' 19991201' and date2000 (25 seconds)
Select date,sum (amount) from record group by date (55 seconds)
Select count (*) from record where date > '19990901' and placein ('BJ','SH') (27 seconds)
Analysis: there are a large number of duplicate values on the date. Under the non-clustered index, the data is randomly stored on the data page. When looking for the range, a table scan must be performed to find all the rows in this range.
two。 A cluster index on date
Select count (*) from record where date > '19991201' and date2000 (14 seconds)
Select date,sum (amount) from record group by date (28 seconds)
Select count (*) from record where date > '19990901' and placein ('BJ','SH') (14 seconds)
Analysis: under the cluster index, the data is physically ordered on the data page, and the duplicate values are also arranged together, so when searching in the range, the beginning and end of the range can be found first, and the data page can only be scanned within this range. large-scale scanning is avoided and query speed is improved.
3. Composite index on place,date,amount
Select count (*) from record where date > '19991201' and date 2000 (26 seconds)
Select date,sum (amount) from record group by date (27 seconds)
Select count (*) from record where date > '19990901' and placein (' BJ','SH') ('19991201' and date 2000 (' 19990901' and placein ('BJ','SH')) (, =
'19991201' and date
< '19991214' and amount >2000 (26 seconds)
Select date,sum (amount) from record group by date
(27 seconds)
Select count (*) from record where date >
'19990901' and place in ('BJ',' SH') (
< 1秒) ---- 分析: ---- 这是一个不很合理的组合索引,因为它的前导列是place,第一和第二条SQL没有引 用place,因此也没有利用上索引;第三个SQL使用了place,且引用的所有列都包含在组 合索引中,形成了索引覆盖,所以它的速度是非常快的。 ---- 4.在date,place,amount上的组合索引 select count(*) from record where date >'19991201' and date
< '19991214' and amount >2000 (
< 1秒) select date,sum(amount) from record group by date (11秒) select count(*) from record where date >'19990901' and place in ('BJ','SH') (
< 1秒) ---- 分析: ---- 这是一个合理的组合索引。它将date作为前导列,使每个SQL都可以利用索引,并 且在第一和第三个SQL中形成了索引覆盖,因而性能达到了最优。 ---- 5.总结: ---- 缺省情况下建立的索引是非群集索引,但有时它并不是最佳的;合理的索引设计要 建立在对各种查询的分析和预测上。一般来说: ---- ①.有大量重复值、且经常有范围查询 (between, > < ,>=
< =)和order by 、group by发生的列,可考虑建立群集索引; ---- ②.经常同时存取多列,且每列都含有重复值可考虑建立组合索引; ---- ③.组合索引要尽量使关键查询形成索引覆盖,其前导列一定是使用最频繁的列。 二、不充份的连接条件: ---- 例:表card有7896行,在card_no上有一个非聚集索引,表account有191122行,在 account_no上有一个非聚集索引,试看在不同的表连接条件下,两个SQL的执行情况: select sum(a.amount) from account a, card b where a.card_no = b.card_no(20秒) ---- 将SQL改为: select sum(a.amount) from account a, card b where a.card_no = b.card_no and a. account_no=b.account_no(< 1秒) ---- 分析: ---- 在第一个连接条件下,最佳查询方案是将account作外层表,card作内层表,利用 card上的索引,其I/O次数可由以下公式估算为: ---- 外层表account上的22541页+(外层表account的191122行*内层表card上对应外层 表第一行所要查找的3页)=595907次I/O ---- 在第二个连接条件下,最佳查询方案是将card作外层表,account作内层表,利用 account上的索引,其I/O次数可由以下公式估算为: ---- 外层表card上的1944页+(外层表card的7896行*内层表account上对应外层表每一 行所要查找的4页)= 33528次I/O ---- 可见,只有充份的连接条件,真正的最佳方案才会被执行。 ---- 总结: ---- 1.多表操作在被实际执行前,查询优化器会根据连接条件,列出几组可能的连接方 案并从中找出系统开销最小的最佳方案。连接条件要充份考虑带有索引的表、行数多的 表;内外表的选择可由公式:外层表中的匹配行数*内层表中每一次查找的次数确定,乘 积最小为最佳方案。 ---- 2.查看执行方案的方法-- 用set showplanon,打开showplan选项,就可以看到连 接顺序、使用何种索引的信息;想看更详细的信息,需用sa角色执行dbcc(3604,310,30 2)。 三、不可优化的where子句 ---- 1.例:下列SQL条件语句中的列都建有恰当的索引,但执行速度却非常慢: select * from record where substring(card_no,1,4)='5378'(13秒) select * from record where amount/30< 1000(11秒) select * from record where convert(char(10),date,112)='19991201'(10秒) ---- 分析: ---- where子句中对列的任何操作结果都是在SQL运行时逐列计算得到的,因此它不得不 进行表搜索,而没有使用该列上面的索引;如果这些结果在查询编译时就能得到,那么 就可以被SQL优化器优化,使用索引,避免表搜索,因此将SQL重写成下面这样: select * from record where card_no like '5378%'(< 1秒) select * from record where amount < 1000*30(< 1秒) select * from record where date= '1999/12/01' (< 1秒) ---- 你会发现SQL明显快起来! ---- 2.例:表stuff有200000行,id_no上有非群集索引,请看下面这个SQL: select count(*) from stuff where id_no in('0','1') (23秒) ---- 分析: ---- where条件中的'in'在逻辑上相当于'or',所以语法分析器会将in ('0','1')转化 为id_no ='0' or id_no='1'来执行。我们期望它会根据每个or子句分别查找,再将结果 相加,这样可以利用id_no上的索引;但实际上(根据showplan),它却采用了"OR策略" ,即先取出满足每个or子句的行,存入临时数据库的工作表中,再建立唯一索引以去掉 重复行,最后从这个临时表中计算结果。因此,实际过程没有利用id_no上索引,并且完 成时间还要受tempdb数据库性能的影响。 ---- 实践证明,表的行数越多,工作表的性能就越差,当stuff有620000行时,执行时 间竟达到220秒!还不如将or子句分开: select count(*) from stuff where id_no='0' select count(*) from stuff where id_no='1' ---- 得到两个结果,再作一次加法合算。因为每句都使用了索引,执行时间只有3秒, 在620000行下,时间也只有4秒。或者,用更好的方法,写一个简单的存储过程: create proc count_stuff as declare @a int declare @b int declare @c int declare @d char(10) begin select @a=count(*) from stuff where id_no='0' select @b=count(*) from stuff where id_no='1' end select @c=@a+@b select @d=convert(char(10),@c) print @d ---- 直接算出结果,执行时间同上面一样快! ---- 总结: ---- 可见,所谓优化即where子句利用了索引,不可优化即发生了表扫描或额外开销。 ---- 1.任何对列的操作都将导致表扫描,它包括数据库函数、计算表达式等等,查询时 要尽可能将操作移至等号右边。 ---- 2.in、or子句常会使用工作表,使索引失效;如果不产生大量重复值,可以考虑把 子句拆开;拆开的子句中应该包含索引。 ---- 3.要善于使用存储过程,它使SQL变得更加灵活和高效。 ---- 从以上这些例子可以看出,SQL优化的实质就是在结果正确的前提下,用优化器可 以识别的语句,充份利用索引,减少表扫描的I/O次数,尽量避免表搜索的发生。其实S QL的性能优化是一个复杂的过程,上述这些只是在应用层次的一种体现,深入研究还会 涉及数据库层的资源配置、网络层的流量控制以及操作系统层的总体设计。 1.合理使用索引 索引是数据库中重要的数据结构,它的根本目的就是为了提高查询效率。现在大多数的数据库产品都采用IBM最先提出的ISAM索引结构。索引的使用要恰到好处,其使用原则如下: ●在经常进行连接,但是没有指定为外键的列上建立索引,而不经常连接的字段则由优化器自动生成索引。 ●在频繁进行排序或分组(即进行group by或order by操作)的列上建立索引。 ●在条件表达式中经常用到的不同值较多的列上建立检索,在不同值少的列上不要建立索引。比如在雇员表的"性别"列上只有"男"与"女"两个不同值,因此就无必要建立索引。如果建立索引不但不会提高查询效率,反而会严重降低更新速度。 ●如果待排序的列有多个,可以在这些列上建立复合索引(compound index)。 ●使用系统工具。如Informix数据库有一个tbcheck工具,可以在可疑的索引上进行检查。在一些数据库服务器上,索引可能失效或者因为频繁操作而使得读取效率降低,如果一个使用索引的查询不明不白地慢下来,可以试着用tbcheck工具检查索引的完整性,必要时进行修复。另外,当数据库表更新大量数据后,删除并重建索引可以提高查询速度。 2.避免或简化排序 应当简化或避免对大型表进行重复的排序。当能够利用索引自动以适当的次序产生输出时,优化器就避免了排序的步骤。以下是一些影响因素: ●索引中不包括一个或几个待排序的列; ●group by或order by子句中列的次序与索引的次序不一样; ●排序的列来自不同的表。 为了避免不必要的排序,就要正确地增建索引,合理地合并数据库表(尽管有时可能影响表的规范化,但相对于效率的提高是值得的)。如果排序不可避免,那么应当试图简化它,如缩小排序的列的范围等。 3.消除对大型表行数据的顺序存取 在嵌套查询中,对表的顺序存取对查询效率可能产生致命的影响。比如采用顺序存取策略,一个嵌套3层的查询,如果每层都查询1000行,那么这个查询就要查询10亿行数据。避免这种情况的主要方法就是对连接的列进行索引。例如,两个表:学生表(学号、姓名、年龄……)和选课表(学号、课程号、成绩)。如果两个表要做连接,就要在"学号"这个连接字段上建立索引。 还可以使用并集来避免顺序存取。尽管在所有的检查列上都有索引,但某些形式的where子句强迫优化器使用顺序存取。下面的查询将强迫对orders表执行顺序操作: Select * FROM orders Where (customer_num=104 AND order_num>1001) or order_num=1008
Although indexes are built on customer_num and order_num, the optimizer scans the entire table using sequential access paths in the above statement. Because this statement retrieves a collection of separated rows, it should be changed to the following statement:
Select * FROM orders Where customer_num=104 AND order_num > 1001
UNION
Select * FROM orders Where order_num=1008
In this way, the query can be processed using the index path. 4. Avoid related subqueries
If the label of a column appears in both the main query and the query in the where clause, it is likely that the subquery will have to be queried again when the column values in the main query have changed. The more levels of query nesting, the lower the efficiency, so subqueries should be avoided as much as possible. If the subquery is inevitable, filter out as many rows as possible in the subquery. 5. Regular expressions to avoid difficulties
The MATCHES and LIKE keywords support wildcard matching, which is technically called regular expressions. But this matching is particularly time-consuming. For example: Select * FROM customer Where zipcode LIKE "98 _ _"
Even if the index is established on the zipcode field, sequential scanning is used in this case. If you change the statement to Select * FROM customer Where zipcode > "98000", the index will be used to query when the query is executed, which obviously increases the speed greatly.
In addition, avoid non-starting substrings. For example, the statement: Select * FROM customer Where zipcode [2jue 3] > "80" uses a non-starting substring in the where clause, so this statement does not use an index. 6. Use temporary tables to speed up queries
Sorting a subset of tables and creating temporary tables can sometimes speed up queries. It helps avoid multiple sorting operations and simplifies the work of the optimizer in other ways. For example:
Select cust.name,rcvbles.balance,... Other columns
FROM cust,rcvbles
Where cust.customer_id = rcvlbes.customer_id
AND rcvblls.balance > 0
AND cust.postcode > "98000"
OrDER BY cust.name
If this query is to be executed more than once, you can find all unpaid customers in a temporary file and sort by customer's name:
Select cust.name,rcvbles.balance,... Other columns
FROM cust,rcvbles
Where cust.customer_id = rcvlbes.customer_id
AND rcvblls.balance > 0
OrDER BY cust.name
INTO TEMP cust_with_balance
Then query in the temporary table as follows:
Select * FROM cust_with_balance
Where postcode > "98000"
There are fewer rows in the temporary table than in the main table, and the physical order is the required order, reducing disk I and O, so the query workload can be greatly reduced.
Note: changes to the primary table are not reflected after the temporary table is created. When the data in the main table is frequently modified, be careful not to lose the data. 7. Replace non-sequential access with sorting
Non-sequential disk access is the slowest operation, as shown by the back and forth movement of the disk access arm. The SQL statement hides this situation, making it easy to write queries that require access to a large number of non-sequential pages when writing an application.
In some cases, using the sorting ability of the database instead of non-sequential access can improve the query. 3. Optimize tempdb performanc
General recommendations for the physical location of the tempdb database and database option settings include:
Enables the tempdb database to be automatically expanded on demand. This ensures that the query is not terminated until the execution is complete, which produces a much larger intermediate result set stored in the tempdb database than expected. Set the initial size of the tempdb database file to a reasonable size to avoid automatic file expansion when more space is needed. If the tempdb database is extended too frequently, performance will be adversely affected. Set the file growth increment percentage to a reasonable size to prevent tempdb database files from growing at too small a value. If the file growth is too small compared to the amount of data written to the tempdb database, the tempdb database may need to scale all the time, which will impair performance. Put the tempdb database on the fast Istroke O subsystem to ensure good performance. Striping tempdb databases on multiple disks for better performance. Place the tempdb database on a disk other than the one used by the user database. For more information, see expanding databases.
4. Optimize servers: optimize server performance using memory configuration options
The memory management component of Microsoft ®SQL Server ™2000 eliminates the need to manually manage the memory available to SQL Server. SQL Server dynamically determines the amount of memory that should be allocated at startup based on the amount of memory currently being used by the operating system and other applications. When the load on the computer and SQL Server changes, so does the allocated memory. For more information, see memory Architecture. The following server configuration options can be used to configure memory usage and affect server performance:
Min server memory
Max server memory
Max worker threads
Index create memory min memory per query
The min server memory server configuration option can be used to ensure that SQL Server does not free memory after reaching this value. This configuration option can be set to a specific value based on the size and activity of the SQL Server. If you choose to set this option, you must set aside enough memory for the operating system and other programs. If the operating system does not have enough memory, it requests memory from SQL Server, which affects SQL Server performance. The max server memory server configuration option can be used to specify the maximum amount of memory that SQL Server can allocate when SQL Server is up and running. If you know that there are multiple applications running at the same time as SQL Server, and you want to ensure that those applications have enough memory to run, you can set this configuration option to a specific value. If these other applications, such as Web servers or e-mail servers, request memory only as needed, SQL Server frees up memory for them as needed, so do not set the max server memory server configuration option. However, applications usually use available memory selectively at startup, and do not request more memory if it is needed. If an application that behaves in this way is running on the same computer as SQL Server, set the max server memory server configuration option to a specific value to ensure that the memory required by the application is not allocated by SQL Server.
Do not set the min server memory and max server memory server configuration options to the same value, which will make the amount of memory allocated to SQL Server fixed. Dynamic memory allocation provides the best overall performance over time. For more information, see Server memory options. The max worker threads server configuration option can be used to specify the number of threads that provide support for users to connect to the SQL Server. The default setting of 255 may be slightly higher for some configurations, depending on the number of concurrent users. Because each worker thread is allocated, even if the thread is not in use (because there are fewer concurrent connections than allocated worker threads), memory resources that can be better utilized by other operations, such as cache, may also be unused. In general, the configuration value should be set to the number of concurrent connections, but not more than 32727. A concurrent connection is different from a user login connection. The worker thread pool for the SQL Server instance only needs to be large enough to serve user connections that are also performing batch processing in the instance. If you increase the number of worker threads beyond the default value, server performance will be degraded. For more information, see max worker threads options.
Indicates that the maximum worker server configuration option does not work when SQL Server is running on Microsoft Windows ®98. The index create memory server configuration option controls the amount of memory used by the sort operation when creating the index. Creating an index on a production system is usually an infrequent task and is usually scheduled for jobs that are executed at off-peak times. Therefore, indexes are created infrequently and when off-peak time, increasing this value can improve the performance of index creation. However, it is best to keep the min memory per query configuration option at a low value so that the index creation job can start even if all requested memory is not available. For more information, see index create memory options.
The min memory per query server configuration option can be used to specify the minimum amount of memory allocated to query execution. Increasing the value of min memory per query can help improve the performance of memory-intensive queries such as large sorting and hashing operations when many queries are executed concurrently in the system. However, do not set the min memory per query server configuration option too high, especially on busy systems, because the query will have to wait until it ensures that the requested minimum memory is occupied, or until it exceeds the value specified in the query wait server configuration option. If there is more available memory than the specified minimum memory needed to execute the query, the extra memory can be used as long as the query makes effective use of the excess memory. For more information, see min memory per query options and query wait options. Optimize server performance using the Icano configuration option
The following server configuration options can be used to configure the use of Ihand O and affect server performance: recovery interval
The recovery interval server configuration option controls when Microsoft ®SQL Server ™2000 issues checkpoints within each database. By default, SQL Server determines the best time to perform a checkpoint operation. However, to determine whether this is an appropriate setting, you need to use Windows NT performance Monitor to monitor disk write activity on the database file. Activity spikes that lead to 100% disk utilization impair performance. Changing this parameter to reduce the occurrence of checkpoint processes can usually improve overall performance in this case. However, you must continue to monitor performance to determine whether the new values have had a positive impact on performance. For more information, see recovery interval options.
Thank you for your reading, the above is the content of "how to make SQL run faster". After the study of this article, I believe you have a deeper understanding of how to make SQL run faster, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.