In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "the optimization of sql sentences". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the optimization of sql sentences".
1. Rational use of index
Index is an important data structure, and its fundamental purpose is to improve query efficiency. Nowadays, most database products adopt the ISAM index structure first proposed by IBM. The use of the index should be appropriate, and the principles of its use are as follows:
● indexes columns that are frequently joined but not specified as foreign keys, while fields that are not frequently joined are automatically indexed by the optimizer.
● builds indexes on columns that are sorted or grouped frequently (that is, group by or order by operations).
● builds retrieval on columns with more different values that are often used in conditional expressions, and does not index columns with fewer values. For example, there are only two different values of "male" and "female" in the "gender" column of the employee table, so there is no need to establish an index. If the index is established, it will not improve the query efficiency, but will seriously reduce the update speed.
● if there are multiple columns to be sorted, you can build a composite index (compound index) on those columns.
● uses system tools. For example, the Informix database has a tbcheck tool that can be checked on suspicious indexes. On some database servers, the index may fail or the reading efficiency may be reduced due to frequent operations. if a query using the index slows down inexplicably, you can try to check the integrity of the index with the tbcheck tool and fix it if necessary. In addition, when the database table updates a large amount of data, deleting and rebuilding the index can improve the query speed.
(1) in the following two select statements:
Select * from table1 where field1=0
Select * from table1 where field1 > = 0 and field1=0, the first select statement is much more efficient than the second select statement, because the first condition of the second select statement consumes a lot of system resources.
The first principle: the most restrictive conditions should be put first in the where clause.
(2) in the following select statement:
Select * from tab where a =... And b =... And c =...
If there is an index index, the order of the fields in the where clause should be the same as that of the fields in the index.
Second principle: the order of fields in the where clause should be the same as the order of fields in the index.
The following assumes that there is a unique index I1 on field1 and a non-unique index I2 on field2.
(3) select field3,field4 from tb where field1='sdf' is fast
Select * from tb where field1='sdf' is slow
Because the latter takes one more step to access the ROWID table after the index scan.
(4) select field3,field4 from tb where field1 > = 'sdf' is fast
Select field3,field4 from tb where field1 > 'sdf' slow
Because the former can quickly locate the index.
(5) select field3,field4 from tb where field2 like'R% 'fast
Select field3,field4 from tb where field2 like'% R' slow
Because the latter does not use indexes.
(6) use functions such as:
Select field3,field4 from tb where upper (field2) = 'RMN' does not use an index.
If a table has 20,000 records, it is not recommended to use functions; if a table has more than 50,000 records, the use of functions is strictly prohibited! There is no limit below 20,000 records.
(7) Null values are not stored in the index, so
Select field3,field4 from tb where field2 is [not] null does not use indexes.
(8) inequalities such as
Select field3,field4 from tb where field2 does not use indexes.
Similarly
Select field3,field4 from tb where field2 not in ('Mauremending P') does not use an index.
(9) Multi-column indexes, which can be used only if the first column of the index in the query is used for the condition.
(10) MAX,MIN and other functions, such as
Select max (field2) from tb uses indexes. Therefore, if you need to max,min,sum fields, etc., you should add an index.
Use only one aggregate function at a time, such as:
Select "min" = min (field1), "max" = max (field1) from tb
How about: select "min" = (select min (field1) from tb), "max" = (select max (field1) from tb)
(11) Indexes with too many duplicate values are not used by the query optimizer. And because you have an index, you have to modify the index when you modify the value of the field, so the operation of updating the field is slower than no index.
(12) if the index value is too large (such as building an index on a field of char (40)), it will cause a lot of Imax O overhead (even more than the Imax O cost of table scan). Therefore, try to use integer indexes. Sp_estspace can calculate the cost of tables and indexes.
(13) for multi-column indexes, the order of the order by must be the same as the order of the fields of the index.
(14) in sybase, if the fields of order by form a cluster index, there is no need to do order by. The order of the records is the same as the cluster index.
(15) join multiple tables (specific query scheme needs to be obtained through testing)
The qualification in the where clause uses the associated fields as much as possible, and puts the associated fields first as much as possible.
Select a.field1,b.field2 from a,b where a.field3=b.field3
1. If there is no index on field3:
Do a full table scan and sort the results
Scan the full table of b and sort the results.
The results were merged.
It is suitable for small or large watches.
2. There is an index on field3
According to the order of table join, b is the driven table and an is the driven table
Do a full table scan of b
Scan the index range of a
If there is a match, access it through a's rowid
(16) avoid one to many join. Such as:
Select tb1.field3,tb1.field4,tb2.field2 from tb1,tb2 where tb1.field2=tb2.field2 and tb1.field2='BU1032' and tb2.field2= 'aaa'
How about:
Declare @ a varchar (80)
Select @ a=field2 from tb2 where field2='aaa'
Select tb1.field3,tb1.field4,@a from tb1 where field2= 'aaa'
(16) Sub-query
Use exists/not exists instead of in/not in operation
Compare:
Select a.field1 from a where a.field2 in (select b.field1 from b where b.field2=100)
Select a.field1 from a where exists (select 1 from b where a.field2=b.field1 and b.field2=100)
Select field1 from a where field1 not in (select field2 from b)
Select field1 from a where not exists (select 1 from b where b.field2=a.field1)
(17) Primary and foreign keys are mainly used for data constraints. Indexes are created automatically when primary keys are created in sybase, and foreign keys have nothing to do with indexes. To improve performance, indexes must be rebuilt.
(18) the non-indexing of fields of type char is worse than that of fields of type int. The performance is only slightly worse after indexing.
(19) use count (*) instead of count (column_name) and avoid using count (distinct column_name).
(20) try not to use field names on the right side of the equal sign, such as:
Select * from tb where field1 = field3
(21) avoid using or conditions because or does not use indexes.
two。 Avoid using order by and group by sentences.
Because the use of these two clauses will take up a lot of temporary space (tempspace), if you must, you can use views, manual generation of temporary tables instead.
If you must, check the size of memory and tempdb first.
Tests have proved that, especially to avoid using both join and group by in a query, the speed will be very slow!
3. Use as few subqueries as possible, especially related subqueries. Because it will lead to a decline in efficiency.
If the label of a column appears in both the main query and the query in the where clause, it is likely that the subquery will have to be queried again when the column values in the main query have changed. The more levels of query nesting, the lower the efficiency, so subqueries should be avoided as much as possible. If the subquery is inevitable, filter out as many rows as possible in the subquery.
4. Eliminate sequential access to large table row data
In nested queries, sequential access to tables can have a fatal impact on query efficiency. For example, using a sequential access strategy, a nested three-tier query, if each layer queries 1000 rows, then the query will query 1 billion rows of data. The main way to avoid this is to index the connected columns. For example, two tables: student form (student number, name, age... ) and course schedule (student number, course number, grade). If two tables are to be joined, an index should be established on the connection field "student number".
You can also use union to avoid sequential access. Although there are indexes on all check columns, some forms of where clauses force the optimizer to use sequential access. The following query forces sequential operations on the orders table:
SELECT * FROM orders WHERE (customer_num=104 AND order_num > 1001) OR order_num=1008
Although indexes are built on customer_num and order_num, the optimizer scans the entire table using sequential access paths in the above statement. Because this statement retrieves a collection of separated rows, it should be changed to the following statement:
SELECT * FROM orders WHERE customer_num=104 AND order_num > 1001
UNION
SELECT * FROM orders WHERE order_num=1008
In this way, the query can be processed using the index path.
5. Regular expressions to avoid difficulties
The MATCHES and LIKE keywords support wildcard matching, which is technically called regular expressions. But this matching is particularly time-consuming. For example: SELECT * FROM customer WHERE zipcode LIKE "98 _ _"
Even if the index is established on the zipcode field, sequential scanning is used in this case. If you change the statement to SELECT * FROM customer WHERE zipcode > "98000", the index will be used to query when the query is executed, which obviously increases the speed greatly.
In addition, avoid non-starting substrings. For example, the statement: SELECT * FROM customer WHERE zipcode [2jue 3] > "80" uses a non-starting substring in the where clause, so this statement does not use an index.
6. Use temporary tables to speed up queries
Sorting a subset of tables and creating temporary tables can sometimes speed up queries. It helps avoid multiple sorting operations and simplifies the work of the optimizer in other ways. For example:
SELECT cust.name,rcvbles.balance,... Other columns
FROM cust,rcvbles
WHERE cust.customer_id = rcvlbes.customer_id
AND rcvblls.balance > 0
AND cust.postcode > "98000"
ORDER BY cust.name
If this query is to be executed more than once, you can find all unpaid customers in a temporary file and sort by customer's name:
SELECT cust.name,rcvbles.balance,... Other columns
FROM cust,rcvbles
WHERE cust.customer_id = rcvlbes.customer_id
AND rcvblls.balance > 0
ORDER BY cust.name
INTO TEMP cust_with_balance
Then query in the temporary table as follows:
SELECT * FROM cust_with_balance
WHERE postcode > "98000"
There are fewer rows in the temporary table than in the main table, and the physical order is the required order, reducing disk I and O, so the query workload can be greatly reduced.
Note: changes to the primary table are not reflected after the temporary table is created. When the data in the main table is frequently modified, be careful not to lose the data.
7. Replace non-sequential access with sorting
Non-sequential disk access is the slowest operation, as shown by the back and forth movement of the disk access arm. The SQL statement hides this situation, making it easy to write queries that require access to a large number of non-sequential pages when writing an application.
In some cases, using the sorting ability of the database instead of non-sequential access can improve the query.
Thank you for your reading, the above is the content of "the optimization of sql sentences". After the study of this article, I believe you have a deeper understanding of the optimization of sql sentences, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.