Optimization of sql statement 07/01 Update SLTechnology News&Howtos

Optimization of sql statement

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "the optimization of sql sentences". The content of the explanation is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "the optimization of sql sentences".

1. Rational use of index

Index is an important data structure, and its fundamental purpose is to improve query efficiency. Nowadays, most database products adopt the ISAM index structure first proposed by IBM. The use of the index should be appropriate, and the principles of its use are as follows:

● indexes columns that are frequently joined but not specified as foreign keys, while fields that are not frequently joined are automatically indexed by the optimizer.

● builds indexes on columns that are sorted or grouped frequently (that is, group by or order by operations).

● builds retrieval on columns with more different values that are often used in conditional expressions, and does not index columns with fewer values. For example, there are only two different values of "male" and "female" in the "gender" column of the employee table, so there is no need to establish an index. If the index is established, it will not improve the query efficiency, but will seriously reduce the update speed.

● if there are multiple columns to be sorted, you can build a composite index (compound index) on those columns.

● uses system tools. For example, the Informix database has a tbcheck tool that can be checked on suspicious indexes. On some database servers, the index may fail or the reading efficiency may be reduced due to frequent operations. if a query using the index slows down inexplicably, you can try to check the integrity of the index with the tbcheck tool and fix it if necessary. In addition, when the database table updates a large amount of data, deleting and rebuilding the index can improve the query speed.

(1) in the following two select statements:

Select * from table1 where field1=0

Select * from table1 where field1 > = 0 and field1=0, the first select statement is much more efficient than the second select statement, because the first condition of the second select statement consumes a lot of system resources.

The first principle: the most restrictive conditions should be put first in the where clause.

(2) in the following select statement:

Select * from tab where a =... And b =... And c =...

If there is an index index, the order of the fields in the where clause should be the same as that of the fields in the index.

Second principle: the order of fields in the where clause should be the same as the order of fields in the index.

The following assumes that there is a unique index I1 on field1 and a non-unique index I2 on field2.

(3) select field3,field4 from tb where field1='sdf' is fast

Select * from tb where field1='sdf' is slow

Because the latter takes one more step to access the ROWID table after the index scan.

(4) select field3,field4 from tb where field1 > = 'sdf' is fast

Select field3,field4 from tb where field1 > 'sdf' slow

Because the former can quickly locate the index.

(5) select field3,field4 from tb where field2 like'R% 'fast

Select field3,field4 from tb where field2 like'% R' slow

Because the latter does not use indexes.

(6) use functions such as:

Select field3,field4 from tb where upper (field2) = 'RMN' does not use an index.

If a table has 20,000 records, it is not recommended to use functions; if a table has more than 50,000 records, the use of functions is strictly prohibited! There is no limit below 20,000 records.

(7) Null values are not stored in the index, so

Select field3,field4 from tb where field2 is [not] null does not use indexes.

(8) inequalities such as

Select field3,field4 from tb where field2 does not use indexes.

Similarly

Select field3,field4 from tb where field2 not in ('Mauremending P') does not use an index.

(9) Multi-column indexes, which can be used only if the first column of the index in the query is used for the condition.

(10) MAX,MIN and other functions, such as

Select max (field2) from tb uses indexes. Therefore, if you need to max,min,sum fields, etc., you should add an index.

Use only one aggregate function at a time, such as:

Select "min" = min (field1), "max" = max (field1) from tb

How about: select "min" = (select min (field1) from tb), "max" = (select max (field1) from tb)

(11) Indexes with too many duplicate values are not used by the query optimizer. And because you have an index, you have to modify the index when you modify the value of the field, so the operation of updating the field is slower than no index.

(12) if the index value is too large (such as building an index on a field of char (40)), it will cause a lot of Imax O overhead (even more than the Imax O cost of table scan). Therefore, try to use integer indexes. Sp_estspace can calculate the cost of tables and indexes.

(13) for multi-column indexes, the order of the order by must be the same as the order of the fields of the index.

(14) in sybase, if the fields of order by form a cluster index, there is no need to do order by. The order of the records is the same as the cluster index.

(15) join multiple tables (specific query scheme needs to be obtained through testing)

The qualification in the where clause uses the associated fields as much as possible, and puts the associated fields first as much as possible.

Select a.field1,b.field2 from a,b where a.field3=b.field3

1. If there is no index on field3:

Do a full table scan and sort the results

Scan the full table of b and sort the results.

The results were merged.

It is suitable for small or large watches.

2. There is an index on field3

According to the order of table join, b is the driven table and an is the driven table

Do a full table scan of b

Scan the index range of a

If there is a match, access it through a's rowid

(16) avoid one to many join. Such as:

Select tb1.field3,tb1.field4,tb2.field2 from tb1,tb2 where tb1.field2=tb2.field2 and tb1.field2='BU1032' and tb2.field2= 'aaa'

How about:

Declare @ a varchar (80)

Select @ a=field2 from tb2 where field2='aaa'

Select tb1.field3,tb1.field4,@a from tb1 where field2= 'aaa'

(16) Sub-query

Use exists/not exists instead of in/not in operation

Compare:

Select a.field1 from a where a.field2 in (select b.field1 from b where b.field2=100)

Select a.field1 from a where exists (select 1 from b where a.field2=b.field1 and b.field2=100)

Select field1 from a where field1 not in (select field2 from b)

Select field1 from a where not exists (select 1 from b where b.field2=a.field1)

(17) Primary and foreign keys are mainly used for data constraints. Indexes are created automatically when primary keys are created in sybase, and foreign keys have nothing to do with indexes. To improve performance, indexes must be rebuilt.

(18) the non-indexing of fields of type char is worse than that of fields of type int. The performance is only slightly worse after indexing.

(19) use count (*) instead of count (column_name) and avoid using count (distinct column_name).

(20) try not to use field names on the right side of the equal sign, such as:

Select * from tb where field1 = field3

(21) avoid using or conditions because or does not use indexes.

two。 Avoid using order by and group by sentences.

Because the use of these two clauses will take up a lot of temporary space (tempspace), if you must, you can use views, manual generation of temporary tables instead.

If you must, check the size of memory and tempdb first.

Tests have proved that, especially to avoid using both join and group by in a query, the speed will be very slow!

3. Use as few subqueries as possible, especially related subqueries. Because it will lead to a decline in efficiency.

If the label of a column appears in both the main query and the query in the where clause, it is likely that the subquery will have to be queried again when the column values in the main query have changed. The more levels of query nesting, the lower the efficiency, so subqueries should be avoided as much as possible. If the subquery is inevitable, filter out as many rows as possible in the subquery.

4. Eliminate sequential access to large table row data

In nested queries, sequential access to tables can have a fatal impact on query efficiency. For example, using a sequential access strategy, a nested three-tier query, if each layer queries 1000 rows, then the query will query 1 billion rows of data. The main way to avoid this is to index the connected columns. For example, two tables: student form (student number, name, age... ) and course schedule (student number, course number, grade). If two tables are to be joined, an index should be established on the connection field "student number".

You can also use union to avoid sequential access. Although there are indexes on all check columns, some forms of where clauses force the optimizer to use sequential access. The following query forces sequential operations on the orders table:

SELECT * FROM orders WHERE (customer_num=104 AND order_num > 1001) OR order_num=1008

Although indexes are built on customer_num and order_num, the optimizer scans the entire table using sequential access paths in the above statement. Because this statement retrieves a collection of separated rows, it should be changed to the following statement:

SELECT * FROM orders WHERE customer_num=104 AND order_num > 1001

UNION

SELECT * FROM orders WHERE order_num=1008

In this way, the query can be processed using the index path.

5. Regular expressions to avoid difficulties

The MATCHES and LIKE keywords support wildcard matching, which is technically called regular expressions. But this matching is particularly time-consuming. For example: SELECT * FROM customer WHERE zipcode LIKE "98 _ _"

Even if the index is established on the zipcode field, sequential scanning is used in this case. If you change the statement to SELECT * FROM customer WHERE zipcode > "98000", the index will be used to query when the query is executed, which obviously increases the speed greatly.

In addition, avoid non-starting substrings. For example, the statement: SELECT * FROM customer WHERE zipcode [2jue 3] > "80" uses a non-starting substring in the where clause, so this statement does not use an index.

6. Use temporary tables to speed up queries

Sorting a subset of tables and creating temporary tables can sometimes speed up queries. It helps avoid multiple sorting operations and simplifies the work of the optimizer in other ways. For example:

SELECT cust.name,rcvbles.balance,... Other columns

FROM cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

AND rcvblls.balance > 0

AND cust.postcode > "98000"

ORDER BY cust.name

If this query is to be executed more than once, you can find all unpaid customers in a temporary file and sort by customer's name:

SELECT cust.name,rcvbles.balance,... Other columns

FROM cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

AND rcvblls.balance > 0

ORDER BY cust.name

INTO TEMP cust_with_balance

Then query in the temporary table as follows:

SELECT * FROM cust_with_balance

WHERE postcode > "98000"

There are fewer rows in the temporary table than in the main table, and the physical order is the required order, reducing disk I and O, so the query workload can be greatly reduced.

Note: changes to the primary table are not reflected after the temporary table is created. When the data in the main table is frequently modified, be careful not to lose the data.

7. Replace non-sequential access with sorting

Non-sequential disk access is the slowest operation, as shown by the back and forth movement of the disk access arm. The SQL statement hides this situation, making it easy to write queries that require access to a large number of non-sequential pages when writing an application.

In some cases, using the sorting ability of the database instead of non-sequential access can improve the query.

Thank you for your reading, the above is the content of "the optimization of sql sentences". After the study of this article, I believe you have a deeper understanding of the optimization of sql sentences, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.