SQL optimization-millions of data query optimization 10/16 Update SLTechnology News&Howtos

SQL optimization-millions of data query optimization

2025-10-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Query optimization of millions of data 1. Rational use of indexes

Index is an important data structure in database, and its fundamental purpose is to improve query efficiency. Nowadays, most database products adopt the ISAM index structure first proposed by IBM. The use of the index should be appropriate, and the principles of its use are as follows:

● indexes columns that are frequently joined but not specified as foreign keys, while fields that are not frequently joined are automatically indexed by the optimizer.

● builds indexes on columns that are sorted or grouped frequently (that is, group by or order by operations).

● builds retrieval on columns with more different values that are often used in conditional expressions, and does not index columns with fewer values. For example, there are only two different values of "male" and "female" in the "gender" column of the employee table, so there is no need to establish an index. If the index is established, it will not improve the query efficiency, but will seriously reduce the update speed.

● if there are multiple columns to be sorted, you can build a composite index (compound index) on those columns.

● uses system tools. For example, the Informix database has a tbcheck tool that can be checked on suspicious indexes. On some database servers, the index may fail or the reading efficiency may be reduced due to frequent operations. if a query using the index slows down inexplicably, you can try to check the integrity of the index with the tbcheck tool and fix it if necessary. In addition, when the database table updates a large amount of data, deleting and rebuilding the index can improve the query speed.

2. Avoid or simplify sorting

Repetitive sorting of large tables should be simplified or avoided. The optimizer avoids sorting steps when the index can be used to automatically generate output in the appropriate order. Here are some influencing factors:

One or more columns to be sorted are not included in the ● index

The order of columns in the ● group by or order by clause is not the same as the order of the index

The columns sorted by ● come from different tables.

In order to avoid unnecessary sorting, it is necessary to correctly add indexes and merge database tables reasonably (although it may sometimes affect the standardization of tables, it is worth the improvement in efficiency). If sorting is inevitable, you should try to simplify it, such as narrowing the scope of sorted columns.

3. Eliminate sequential access to large table row data

In nested queries, sequential access to tables can have a fatal impact on query efficiency. For example, using a sequential access strategy, a nested three-tier query, if each layer queries 1000 rows, then the query will query 1 billion rows of data. A) the main way to avoid this is to index the connected columns.

B) you can also use union to avoid sequential access (change or to union). Although there are indexes on all check columns, some forms of where clauses force the optimizer to use sequential access. The following query forces sequential operations on the orders table:

SELECT * FROM orders WHERE (customer_num=104 AND order_num > 1001) OR order_num=1008

Although indexes are built on customer_num and order_num, the optimizer scans the entire table using sequential access paths in the above statement. Because this statement retrieves a collection of separated rows, it should be changed to the following statement:

SELECT * FROM orders WHERE customer_num=104 AND order_num > 1001

UNION

SELECT * FROM orders WHERE order_num=1008

In this way, you can use the index path to process the query to obtain the [download address].

4. Avoid related subqueries

If the label of a column appears in both the main query and the query in the where clause, it is likely that the subquery will have to be queried again when the column values in the main query have changed. The more levels of query nesting, the lower the efficiency, so subqueries should be avoided as much as possible. If the subquery is inevitable, filter out as many rows as possible in the subquery.

5. Regular expressions to avoid difficulties

The MATCHES and LIKE keywords support wildcard matching, which is technically called regular expressions. But this matching is particularly time-consuming. For example: SELECT * FROM customer WHERE zipcode LIKE "98 _ _"

Even if the index is established on the zipcode field, sequential scanning is used in this case. If you change the statement to SELECT * FROM customer WHERE zipcode > "98000", the index will be used to query when the query is executed, which obviously increases the speed greatly.

In addition, avoid non-starting substrings. For example, the statement: SELECT * FROM customer WHERE zipcode [2jue 3] > "80" uses a non-starting substring in the where clause, so this statement does not use an index.

6. Use temporary tables to speed up queries

Sorting a subset of tables and creating temporary tables can sometimes speed up queries. It helps avoid multiple sorting operations and simplifies the work of the optimizer in other ways. For example:

SELECT cust.name,rcvbles.balance,... Other columns

FROM cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

AND rcvblls.balance > 0

AND cust.postcode > "98000"

ORDER BY cust.name

If this query is to be executed more than once, you can find all unpaid customers in a temporary file and sort by customer's name:

SELECT cust.name,rcvbles.balance,... Other columns

FROM cust,rcvbles

WHERE cust.customer_id = rcvlbes.customer_id

AND rcvblls.balance > 0

ORDER BY cust.name

INTO TEMP cust_with_balance

Then query in the temporary table as follows:

SELECT * FROM cust_with_balance

WHERE postcode > "98000"

There are fewer rows in the temporary table than in the main table, and the physical order is the required order, reducing disk I and O, so the query workload can be greatly reduced.

Note: changes to the primary table are not reflected after the temporary table is created. When the data in the main table is frequently modified, be careful not to lose the data.

Small knot

20% of the code takes 80% of the time, which is a well-known law in programming, as well as in database applications. Our optimization needs to address the key issues, and for database applications, the focus is on the efficiency of SQL execution. The key link of query optimization is to make the database server read less data from disk and read pages sequentially instead of non-sequentially.

The second part (how to make the engine make full use of the index to obtain [download address]) Thirty query Optimization techniques for 1 million data

1. In order to optimize the query by establishing an index, we should try our best to avoid full table scanning, and first of all, we should consider establishing an index on the columns involved in where and order by.

two。 Try to avoid judging the null value of a field in the where clause, otherwise it will cause the engine to abandon the use of the index and perform a full table scan, such as:

Select id from t where num is null

You can set the default value of 0 on num to ensure that there is no null value for the num column in the table, and then query it like this:

Select id from t where num=0

3. The use of the! = or operator in the where clause should be avoided as much as possible, otherwise the engine will abandon the use of indexes and perform a full table scan.

4. Try to avoid using or to join conditions in the where clause, and you can use union, otherwise it will cause the engine to abandon the use of indexes and perform full table scans, such as:

Select id from t where num=10 or num=20

You can query it like this:

Select id from t where num=10

Union all

Select id from t where num=20

5.in and not in should also be used with caution, otherwise it will lead to full table scanning, such as:

Select id from t where num in (1, 2, 3)

For consecutive values, use between instead of in:

Select id from t where num between 1 and 3

6. The following fuzzy query will also cause a full table scan:

Select id from t where name like'% abc%'

To improve efficiency, consider full-text retrieval.

7. Using parameters in the where clause also results in a full table scan.

Because SQL parses local variables only at run time, the optimizer cannot defer the choice of an access plan until run time; it must be selected at compile time. However, if an access plan is established at compile time, the value of the variable is still unknown and cannot be used as an input to the index selection. A full table scan will be performed as follows:

Select id from t where num=@num

You can force the query to use the index instead:

Select id from t with (index (index name)) where num=@num

8. Expression manipulation of fields in the where clause should be avoided as far as possible, which will cause the engine to abandon the use of indexes and perform full table scans. Such as:

Select id from t where num/2=100

It should be changed to:

Select id from t where num=100*2

9. You should try to avoid performing functions (built-in functions) on fields in the where clause.

This will cause the engine to abandon the use of indexes and perform a full table scan. Such as:

Select id from t where substring (name,1,3) = 'abc'--name id that begins with abc

Select id from t where datediff (day,createdate,'2005-11-30') = id generated by Murray 2005-11-30'

It should be changed to:

Select id from t where name like 'abc%'

Select id from t where createdate > = '2005-11-30' and createdate

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.