How to write better SQL queries: the Ultimate Guide-part 3 07/16 Update SLTechnology News&Howtos

How to write better SQL queries: the Ultimate Guide-part 3

2025-07-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

This time we'll learn the last article in the series "how to write better SQL queries."

Time complexity and Big O symbol

Through the first two articles, we already have some understanding of the query plan. Next, we can also use the theory of computational complexity to further explore and think about the improvement of performance. The field of theoretical computer science focuses on classifying computational problems according to difficulty. These computing problems can be algorithm problems or query problems.

For queries, we can classify them not by difficulty, but by the time it takes to run the query and get the results. This approach is also known as classifying according to time complexity.

Using the big O symbol, you can represent the elapsed time according to the growth rate of the input, because the input can be arbitrarily large. The big O symbol does not include coefficients and lower-order terms so that you can focus on an important part of the query run time: the growth rate. In this way, coefficients and low-order items are discarded, and the time complexity is gradually described, which means that the input becomes infinite.

In a database language, complexity measures the length of time a query takes to run.

Note that the size of the database not only increases as the data stored in the table increases, but the indexes in the database also affect the database size.

Estimate the time complexity of the query plan

The execution plan defines the algorithm used for each operation, which also allows the execution time of each query to be logically expressed as a function of the size of the data table in the query plan. In other words, you can use the big O symbol and the execution plan to estimate the complexity and performance of the query.

In the following summary, we will learn about four types of time complexity concepts.

Through these examples, you can see that the time complexity of the query varies depending on the content of the query being run.

For different databases, different indexing methods, different execution plans and different implementation methods need to be considered.

Therefore, the concept of time complexity listed below is very common.

O (1): constant time

There is a query algorithm that takes the same time to execute regardless of the size of the input, which is a constant time query. These types of queries are not common, and here is an example:

SELECT TOP 1 t.*FROM t

The time complexity of this algorithm is constant because only any row is selected from the table. Therefore, the length of time is independent of the size of the table.

Linear time: O (n)

If the execution time of an algorithm is proportional to the input size, the execution time of the algorithm will increase with the increase of the input size. For a database, this means that the query execution time is proportional to the table size: as the number of rows in the table increases, the query time increases accordingly.

This means that every row in the table needs to be read in order to find the correct ID data. Even if the correct data is found in the first row, the query reads each row of data.

If there is no index, the complexity of the query is O (n) i_id:

SELECT i_idFROM item

This also means that count queries such as COUNT (*) FROM TABLE have O (n) time complexity, and full table scans are performed unless the total number of rows of the data table is stored. At this point, the complexity will be more like O (1).

Closely related to linear execution time is the sum of time for all linear execution plans. Here are some examples:

The complexity of hash join (hash join) is O (M + N). The classic hash join algorithm for joining two internal data tables is to first prepare a hash table for the smaller data table. The entry to the hash table consists of join properties and rows. Access to the hash table is achieved by applying the hash function to the join property. Once the hash table is built, the larger table is scanned and the relevant rows in the smaller table are found by looking at the hash table.

The complexity of a merge join (merge join) is O (M + N), but this join relies heavily on the index on the join column, and in the absence of an index, rows are sorted first according to the key used in the join:

If the two tables are sorted according to the key used in the join, the complexity of the query is O (M + N).

If both tables have indexes on joined columns, the indexes maintain the columns sequentially and do not need to sort. In this case, the complexity is O (M + N).

If neither table joins the indexes on the column, you need to sort the two tables first, so the complexity will be O (M log M + N log N).

If one table has an index on its join column and another table does not, the table without an index needs to be sorted first, so the complexity will be O (M + N log N).

For nested joins, the complexity is usually O (MN). This join is especially effective when one or two tables are very small (for example, less than 10 records).

Remember that a nested join is a join that compares each record in one table to each record in another table.

Logarithmic time: O (log (n))

If the execution time of the algorithm is proportional to the logarithm of the input size, the algorithm is called the logarithmic time algorithm; for queries, this means that the execution time is proportional to the logarithm of the database size.

The query plan time complexity of performing an index scan (index Scan) or a clustered index scan is logarithmic time. A clustered index is an index whose leaf level contains the actual rows of data of the table. Clustering is very similar to other indexes: it is defined on one or more columns. This also forms the index primary key. A clustered primary key is the primary key column of a clustered index. Clustered index scanning is the basic operation for RDBMS to read from beginning to end in a clustered index.

In the following example, there is an index of i_id, which also leads to the complexity of O (log (n)):

SELECT i_stockFROM itemWHERE i_id = N

If there is no index, the time complexity is O (n).

Second time: O (n ^ 2)

If the execution time of the algorithm is proportional to the square of the input size, the algorithm is called the logarithmic time algorithm. For databases, this means that the execution time of the query is proportional to the square of the database size.

Examples of queries with quadratic time complexity are as follows:

SELECT * FROM item, authorWHERE item.i_a_id=author.a_id

The minimum complexity is O (n log (n)), but based on the index information of the join property, the maximum complexity will be O (n ^ 2).

The following figure is a chart that estimates query performance based on time complexity, through which you can see the performance of each algorithm.

SQL tuning

Query plan and time complexity can be measured in the following ways, and SQL queries can be further tuned:

Replace unnecessary large data tables with index scans

Make sure that the join order of the table is the best order

Ensure that indexes are used in the best way

Cache the full table scan of the small data table.

All the contents of the tutorial "how to write better SQL queries" are introduced here. I hope that through the introduction of this tutorial, we can help you write better and better SQL queries.

Original link: https://www.datacamp.com/community/tutorials/sql-tutorial-query#importance

Reprint, please indicate from: grape city control

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.