SQL performance part 2: query Analysis and access path Development 07/11 Update SLTechnology News&Howtos

SQL performance part 2: query Analysis and access path Development

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

Manifesto of the Lady

In the first part of the SQL performance overview, we studied relationship optimization and its influencing factors. In today's article, we turn our attention to query analysis and the way SQL is transformed into executable code. I hope it will be helpful for you to optimize the performance of SQL.

From the upper level, the optimization process consists of four steps:

Receive and validate SQL statements.

Analyze the environment and optimize the method that satisfies the SQL statement.

Create machine-readable instructions to execute optimized SQL.

Execute these instructions or store them for future execution.

The first thing you need to do is to verify that SQL is written correctly. This does not mean that it will do what you want it to do, but it conforms to the required syntax. The SQL will be analyzed and checked. If you encounter any errors, the process will stop and you must modify the SQL until it is correct. After validating the SQL syntax, the next step is to examine semantics, such as data types, reference constraints, check constraints, views, and triggers.

The second step in this process is the most interesting. How does the optimizer decide how to execute a large number of SQL statements that can be sent in its own way? This query analysis step scans the SQL to determine its overall complexity. The expression of the SQL statement is an important factor in determining the access path chosen by the optimizer. The complexity of the query, the number and type of predicates, the existence of functions, and the existence of sorting clauses will all be included in the estimated cost calculated by the optimizer.

The more complex the SQL statement, the more work the query analysis must do to understand the SQL statement. During query analysis, the optimizer analyzes SQL statements and various aspects of the database system, such as

Which tables in which databases are required

Whether any views need to be decomposed into underlying tables

Whether a table join or sub-selection is required

Is UNION, EXCEPT, or INTERSECT required

Which indexes can be used (if any)

How many predicates must be satisfied (WHERE clause)

Which functions must be executed

Whether SQL uses OR or AND

How DBMS handles each component of a SQL statement

How much memory is allocated for the data cache used by the tables in the SQL statement

If the query needs sorting, how much memory is available for sorting

In other words, query analysis breaks down the SQL statement into discrete tasks that must be performed to return the query results.

The modern relational optimizer is cost-based, which means that the optimization process always tries to develop an access path for each query that reduces the overall cost. To achieve this, the optimizer applies a query cost formula to evaluate and weigh multiple factors for each potential access path: CPU costs, Imax O operations, statistics in the system catalog, and actual SQL statement codes.

The optimizer can rewrite the query and convert it to an equivalent version that is easier to compile and optimize. Predicate pushdown and conversion may occur at this time. Then optimize the SQL. Multiple access paths are reviewed and analyzed to select the option with the lowest cost. The final step is to create the actual executable code.

one

Access path

The relational optimizer has many options for creating SQL access paths. At a higher level, there are ways to access data in a single table, and there are ways to combine data in two tables. You can combine these methods into a series of access methods to create an overall access path for SQL statements.

For single table access, you can use scanning or indexing to retrieve data. After the optimizer determines which indexes are available for each predicate, it decides whether to use a single index, multiple indexes, or no indexes at all.

It's easy to say that index access will be better than scan access, but this is not always the case. The optimizer must evaluate the amount of data that must be accessed and the nature of the query. For example, if you are creating a report that contains each row in a table, using an index may be slower than using a scan to read all the data.

Table scanning is the simplest form of data access. Table scans are performed by reading each row of the table. Depending on the DBMS, there may be another type of scan called tablespace scan. The tablespace scans each page in the tablespace, which may contain multiple tables. Obviously, tablespace scans will run more slowly than tablespace scans, because it may result in additional Iripple O reading data that is not applicable.

Another form of scanning is zonal scanning. If DBMS can determine that the data to be accessed exists in some partition of a multi-partitioned table (or tablespace), it can limit the scanned data to the appropriate partition. Partition scans should be better than table scans or tablespace scans because the number of Icano required is reduced.

Typically, the optimizer chooses to scan data for one of the following reasons:

Using an index cannot satisfy the query, perhaps because there is no index available, the predicate does not match the index, or the predicate interferes with the use of the index.

The percentage of eligible rows in the table is high. In this case, using an index may be inefficient because you need to read most of the data rows anyway.

Indexes with matching predicates have a low cluster ratio and are valid for only a small amount of data.

The table is so small that it is actually harmful to use an index. For small tables, adding index access to table access may result in additional Iripple O, rather than fewer.

To help the performance of the scan, the optimizer can invoke data prefetching. Data prefetching causes DBMS to read data pages into the data cache sequentially before requesting them. In essence, data prefetching is a pre-read mechanism-when the data scan starts to request data, it already exists in memory. Prefetch is particularly useful for scanning, but it is useful for any type of sequential data access. You should know how and why a particular DBMS prefetches data.

two

Index access

Most accesses should use indexes, which gives us the option of scanning or indexing access. The optimizer must first discover whether an index exists. Before you write SQL to access columns, you don't have to define an index-you can query any column of any table that the database knows.

In addition, at least one index column must be referenced in an indexable predicate in the SQL statement. DBMS cannot use an index for each WHERE clause. You must know what types of indexes the predicate can use to ensure that the appropriate indexes are created for queries in the database application. Each database management system has a different list of what is indexable and what is not indexable. In addition, indexable content tends to change with each version of DBMS.

The optimizer can choose to use indexes in many different ways. The first and simplest type of index access is direct index lookup. For DBMS to perform a direct index lookup, you must provide a value for each column in the index. To perform a direct index lookup, DBMS compares the value requested in the predicate with the value stored in the root page of the index. Based on this comparison, DBMS will traverse the index to the next page set. If there is an intermediate non-leaf page, read the appropriate non-leaf page and compare the value to determine which leaf page to access. Read the appropriate page; the index page contains pointers to the actual data of the eligible row. Based on the pointer in the page index entry, DBMS reads the appropriate table data page.

However, assume that there are not all the columns that provide the index in the SQL statement. Direct index lookup cannot be selected because DBMS does not match the full index key. Instead, you can choose an index scan. When an index scan is called, the pages of the index are read in turn.

There are two basic types of index scans: matching index scans and mismatched index scans. Matching index scans are sometimes called absolute positioning. The matching index scan starts at the root page of the index and works down to the leaf page in the same way as a direct index lookup. However, because the full key of the index is not available, DBMS must scan the page of the index for available values until all matching values are retrieved.

To use a matching index scan, you must specify a higher-order column in the index key; that is, the first column specified in the index DDL. The high-level column DBMS provides a starting point from the root page to the appropriate leaf page traversal index structure.

Consider the consequences of not specifying higher-order columns in the query. DBMS can deploy mismatched index scans, sometimes called relative positioning. DBMS cannot use the index tree structure when the starting point cannot be determined because the first column in the index key is not specified. However, it can scan index pages. Mismatched index scans start with the first page in the index, and then scan subsequent pages in the order of predicates available.

Mismatched index scans may be more efficient than table or tablespace scans, especially if the data pages that must be accessed are in cluster order. In addition, keep in mind that the index page (or block) contains more entries than the table page, because the index "rows" are shorter than the table rows, making the index page Icano more efficient than scanning the table page.

Summary

In this article, we study the query analysis and access path formula from a high level, and understand the components of query analysis and single table access methods. But there is more to learn. In the next article, we will examine the multi-table access methods that can be used for relational optimization. I hope it will be helpful for you to optimize the performance of SQL.

Discussion on HULK first-line Technology

The official account of technology sharing created by the 360Cloud platform team covers many technical areas, such as cloud computing, database, big data, monitoring, pan-front-end, automated testing, and so on. Through the accumulation of solid technology and rich first-line practical experience, to bring you the best technology sharing.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.