Sharing several cycles of a database SQL query 07/11 Update SLTechnology News&Howtos

Sharing several cycles of a database SQL query

2025-07-11 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article focuses on "sharing several cycles of a database SQL query". Interested friends may wish to have a look at it. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "sharing several cycles of database SQL queries".

We use the database, the intuitive feeling is that the client sends a SQL, the database executes the SQL, finds out the data and returns it to the client. But in fact, SQL is transformed and optimized behind the back, and it goes through a lot of "hardships" to get the results back.

As shown in the figure above, we see that the execution engine is entered through the parser and optimizer from the query processor.

Today, let's first look at the query manager, and then focus on how the query optimizer is economical.

Query Manager

This part is the embodiment of database function. In this section, a poorly written query is converted into code that can be executed quickly, then executed, and the results are returned to the client. This process involves several steps:

First, parse whether the query is legal.

Then the query is rewritten, the useless operators are removed, and some pre-optimization is done.

Optimize queries to improve performance and convert queries into execution and data access plans

Compile query plan

Execution

In this part, we won't say much about the last two points, and they are relatively less critical.

Query parser

Each SQL statement goes through the parser to verify that the syntax is correct. If you write wrong, the parser will reject the query. For example, if you make a mistake and write SELECT as SLECT, it will stop here.

In addition, it also checks whether the keyword order is correct.

Then, the table and column names in the query SQL are also analyzed, and the parser checks the following through the metadata of the database:

Whether the table exists or not

Whether the corresponding query field exists in the table

Whether the corresponding operator can be applied to a specified column (for example, you cannot compare a number to a string or use substring for an integer)

After that, it will check whether you have the permission to read or write to the corresponding table in the query. After all, these access permissions are assigned by DBA.

During parsing, the query SQL is converted to the internal representation of the database (usually a tree). If everything is OK, the converted content will be sent to the query "rewriter"

Query Rewriter

In this step, we get an internal representation of the query, and the goal of the rewriter is to:

Pre-optimize the query

Avoid useless operations

Help the optimizer find the best solution

The rewriter executes a series of known rules on the query. If the query conforms to the pattern of a rule, the rule is applied to rewrite the query. The following are (optional) rules:

View merge: if a view is used in a query, the view will be converted with the SQL code for that view.

Subquery leveling: queries with subqueries are difficult to optimize, so the rewriter will try to modify the query or even delete the subquery.

For example

SELECT PERSON.* FROM PERSON WHERE PERSON.person_key IN (SELECT MAILS.person_key FROM MAILS WHERE MAILS.mail LIKE 'christophe%')

Will be replaced by this SQL.

SELECT PERSON.* FROM PERSON, MAILS WHERE PERSON.person_key = MAILS.person_key and MAILS.mail LIKE 'christophe%'

Remove useless operators: if you use DISTINCT but you already have a UNIQUE constraint to ensure that the data is unique, the DISTINCT keyword will be deleted.

Eliminate redundant connections: if you have the same connection condition twice, because a connection condition is hidden in the view, or a useless connection due to transitivity, delete it.

Continuous arithmetic evaluation: if the query is something that needs to be evaluated, it will be evaluated during the rewrite process. For example, convert WHERE AGE > 10 + 2 to WHERE AGE > 12, and then convert TODATE ("date") to a date in datetime format

(advanced) Partition Correction: if you use a partition table, the rewriter can find the partition to use.

(advanced) instantiated view rewriting: if there is already an instantiated view that matches the query subset, the rewriter checks whether the view is up-to-date and modifies the query to use the instantiated view instead of the original table.

(advanced) Custom rules: if you create custom rules to rewrite queries, the rewriter will perform these rules (advanced) Olap transformations: analysis / window functions, star joins, summaries. Both are converted (but whether it is done by the rewriter or the optimizer depends on the database, because the two processes are adjacent).

This rewritten query is sent to the query optimizer, and here comes the interesting part.

Statistics

Before entering the database how to optimize the query, we need to talk about statistics, because without statistics, the database would be stupid. If you do not tell the database to analyze its own data, it will not do so and will make the wrong assumptions.

What kind of information does the database need?

Let's talk a little bit about how databases and operating systems store data. The smallest unit they use is called a page or block (the default is 4 or 8 KB). In other words, if you only need 1 KB, it will take up a page. If the page takes up 8 KB, 7 KB will be wasted.

Going back to statistics, when you ask the database to get statistics, it calculates these contents:

The number of rows or pages in a table

Each column in a table

Separate data content

Length of data (minimum, maximum, average)

Data interval information (minimum, maximum, average)

Index information of the table

These statistics will help the optimizer better estimate disk usage of CPU and memory in queries.

The statistics for each column are important. For example, a PERSON table needs to be joined in two columns LAST_NAME and FIRST_NAME. Through statistics, the database can know how many different values there are in the column FIRST_NAME and how many different values LAST_NAME has. So the database uses LAST_NAME,FIRST_NAME to connect instead of FIRST_NAME,LAST_NAME, because the LAST_NAME is unlikely to be the same and produces less data. In most cases, comparing the first two or three characters of the database with LAST_NAME is sufficient.

Of course, these are basic statistics, and you can also let the database calculate higher-level statistics such as histograms. The most commonly used values, quality, and so on, through this additional information, you can help the database find more efficient query plans, especially such as equivalent queries and range queries. Because the database already knows how many records there are in this case.

These statistics are recorded in the metadata of the database. Therefore, it takes time to update constantly. That's why he doesn't update automatically in most databases.

At this point, I believe you have a deeper understanding of "sharing several cycles of a database SQL query". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.