How to write better SQL queries: the Ultimate Guide-part 2 07/03 Update SLTechnology News&Howtos

How to write better SQL queries: the Ultimate Guide-part 2

2025-07-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

In the last article, we learned how SQL queries are executed and what you need to pay attention to when writing SQL query statements.

Next, I will further learn the query method and query optimization.

Query based on collection and program method

The implicit fact in the reverse model is that there are differences between collection-based and program-based approaches when building queries.

The programmatic method of querying is very similar to programming: you tell the system what to do and how to do it. For example, in the example in the previous article, the final query result is obtained by executing one function and then calling another function to query the database, or by using a logical way that includes loops, conditions, and user-defined functions (UDF). You will find that in this way, you are always requesting a subset of the data in one layer. This approach is also often referred to as a step-by-step or row-by-row query.

The other is a collection-based approach where you only need to specify the actions that need to be performed. What you need to do with this method is to specify the conditions and requirements of the results you want to get through the query. In the process of retrieving data, you do not need to pay attention to the internal mechanism of implementing the query: the database engine determines the best algorithm and logic to execute the query.

Because SQL is set-based, this approach is more efficient than the programmatic approach, which explains why SQL can work faster than code in some cases.

The set-based query method is also a skill that you must master in the data mining and analysis industry. Because you need to be adept at switching between the two methods. If you find that there is a program query in your query, you should consider whether you need to rewrite this part.

From query to execution plan

The reverse mode is not static. As you become a SQL developer, avoiding querying reverse models and rewriting queries can be a difficult task. So you often need to use tools to optimize your queries in a more structured way.

Thinking about performance requires not only a more structured approach, but also a deeper approach.

However, this structured and in-depth approach is mainly based on query plans. The query plan is first parsed into a "parse tree" and defines exactly what algorithm is used for each operation and how to coordinate the operation process.

Query optimization

When optimizing a query, you will most likely need to manually check the plan generated by the optimizer. In this case, you will need to analyze your query again by viewing the query plan.

To master such a query plan, you need to use some of the tools provided to you by the database management system. You can use some of the following tools:

Some software package functional tools can generate a graphical representation of the query plan.

Other tools can provide you with a text description of the query plan.

Note that if you are using PostgreSQL, you can distinguish between different EXPLAIN, you just need to get a description of how planner executes the query without running the plan. At the same time, EXPLAIN ANALYZE will execute the query and return an analysis report that evaluates the query plan and the actual query plan. In general, the actual execution plan will actually implement the plan, and the evaluation execution plan can solve this problem without executing the query. Logically, the actual execution plan is more useful because it contains other details and statistics that actually occur when the query is executed.

Next you will learn more about XPLAIN and ANALYZE, and how to use these two commands to learn more about your query plan and query performance. To do this, you need to start using two tables: one_million and half_million to do some examples.

You can use EXPLAIN to retrieve the current information of the one_million table: make sure it is placed in the first place to run the query, and when the run is complete, it will be returned to the query plan:

EXPLAINSELECT *

FROM one_million

QUERY PLAN___

Seq Scan on one_million

(cost=0.00..18584.82 rows=1025082 width=36)

(1 row)

In the above example, we see that the Cost of the query is 0.00.18584.82, the number of rows is 1025082, and the column width is 36.

At the same time, you can use ANALYZE to update statistics.

ANALYZE one_million

EXPLAINSELECT *

FROM one_million

QUERY PLAN___

Seq Scan on one_million

(cost=0.00..18334.00 rows=1000000 width=37)

(1 row)

In addition to EXPLAIN and ANALYZE, you can also use EXPLAIN ANALYZE to retrieve the actual execution time:

EXPLAIN ANALYZESELECT *

FROM one_million

QUERY PLAN___

Seq Scan on one_million

(cost=0.00..18334.00 rows=1000000 width=37)

(actual time=0.015..1207.019 rows=1000000 loops=1)

Total runtime: 2320.146 ms

(2 rows)

The disadvantage of using EXPLAIN ANALYZE is that you need to actually execute the query, which is worth noting!

All the algorithms we've seen so far are sequential scans or full table scans: this is a method of scanning on a database, where each row of the scanned table is read in a sequential (serial) order, and each column is checked for compliance. In terms of performance, sequential scanning is not the best execution plan because the entire table needs to be scanned. But if you use slow disks, sequential reads can also be fast.

There are some examples of other algorithms:

EXPLAIN ANALYZESELECT *

FROM one_million JOIN half_millionON

(one_million.counter=half_million.counter)

QUERY PLAN

_ _ _

Hash Join (cost=15417.00..68831.00 rows=500000 width=42)

(actual time=1241.471..5912.553 rows=500000 loops=1)

Hash Cond: (one_million.counter = half_million.counter)

-> Seq Scan on one_million

(cost=0.00..18334.00 rows=1000000 width=37)

(actual time=0.007..1254.027 rows=1000000 loops=1)

-> Hash (cost=7213.00..7213.00 rows=500000 width=5)

(actual time=1241.251..1241.251 rows=500000 loops=1)

Buckets: 4096 Batches: 16 Memory Usage: 770kB

-> Seq Scan on half_million

(cost=0.00..7213.00 rows=500000 width=5)

(actual time=0.008..601.128 rows=500000 loops=1)

Total runtime: 6468.337 ms

We can see that the query optimizer chose Hash Join. Keep this in mind because we need to use this to assess the time complexity of the query. We notice that there is no half_million.counter index in the above example, and we can add the index in the following example:

CREATE INDEX ON half_million (counter)

EXPLAIN ANALYZESELECT *

FROM one_million JOIN half_millionON

(one_million.counter=half_million.counter)

QUERY PLAN

_ _ _

Merge Join (cost=4.12..37650.65 rows=500000 width=42)

(actual time=0.033..3272.940 rows=500000 loops=1)

Merge Cond: (one_million.counter = half_million.counter)

-> Index Scan using one_million_counter_idx on one_million

(cost=0.00..32129.34 rows=1000000 width=37)

(actual time=0.011..694.466 rows=500001 loops=1)

-> Index Scan using half_million_counter_idx on half_million

(cost=0.00..14120.29 rows=500000 width=5)

(actual time=0.010..683.674 rows=500000 loops=1)

Total runtime: 3833.310 ms

(5 rows)

By creating the index, the query optimizer has decided how to find Merge join when the index is scanned.

Note the difference between an index scan and a full table scan (sequential scan): the latter (also known as a "table scan") scans all data or indexes all pages to find the appropriate results, while the former scans only every row in the table.

That's all for the second part of the tutorial. We'll be looking forward to the last article in the series "how to write better SQL queries."

Original link: http://www.kdnuggets.com/2017/08/write-better-sql-queries-definitive-guide-part-2.html

Reprint, please indicate from: grape city control

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.