Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use the database PostgreSQL that adjusts the query cost

2025-02-22 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how to use the database PostgreSQL about adjusting the query cost. The content of the article is of high quality. Therefore, Xiaobian shares it with you as a reference. I hope you have a certain understanding of relevant knowledge after reading this article.

Most databases cannot change the cost metric of Cost evaluation in queries. Suppose my system changes from 10000 rpm disks to SSD queries that provide 1366MB/S. How much does this help database system query performance?

So what special function does PG have in this respect? Let's look down. Before we do this, we also need to know that PG is also the only one in these databases that cannot be forcibly added in the statement and forced to go index.

Or not indexed databases. (pg_hint_plan can solve this problem)

Here's how to check cost in a query

Let's go a little deeper. I can see what you can see from the following two diagrams. In the first diagram, we can see that Startup cost is 0 in the query execution plan.

The overall cost and startup cost in the query plan startup cost below are similar.

In fact, total cost equals startup cost + run cost.

In addition, taking the first column as an example, sequential scanning has no startup cost, only operation cost.

The overall cost is 2235. How did this 2235 come about?

Let me look at the number of rows in bloom_table and the number of pages occupied.

Run cost = (cpu run cost + disk run cost ) * how many lines + sequential scan cost per page * how many pages

(0.01 + 0.0025) * 100000 + 1.0 * 1235 =1250 +1235 = 2235

where 0.01 0.0025 1 is from the above figure

seq_page_cost = 1.0

cpu_tuple_cost = 0.01

cpu_operator_cost = 0.0025

This also means that the cost of a statement can be changed by adjusting the parameters in the system, and other databases are basically not open in this respect.

Let's take a look at how the cost will be calculated if the index is gone

The cost of reading the index will include the startup cost, starting with the first tuple of reading the index.

Cost of starting (index walking)= rounding {log(2)(how many indexed rows are walked) +(Hindex + 1)* 50} * CPU running cost

Associated consumption = rounded (log2 100000 +(2+1)* 50)* 0.0025 = 0.42 (approximately equal to 0.4175 actually)

There are two problems here, 1 HINDEX is so, which refers to the tree height of the index, in fact, you can use this formula to deduce how high your index tree is.

Cost of running (CPU cost of index usage + CPU cost of table usage)+(index_io cost + io cost of table)

The problem of selectivity is involved in calculating the cost of indexing, meaning an estimate of the frequency of the predicates of the query.

Here's what it takes to calculate a "sample rate" by giving each row a value in an SQL statement, which tells you what percentage of the overall table that row's value is.

Here, because the calculation is more troublesome, it will not be calculated, but what needs to be noted here is that

random_page_cost = 4.0, this is a scalar that uses the index to calculate index_io_cost in the query. The io cost of the index can be obtained by the selection rate * the number of pages of index * random_page_cost. Whether to take the index or take the full table scan, the execution plan will compare. If you take the full table scan, the minimum and maximum io cost will be calculated. For example, the maximum io_cost = the number of pages * random_page_cost.

So adjusting the value of random_page_cost affects the selectivity of whether to go index or full table scan.

Here is an example. I adjust random_page_cost and cpu_index_tuple_cost in the configuration file, one smaller and one larger. You can see the result in the following figure. Even if I have 100,000 records, and the field 100,000 in my query condition is different from that one, and the relevant index is also established, the final result is still a full table scan.

After restoring the two parameters, continue to go to the original index

Having said that, in fact, back to the problem I said at the beginning, if your disk system has been changed to SSD disk, some of your values need to be changed, otherwise there may be some cases where the index is very good, but he chooses to scan the whole table.

How to use PostgreSQL database to adjust the query cost is shared here. I hope the above content can be of some help to everyone and learn more knowledge. If you think the article is good, you can share it so that more people can see it.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report