In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)06/01 Report--
This article is from: http://www.023dns.com/Database_mssql/5974.html
The PostgreSQL optimizer is CBO-based (of course, if GEQO is enabled, GEQO will be used when the number of associated tables exceeds a certain threshold, mainly because the exhaustive method may bring huge PLAN overhead in the case of too many relational tables, so the execution plan of GEQO output is not necessarily optimal)
What this article wants to talk about has nothing to do with GEQO, but mainly related to CBO.
When PostgreSQL uses CBO, is it certain to output the optimal execution plan every time?
1. First, let's take a look at what factors CBO looks at and how does it calculate the cost?
The cost is related to the scanning method, association mode, operator, cost factor, data set, etc. The specific calculation method can be found in the following code:
Src/backend/optimizer/path/costsize.c
Here we briefly list which factors will affect the result of costing. For more information, please see costsize.c:
-- how many records the table has, which affects the COST of the CPU processing records of the full table scan.
The number of data blocks in a table affects the cost of scanning data blocks, such as full table scan and index scan.
Cost factor, the cost factor that affects the calculation of cost; for example, the cost factor of continuously or randomly scanning a single data block, CPU processing the cost factor of a record from the HEAP block, the cost factor of an index record from the INDEX block, and the cost factor of executing an operator or function.
The discreteness of physical order and index order of data storage affects the computational cost of index scanning.
Memory size affects the computational cost of index scanning.
Column statistics (column width, null ratio, unique value ratio, high frequency value and its ratio, bucket, discreteness of physical and index order, array statistics, etc.), influence selectivity, that is, the number of rows in the result set, and ultimately affect the computational cost of index scanning.
-- the cost set when creating a function or operator.
two。 Then let's see which factors are not taken into account by CBO and what other factors are taken into account by CBO, but may change at any time.
Can PostgreSQL keep up with these changes dynamically?
2.1 PostgreSQL enables automatic analyze, and the following factors can be updated in a timely manner:
-- number of data blocks in the table, number of records, update pg_class.relpages, pg_class.reltuples
Column statistics, discreteness of data storage physical order and index order, update pg_statistic
2.2 static configuration factors:
-- can actually be used as cache memory, because other programs may be running in the operating system where the database resides, and the memory available for caching may change. Even if no other programs are running, when work_mem is heavily used in database sessions, the memory available for caching will change.
The cost set when creating a function or operator. When the function changes due to internal SQL or processing logic, the processing time of the function itself may change.
2.3 factors not taken into account:
-- pre-reading of block devices. Generally speaking, the data of 128KB will be pre-read on each read.
# blockdev-- getra / dev/sda
two hundred and fifty six
What's the impact of this? If the data you want to read is in contiguous 128KB blocks, you only need to IO the block device once. For a database, no matter how many blocks are scanned when scanning data, the cost will be calculated, so for different block device read-ahead configurations, or for different block devices (such as mechanical disk and SSD), the scanning cost may be different. The performance of PostgreSQL block equipment is reflected in cost calculation, that is, seq_page_cost, random_page_cost.
These two parameters can be set for tablespaces, that is, different values can be set for different tablespaces. For example, we have tablespaces established in SSD and tablespaces created on ordinary mechanical disks. Of course, we need to set different values of seq_page_cost and random_page_cost.
But for pre-reading, if there is a change, it will have a slight impact on the actual performance, and it should not change the read ahead of the block device all the time.
2.4 generic plan cache, that is, the execution plan cache.
PostgreSQL chooses to reschedule the execution plan or use the cached execution plan through choose_custom_plan. When the cost of cached plan is greater than the average cost of custom, custom plan will be selected, so when the statistical information is correct, the problem of cache execution plan can be found and the new execution plan can be planned in time.
For more information, please see: src/backend/utils/cache/plancache.c
2.5 sampling accuracy parameter default_statistics_target, which affects the number of bucket and sampling accuracy.
After some analysis, is it certain that PostgreSQL will be able to output the best execution plan "every time" when CBO is used?
1. The first choice is to ensure that the cost factor is set accurately, and you also need to turn on automatic analyze (update column statistics, blocks, dispersion, etc.)
two。 Some of the factors that affect cost are statically configured: for example, the memory that can be used as BUFFER, the cost of the function.
3. There is still something to consider: pre-reading (very little).
In most cases, if we set up a reasonable configuration, then we rarely need to use hint. Except for the two points mentioned above in 2 and 3.
At the same time, hint also has serious disadvantages. If you write hint in the program code, once you need to change the execution plan, you also need to change the program code, which is not flexible.
Of course, we don't rule out another starting point for using HINT, such as debugging. I just want to see if the execution efficiency is the same as expected under different implementation plans.
(we can also use the switch to control the execution plan, but it is more direct to have HINT.)
In the long run, it makes sense to constantly improve the optimizer of the database itself only from a performance perspective. But for requirements such as debugging, it is more convenient and right to have HINT.
When you get to the topic, most Oracle users will ask PG if there is a SQL hint after they come into contact with PostgreSQL.
In order to make the database output the execution plan according to the user's idea, PostgreSQL generally provides some switches, such as turning off the full table scan and letting it go to the index.
Turn off index scan, let it go to bitmap or full table scan, close nested loop, let it go to hash join or merge join, etc.
But only with these switches, it is not very easy to use, so is there a direct HINT?
There is a plug-in that can solve your problem: pg_hint_plan.
Pg_hint_plan uses the open hook interface of PostgreSQL, so it does not need to change the PG code to realize the function of injecting HINT.
/ *
* Module load callbacks
* /
Void
_ PG_init (void)
{
...
}
Due to different PostgreSQL versions, the plan code may be inconsistent, so pg_hint_plan is also a sub-version of the source code.
For example, I'm going to test this tool in PostgreSQL 9.4.1.
Let's test it next:
Installation
# wget http://iij.dl.sourceforge.jp/pghintplan/62456/pg_hint_plan94-1.1.3.tar.gz
# tar-zxvf pg_hint_plan94-1.1.3.tar.gz
# cd pg_hint_plan94-1.1.3
[root@db-172-16-3-150 pg_hint_plan94-1.1.3] # export PATH=/opt/pgsql/bin:$PATH
[root@db-172-16-3-150 pg_hint_plan94-1.1.3] # which psql
/ opt/pgsql/bin/psql
[root@db-172-16-3-150 pg_hint_plan94-1.1.3] # psql-V
Psql (PostgreSQL) 9.4.1
# gmake clean
# gmake
# gmake install
[root@db-172-16-3-150 pg_hint_plan94-1.1.3] # ll-rt / opt/pgsql/lib | tail-n 1
-rwxr-xr-x 1 root root 78K Feb 18 09:31 pg_hint_plan.so
[root@db-172-16-3-150 pg_hint_plan94-1.1.3] # su-postgres
$vi $PGDATA/postgresql.conf
Shared_preload_libraries = 'pg_hint_plan'
Pg_hint_plan.enable_hint = on
Pg_hint_plan.debug_print = on
Pg_hint_plan.message_level = log
$pg_ctl restart-m fast
Postgres@db-172-16-3-150-> psql
Psql (9.4.1)
Type "help" for help.
Postgres=# create extension pg_hint_plan
CREATE EXTENSION
Examples of usage:
Postgres=# create table a (id int primary key, info text, crt_time timestamp)
CREATE TABLE
Postgres=# create table b (id int primary key, info text, crt_time timestamp)
CREATE TABLE
Postgres=# insert into a select generate_series (1Jing 100000), 'averse' | | md5 (random ():: text), clock_timestamp ()
INSERT 0 100000
Postgres=# insert into b select generate_series (1Jing 100000), 'baked' | | md5 (random ():: text), clock_timestamp ()
INSERT 0 100000
Postgres=# analyze a
ANALYZE
Postgres=# analyze b
ANALYZE
Postgres=# explain select A. from journal b. * from a month b where a.id=b.id and a.id
QUERY PLAN
Nested Loop (cost=0.58..83.35 rows=9 width=94)
-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)
Index Cond: (id
< 10) ->Index Scan using b_pkey on b (cost=0.29..8.31 rows=1 width=47)
Index Cond: (id = a.id)
(5 rows)
In the absence of pg_hint_plan, we need to use switches to change the execution plan of PostgreSQL
Postgres=# set enable_nestloop=off
SET
Postgres=# explain select A. from journal b. * from a month b where a.id=b.id and a.id
QUERY PLAN
Hash Join (cost=8.56..1616.65 rows=9 width=94)
Hash Cond: (b.id = a.id)
-> Seq Scan on b (cost=0.00..1233.00 rows=100000 width=47)
-> Hash (cost=8.45..8.45 rows=9 width=47)
-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)
Index Cond: (id
< 10) (6 rows) postgres=# set enable_nestloop=on; SET postgres=# explain select a.*,b.* from a,b where a.id=b.id and a.id QUERY PLAN ----------------------------------------------------------------------- Nested Loop (cost=0.58..83.35 rows=9 width=94) ->Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)
Index Cond: (id
< 10) ->Index Scan using b_pkey on b (cost=0.29..8.31 rows=1 width=47)
Index Cond: (id = a.id)
(5 rows)
Use pg_hint_plan to change the execution plan of PostgreSQL, as follows:
Postgres=# / * +
HashJoin (a b)
SeqScan (b)
* / explain select a.m.m.b.* from arecom b where a.id=b.id and a.id
QUERY PLAN
Hash Join (cost=8.56..1616.65 rows=9 width=94)
Hash Cond: (b.id = a.id)
-> Seq Scan on b (cost=0.00..1233.00 rows=100000 width=47)
-> Hash (cost=8.45..8.45 rows=9 width=47)
-> Index Scan using a_pkey on a (cost=0.29..8.45 rows=9 width=47)
Index Cond: (id
< 10) (6 rows) postgres=# /*+ SeqScan(a) */ explain select * from a where id QUERY PLAN ------------------------------------------------------ Seq Scan on a (cost=0.00..1483.00 rows=10 width=47) Filter: (id < 10) (2 rows) postgres=# /*+ BitmapScan(a) */ explain select * from a where id QUERY PLAN --------------------------------------------------------------------- Bitmap Heap Scan on a (cost=4.36..35.17 rows=9 width=47) Recheck Cond: (id < 10) ->Bitmap Index Scan on a_pkey (cost=0.00..4.36 rows=9 width=0)
Index Cond: (id
< 10) (4 rows) 目前pg_hint_plan支持的HINT http://pghintplan.sourceforge.jp/hint_list.html The available hints are listed below.[reference]
1. Http://pghintplan.sourceforge.jp/pg_hint_plan-en.html
2. Http://pghintplan.sourceforge.jp/pg_hint_plan.html
3. Http://pghintplan.sourceforge.jp/hint_list.html
4. Http://pghintplan.sourceforge.jp/
5. Src/backend/optimizer/path/costsize.c
6. Src/backend/utils/cache/plancache.c
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.