What is the PostgreSQL scanning method? 07/06 Update SLTechnology News&Howtos

What is the PostgreSQL scanning method?

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

This article mainly explains "what is the PostgreSQL scanning method". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn what the PostgreSQL scanning method is.

Relational databases need to produce an optimal execution plan to minimize the time and resources consumed in the query. Typically, all databases produce an execution plan in the form of a tree: the leaf nodes of the plan tree are called table scan nodes. The query node corresponds to getting data from the base table.

For example, this query: SELECT * FROM TAB1,TAB2 where TAB2.ID > 1000. Suppose the plan tree is as follows:

The plan tree above: "Sequential scan on TBL1" and "Index scan on TBL2" correspond to the table scan methods on table TBL1 and TBL2, respectively. Sequential scan on TBL1: get data sequentially from the corresponding page; index scan: use index scan to access Table 2. Choosing the correct scanning method as part of the plan is important for query performance.

Before delving into the scanning methods of PG, I'll introduce a few important concepts.

HEAP: the storage domain in which the entire row of the table is stored. As shown above, the entire domain is split into multiple pages, and each page size defaults to 8K. In each page, the item pointer (for example, 1J2 in the above page) points to the data within the page.

Index Storage: stores only key values, that is, column values contained in the index. It is also divided into multiple pages, each index page defaults to 8K.

Tuple Identifier (TID): TID is 6 bytes and consists of two parts. The first 4 bytes are page numbers, and the last 2 bytes are intra-page tuple indexes. TID can navigate to specific records.

In the current version, PG supports the following scanning methods: sequential scan, index scan, index coverage scan, bitmap scan, TID scan. Depending on table cardinality, selected tables, disk IO, random IO, sequential IO, and so on, each scanning method is very useful. Let's first create a table and prefabricate the data, and explain these scanning methods.

Postgres=# CREATE TABLE demotable (num numeric, id int); CREATE TABLEpostgres=# CREATE INDEX demoidx ON demotable (num); CREATE INDEXpostgres=# INSERT INTO demotable SELECT random () * 1000, generate_series (1, 1000000); INSERT 0 1000000postgres=# analyze;ANALYZE

In this example, 100 million records are prefabricated and analyze update statistics are performed.

Sequential scanning

As the name implies, a sequential scan of a table is a sequential scan of the item pointers of all pages of the corresponding table. If a table has 100 pages with 1000 records per page, the sequential scan takes 100,1000 records and checks to see if the isolation level and where conditions are matched. Therefore, even if only one record meets the criteria, he will scan 100K records. For the data in the above table, the following query is sequentially scanned because most of the data needs to be selected.

Postgres=# explain SELECT * FROM demotable WHERE num

< 21000; QUERY PLAN-------------------------------------------------------------------- Seq Scan on demotable (cost=0.00..17989.00 rows=1000000 width=15) Filter: (num < '21000'::numeric)(2 rows) 注意，不计算和比较计划耗费，几乎不可能直到选用哪个扫描方法。但是为了使用顺序扫描，至少需要满足以下关键点：谓词部分没有可用的索引键；或者SQL查询获取的行记录占表的大部分。如果只有少数行数据被获取，并且谓词在一个或多个列上，那么久会尝试使用或者不使用索引来评估性能。索引扫描和顺序扫描不同，索引扫描不会顺序获取所有表记录。相反，依赖于不同索引类型并和查询中涉及的索引相对应使用不同的数据结构。然后索引扫描获取的条目直接指向heap域中的数据，然后根据隔离级别判断可见性。因此索引扫描分两步：从索引数据结构中获取数据，返回heap中数据对应的TID；然后定位到对应的heap页直接访问数据。由于以下原因需要执行额外的步骤：查询可能请求可用索引更多的列；索引数据中不维护可见信息，为了判断可见性，需要访问heap数据。此时可能会迷惑，索引扫描如此高效，为什么有时不用呢？原因在于cost。这里的cost涉及IO的类型。索引扫描中，为了获取heap中的对应数据，涉及随机IO；而顺序扫描涉及顺序IO，只有随机IO耗时的1/4。因此只有当顺序IO的代价大于随机IO时，才会选择索引扫描。针对上表和数据，执行下面查询时会使用索引扫描。随机IO代价小，从而查询标记快。 postgres=# explain SELECT * FROM demotable WHERE num = 21000; QUERY PLAN-------------------------------------------------------------------------- Index Scan using demoidx on demotable (cost=0.42..8.44 rows=1 width=15) Index Cond: (num = '21000'::numeric)(2 rows) Index Only Scan 仅索引扫描和索引扫描类似，区别在于第二步，仅仅涉及到扫描索引数据。有两个条件：查询获取的数据只有key列，且该列是索引的一部分；所有获取的数据都是可见的。如下所示： postgres=# explain SELECT num FROM demotable WHERE num = 21000; QUERY PLAN-----------------------------------------------------------------------------Index Only Scan using demoidx on demotable (cost=0.42..8.44 rows=1 Width=11) Index Cond: (num = '21000'::numeric)(2 rows) Bitmap Scan 是索引扫描和顺序扫描的混合体。为了解决索引扫描的缺点并充分利用其优点。正如上面所说，对于索引数据结构中的数据，需要找到heap页中对应的数据。因此需要获取一次索引页，然后获取heap页，从而造成大量随机IO。Bitmap扫描方法平衡了不使用随机IO的索引扫描优点。 Bitmap index scan：首先获取索引数据并为所有TID创建bitmap。为了理解方法，可以认为bitmap包含所有页的哈希（基于页号），每个页的entry包含页内所有偏移的数组。 Bitmap heap scan：从页的bitmap中读取值，然后针对页和偏移扫描数据。最后检查可见性和条件并返回tuple。下面查询使用bitmap扫描，因为他选择的记录很多（比如too much for index scan）但不是大量（too little for sequential scan）。 postgres=# explain SELECT * FROM demotable WHERE num < 210; QUERY PLAN-------------------------------------------------------------------------- Bitmap Heap Scan on demotable (cost=5883.50..14035.53 rows=213042 width=15) Recheck Cond: (num < '210'::numeric) ->

Bitmap Index Scan on demoidx (cost=0.00..5830.24 rows=213042 width=0) Index Cond: (num < '210'::numeric) (4 rows)

Looking at another query, select the same number of records but only index columns. No heap page is required because there is no random IO, so this query chooses index only scan instead of bitmap scan.

Postgres=# explain SELECT num FROM demotable WHERE num < 210, QUERY PLAN---- Index Only Scan using demoidx on demotable (cost=0.42..7784.87 rows=208254 width=11) Index Cond: Num < '210'::numeric) (2 rows)

TID Scan

TID scanning is a very special way in PG, similar to ROWID-based queries in Oracle:

Postgres=# select ctid from demotable where id=21000; ctid- (115. 42) (1 row) postgres=# explain select * from demotable where ctid=' (115. 42)' QUERY PLAN--- Tid Scan on demotable (cost=0.00..4.01 rows=1 width=15) TID Cond: (ctid ='(115 rows 42)':: tid) (2 rows)

In addition, the PG community is discussing other scanning methods: "Loose Index Scan" in MySQL, "index skip scan" in Oracle, and "jump scan" in DB2. This scanning method is used in the specified scenario: the key column values of the selected B-tree index are different. Avoid traversing all equivalent key values, but only iterate through the first unique value and then skip to the next large value. This work PG is under development, also known as "Index skip scan", and this feature can be seen in release in the future.

Thank you for your reading, these are the contents of "what is the PostgreSQL scanning method?" after the study of this article, I believe you have a deeper understanding of what the PostgreSQL scanning method is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.