B-tree index of PostgreSQL 07/06 Update SLTechnology News&Howtos

B-tree index of PostgreSQL

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)06/01 Report--

structure

B-tree indexes are suitable for storing sorted data. For this data type, you need to define greater than, greater than or equal to, less than, less than or equal to operators.

Typically, B-tree 's index records are stored in data pages. The records in the leaf pages contain index data (keys) and pointers to the heap tuple records (that is, the row records TIDs of the table). Records in internal pages contain pointers to indexed subpages and minimum values in subpages.

B-tree has several important features:

1. B-tree is a balanced tree, that is, there are the same number of internal pages between each leaf page and the root page. So the time to query any value is the same.

2. A node in B-tree has multiple branches, that is, each page (usually 8KB) has many TIDs. Therefore, the height of the B-tree is relatively low, usually 4 to 5 layers can store a large number of row records.

3. The data in the index is stored in a non-decreasing order (between and within pages), and the data pages of the same level are connected by a two-way linked list. So instead of returning root every time, you can get an ordered dataset by traversing the linked list.

Here is a simple example of an index that stores records as integers and has only one field:

The top page of the index is the metadata page, which stores information about the index root page. The internal node is under the root, and the leaf page is at the bottom layer. The down arrow indicates that the leaf node points to the table record (TIDs).

Equivalent query

For example, the value of 49 is queried by a conditional query in the form of "indexed-field = expression".

The root node has three records: (4pm 32pr 64). Start the search from the root node, because 32 ≤ 49

< 64，所以选择32这个值进入其子节点。通过同样的方法继续向下进行搜索一直到叶子节点，最后查询到49这个值。实际上，查询算法远不止看上去的这么简单。比如，该索引是非唯一索引时，允许存在许多相同值的记录，并且这些相同的记录不止存放在一个页中。此时该如何查询？我们返回到上面的的例子，定位到第二层节点(32,43,49)。如果选择49这个值并向下进入其子节点搜索，就会跳过前一个叶子页中的49这个值。因此，在内部节点进行等值查询49时，定位到49这个值，然后选择49的前一个值43，向下进入其子节点进行搜索。最后，在底层节点中从左到右进行搜索。 (另外一个复杂的地方是，查询的过程中树结构可能会改变，比如分裂) 非等值查询通过"indexed-field ≤ expression" (or "indexed-field ≥ expression")查询时，首先通过"indexed-field = expression"形式进行等值（如果存在该值）查询，定位到叶子节点后，再向左或向右进行遍历检索。下图是查询 n ≤ 35的示意图：大于和小于可以通过同样的方法进行查询。查询时需要排除等值查询出的值。范围查询范围查询"expression1 ≤ indexed-field ≤ expression2"时，需要通过 "expression1 ≤ indexed-field =expression2"找到一匹配值，然后在叶子节点从左到右进行检索，一直到不满足"indexed-field ≤ expression2" 的条件为止；或者反过来，首先通过第二个表达式进行检索，在叶子节点定位到该值后，再从右向左进行检索，一直到不满足第一个表达式的条件为止。下图是23 ≤ n ≤ 64的查询示意图: 案例下面是一个查询计划的实例。通过demo database中的aircraft表进行介绍。该表有9行数据，由于整个表只有一个数据页，所以执行计划不会使用索引。为了解释说明问题，我们使用整个表进行说明。 demo=# select * from aircrafts; aircraft_code | model | range---------------+---------------------+------- 773 | Boeing 777-300 | 11100 763 | Boeing 767-300 | 7900 SU9 | Sukhoi SuperJet-100 | 3000 320 | Airbus A320-200 | 5700 321 | Airbus A321-200 | 5600 319 | Airbus A319-100 | 6700 733 | Boeing 737-300 | 4200 CN1 | Cessna 208 Caravan | 1200 CR2 | Bombardier CRJ-200 | 2700(9 rows)demo=# create index on aircrafts(range);demo=# set enable_seqscan = off; （更准确的方式：create index on aircrafts using btree(range)，创建索引时默认构建B-tree索引。）等值查询的执行计划： demo=# explain(costs off) select * from aircrafts where range = 3000; QUERY PLAN --------------------------------------------------- Index Scan using aircrafts_range_idx on aircrafts Index Cond: (range = 3000)(2 rows) 非等值查询的执行计划： demo=# explain(costs off) select * from aircrafts where range < 3000; QUERY PLAN --------------------------------------------------- Index Scan using aircrafts_range_idx on aircrafts Index Cond: (range < 3000)(2 rows) 范围查询的执行计划： demo=# explain(costs off) select * from aircraftswhere range between 3000 and 5000; QUERY PLAN ----------------------------------------------------- Index Scan using aircrafts_range_idx on aircrafts Index Cond: ((range >

= 3000) AND (range Seq Scan on aircrafts (3 rows)

(note that the final execution plan selects sequential scans, ignoring the previously set enable_seqscan = off. Because this setting does not abandon the table scan, it just sets its cost-view costs on's execution plan)

If an index is used, specify the direction of sorting when creating the index:

Demo=# create index aircrafts_case_asc_model_desc_idx on aircrafts ((case when range

< 4000 then 1 when range < 10000 then 2 else 3 end) ASC, model DESC); demo=# explain(costs off)select class, model from aircrafts_v order by class ASC, model DESC; QUERY PLAN ----------------------------------------------------------------- Index Scan using aircrafts_case_asc_model_desc_idx on aircrafts(1 row)列的顺序当使用多列索引时与列的顺序有关的问题会显示出来。对于B-tree，这个顺序非常重要：页中的数据先以第一个字段进行排序，然后再第二个字段，以此类推。下图是在range和model列上构建的索引：当然，上图这么小的索引在一个root页足以存放。但是为了清晰起见，特意将其分成几页。从图中可见，通过类似的谓词class = 3（仅按第一个字段进行搜索）或者class = 3 and model = 'Boeing 777-300'（按两个字段进行搜索）将非常高效。然而，通过谓词model = 'Boeing 777-300'进行搜索的效率将大大降低：从root开始，判断不出选择哪个子节点进行向下搜索，因此会遍历所有子节点向下进行搜索。这并不意味着永远无法使用这样的索引----它的效率有问题。例如，如果aircraft有3个classes值，每个class类中有许多model值，此时不得不扫描索引1/3的数据，这可能比全表扫描更有效。但是，当创建如下索引时： demo=# create index on aircrafts( model, (case when range < 4000 then 1 when range < 10000 then 2 else 3 end)); 索引字段的顺序会改变：通过这个索引，model = 'Boeing 777-300'将会很有效，但class = 3则没这么高效。 NULLs PostgreSQL的B-tree支持在NULLs上创建索引，可以通过IS NULL或者IS NOT NULL的条件进行查询。考虑flights表，允许NULLs： demo=# create index on flights(actual_arrival);demo=# explain(costs off) select * from flights where actual_arrival is null; QUERY PLAN ------------------------------------------------------- Bitmap Heap Scan on flights Recheck Cond: (actual_arrival IS NULL) ->

Bitmap Index Scan on flights_actual_arrival_idx Index Cond: (actual_arrival IS NULL) (4 rows)

The NULLs is located at one end or the other end of the leaf node, depending on how the index is created (NULLS FIRST or NULLS LAST). This is important if the query includes sorting: if the SELECT statement specifies in the ORDER BY clause that the sequential index of NULLs is built in the same order (NULLS FIRST or NULLS LAST), the entire index can be used.

In the following example, they are in the same order, so indexes can be used:

Demo=# explain (costs off) select * from flights order by actual_arrival NULLS LAST; QUERY PLAN-Index Scan using flights_actual_arrival_idx on flights (1 row)

In the following example, in a different order, the optimizer selects a sequential scan and then sorts it:

Demo=# explain (costs off) select * from flights order by actual_arrival NULLS FIRST; QUERY PLAN-- Sort Sort Key: actual_arrival NULLS FIRST-> Seq Scan on flights (3 rows)

The NULLs must be at the beginning to use the index:

Demo=# create index flights_nulls_first_idx on flights (actual_arrival NULLS FIRST); demo=# explain (costs off) select * from flights order by actual_arrival NULLS FIRST; QUERY PLAN-Index Scan using flights_nulls_first_idx on flights (1 row)

Problems like this are caused by NULLs rather than unsorted, which means that the results of NULL and other comparisons are unpredictable:

Demo=#\ pset null NULLdemo=# select null < 42;? column?- NULL (1 row)

This runs counter to the concept of B-tree and does not conform to the general pattern. However, NULLs plays an important role in the database, so special settings have to be made for NULL.

Because NULLs can be indexed, indexes can be used even if there are no tags on the table. Because this index contains all the information about the navigation record of the table. If the query requires sorted data, and the index ensures the desired order, then this may be meaningful. In this case, the query plan tends to get the data through the index.

Attribute

The features of the btree access method are described below.

As you can see, B-tree can sort data and support uniqueness. Multi-column indexes are also supported, but other access methods also support such indexes. We will discuss the EXCLUDE condition next time.

The Btree access method can obtain data in two ways: index scan and bitmap scan. As you can see, tree allows you to traverse forward and backward.

The first four properties specify how specific columns are precisely sorted. In this case, the values are sorted in ascending order (asc) and NULLs is followed by (nulls_last). There can also be other combinations.

The features of search_array support expressions such as:

Demo=# explain (costs off) select * from aircrafts where aircraft_code in ("733", "763", "763", "773") QUERY PLAN-Index Scan using aircrafts_pkey on aircrafts Index Cond: (aircraft_code = ANY ('{733763773}':: Bpchar [])) (2 rows)

The returnable attribute supports index-only scan, which is reasonable because the index itself also stores index values. The following is a brief introduction to the B-tree-based overlay index.

Unique index with additional columns

As discussed earlier: the override index contains all the values required by the query and does not need to go back to the table. A unique index can be an overlay index.

Assuming that the columns needed by our query are added to the unique index, the new combined unique key may no longer be unique, and two indexes will be required on the same column: one unique to support integrity constraints, and the other non-unique to overwrite the index. This is, of course, inefficient.

In our company Anastasiya Lubennikova @ lubennikovaav has improved btree so that additional non-unique columns can be included in a unique index. We hope this patch will be adopted by the community. In fact, PostgreSQL11 has already incorporated the patch.

Consider table bookings:

Demo=# begin;demo=# alter table bookings drop constraint bookings_pkey cascade;demo=# alter table bookings add primary key using index bookings_pkey2;demo=# alter table tickets add foreign key (book_ref) references bookings (book_ref); demo=# commit

Then the table structure:

Demo=#\ d bookings Table "bookings.bookings" Column | Type | Modifiers-+--+- book_ref | character (6) | not null book_date | timestamp with time zone | not null Total_amount | numeric (10jue 2) | not nullIndexes: "bookings_pkey2" PRIMARY KEY Btree (book_ref) INCLUDE (book_date) Referenced by:TABLE "tickets" CONSTRAINT "tickets_book_ref_fkey" FOREIGN KEY (book_ref) REFERENCES bookings (book_ref)

At this point, the index can work as a unique index or as an overlay index:

Demo=# explain (costs off) select book_ref, book_date from bookings where book_ref = '059FC4indexes; QUERY PLAN-Index Only Scan using bookings_pkey2 on bookings Index Cond: (book_ref =' 059FC4'::bpchar) (2 rows) create an index

It is well known that for large tables, it is best to load data without an index; create the index after the load is complete. This will not only improve efficiency but also save space.

Creating an B-tree index is more efficient than inserting data into the index. All the data is roughly sorted and the leaf pages of the data have been created, and then you just need to build the inner page until the root page is built into a complete B-tree.

The speed of this method depends on the size of RAM and is limited by the parameter maintenance_work_mem. Therefore, increasing the parameter value can increase the speed. For unique indexes, in addition to allocating memory for maintenance_work_mem, memory of work_mem is also allocated.

Compare

Earlier, it was mentioned that PG needs to know which function to call for different types of values, and this associated method is stored in the hash access method. Again, the system must figure out how to sort it. This is involved in sorting, grouping (sometimes), and merge join. PG does not bind itself to operator names because users can customize their data types and give different operator names.

For example, the comparison operator in the bool_ops operator set:

Postgres=# select amop.amopopr::regoperator as opfamily_operator, amop.amopstrategyfrom pg_am am, pg_opfamily opf, pg_amop amopwhere opf.opfmethod = am.oidand amop.amopfamily = opf.oidand am.amname = 'btree'and opf.opfname =' bool_ops'order by amopstrategy Opfamily_operator | amopstrategy-+- (boolean,boolean) | 5 (5 rows)

You can see that there are five operators, but you should not rely on their names. In order to specify which operators do what, the concept of policy is introduced. To describe operator semantics, five strategies are defined:

1-less

2-less or equal

3-equal

4-greater or equal

5-greater

Postgres=# select amop.amopopr::regoperator as opfamily_operatorfrom pg_am am, pg_opfamily opf, pg_amop amopwhere opf.opfmethod = am.oidand amop.amopfamily = opf.oidand am.amname = 'btree'and opf.opfname =' integer_ops'and amop.amopstrategy = 1order by opfamily_operator; pfamily_operator--

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.