In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-02 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly explains "what is the basic knowledge and structure of BRIN index in PostgreSQL". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the basic knowledge and structure of BRIN index in PostgreSQL".
Brief introduction
BRIN is the abbreviation of block range index Block Range Index. Its design idea is to avoid searching in absolutely mismatched Page, instead of finding matching rows quickly, and to use very little space relative to Btree to obtain relatively good query efficiency. BRIN can create indexes on tables above the TB level with little index and maintenance overhead.
BRIN works well when column values are highly correlated with their physical position in the table, such as serial values, but poorly on column values with little correlation, such as random values.
The working principle of BRIN index is as follows: firstly, the data table is divided into multiple ranges, each range has multiple pages, and the index stores the summary information of range, such as maximum and minimum values. If the query condition contains the column and the condition value is not in the range, the whole range is skipped, otherwise, all rows in all pages need to be scanned to match.
BRIN can be seen as an accelerator for partition & sequential scanning rather than Index.
structure
Page 1 (0 to be exact) stores metadata, which locates the pages with summary information. Each index line on these pages contains summary information for each range.
The pages between meta page and summary information is called reverse range mapping revmap (reverse range map), and revmap is actually an array of TIDs pointers to index rows.
For some ranges, the pointer in revmap does not point to the index row, which means that there is still no summary information.
Index scan
Unlike other AM pointing to data rows, BRIN is not accessed by TID returning rows one by one, but by building bitmap. There are two types of bitmap pages, one is to point to a row precisely, and the other is to execute an imprecise page, which uses an imprecise page.
The scanning algorithm is not complicated, the ranges map is accessed sequentially, the index rows containing range summary information are determined by the pointer, and all the pages of range will be added to the bitmap. As a result, the use of bitmap is the same as that of ordinary bitmap.
Index update
To add a new version line in page, you need to determine the range containing the line and use the ranges mapping to find the index row with summary information. These operations are simple arithmetic operations. For example, if range size is 4, add a row on page 13 with the number of 42 range (starting at 0) and 13 shock 4 = 3, so use a pointer with offset 3 in revmap.
The minimum value of the range is 31 and the maximum value is 40. Because the new value 42 is out of range, the maximum update value is 42, but if the new value is within the range, the index does not need to be updated.
After the index is created, summary information for all available ranges is calculated, but after the table is expanded, the new pages may exceed the limit, and there are two methods available:
1. Do not update the index right away. Update only when you vacuum, or through the function brin_summarize_new_values
two。 If you create an index with the autosummarize option, the index is updated immediately, and this parameter defaults to off.
When a new range emerges, the size of the revmap increases. When growing through other page, the existing line version will be moved to other pages.
No action is required when deleting a row. The maximum and minimum values may no longer exist, but to detect this, you need to traverse all the values in the range, which is not worth it. The correctness of the index does not affect the correctness of the scan, but only has an impact on performance: more pages.PG needs to be scanned than it really is. Brin_desummarize_range and brin_summarize_new_values functions provide manual recalculation summary information, but how can such a requirement be detected? At least traditional methods can't do it.
To update the row, just delete the expired version and add a new one.
Use
Test data:
Testdb=# create table t_brin (id int,randomnum int); CREATE TABLEtestdb=# testdb=# truncate table tweak brinability truncate TABLEtestdb=# insert into t_brin select x random () * 10000000 from generate_series (1meme 10000000) as x insert 0 10000000testdb=# testdb=# create index idx_t_brin_id on t_brin using brin (id); CREATE INDEXtestdb=# create index idx_t_brin_randomnum on t_brin using brin (randomnum); CREATE INDEXtestdb=# create index idx_t_brin_btreeid on t_brin using btree (id) CREATE INDEXtestdb=# testdb=# select pg_size_pretty (pg_table_size ('idx_t_brin_id'')); pg_size_pretty-346 MB (1 row) testdb=# select pg_size_pretty (pg_table_size ('idx_t_brin_id')) Pg_size_pretty-48 kB (1 row) testdb=# select pg_size_pretty (pg_table_size ('idx_t_brin_btreeid')); pg_size_pretty-214 MB (1 row)
The brin index has only 48KB, while the Btree index is 214m, which is 4565 times the size of the BRIN index!
Execute the query on the sequential column:
Testdb=# analyze tasking Brin and ANALZEtestdbac # explain analyze verbose select * from t_brin where id = 102345 QUERY PLAN- -Index Scan using idx_t_brin_btreeid on public.t_brin (cost=0.43..8.45 rows=1 width=8) (actual time=0.221..0.222 rows=1 loops=1) Output: id Randomnum Index Cond: (t_brin.id = 102345) Planning Time: 0.521 ms Execution Time: 0.266 ms (5 rows) testdb=# drop index idx_t_brin_btreeid DROP INDEXtestdb=# explain analyze verbose select * from t_brin where id = 102345 QUERY PLAN- -Bitmap Heap Scan on public.t_brin (cost=12.03..41657.45 rows=1 width=8) (actual time=4.049..7.630 rows=1 loops=1) Output: id Randomnum Recheck Cond: (t_brin.id = 102345) Rows Removed by Index Recheck: 28927 Heap Blocks: lossy=128-> Bitmap Index Scan on idx_t_brin_id (cost=0.00..12.03 rows=28902 width=0) (actual time=0.137..0.137 rows=1280 loops=1) Index Cond: (t_brin.id = 102345) Planning Time: 0.344 ms Execution Time: 7.666 ms (9 rows)
For equivalent query, PG selects Btree index, cost is 8.45and execution time is 0.266 ms.
Delete Btree index, use BRIN index, cost is 41657.45, execution time is 7.666 ms, 28 times as much as Btree. Although it is 1 order of magnitude slower, the absolute time is not long and the space consumed is 3 orders of magnitude less.
Execute the query on a random value column:
Testdb=# explain verbose select * from t_brin where randomnum = 102345 QUERY PLAN-Gather (cost=1000 .00.. 97331.41 rows=2 width=8) Output: id Randomnum Workers Planned: 2-> Parallel Seq Scan on public.t_brin (cost=0.00..96331.21 rows=1 width=8) Output: id, randomnum Filter: (t_brin.randomnum = 102345) (6 rows) testdb=# set enable_seqscan=off SETtestdb=# explain verbose select * from t_brin where randomnum = 102345 QUERY PLAN -Gather (cost=1023.07..97354.49 rows=2 width=8) Output: id Randomnum Workers Planned: 2-> Parallel Bitmap Heap Scan on public.t_brin (cost=23.07..96354.29 rows=1 width=8) Output: id, randomnum Recheck Cond: (t_brin.randomnum = 102345)-> Bitmap Index Scan on idx_t_brin_randomnum (cost=0.00..23.07 rows=9999977 width=0) Index Cond: (t_brin.randomnum = 102345) (8 rows)
PG does not use the BRIN index on this column but chooses full table scan. After disabling sequential scanning, the cost of using brin index is about the same as that of full table scan, indicating that the BRIN index on random value column has no effect.
Testdb=# select attname, correlation from pg_stats where tablename='t_brin' order by correlation desc nulls last; attname | correlation-+-id | 1 randomnum | 0.0016428155 (2 rows)
Query statistics, id column correlation is 1, while random value column correlation is 0.0016428155, the difference is huge.
Thank you for your reading, these are the contents of "what is the basic knowledge and structure of BRIN index in PostgreSQL". After the study of this article, I believe you have a deeper understanding of what the basic knowledge and structure of BRIN index in PostgreSQL is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.