Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use the Bloom index in PostgreSQL

2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use the Bloom index in PostgreSQL". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how to use the Bloom index in PostgreSQL".

Brief introduction

Bloom Index originates from Bloom filter (Bloom filter). The Bloom filter is used to quickly determine whether a value is in the set when using a small amount of space. Its disadvantage is that there is a false positive False Positives, so it needs Recheck to determine whether the value is in the set, but the Bloom filter does not have false negative, that is to say, for a value, if the filter does not exist, it does not exist.

structure

Its structure is shown in the following figure:

The first page is metadata, and then each line will have a bit array (signature) and TID corresponding to it.

Example

Create a data table and insert data

Testdb=# drop table if exists tasking bloom.DROP TABLEtestdb=# CREATE TABLEt _ bloom (id int, dept int, id2 int, id3 int, id4 int, id5 int,id6 int,id7 int,details text, zipcode int) CREATE TABLEtestdb=# testdb=# INSERT INTO t_bloom testdb-# SELECT (random () * 1000000):: int, (random () * 1000000):: int,testdb-# (random () * 1000000):: int, (random () * 1000000):: int,testdb-# (random () * 1000000):: int, (random () * 1000000):: int,md5 (g::text) Floor (random () * (20000-9999 + 1) + 9999) testdb-# from generate_series (1meme 16024024g) g INSERT 0 16777216testdb=# testdb=# analyze tasking bloom.ANALYZEtestDB # testdb=# select pg_size_pretty (pg_table_size ('tweebloom')); pg_size_pretty-1619 MB (1 row)

Create a Btree index

Testdb=# testdb=# create index idx_t_bloom_btree on t_bloom using btree (id,dept,id2,id3,id4,id5,id6,id7,zipcode) CREATE INDEXtestdb=#\ di+ idx_t_bloom_btree List of relations Schema | Name | Type | Owner | Table | Size | Description-+-- -public | idx_t_bloom_btree | index | pg12 | t_bloom | 940 MB | (1 row)

Execute query

Testdb=# EXPLAIN ANALYZE select * from t_bloom where id4 = 305294 and zipcode = 13266 QUERY PLAN -Index Scan using idx_t_bloom_btree on t_bloom (cost=0.56..648832.73 rows=1 width=69) (actual time=2648.215..2648.215 rows=0 loops=1) Index Cond: (id4 = 305294) AND (zipcode = 13266) Planning Time: 3.244 ms Execution Time: 2659.804 ms (4 rows) ) testdb=# EXPLAIN ANALYZE select * from t_bloom where id5 = 241326 and id6 = 354198 QUERY PLAN -Index Scan using idx_t_bloom_btree on t_bloom (cost=0.56..648832.73 rows=1 width=69) (actual time=2365.533..2365.533 rows=0 loops=1) Index Cond: (id5 = 241326) AND (id6 = 354198) Planning Time: 1.918 ms Execution Time: 2365.629 ms (4 rows)

Create a Bloom index

Testdb=# create extension bloom;CREATE EXTENSIONtestdb=# CREATE INDEX idx_t_bloom_bloom ONt _ bloom USING bloom (id, dept, id2, id3, id4, id5, id6, id7, zipcode) testdb-# WITH (length=64, col1=4, col2=4, col3=4, col4=4, col5=4, col6=4, col7=4, col8=4, col9=4) CREATE INDEXtestdb=#\ di+ idx_t_bloom_bloom List of relations Schema | Name | Type | Owner | Table | Size | Description-+-- -public | idx_t_bloom_bloom | index | pg12 | t_bloom | 225 MB | (1 row)

Execute query

Testdb=# EXPLAIN ANALYZE select * from t_bloom where id4 = 305294 and zipcode = 13266 QUERY PLAN -- Bitmap Heap Scan on t_bloom (cost=283084.16..283088.18 rows=1 width=69) (actual time=998.727..998.727 rows=0 loops=1) Recheck Cond: (id4 = 305294) AND (zipcode = 13266) Rows Removed by Index Recheck: 12597 Heap Blocks: exact=12235-> Bitmap Index Scan on idx_t_bloom_bloom (cost=0.00) .. 283084.16 rows=1 width=0) (actual time=234.893..234.893 rows=12597 loops=1) Index Cond: (id4 = 305294) AND (zipcode = 13266) Planning Time: 31.482 ms Execution Time: 998.975 ms (8 rows) testdb=# EXPLAIN ANALYZE select * from t_bloom where id5 = 241326 and id6 = 354198 QUERY PLAN -- Bitmap Heap Scan on t_bloom (cost=283084.16..283088.18 rows=1 width=69) (actual time=1019.621..1019.621 rows=0 loops=1) Recheck Cond: (id5 = 241326) AND (id6 = 354198) Rows Removed by Index Recheck: 13033 Heap Blocks: exact=12633-> Bitmap Index Scan on idx_t_bloom_bloom (cost=0.00) .. 283084.16 rows=1 width=0) (actual time=204.873..204.873 rows=13033 loops=1) Index Cond: (id5 = 241326) AND (id6 = 354198) Planning Time: 0.441 ms Execution Time: 1019.811 ms (8 rows)

From the point of view of the execution result, if there is no non-leading column in the query condition (id1 in the example above), bloom index will outperform btree index in the case of arbitrary combination of multiple columns.

Thank you for reading, the above is the content of "how to use the Bloom index in PostgreSQL". After the study of this article, I believe you have a deeper understanding of how to use the Bloom index in PostgreSQL, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report