What is the use of BLOOM INDEX in PostgreSQL 04/10 Update SLTechnology News&Howtos

What is the use of BLOOM INDEX in PostgreSQL

2025-04-10 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article introduces what is the use of BLOOM INDEX in PostgreSQL. The content is very detailed. Interested friends can use it for reference. I hope it will be helpful to you.

The editor wants to talk about the problem of bloom filter, but why is the title so? generally speaking, if we want to add an index to a large table, and this query has to add a lot of columns, it is a headache. There is an index called BLOOM INDEX in PostgreSQL, and what are the advantages of this index? let's take a look.

First of all, what BLOOM, I read some online materials, written very well, in which a variety of high-end X values, K HASH, approaching the limit, bula bula, if I write the same now, it is estimated that many people will turn it off.

So I intend to use popular words to talk about this matter, what may be inaccurate, you experts to correct.

For example, we have a draft contest, which is full of "fresh meat" and "fairies", and there are three judges here, namely Ke Yimin, Jin Xing and Feng Xiaogang. One of them comes in first, Li Yugang.

Ke Yimin said, OK, Jin Xing said yes, Feng Xiaogang said get out.

And then it's on our score sheet.

The number 1 1 0

Next came a little fairy, Li and Chun, Ke Yimin said go, Jin Xing said go, Feng Xiaogang said yes.

Then we have numbers on our score sheet.

0 0 1

And so on, Wu Yifan is 101, Shi Yan is 100, and so on, we can use such numbers to identify this person, or similar such people.

But sometimes this is not always the case, for example, the one who came in was Zhou Shen, and the score was 110, just like Li Yugang.

Then it is wrong for you to say that 110 is Li Yugang. (note, the score is the same because they all have a high voice, and you can't tell if they are men.)

OK at this point we get out of entertainment and return to BLOOM filtering. After a value is calculated by N hash, different values are produced in the list. A value can be identified by multiple calculated values of HASH, which is the essence of BLOOM filter. But if you use this method to exclude values, it's definitely 100% to rule out data that don't match the value you're looking for.

Let's draw a diagram, which roughly means that we have a bunch of values, and through a variety of HASH algorithms, we generate the corresponding HASH values in our list below, and the following list is where these values are recorded. After VALUE1 passes three different HASH algorithms, the value is 1000000100010010000001.

Of course, the place where the number of digits is 1 is very likely, or there is a great possibility that there will be repetition, but when we encounter different HASH algorithms or later VALUE2 will continue to write 1 where there is already 1, we will ignore it. Finally, we will calculate the values of VALUES 1 2 3 4 in turn after the calculation of 1010010101010101010011101.

So what's the point of getting this value? the meaning is that we are calculating

When the value of value5 6 7 8 is different from that of 1010010101010010101010101010011101, we are 100% sure that our value5 6 7 8 is different from our value 1 2 3 4, but if we calculate value5 6 7 8 and get the same value as value 1 2 3 4, we are not 100% sure that our two calculations are equal. This is the elimination method that we are familiar with, and if we want limit to approach 1, we can infinitely add a more accurate HASH algorithm and the length of the saved value.

So what are the advantages of this BLOOM filter compared to other indexes when it is used in indexes?

Use the bloom filter. When there is a table with too many columns, and the query uses a combination of too many columns on such a table, many indexes are required. Maintaining so many indexes is not only expensive for the database, but also a performance killer when dealing with larger datasets.

If you create an bloom index on all of these columns, a hash is calculated for each column and each row / record is merged into an index entry of a specified length. This allows you to quickly sort out mismatched records, which is a good choice if the records you query are small or unique in a large table.

Let's take a look at the weight of Bloom index in PostgreSQL.

1 We set up an extension of postgresql

CREATE EXTENSION bloom

2 create a test table to insert 10000000 rows of data

3 it takes about 31 seconds to build a composite index on this table in the way of BTREE.

4 We have done some tests on the query, and we can see that the speed of the query is quite fast.

5 We delete the index, and then set up the bloom index. It takes about 8 seconds to build the bloom index as a whole, which is 3/4 faster than 31 seconds.

6 the query speed is also less than 10 times faster than the ordinary BTREE index.

Then the following question arises, do you say so fast, so fast, no shortcomings?

1 Bloom filter is suitable for index establishment of multiple fields

2 Bloom is suitable for equivalent operation

About the use of BLOOM INDEX in PostgreSQL to share here, I hope that the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.