How to implement Bloom filter by Redis 07/19 Update SLTechnology News&Howtos

How to implement Bloom filter by Redis

2025-07-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Shulou(Shulou.com)05/31 Report--

Editor to share with you how to achieve the Bloom filter Redis, I believe that most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to understand it!

The Bloom filter (Bloom Filter) was proposed by Bloom in 1970. It is actually a very long binary vector and a series of random mapping functions. The Bloom filter can be used to retrieve whether an element is in a collection. Its advantage is that the space efficiency and query time are much better than the general algorithm, and its disadvantage is that it has a certain error recognition rate and deletion difficulties.

This article will introduce the principle of Bloom filter and how to implement Bloom filter by Redis.

Application scenario

1. There are 5 billion telephone numbers, and there are now 100000 telephone numbers. how can we tell whether these 100000 already exist in 5 billion? (possible solution: database, set, hyperloglog)

2. When the news client watches the news, it will constantly recommend new content and repeat it every time, so how to achieve push de-duplication?

3. Reptile URL de-weight?

4. Reduce the number of IO requests in the NoSQL database domain?

5. Spam filtering in mailbox system?

Bloom filter (Bloom Filter) is designed to solve this problem, it can remove weight at the same time, it can also save more than 90% in space, but there is a certain probability of miscalculation.

Recognize Bloom filter

Bloom filter is a kind of data structure similar to set, but it is not very accurate. When using bf.exists to judge whether an element exists, the return result exists but does not necessarily exist; when the return does not exist, it certainly does not exist, so there is a certain probability of misjudgment when judging whether the element is duplicated or not.

Of course, misjudgment will only occur in elements that have not been added by the filter, and misjudgments will not occur in elements that have been added.

Features: efficient insertion and query, occupies less space, and the returned results are uncertain.

Principle of Bloom filter

Each Bloom filter corresponds to the data structure of Redis is a large array of bits and several different unbiased hash functions, unbiased means that the distribution is uniform.

When adding key, we use several hash functions to hash the key to get an integer index value, modular operation on the length of the bit array to get a position, each hash function will get a different position, set these positions to 1 to complete the add operation.

Similarly, as long as one bit is 0, the key does not exist, but if it is all 1, then there may not be a corresponding key.

Space occupancy estimation

There is a simple formula for calculating the space occupation of Bloom filter, but it is complicated to deduce. The Bloom filter has two parameters, the expected number of elements n, the error rate f, the formula gets two outputs, the bit array length L (that is, the storage space size bit), and the optimal number k of the hash function.

K = 0.7 * (1Po)

F = 0.6185 ^ (LPo)

1. The longer the relative length of the digit array, the lower the error rate.

2. The longer the relative length of the digit array, the more hash functions are needed.

3. When an element needs an average of one byte (8bit) of fingerprint space (L/n=8), the error rate is about 2%.

How will the misjudgment rate change when the actual element exceeds?

F = (1-0.5 ^ t) ^ k # t is the multiple of the actual element and the expected element

1. When the error rate is 10% and the multiple ratio is 2, the error rate is close to 40%.

2. When the error rate is 1% and the multiple ratio is 2, the error rate is 15%.

3. When the error rate is 0.1% and the multiple is 2, the error rate is 5%.

Implementing simple Bloom Filter with Redis

If you want to use the Bloom filter provided by redis, you must add plug-ins above redis version 4.0. please refer to the online installation steps.

The Bloom filter has two basic instructions: bf.add to add elements, bf.exists to query for the existence of elements, bf.madd to add multiple elements at a time, and bf.mexists to query multiple elements at a time.

> bf.add spiderurl www.baidu.com

> bf.exists spiderurl www.baidu.com

> bf.madd spiderurl www.sougou.com www.jd.com

> bf.mexists spiderurl www.jd.com www.taobao.com

The Bloom filter automatically creates a filter based on the default parameters on the first add, and Redis also provides a Bloom filter with custom parameters.

Explicitly created using the bf.reserve instruction before add, which has three parameters, key,error_rate, initial_size, the lower the error rate, the more space is needed. Error_rate indicates the expected error rate, and the initial_size parameter indicates the number of elements expected to be put. When the actual number exceeds this value, the error rate will increase, so you need to set a larger value in advance to avoid exceeding.

The default error_rate is 0.01. The initialsize is 100.

Use Bloom filters to reduce disk IO or network requests, because once a value must not exist, we don't have to make subsequent expensive query requests.

The above is all the content of the article "how to implement the Bloom filter in Redis". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.