In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-04 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >
Share
Shulou(Shulou.com)05/31 Report--
Implementation of Bloom filter in Redis? I believe that many inexperienced people are at a loss about this, so this article summarizes the causes and solutions of the problem. Through this article, I hope you can solve this problem.
What is "Bloom filter"?
The Bloom filter is a magical data structure that can be used to determine whether an element is in a collection. One of the most commonly used functions is to remove heavy loads. A common requirement in crawlers: the target site URL tens of millions, how to determine whether a URL crawler has been spoiled? To keep it simple, every time the crawler collects a URL, it stores the URL in the database, and every time a new URL comes over, it goes to the database to query whether it has been accessed.
Select id from table where url = 'https://jaychen.cc'
But as more and more URL crawlers crawl, the database is accessed once before each request, and SQL queries for such strings are not efficient. In addition to the database, the set structure using Redis can also meet this requirement, and the performance is better than the database. But Redis also has a problem: it consumes too much memory. At this time, the Bloom filter made a very horizontal appearance: let me do this question.
Compared with databases and Redis, using Bloom filters can avoid performance and memory footprint problems.
The Bloom filter is essentially an array of bits, which means that each element of the array takes up only 1 bit. Each element can only be 0 or 1. In this way, applying for a 10000-element array takes up only 10000 / 8 = 1250 B of space. The Bloom filter has K hash functions in addition to an array of digits. When an element is added to the Bloom filter, the following actions are done:
K hash functions are used to calculate the element values for K times, and K hash values are obtained.
According to the hash value, set the value of the corresponding subscript to 1 in the bit array.
For example, suppose the Bloom filter has three hash functions: F1, f2, f3 and a bit array arr. Now insert https://jaychen.cc into the Bloom filter:
After hashing the values three times, we get three values N1, N2, n3.
Set the three elements arr [N1], arr [N2] and arr [3] in the bit array to 1.
When you want to determine whether a value is in the Bloom filter, hash the element again, and then determine whether each element in the bit array is 1. If the value is 1, then the value is in the Bloom filter. If there is a value that is not 1, the element is not in the Bloom filter.
Can not read the text to see the following soul painter's picture explanation?
After reading the above instructions, it is bound to raise a question: when more elements are originally inserted, the more positions are set to 1 in the bit array, and when an element that is not in the Bloom filter is hashed, the resulting values are queried in the bit array, and it is possible that these positions are also set to 1. Such a person that does not exist in the Bloom filter may also be misjudged to be in the Bloom filter. But if the Bloom filter determines that an element is not in the Bloom filter, then the value must not be in the Bloom filter. To put it simply:
If the Bloom filter says that an element is present, it may be misjudged.
The Bloom filter says that if an element is not there, it must not be there.
The bug of this Bloom filter is put into the requirements of the crawler above. There may be some unvisited URL that may be misjudged as visited, but the visited URL will not be misjudged as unvisited.
Bloom filter in Redis
Redis adds module function in version 4.0, and the Bloom filter can be added to redis in the form of module, so you can use the Bloom filter in redis by loading module when using redis version 4.0 or above. But this is not the easiest way to use docker to experience Bloom filters directly in redis.
> docker run-d-p 6379Viru 6379-name bloomfilter redislabs/rebloom > docker exec-it bloomfilter redis-cli
The redis Bloom filter has two main commands:
Bf.add adds an element to the Bloom filter: bf.add urls https://jaychen.cc.
Bf.exists determines whether an element is in the filter: bf.exists urls https://jaychen.cc.
There are two values in redis that determine the accuracy of the Bloom filter:
Error_rate: allows the error rate of the Bloom filter. The lower the value, the larger the size of the filter's bit array, and the more space it takes up.
Initial_size: the number of elements that a Bloom filter can store. When the actual number of elements stored exceeds this value, the accuracy of the filter will decrease.
There is a command in redis to set these two values:
Bf.reserve urls 0.01 100
The meaning of the three parameters:
The first value is the name of the filter.
The second value is the value of error_rate.
The third value is the value of initial_size.
One thing to note when using this command: the filter name should not exist before the command is executed. If it exists before the command is executed, it will report an error: (error) ERR item exists
After reading the above, have you mastered the implementation of the Bloom filter in Redis? If you want to learn more skills or want to know more about it, you are welcome to follow the industry information channel, thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.