In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article introduces the relevant knowledge of "what is a Bloom filter". In the operation of actual cases, many people will encounter such a dilemma. Then let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Before we formally explain the Bloom filter, let's take a look at this business scenario:
Redis is a common component in software architecture, and the most common use is to cache hot data in Redis to reduce the pressure on the database; the most common use in the query process is to query Redis, return directly if it can be queried, and continue to query the database if it does not exist in Redis.
This method can reduce the number of visits to the database, but "query the database when it is not in the cache" is still risky in a highly concurrent environment. For example, 90% of the requested data is not in the cache. Then these requests will fall on the database, which is called cache penetration.
So is there any way to solve this problem? This allows you to use the Bloom filter, which determines that "some data definitely doesn't exist."
01. The concept of Bloom filter
The Bloom filter is proposed by a person named "Bloom", which itself is a very long binary vector (imagined as an array) and a series of random mapping functions (imagined as multiple Hash functions). The binary vector stores either 0 or 1 (before learning the Bloom filter, you can understand the BitMap algorithm for easy understanding).
For example, if you want to query customer information based on your mobile phone number, you will usually set your mobile phone number to the Key in the cache. Let's set a Bloom filter with a length of 16.
Bloom filter initialization is all 0
With the operation of hash2 (), hash3 () and hash4 () for 13800000000, three results 5, 9 and 12 are obtained, and the corresponding position is set to 1.
The operation of hash2 (), hash3 () and hash4 () is performed on 18900000000, and three results 2, 8, 12 are obtained. The corresponding position is set to 1. Now 2, 5, 8, 9, 12 are all 1, and the other elements are all 0.
What do we need to do if we want to verify that a phone number exists?
Perform hash2 (), hash3 () and hash4 () operations on 13700000000, respectively, and get three results 1, 9, 13, and then determine whether the value on bit 1, 9, and 13 is 0 or 1. If it is not all 1, it means that 13700000000 is not on the Bloom filter; this determines that "some data must not exist".
Of course, we can also see that there is a problem with the Bloom filter, that is, there is no guarantee that the data must exist. For example, if we do the hash2 (), hash3 () and hash4 () operations on 180000000, the results are 5, 8 and 9, which happen to be 1, but in fact, this data does not exist, so the Bloom filter has a certain misjudgment rate.
And because multiple data may be mapped to the same location after operation (there are 12 results for both 138 and 189), it is difficult to delete a Bloom filter unless you want to add a counter for each bit. when deleting, you need to subtract the counter by 1, and the corresponding position of the Bloom filter is not changed to 0 until the counter is 0.
02. Summary of characteristics
You can be sure that an element doesn't exist, but you can't be sure that an element does exist.
The longer the binary vector is, the more mapping functions are, and the lower the misjudgment rate is. If the misjudgment rate can be determined in advance, the length of Bloom filter can also be deduced.
You can add elements, but you cannot delete them (unless you add counters)
There are great advantages in both the storage space and the time complexity of inserting queries.
Going back to the business scenario at the beginning of this article, in order to prevent cache penetration, you can use a Bloom filter to filter out data that definitely does not exist. Although misjudged requests will still be put into the database, the number of penetration has been greatly reduced.
03. A handwritten Bloom filter
Code is not the end, the process of Coding is to deepen understanding.
First, we need to define a bitmap. In JDK, there is already a data structure class java.util.BitSet for the corresponding implementation:
/ / set a Bloom filter private int DEFAULT_SIZE = 1
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.