In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
Today, I would like to share with you how java can quickly judge whether the elements are in the collection. The content is detailed and the logic is clear. I believe most people still know too much about this knowledge, so share this article for your reference. I hope you can get something after reading this article. Let's take a look at it.
1. What is a Bloom filter
The Bloom filter (Bloom Filter) was proposed by a brother named Bloom in 1970.
In fact, it can be regarded as a data structure composed of binary vectors (or bit arrays) and a series of random mapping functions (hash functions).
Its advantage is that the space efficiency and query time are much better than the general algorithm, and its disadvantage is that it has a certain error recognition rate and deletion difficulties.
2. Realization principle
Let's start with a picture.
The main idea of Bloom filter algorithm is to use n hash functions to hash, get different hash values, map to different index positions of the array (the length of this array may be very long) according to hash, and then set the value on the corresponding index bit to 1.
To judge whether the element appears in the collection is to use k different hash functions to calculate the hash value to see whether the hash value corresponds to the value above the corresponding index position is 1. If 1 is not 1, the element does not exist in the collection.
However, it is also possible to determine that the element is in the collection, but the element is not, and the 1 above all index positions of this element is set by other elements, which leads to a certain probability of misjudgment (which is the root cause of why the above may be in a collection, because there will be some hash conflicts).
Note: the lower the misjudgment rate, the lower the corresponding performance.
3. Function
The Bloom filter can be used to determine whether an element is (possibly) in a collection, and it has a huge advantage in both space and time over other data structures.
Pay attention to the above word: maybe. A suspense is reserved here, which will be analyzed in detail below.
Determine whether a given data exists
Prevent cache penetration (to determine whether the requested data is effective to avoid directly bypassing the cache request database), mailbox spam filtering, blacklisting, and so on.
4. Concrete realization
After reading the algorithm idea of Bloom filter, then begin to explain the specific implementation.
Let me first give an example. Suppose there are two strings, Wangcai and Xiaoqiang, who have gone through the hash algorithm three times respectively, and then set the value of the index position of the corresponding array (assuming the length of the array is 16) to 1 according to the result of hash. Let's take a look at the phrase Wangcai:
After three times of hash, the value of Wangcai is 2 hash 4, 6 respectively, then according to the index values that can be obtained are 2, 4, 6, respectively, so the value of the index (2, 4, 6) of the array is set to 1, and the rest is taken as 0. Now suppose that you need to find Wangcai, and then find that the corresponding values of index 2, 4, and 6 are all 1, then you can judge that Wangcai may exist.
Then there is the insertion of Xiao Qiang into the Bloom filter, and the actual process is the same as above, assuming that the subscript is 1, 3, 5.
Despite the existence of Wangcai, Xiaoqiang is like this in the Bloom filter at this time, and the array that combines Wangcai and Xiaoqiang is like this:
Now there is a data: 9527, now the requirement is to determine whether 9527 exists, suppose 9527 after three times of hash to get the subscript: 5, 6, 7 respectively. It is found that the value of the position with subscript 7 is 0, so it is certain that 9527 does not exist.
Then there is a domestic 007. After three times of hash, the subscript is 2, 3 and 5 respectively. It is found that the corresponding values of 2, 3 and 5 subscript are all 1, so it can be roughly judged that the domestic 007 may exist. But in fact, after our demonstration, domestic 007 does not exist at all, the reason why the value of 2, 3, 5 index position is 1, that is because other data settings.
Speaking of which, I don't know if you understand the function of Bloom filter.
5. The realization of the code
As java programmers, we are really happy, we use a lot of frameworks and tools, basically encapsulated, Bloom filter, we use google encapsulated tool classes. Of course, there are other ways that you can explore.
Add dependencies first
Com.google.guava guava 25.1-jre
Implementation of the code
Import com.google.common.hash.BloomFilter;import com.google.common.hash.Funnels;import java.nio.charset.Charset Public class BloomFilterDemo {public static void main (String [] args) {/ * create a Bloom filter with 100 million inserts and a false alarm rate of 0.01% * does not exist necessarily * does not exist * * Funnel objects: estimated number of elements Misjudgment rate * mightContain: method to determine whether an element exists * / BloomFilter bloomFilter = BloomFilter.create (Funnels.stringFunnel (Charset.forName ("utf-8")), 100000000, 0.0001) BloomFilter.put ("death"); bloomFilter.put ("knock"); bloomFilter.put ("Redis"); System.out.println (bloomFilter.mightContain ("Redis")); System.out.println (bloomFilter.mightContain ("Java"));}}
The specific explanation has been written in the notes. Here, I believe you must understand the Bloom filter and how to use it.
6. Actual combat
Let's simulate a scenario where cache penetration is solved through a Bloom filter.
First of all, you know what cache penetration is, right?
Cache traversal means that the user accesses data that is not in the cache or in the database, and because it does not exist in the cache, the user accesses the database if concurrency is high. It's easy to break the database.
So how does the Bloom filter solve this problem? he
The principle is like this: put all the query conditions in the database into the Bloom filter, when a query request comes, first check through the Bloom filter, if it is judged that the request query value exists, then continue to check; if it is determined that the request query does not exist, it is discarded directly.
The code is as follows:
String get (String key) {String value = redis.get (key); if (value = = null) {if (! bloomfilter.mightContain (key)) {return null;} else {value = db.get (key); redis.set (key, value);} return value } these are all the contents of the article "how to quickly determine whether an element is in a collection by java". Thank you for reading! I believe you will gain a lot after reading this article. The editor will update different knowledge for you every day. If you want to learn more knowledge, please pay attention to the industry information channel.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.