In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Servers >
Share
Shulou(Shulou.com)05/31 Report--
This article mainly introduces what Bloom filter is in hbase, it has certain reference value, interested friends can refer to it, I hope you can learn a lot after reading this article, let's take you to know it.
1. Bloom filter
Scope of application: it can be used to realize the data dictionary, judge the weight of the data, or gather to find the intersection.
Basic principles and main points:
It is very simple for the principle, the bit array + k independent hash functions. Set the bit array of the values corresponding to the hash function to 1, and if you find that all the corresponding bits of the hash function are 1, it is obvious that this process does not guarantee that the search results are 100% correct. It is also not supported to delete an inserted keyword because the corresponding bit of the keyword affects other keywords. So a simple improvement is counting Bloom filter, which can be deleted by using an counter array instead of a bit array.
There is also an important problem, how to determine the size of the bit array m and the number of hash functions according to the number of input elements n. The error rate is the lowest when the number of hash functions is k = (ln2) * (Mzone). When the error rate is not greater than E, m must be at least equal to n*lg (1max E) to represent the set of any n elements. But m should be larger, because to make sure that at least half of the bit array is 0, then m should > = nlg (1amp E) * lge is about 1.44x nlg (1max E) (lg represents the logarithm with base 2).
For example, if we assume that the error rate is 0.01, then m should be about 13 times that of n. So k is about eight.
Note that the unit of m is different from that of n. M is the unit of bit, while n is the unit of the number of elements (exactly the number of different elements). Usually the length of a single element has a lot of bit. So using bloom filter is usually a savings in memory.
Extend:
Bloom filter maps the elements in the set into the array, using k (k is a hash function) whether the mapping bits are all 1 to indicate whether the element is in the set or not. Counting bloom filter (CBF) extends each bit in the bit array to a counter, thus supporting the deletion of elements. Spectral Bloom Filter (SBF) associates it with the number of occurrences of the collection element. SBF uses the minimum value in counter to approximately represent the occurrence frequency of elements.
Problem example: give you two files of URL B, each storing 5 billion URL, each URL occupies 64 bytes, and the memory limit is 4G, so that you can find out the common URL of the file. What if it's three or even n files?
According to this problem, let's calculate the memory footprint. 4G = 2 ^ 32 is about 4 billion * 8 is about 34 billion, n = 5 billion. If it is calculated according to the error rate of 0.01, it takes about 65 billion bit. 34 billion is available now, and the difference is not much, which may increase the error rate. In addition, if these urlip are one-to-one correspondence, they can be converted into ip, which is much easier.
Thank you for reading this article carefully. I hope the article "what is Bloom filter in hbase" shared by the editor will be helpful to you. At the same time, I also hope you will support us and pay attention to the industry information channel. More related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 300
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.