In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
How to understand, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.
The Bloom filter (Bloom Filter) was proposed by Bloom in 1970. It is actually a very long binary vector and a series of random mapping functions. The Bloom filter can be used to retrieve whether an element is in a collection. Its advantage is that the space efficiency and query time are far higher than the general algorithm, and the disadvantage is that it has a certain error recognition rate and deletion difficulties.
If you want to determine whether an element is in a collection, the general idea is to save all the elements and then determine by comparison. Linked lists, trees and other data structures are all this way of thinking. However, with the increase of elements in the collection, we need more and more storage space, and the retrieval speed is getting slower and slower (O (n), O (logn)). However, there is also a data structure called hash table (Hash table) in the world. It can map an element to a point in a bit array (Bit array) through a Hash function. In this way, we only need to see whether this point is 1 or not to know if it is in the collection. This is the basic idea of the Bloom filter. [see Baidu encyclopedia for details]
In general: the Bloom filter is implemented using a hash algorithm + bitmap.
So at the bottom of the Bloom filter, we encapsulate it with a bitmap.
Those who are not familiar with hashing algorithms can see the blogger http://helloleex.blog.51cto.com/10728491/1770568.
If you are not familiar with bitmaps, you can check the blogger's blog http://helloleex.blog.51cto.com/10728491/1772827.
# pragma once#include#include "Bit_Map.h" # include "HashFunc.h" imitating function templateclass BloomFilter {public: BloomFilter (size_t size) {/ / size_t newsize = _ GetNextPrime (size); / / _ capacity = newsize / / size is not necessarily a prime, take a prime in the prime table and open a space with large primes _ capacity = _ bm.Resize (size);} void Set (const K & key) {size_t index1 = HashFunc1 () (key); size_t index2 = HashFunc2 () (key) Size_t index3 = HashFunc3 () (key); size_t index4 = HashFunc4 () (key); size_t index5 = HashFunc5 () (key); _ bm.Set (index1); _ bm.Set (index2); _ bm.Set (index3); _ bm.Set (index4) The _ bm.Set (index5);} / Bloom filter cannot delete because there are different hashing algorithms, and different key / / tags may be in the same bit. Deletion will delete other people's records and affect their correctness. / / void Reset (const K & key); / / Test does not exist necessarily, bool Test (const K & key) {size_t index1 = HashFunc1 () (key); size_t index2 = HashFunc2 () (key); size_t index3 = HashFunc3 () (key); size_t index4 = HashFunc4 () (key) Size_t index5 = HashFunc5 () (key); if (_ bm.Test (index1) | | _ bm.Test (index2) | | _ bm.Test (index3) | | _ bm.Test (index4) | | _ bm.Test (index5)) return true; else return false } protected: BitMap _ bm; size_t _ capacity;}; HashFunc.h//5 efficient hash algorithm, all invented by the great god. # pragma oncetemplate//BKDR Hash Function is a simple and fast hash algorithm. / / it is also the string Hash algorithm currently used by Java (cumulative factor is 31) struct _ HashFunc1 {size_t operator () (const T & str) {register size_t hash = 0; const char* tmp = str.c_str () While (size_t ch = (size_t) * tmp++) {hash = hash * 131 + ch;} return hash;}}; template//RS Hash Function. It gets its name from the display of Robert Sedgwicks in his book Algorithms in C. Struct _ HashFunc2 {size_t operator () (const T & str) {register size_t hash = 0; size_t magic = 63689; const char* tmp = str.c_str (); while (size_t ch = (size_t) * tmp++) {hash = hash * magic + ch Magic * = 378551;} return hash;}}; template//AP Hash Function A hash algorithm invented by Arash Partow. Struct _ HashFunc3 {size_t operator () (const T & str) {register size_t hash = 0; size_t ch; const char* tmp = str.c_str (); for (long I = 0; ch = (size_t) * tmp++ ) {if ((I & 1) = = 0) {hash ^ = ((hash > 3)) } else {hash ^ = (~ ((hash > 5));}} return hash;}}; template// DJB Hash Function 2. Another hash algorithm invented by Daniel J. Bernstein. Struct _ HashFunc4 {size_t operator () (const T & str) {const char* tmp = str.c_str (); if (! * tmp) / / this is added by myself to ensure that the empty string returns a hash value of 0 return 0; register size_t hash = 5381 While (size_t ch = (size_t) * tmp++) {hash = hash * 33 ^ ch;} return hash;}; template//JS Hash Function. A hash algorithm invented by Justin Sobel. Struct _ HashFunc5 {size_t operator () (const T & str) {const char* tmp = str.c_str (); if (! * tmp) / / this is added by myself to ensure that the empty string returns a hash value of 0 return 0; register size_t hash = 1315423911 While (size_t ch = (size_t) * tmp++) {hash ^ = (hash > 2);} return hash;}}
The Bloom filter has a false recognition rate, although the probability is very low, but if one or two of the five hash functions are mistakenly identified, an error will occur.
And the Bloom filter does not support deleting Reset. Because there is a good chance that the records identified by other hash functions will be deleted, the probability of false recognition is greatly increased.
Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.