Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to realize Bloom filter in programming development

2025-03-13 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/03 Report--

This article is about how to implement Bloom filters in programming development. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.

Give 4 billion unrepeated unsigned integers, unsorted. Give an unsigned integer, how to quickly determine whether a number is in these 4 billion numbers.

When we get this topic, the first thing we think of is to traverse the 4 billion numbers, and then look for them one by one. It's obviously not going to work. Because these 4 billion numbers are put into memory, about 16 gigabytes of memory is needed.

If we convert it to bitmap processing, it will be much easier to handle. We can subdivide a × ×, an int type can be programmed with 32 bits, and each bit uses 0meme 1 to indicate whether there is a value in the current location, which is also a method of hash storage. Only this kind of storage can reduce a lot of space, for example, the memory used in the previous question can be reduced from 16G to 500m. The utilization of space has decreased by more than one point.

You can implement the above code according to my method, today I mainly introduce the Bloom filter, because the Bloom filter also uses bitmap (bitmap), bitmap implementation idea:

1. Change one int type to 32 bits.

two。 Initialize them all to 0.

3. If there is a value on the current bit, set 0 to 1.

Here I give the implementation code of the bitmap:

In BitMap.h

# include class BitMap {public: BitMap (size_t size = 0): _ size (0) {/ / Open up space with resize, the values in _ a will be initialized to 0 / / plus 1 in order to put all the values into the array, if there are 36 numbers, 36 will be 32 seconds 1 over 4 And / / the more open space ensures that these four numbers can be put down / / _ a.resize (size/32+1) It has the same property as the following code, except that the shift operator is more efficient than division _ a.resize ((size > > 5) + 1);} / / insert void Set (size_t x) {size_t index = x > > 5; size_t num = x% 32 / / if the current position is equal to 0 and there is no value, you can insert if (!) (1 5; size_t num = x% 32; / / if the current position is 1, there is a value. You can delete if (_ a [index] & (1 5; size_t num = x% 32) If (_ a [index] & (1 size) {return _ PrimeList [I]; / set the capacity size according to the prime table}} / / when the required capacity exceeds the maximum capacity of the prime table, we expand return _ PrimeList according to the maximum capacity [_ PrimeSize-1] } template struct _ HashFunc1 {size_t BKDRHash (const T * str) {register size_t hash = 0; while (size_t ch = (size_t) * str++) {hash = hash * 131 + ch;} return hash } size_t operator () (const T & key) {return BKDRHash (key.c_str ());}}; template struct _ HashFunc2 {size_t SDBMHash (const T * str) {register size_t hash = 0 While (size_t ch = (size_t) * str++) {hash = 65599 * hash + ch; / / hash = (size_t) ch + (hash 5));}} return hash } size_t operator () (const T & key) {return APHash (key.c_str ());}} Template struct _ _ HashFunc5 {size_t JSHash (const T * str) {if (! * str) / / this is added by myself to ensure that the empty string returns a hash value of 0 return 0; register size_t hash = 1315423911 While (size_t ch = (size_t) * str++) {hash ^ = (hash > 2);} return hash;} size_t operator () (const T & key) {return JSHash (key.c_str ());}}

I used five Hash functions to reduce hash conflicts. Depending on the situation, set the number of hash functions by yourself.

In BoolmFilter.h

# pragma once#include#include "BitMap.h" # include "common.h" templateclass BoolmFilter {public: BoolmFilter (size_t capatity = 0) {_ capatity = NewSize (capatity); _ bm.Resize (_ capatity);} void Set (const T & key) {size_t index1 = HashFunc1 () (key) Size_t index2 = HashFunc2 () (key); size_t index3 = HashFunc3 () (key); size_t index4 = HashFunc4 () (key); size_t index5 = HashFunc5 () (key); _ bm.Set (index1%_capatity); _ bm.Set (index2%_capatity) _ bm.Set (index3%_capatity); _ bm.Set (index4%_capatity); _ bm.Set (index5%_capatity);} bool Test (const T & key) {size_t index1 = HashFunc1 () (key) If (! _ bm.BitMapTest (index1%_capatity)) {return false;} size_t index2 = HashFunc2 () (key); if (! _ bm.BitMapTest (index2%_capatity)) {return false } size_t index3 = HashFunc3 () (key); if (! _ bm.BitMapTest (index3%_capatity)) {return false;} size_t index4 = HashFunc4 () (key) If (! _ bm.BitMapTest (index4%_capatity)) {return false;} size_t index5 = HashFunc5 () (key); if (! _ bm.BitMapTest (index5%_capatity)) {return false } return true;} protected: BitMap _ bm; size_t _ capatity;}

In test.cpp

# include using namespace std;#include "BoolmFilter.h" void BoolTest () {BoolmFilterbf (100); bf.Set ("she is girl"); bf.Set ("I am a good man"); bf.Set ("chive/2012/05/31/2528153.html"); cout

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report