Analysis of Bloom filter 10/17 Update SLTechnology News&Howtos

Analysis of Bloom filter

2025-10-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/03 Report--

布隆过滤器（Bloom Filter）是由布隆（Burton Howard Bloom）在1970年提出的。它实际上是由一个很长的二进制向量和一系列随机映射函数组成，布隆过滤器可以用于检索一个元素是否在一个集合中。它的优点是空间效率和查询时间都远远超过一般的算法，缺点是有一定的误识别率（假正例False positives，即Bloom Filter报告某一元素存在于某集合中，但是实际上该元素并不在集合中）和删除困难，但是没有识别错误的情形（即假反例False negatives，如果某个元素确实没有在该集合中，那么Bloom Filter 是不会报告该元素存在于集合中的，所以不会漏报）。

即布隆：存在，不准确（哈希冲突）不存在：准确

改进：映射位越多，占空间越多，误判率越低。

可使用计数达到删除功能

布隆底层：使用位图。

原理

如果想判断一个元素是不是在一个集合里，一般想到的是将集合中所有元素保存起来，然后通过比较确定。链表、树、散列表（又叫哈希表，Hash table）等等数据结构都是这种思路。但是随着集合中元素的增加，我们需要的存储空间越来越大。同时检索速度也越来越慢。

Bloom Filter 是一种空间效率很高的随机数据结构，Bloom filter 可以看做是对 bit-map 的扩展, 它的原理是：

当一个元素被加入集合时，通过 K 个 Hash 函数将这个元素映射成一个位阵列（Bit array）中的 K 个点，把它们置为 1。检索时，我们只要看看这些点是不是都是 1 就（大约）知道集合中有没有它了：

如果这些点有任何一个 0，则被检索元素一定不在;

如果都是 1，则被检索元素很可能在

优点

它的优点是空间效率和查询时间都远远超过一般的算法，布隆过滤器存储空间和插入 / 查询时间都是常数O(k)。另外, 散列函数相互之间没有关系，方便由硬件并行实现。布隆过滤器不需要存储元素本身，在某些对保密要求非常严格的场合有优势。

缺点

但是布隆过滤器的缺点和优点一样明显。误算率是其中之一。随着存入的元素数量增加，误算率随之增加。但是如果元素数量太少，则使用散列表足矣。

(误判补救方法是：再建立一个小的白名单，存储那些可能被误判的信息。)

另外，一般情况下不能从布隆过滤器中删除元素. 我们很容易想到把位数组变成整数数组，每插入一个元素相应的计数器加 1, 这样删除元素时将计数器减掉就可以了。然而要保证安全地删除元素并非如此简单。首先我们必须保证删除的元素的确在布隆过滤器里面. 这一点单凭这个过滤器是无法保证的。另外计数器回绕也会造成问题。

模拟实现如下：

#pragma once#include#includeusing namespace std;class BitMap//将数据存储在对应的位，用位来存储数据{public: BitMap(size_t len) { int size = len >> 5; if (len % 32) _array.resize(size + 1); else _array.resize(size); } BitMap(size_t minLen, size_t maxLen)//如果用这种，求下标时(num-minLen)/32 { int size = (maxLen - minLen + 1) >> 5; if ((maxLen - minLen + 1) % 32) _array.resize(size + 1); else _array.resize(size); } void Set(size_t num) { size_t index = num >> 5; size_t count = num % 32; _array[index] |= (1 > 5; size_t count = num % 32; _array[index] &= (!(1 > 5; size_t count = num % 32; return _array[index] & (1 3)); } else { hash ^= (~((hash > 5))); } } return hash; }public: size_t operator()(string key) { return APHash(key.c_str()); }};class HashFunc5{ size_t JSHash(const char* str) { if (!*str) // 这是由本人添加，以保证空字符串返回哈希值0 return 0; register size_t hash = 1315423911; while (size_t ch = (size_t)*str++) { hash ^= ((hash > 2)); } return hash; }public: size_t operator()(string key) { return JSHash(key.c_str()); }};templateclass BloomFilter{public: BloomFilter(size_t cap = 100) :_bitmap(cap) , _capacity(cap) {} void Set(const K& key) { size_t index1 = Func1()(key); _bitmap.Set(index1%_capacity); size_t index2 = Func2()(key); _bitmap.Set(index2%_capacity); size_t index3 = Func3()(key); _bitmap.Set(index3%_capacity); size_t index4 = Func4()(key); _bitmap.Set(index4%_capacity); size_t index5 = Func5()(key); _bitmap.Set(index5%_capacity); cout

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.