RSS FeedReview of Bloom Filters paper
My review of the SPAA 11 paper “Understanding Bloom Filter Intersection for Lazy Address-Set Disambiguation” by Mark Jeffrey and Gregory Steffan has been published and is available here (requires access to Computing Reviews).
Here is the list of other review activities that I am usually involved with.
Bloom Filters, Random and Hash Functions
Today, I tested Bloom Filters on a database with 50,000 records. Some preliminaries: Considering a false positive rate of 1%, good bitset size is about 500,000 and number of hash functions is less than 7. You can of course confirm all this here:
Given an expected false positive rate of 1%, here is the number of bits in the bit array:
- 50000 * ln(0.01) / ln(2)^2
And then, given that, the number of hash functions is 0.7 (m/n), which becomes about 7.
This of course is all old news (since 1970), but a more recent paper by Almeida et al titled “Scalable Bloom Filters” provides some insights into choices of parameters to control the error probability. Another element to consider is if a single hash function can be used to replace multiple hash functions by separating its multiple parts of output. That part requires some more simulation on my end – still working on it.
Apps