**jm + bloom-filters**
19

A general purpose counting filter

cqf
counting-quotient-filters
data-structures
via:acolyer
coding
approximate
bloom-filters

august 2017 by jm

This paper introduces a new AMQ data structure, a Counting Quotient Filter, which addresses all of these shortcomings and performs extremely well in both time and space: CQF performs in-memory inserts and queries up to an order of magnitude faster than the original quotient filter structure from which it takes its inspiration, several times faster than a Bloom filter, and similarly to a cuckoo filter. The CQF structure is comparable or more space efficient than all of them too. Moreover, CQF does all of this while supporting counting, outperforming all of the other forms in both dimensions even though they do not. In short, CQF is a big deal!

august 2017 by jm

Fast Forward Labs: Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters

november 2016 by jm

Nice comparison of a counting Bloom filter and a Cuckoo Filter, implemented in Python:

algorithms
probabilistic
approximation
bloom-filters
cuckoo-filters
sets
estimation
python
This post provides an update by exploring Cuckoo filters, a new probabilistic data structure that improves upon the standard Bloom filter. The Cuckoo filter provides a few advantages: 1) it enables dynamic deletion and addition of items 2) it can be easily implemented compared to Bloom filter variants with similar capabilities, and 3) for similar space constraints, the Cuckoo filter provides lower false positives, particularly at lower capacities. We provide a python implementation of the Cuckoo filter here, and compare it to a counting Bloom filter (a Bloom filter variant).

november 2016 by jm

"Cuckoo Filter: Practically Better Than Bloom"

march 2015 by jm

'We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set membership

tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than

Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have

lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters outperform previous data structures that extend Bloom filters to support deletions substantially in both time and space.'

algorithms
paper
bloom-filters
cuckoo-filters
cuckoo-hashing
data-structures
false-positives
big-data
probabilistic
hashing
set-membership
approximation
tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than

Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have

lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters outperform previous data structures that extend Bloom filters to support deletions substantially in both time and space.'

march 2015 by jm

Cuckoo Filters

october 2014 by jm

'In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend “standard” Bloom filters to support deletion all degrade either space or performance. We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set member- ship tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters out-perform previous data structures that extend Bloom filters to support deletions substantially in both time and space.'

algorithms
cs
coding
cuckoo-filters
bloom-filters
sets
data-structures
october 2014 by jm

"Invertible Bloom Lookup Tables" [paper]

september 2014 by jm

'We present a version of the Bloom filter data structure that supports not only the insertion, deletion, and lookup of key-value pairs, but also allows a complete listing of the pairs it contains with high probability, as long the number of key- value pairs is below a designed threshold. Our structure allows the number of key-value pairs to greatly exceed this threshold during normal operation. Exceeding the threshold simply temporarily prevents content listing and reduces the probability of a successful lookup. If entries are later deleted to return the structure below the threshold, everything again functions appropriately. We also show that simple variations of our structure are robust to certain standard errors, such as the deletion of a key without a corresponding insertion or the insertion of two distinct values for a key. The properties of our structure make it suitable for several applications, including database and networking applications that we highlight.'

iblt
bloom-filters
data-structures
performance
algorithms
coding
papers
probabilistic
september 2014 by jm

3 Rules of thumb for Bloom Filters

But see also http://stackoverflow.com/a/9554448 , http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf (thanks Tony Finch!)
bloom-filters
algorithm
probabilistic
rules
reasoning
via:norman-maurer
false-positives
hashing
coding

august 2014 by jm

I often need to do rough back-of-the-envelope reasoning about things, and I find that doing a bit of work to develop an intuition for how a new technique performs is usually worthwhile. So, here are three broad rules of thumb to remember when discussing Bloom filters down the pub:

One byte per item in the input set gives about a 2% false positive rate.

The optimal number of hash functions is about 0.7 times the number of bits per item.

3 - The number of hashes dominates performance.

But see also http://stackoverflow.com/a/9554448 , http://www.eecs.harvard.edu/~kirsch/pubs/bbbf/esa06.pdf (thanks Tony Finch!)

august 2014 by jm

Redis adds support for HyperLogLog

april 2014 by jm

good comment thread on HN, discussing hlld and bloomd as well

hll
bloom-filters
hyperloglog
redis
data-structures
estimation
cardinality
probabilistic
probability
hashing
random
april 2014 by jm

bloomd

via:fanf
memcached
bloomd
open-source
bloom-filters

march 2013 by jm

a high-performance C server which is used to expose bloom filters and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.(via Tony Finch)

march 2013 by jm

clearspring / stream-lib

february 2013 by jm

ASL-licensed open source library of stream-processing/approximation algorithms: count-min sketch, space-saving top-k, cardinality estimation, LogLog, HyperLogLog, MurmurHash, lookup3 hash, Bloom filters, q-digest, stochastic top-k

algorithms
coding
streams
cep
stream-processing
approximation
probabilistic
space-saving
top-k
cardinality
estimation
bloom-filters
q-digest
loglog
hyperloglog
murmurhash
lookup3
february 2013 by jm

Google Guava BloomFIlter

march 2012 by jm

neat, Guava now has a builtin Bloom filter implementation using the murmur hash. that'll potentially save a little hassle in the future

guava
coding
java
bloom-filters
data-structures
sets
march 2012 by jm

Golomb-coded sets

september 2011 by jm

'a probabilistic data structure conceptually similar to a Bloom filter, but with a more compact in-memory representation, and a slower query time.' could come in handy

gcs
bloom-filters
probabilistic
data-structures
memory
algorithms
september 2011 by jm

Lucene Utilities and Bloom Filters - Greplin:tech

april 2011 by jm

'Storing 50,000 2.5KB items in a traditional hash set requires over 125MB, but if you're willing to accept a 1-in-10,000 false positive rate on lookups, [this] bloom filter requires under 500KB' - interesting variation on the basic concept. Java, Apache-licensed

search
bloom-filters
greplin
open-source
apache
false-positives
from delicious
april 2011 by jm

Tony Finch - Some notes on Bloom filters

november 2010 by jm

more good Bloom Filter tips. he says: 'I take a slightly different tack, starting with a target population in mind which determines the size of the filter. Also there's a minor error regarding performance in the corte.si post. You only need to calculate two hash functions, and use a linear combination of them to index the Bloom filter. This simplifies the coding a lot, and if hash calculation dominates filter indexing, it's also a lot faster.'

bloom-filters
tips
coding
via:fanf
false-positives
from delicious
november 2010 by jm

3 Rules of thumb for Bloom Filters

november 2010 by jm

good to know (via Jeremy)

via:jzawodny
bloom-filters
hashing
algorithms
coding
tips
false-positives
from delicious
november 2010 by jm

/~colmmacc/ » Prime and Proper

september 2010 by jm

algorithm to perform set membership tests on enumerated sets quickly and memory-efficiently, using multiplication by primes. Nice trick

hacks
colmmacc
prime-numbers
set-membership
bloom-filters
bignums
algorithms
programming
from delicious
september 2010 by jm

_Fast Cache for Your Text: Accelerating Exact Pattern Matching with Feed-Forward Bloom Filters_ [PDF]

september 2010 by jm

intriguing application of a Bloom Filter optimised for modern CPUs (2-level, with a cache-partitioned first level), providing massive speedups vs GNU grep or trie-based approaches like Aho-Corasick -- or possibly re2c, as used in "sa-compile". On the other hand, a perl implementation of Rabin-Karp, which is similar, didn't perform as well. Still, may be worth investigating

bloom-filters
grep
filtering
spamassassin
sa-compile
text-matching
caches
aho-corasick
from delicious
september 2010 by jm

**related tags**

Copy this bookmark: