jm + sets   8

Fast Forward Labs: Probabilistic Data Structure Showdown: Cuckoo Filters vs. Bloom Filters
Nice comparison of a counting Bloom filter and a Cuckoo Filter, implemented in Python:
This post provides an update by exploring Cuckoo filters, a new probabilistic data structure that improves upon the standard Bloom filter. The Cuckoo filter provides a few advantages: 1) it enables dynamic deletion and addition of items 2) it can be easily implemented compared to Bloom filter variants with similar capabilities, and 3) for similar space constraints, the Cuckoo filter provides lower false positives, particularly at lower capacities. We provide a python implementation of the Cuckoo filter here, and compare it to a counting Bloom filter (a Bloom filter variant).
algorithms  probabilistic  approximation  bloom-filters  cuckoo-filters  sets  estimation  python 
november 2016 by jm
Roaring Bitmaps
Bitsets, also called bitmaps, are commonly used as fast data structures. Unfortunately, they can use too much memory. To compensate, we often use compressed bitmaps. Roaring bitmaps are compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, they can be hundreds of times faster and they often offer significantly better compression.

Roaring bitmaps are used in Apache Lucene (as of version 5.0 using an independent implementation) and Apache Spark (as of version 1.2).
bitmaps  bitsets  sets  data-structures  bits  compression  lucene  spark  daniel-lemire  algorithms 
november 2014 by jm
Cuckoo Filters
'In many networking systems, Bloom filters are used for high-speed set membership tests. They permit a small fraction of false positive answers with very good space efficiency. However, they do not permit deletion of items from the set, and previous attempts to extend “standard” Bloom filters to support deletion all degrade either space or performance. We propose a new data structure called the cuckoo filter that can replace Bloom filters for approximate set member- ship tests. Cuckoo filters support adding and removing items dynamically while achieving even higher performance than Bloom filters. For applications that store many items and target moderately low false positive rates, cuckoo filters have lower space overhead than space-optimized Bloom filters. Our experimental results also show that cuckoo filters out-perform previous data structures that extend Bloom filters to support deletions substantially in both time and space.'
algorithms  cs  coding  cuckoo-filters  bloom-filters  sets  data-structures 
october 2014 by jm
HyperLogLog - Intersection Arithmetic
'In general HLL intersection in StreamLib works.  |A INTERSECT B|
= |A| + |B| - |A UNION B|.  Timon's article on intersection is
important to read though.  The usefulness of HLL intersection depends
on the features of the HLLs you are intersecting.'
hyperloglog  hll  hyperloglogplus  streamlib  intersections  sets  estimation  algorithms 
april 2014 by jm
Sketch of the Day: K-Minimum Values
Another sketching algorithm -- this one supports set union and intersection operations more easily than HyperLogLog when there are more than 2 sets
algorithms  coding  space-saving  cardinality  streams  stream-processing  estimation  sets  sketching 
june 2013 by jm
Good UI for exploration of HyperLogLog set intersections and unions.
One of the first things that we wanted to do with HyperLogLog when we first started playing with it was to support and expose it natively in the browser. The thought of allowing users to directly interact with these structures -- perform arbitrary unions and intersections on effectively unbounded sets all on the client -- was exhilarating to us. [...] we are pleased to announce the open-source release of AK’s HyperLogLog implementation for JavaScript, js-hll. We are releasing this code under the Apache License, Version 2.0.

We knew that we couldn’t just release a bunch of JavaScript code without allowing you to see it in action — that would be a crime. We passed a few ideas around and the one that kept bubbling to the top was a way to kill two birds with one stone. We wanted something that would showcase what you can do with HLL in the browser and give us a tool for explaining HLLs. It is typical for us to explain how HLL intersections work using a Venn diagram. You draw some overlapping circles with a border that represents the error and you talk about how if that border is close to or larger than the intersection then you can’t say much about the size of that intersection. This works just ok on a whiteboard but what you really want is to just build a visualization that allows you to select from some sets and see the overlap. Maybe even play with the precision a little bit to see how that changes the result. Well, we did just that!
javascript  ui  hll  hyperloglog  algorithms  sketching  js  sets  intersection  union  apache  open-source 
june 2013 by jm
aaw/hyperloglog-redis - GitHub
'This gem is a pure Ruby implementation of the HyperLogLog algorithm for estimating cardinalities of sets observed via a stream of events. A Redis instance is used for storing the counters.'
cardinality  sets  redis  algorithms  ruby  gems  hyperloglog 
january 2013 by jm
Google Guava BloomFIlter
neat, Guava now has a builtin Bloom filter implementation using the murmur hash. that'll potentially save a little hassle in the future
guava  coding  java  bloom-filters  data-structures  sets 
march 2012 by jm

Copy this bookmark: