jm + counting   6

jomsdev notes:

'Last year, in the AofA’16 conference Robert Sedgewick proposed a new algorithm for cardinality estimation. Robert Sedgwick is a professor at Princeton with a long track of publications on combinatorial/randomized algorithms. He was a good friend of Philippe Flajolet (creator of Hyperloglog) and HyperBitBit it's based on the same ideas. However, it uses less memory than Hyperloglog and can provide the same results. On practical data, HyperBitBit, for N < 2^64 estimates cardinality within 10% using only 128 + 6 bits.'
algorithms  programming  cs  hyperloglog  estimation  cardinality  counting  hyperbitbit 
9 weeks ago by jm
Counting with domain specific databases — The Smyte Blog — Medium
whoa, pretty heavily engineered scalable counting system with Kafka, RocksDB and Kubernetes
kafka  rocksdb  kubernetes  counting  databases  storage  ops 
april 2016 by jm
A probabilistic data structure for frequency/k-occurrence cardinality estimation of multisets. Sample implementation

(via Patrick McFadin)
via:patrickmcfadin  hyperloglog  cardinality  data-structures  algorithms  hyperlogsandwich  counting  estimation  lossy  multisets 
may 2015 by jm
Druid | How We Scaled HyperLogLog: Three Real-World Optimizations
3 optimizations have made to the HLL algorithm to scale it up for production use in Metamarkets: compacting registers (fixes a bug with unions of multiple HLLs); a sparse storage format (to optimize space); faster lookups using a lookup table.  metamarkets  scaling  hyperloglog  hll  algorithms  performance  optimization  counting  estimation 
april 2014 by jm
_An Improved Construction For Counting Bloom Filters_
'A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally'
bloom-filter  data-structures  algorithms  counting  cbf  storage  false-positives  d-left-hashing  hashing 
september 2013 by jm

Copy this bookmark: