**jm + cardinality**
11

HyperBitBit

march 2017 by jm

jomsdev notes:

'Last year, in the AofA’16 conference Robert Sedgewick proposed a new algorithm for cardinality estimation. Robert Sedgwick is a professor at Princeton with a long track of publications on combinatorial/randomized algorithms. He was a good friend of Philippe Flajolet (creator of Hyperloglog) and HyperBitBit it's based on the same ideas. However, it uses less memory than Hyperloglog and can provide the same results. On practical data, HyperBitBit, for N < 2^64 estimates cardinality within 10% using only 128 + 6 bits.'

algorithms
programming
cs
hyperloglog
estimation
cardinality
counting
hyperbitbit
'Last year, in the AofA’16 conference Robert Sedgewick proposed a new algorithm for cardinality estimation. Robert Sedgwick is a professor at Princeton with a long track of publications on combinatorial/randomized algorithms. He was a good friend of Philippe Flajolet (creator of Hyperloglog) and HyperBitBit it's based on the same ideas. However, it uses less memory than Hyperloglog and can provide the same results. On practical data, HyperBitBit, for N < 2^64 estimates cardinality within 10% using only 128 + 6 bits.'

march 2017 by jm

hyperlogsandwich

(via Patrick McFadin)
via:patrickmcfadin
hyperloglog
cardinality
data-structures
algorithms
hyperlogsandwich
counting
estimation
lossy
multisets

may 2015 by jm

A probabilistic data structure for frequency/k-occurrence cardinality estimation of multisets. Sample implementation

(via Patrick McFadin)

may 2015 by jm

Redis adds support for HyperLogLog

april 2014 by jm

good comment thread on HN, discussing hlld and bloomd as well

hll
bloom-filters
hyperloglog
redis
data-structures
estimation
cardinality
probabilistic
probability
hashing
random
april 2014 by jm

Recordinality

august 2013 by jm

a new, and interesting, sketching algorithm, with a Java implementation:

sketching
coding
algorithms
recordinality
cardinality
estimation
hll
hashing
murmurhash
java
Recordinality is unique in that it provides cardinality estimation like HLL, but also offers "distinct value sampling." This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of 'A' and one occurrence each of 'B' - 'Z', the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like "do the elements we've sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?"

august 2013 by jm

Sketch of the Day: K-Minimum Values

june 2013 by jm

Another sketching algorithm -- this one supports set union and intersection operations more easily than HyperLogLog when there are more than 2 sets

algorithms
coding
space-saving
cardinality
streams
stream-processing
estimation
sets
sketching
june 2013 by jm

hlld

(via:cscotta)
hyper-log-log
hlld
hll
data-structures
memcached
daemons
sketching
estimation
big-data
cardinality
algorithms
via:cscotta

june 2013 by jm

a high-performance C server which is used to expose HyperLogLog sets and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.

HyperLogLog's are a relatively new sketching data structure. They are used to estimate cardinality, i.e. the unique number of items in a set. They are based on the observation that any bit in a "good" hash function is indepedenent of any other bit and that the probability of getting a string of N bits all set to the same value is 1/(2^N). There is a lot more in the math, but that is the basic intuition. What is even more incredible is that the storage required to do the counting is log(log(N)). So with a 6 bit register, we can count well into the trillions. For more information, its best to read the papers referenced at the end. TL;DR: HyperLogLogs enable you to have a set with about 1.6% variance, using 3280 bytes, and estimate sizes in the trillions.

(via:cscotta)

june 2013 by jm

Approximate Heavy Hitters -The SpaceSaving Algorithm

may 2013 by jm

nice, readable intro to SpaceSaving (which I've linked to before) -- a simple stream-processing cardinality top-K estimation algorithm with bounded error.

algorithms
coding
space-saving
cardinality
streams
stream-processing
estimation
may 2013 by jm

HyperLogLog++: Google’s Take On Engineering HLL

february 2013 by jm

Google and AggregateKnowledge's improvements to the HyperLogLog cardinality estimation algorithm

hyperloglog
cardinality
estimation
streaming
stream-processing
cep
february 2013 by jm

clearspring / stream-lib

february 2013 by jm

ASL-licensed open source library of stream-processing/approximation algorithms: count-min sketch, space-saving top-k, cardinality estimation, LogLog, HyperLogLog, MurmurHash, lookup3 hash, Bloom filters, q-digest, stochastic top-k

algorithms
coding
streams
cep
stream-processing
approximation
probabilistic
space-saving
top-k
cardinality
estimation
bloom-filters
q-digest
loglog
hyperloglog
murmurhash
lookup3
february 2013 by jm

aaw/hyperloglog-redis - GitHub

january 2013 by jm

'This gem is a pure Ruby implementation of the HyperLogLog algorithm for estimating cardinalities of sets observed via a stream of events. A Redis instance is used for storing the counters.'

cardinality
sets
redis
algorithms
ruby
gems
hyperloglog
january 2013 by jm

**related tags**

Copy this bookmark: