**jm + hash-tables**
3

Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo)

Apparently this is binary multiplicative hashing, and Google's brotli, webp, and Snappy libs all use a constant derived heuristically from a compression test corpus along the same lines (see comments).

(Via Michael Fogleman)
algorithms
hashing
hash
fibonacci
golden-ratio
coding
hacks
brotli
webp
snappy
hash-tables
hashmaps
load-distribution

27 days ago by jm

Turns out I was wrong. This is a big one. And everyone should be using it. Hash tables should not be prime number sized and they should not use an integer modulo to map hashes into slots. Fibonacci hashing is just better. Yet somehow nobody is using it and lots of big hash tables (including all the big implementations of std::unordered_map) are much slower than they should be because they don’t use Fibonacci Hashing.

Apparently this is binary multiplicative hashing, and Google's brotli, webp, and Snappy libs all use a constant derived heuristically from a compression test corpus along the same lines (see comments).

(Via Michael Fogleman)

27 days ago by jm

Special encoding of small aggregate data types in Redis

november 2012 by jm

Nice performance trick in Redis on hash storage:

'In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like an hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than an hash table).'

memory
redis
performance
big-o
hash-tables
storage
coding
cache
arrays
'In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like an hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than an hash table).'

november 2012 by jm

experimental CPU-cache-aware hash table implementations in Cloudera's Impala

october 2012 by jm

via Todd Lipcon -- https://twitter.com/tlipcon/status/261113382642532352

'another cool piece of cloudera impala source: cpu-cache-aware hash table implementations by @jackowayed'. 'L1-sized hash table that hopes to use cache well. Each bucket is a chunk list of tuples. Each chunk is a cache line.'

hashing
hash-tables
data-structures
performance
c++
l1
cache
cpu
'another cool piece of cloudera impala source: cpu-cache-aware hash table implementations by @jackowayed'. 'L1-sized hash table that hopes to use cache well. Each bucket is a chunk list of tuples. Each chunk is a cache line.'

october 2012 by jm

Copy this bookmark: