jm + hash-tables   3

Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo)
Turns out I was wrong. This is a big one. And everyone should be using it. Hash tables should not be prime number sized and they should not use an integer modulo to map hashes into slots. Fibonacci hashing is just better. Yet somehow nobody is using it and lots of big hash tables (including all the big implementations of std::unordered_map) are much slower than they should be because they don’t use Fibonacci Hashing.

Apparently this is binary multiplicative hashing, and Google's brotli, webp, and Snappy libs all use a constant derived heuristically from a compression test corpus along the same lines (see comments).

(Via Michael Fogleman)
algorithms  hashing  hash  fibonacci  golden-ratio  coding  hacks  brotli  webp  snappy  hash-tables  hashmaps  load-distribution
june 2018 by jm
Special encoding of small aggregate data types in Redis
Nice performance trick in Redis on hash storage:

'In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like an hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than an hash table).'
memory  redis  performance  big-o  hash-tables  storage  coding  cache  arrays
november 2012 by jm
experimental CPU-cache-aware hash table implementations in Cloudera's Impala