jm + distributions   3

Good advice on running large-scale database stress tests
I've been bitten by poor key distribution in tests in the past, so this is spot on: 'I'd run it with Zipfian, Pareto, and Dirac delta distributions, and I'd choose read-modify-write transactions.'

And of course, a dataset bigger than all combined RAM.

Also: -- the "Biebermark", where just a single row out of the entire db is contended on in a read/modify/write transaction: "the inspiration for this is maintaining counts for [highly contended] popular entities like Justin Bieber and One Direction."
biebermark  benchmarks  testing  performance  stress-tests  databases  storage  mongodb  innodb  foundationdb  aphyr  measurement  distributions  keys  zipfian 
december 2014 by jm
Nassim Taleb: retire Standard Deviation
Use the mean absolute deviation [...] it corresponds to "real life" much better than the first—and to reality. In fact, whenever people make decisions after being supplied with the standard deviation number, they act as if it were the expected mean deviation.'

Graydon Hoare in turn recommends the median absolute deviation. I prefer percentiles, anyway ;)
statistics  standard-deviation  stddev  maths  nassim-taleb  deviation  volatility  rmse  distributions 
january 2014 by jm
Fat Tails
Nice d3.js demo of the fat-tailed distribution:
A fat-tailed distribution looks normal but the parts far away from the average are thicker, meaning a higher chance of huge deviations. [...] Fat tails don't mean more variance; just different variance. For a given variance, a higher chance of extreme deviations implies a lower chance of medium ones.
dataviz  via:hn  statistics  visualization  distributions  fat-tailed  kurtosis  d3.js  javascript  variance  deviation 
july 2013 by jm

Copy this bookmark: