jm + aphyr + performance   2

Good advice on running large-scale database stress tests
I've been bitten by poor key distribution in tests in the past, so this is spot on: 'I'd run it with Zipfian, Pareto, and Dirac delta distributions, and I'd choose read-modify-write transactions.'

And of course, a dataset bigger than all combined RAM.

Also: -- the "Biebermark", where just a single row out of the entire db is contended on in a read/modify/write transaction: "the inspiration for this is maintaining counts for [highly contended] popular entities like Justin Bieber and One Direction."
biebermark  benchmarks  testing  performance  stress-tests  databases  storage  mongodb  innodb  foundationdb  aphyr  measurement  distributions  keys  zipfian 
december 2014 by jm
Profiling Hadoop jobs with Riemann
I’ve built a very simple distributed profiler for soft-real-time telemetry from hundreds to thousands of JVMs concurrently. It’s nowhere near as comprehensive in its analysis as, say, Yourkit, but it can tell you, across a distributed system, which functions are taking the most time, and what their dominant callers are.

Potentially useful.
riemann  profiling  aphyr  hadoop  emr  performance  monitoring 
august 2014 by jm

Copy this bookmark: