jm + via:codeslinger   3

consistent hashing with bounded loads
'an algorithm that combined consistent hashing with an upper limit on any one server’s load, relative to the average load of the whole pool.'

Lovely blog post from Vimeo's eng blog on a new variation on consistent hashing -- incorporating a concept of overload-avoidance -- and adding it to HAProxy and using it in production in Vimeo. All sounds pretty nifty! (via Toby DiPasquale)
via:codeslinger  algorithms  networking  performance  haproxy  consistent-hashing  load-balancing  lbs  vimeo  overload  load 
5 weeks ago by jm
The Aggregate Magic Algorithms
Obscure, low-level bit-twiddling tricks -- specifically:
Absolute Value of a Float, Alignment of Pointers, Average of Integers, Bit Reversal, Comparison of Float Values, Comparison to Mask Conversion, Divide Rounding, Dual-Linked List with One Pointer Field, GPU Any, GPU SyncBlocks, Gray Code Conversion, Integer Constant Multiply, Integer Minimum or Maximum, Integer Power, Integer Selection, Is Power of 2, Leading Zero Count, Least Significant 1 Bit, Log2 of an Integer, Next Largest Power of 2, Most Significant 1 Bit, Natural Data Type Precision Conversions, Polynomials, Population Count (Ones Count), Shift-and-Add Optimization, Sign Extension, Swap Values Without a Temporary, SIMD Within A Register (SWAR) Operations, Trailing Zero Count.

Many of these would be insane to use in anything other than the hottest of hot-spots, but good to have on file. (via Toby diPasquale)
hot-spots  optimisation  bit-twiddling  algorithms  via:codeslinger  snippets 
december 2012 by jm
Practical machine learning tricks from the KDD 2011 best industry paper
Wow, this is a fantastic paper. It's a Google paper on detecting scam/spam ads using machine learning -- but not just that, it's how to build out such a classifier to production scale, and make it operationally resilient, and, indeed, operable.

I've come across a few of these ideas before, and I'm happy to say I might have reinvented a few (particularly around the feature space), but all of them together make extremely good sense. If I wind up working on large-scale classification again, this is the first paper I'll go back to. Great info! (via Toby diPasquale.)
classification  via:codeslinger  training  machine-learning  google  ops  kdd  best-practices  anti-spam  classifiers  ensemble  map-reduce 
july 2012 by jm

Copy this bookmark: