jm + load   9

Linux Load Averages: Solving the Mystery
Nice bit of OS archaeology by Brendan Gregg.
In 1993, a Linux engineer found a nonintuitive case with load averages, and with a three-line patch changed them forever from "CPU load averages" to what one might call "system load averages." His change included tasks in the uninterruptible state, so that load averages reflected demand for disk resources and not just CPUs. These system load averages count the number of threads working and waiting to work, and are summarized as a triplet of exponentially-damped moving sum averages that use 1, 5, and 15 minutes as constants in an equation. This triplet of numbers lets you see if load is increasing or decreasing, and their greatest value may be for relative comparisons with themselves.
load  monitoring  linux  unix  performance  ops  brendan-gregg  history  cpu 
4 weeks ago by jm
consistent hashing with bounded loads
'an algorithm that combined consistent hashing with an upper limit on any one server’s load, relative to the average load of the whole pool.'

Lovely blog post from Vimeo's eng blog on a new variation on consistent hashing -- incorporating a concept of overload-avoidance -- and adding it to HAProxy and using it in production in Vimeo. All sounds pretty nifty! (via Toby DiPasquale)
via:codeslinger  algorithms  networking  performance  haproxy  consistent-hashing  load-balancing  lbs  vimeo  overload  load 
5 weeks ago by jm
Gil Tene on benchmarking
'I would strongly encourage you to avoid repeating the mistakes of testing methodologies that focus entirely on max achievable throughput and then report some (usually bogus) latency stats at those max throughout modes. The techempower numbers are a classic example of this in play, and while they do provide some basis for comparing a small aspect of behavior (what I call the "how fast can this thing drive off a cliff" comparison, or "pedal to the metal" testing), those results are not very useful for comparing load carrying capacities for anything that actually needs to maintain some form of responsiveness SLA or latency spectrum requirements.'

Some excellent advice here on how to measure and represent stack performance.

Also: 'DON'T use or report standard deviation for latency. Ever. Except if you mean it as a joke.'
performance  benchmarking  testing  speed  gil-tene  latency  measurement  hdrhistogram  load-testing  load 
april 2016 by jm
Brownout: building more robust cloud applications
Applications can saturate – i.e. become unable to serve users in a timely manner. Some users may experience high latencies, while others may not receive any service at all. The authors argue that it is better to downgrade the user experience and continue serving a larger number of clients with reasonable latency.

"We define a cloud application as brownout compliant if it can gradually downgrade user experience to avoid saturation."

This is actually very reminiscent of circuit breakers, as described in Nygard’s ‘Release It!’ and popularized by Netflix. If you’re already designing with circuit breakers, you’ve probably got all the pieces you need to add brownout support to your application relatively easily.

"Our work borrows from the concept of brownout in electrical grids. Brownouts are an intentional voltage drop often used to prevent blackouts through load reduction in case of emergency. In such a situation, incandescent light bulbs dim, hence originating the term."
"To lower the maintenance effort, brownouts should be automatically triggered. This enables cloud applications to rapidly and robustly avoid saturation due to unexpected environmental changes, lowering the burden on human operators."


This is really similar to the Circuit Breaker pattern -- in fact it feels to me like a variation on that, driven by measured latencies of operations/requests.

See also http://blog.acolyer.org/2014/10/27/improving-cloud-service-resilience-using-brownout-aware-load-balancing/ .
circuit-breaker  patterns  brownout  robustness  reliability  load  latencies  degradation 
october 2014 by jm
New Low Cost EC2 Instances with Burstable Performance
Oh, very neat. New micro, small, and medium-class instances with burstable CPU scaling:
The T2 instances are built around a processing allocation model that provides you a generous, assured baseline amount of processing power coupled with the ability to automatically and transparently scale up to a full core when you need more compute power. Your ability to burst is based on the concept of "CPU Credits" that you accumulate during quiet periods and spend when things get busy. You can provision an instance of modest size and cost and still have more than adequate compute power in reserve to handle peak demands for compute power.
ec2  aws  hosting  cpu  scaling  burst  load  instances 
july 2014 by jm
"H" in cron syntax
This is something Jenkins have come up to randomize and distribute load, in order to avoid the "thundering-herd" bug. Good call
jenkins  randomization  load-balancing  load  thundering-herd  ops  capacity  sleep 
april 2014 by jm
Scaling lessons learned at Dropbox
website-scaling tips and suggestions, "particularly for a resource-constrained, fast-growing environment that can’t always afford to do things “the right way” (i.e., any real-world engineering project". I really like the "run with fake load" trick; add additional queries/load which you can quickly turn off if the service starts browning out, giving you a few days breathing room to find a real fix before customers start being affected. Neat
dropbox  scalability  webdev  load  scaling-up 
july 2012 by jm
BufferBloat: What's Wrong with the Internet? - ACM Queue
'A discussion with Vint Cerf, Van Jacobson, Nick Weaver, and Jim Gettys' -- the big guns! Great discussion (via Tony Finch)
via:fanf  bufferbloat  networking  buffers  buffering  performance  load  tcp  ip 
december 2011 by jm
Deployment is just a part of dev/ops cooperation, not the whole thing
metrics, monitoring, instrumentation, fault tolerance, load mitigation called out as other factors by Allspaw
ops  deployment  operations  engineering  metrics  devops  monitoring  fault-tolerance  load  from delicious
december 2009 by jm

Copy this bookmark:



description:


tags: