jm + failsafe   2

Ironies of automation
Wow, this is a great paper recommendation from Adrian Colyer - 'Ironies of automation', Bainbridge, Automatica, Vol. 19, No. 6, 1983.
In an automated system, two roles are left to humans: monitoring that the automated system is operating correctly, and taking over control if it isn’t. An operator that doesn’t routinely operate the system will have atrophied skills if ever called on to take over.

Unfortunately, physical skills deteriorate when they are not used, particularly the refinements of gain and timing. This means that a formerly experienced operator who has been monitoring an automated process may now be an inexeperienced one.

Not only are the operator’s skills declining, but the situations when the operator will be called upon are by their very nature the most demanding ones where something is deemed to be going wrong. Thus what we really need in such a situation is a more, not a lesser skilled operator! To generate successful strategies for unusual situtations, an operator also needs good understanding of the process under control, and the current state of the system. The former understanding develops most effectively through use and feedback (which the operator may no longer be getting the regular opportunity for), the latter takes some time to assimilate.


(via John Allspaw)
via:allspaw  automation  software  reliability  debugging  ops  design  failsafe  failure  human-interfaces  ui  ux  outages 
17 days ago by jm
Fault Tolerance in a High Volume, Distributed System
Netflix's "DependencyCommand", a resiliency system for SOA inter-service network calls, offering builtin support for threadpools, timeouts, retries and graceful failover. Very nice
netflix  architecture  concurrency  distributed  failover  ha  resiliency  fail-fast  failsafe  soa  fault-tolerance 
march 2012 by jm

Copy this bookmark:



description:


tags: