jm + modelling   3

Locking, Little's Law, and the USL
Excellent explanatory mailing list post by Martin Thompson to the mechanical-sympathy group, discussing Little's Law vs the USL:
Little's law can be used to describe a system in steady state from a queuing perspective, i.e. arrival and leaving rates are balanced. In this case it is a crude way of modelling a system with a contention percentage of 100% under Amdahl's law, in that throughput is one over latency.

However this is an inaccurate way to model a system with locks. Amdahl's law does not account for coherence costs. For example, if you wrote a microbenchmark with a single thread to measure the lock cost then it is much lower than in a multi-threaded environment where cache coherence, other OS costs such as scheduling, and lock implementations need to be considered.

Universal Scalability Law (USL) accounts for both the contention and the coherence costs.

When modelling locks it is necessary to consider how contention and coherence costs vary given how they can be implemented. Consider in Java how we have biased locking, thin locks, fat locks, inflation, and revoking biases which can cause safe points that bring all threads in the JVM to a stop with a significant coherence component.
usl  scaling  scalability  performance  locking  locks  java  jvm  amdahls-law  littles-law  system-dynamics  modelling  systems  caching  threads  schedulers  contention 
4 weeks ago by jm
Model checking for highly concurrent code
Applied formal methods in order to test distributed systems -- specifically GlusterFS:

I'll use an example from my own recent experience. I'm developing a new kind of replication for GlusterFS. To make sure the protocol behaves correctly even across multiple failures, I developed a Murphi model for it. [...]

I added a third failure [to the simulated model]. I didn't expect a three-node system to continue working if more than one of those were concurrent (the model allows the failures to be any mix of sequential and concurrent), but I expected it to fail cleanly without reaching an invalid state. Surprise! It managed to produce a case where a reader can observe values that go back in time. This might not make much sense without knowing the protocol involved, but it might give some idea of the crazy conditions a model checker will find that you couldn't possibly have considered. [...]

So now I have a bug to fix, and that's a good thing. Clearly, it involves a very specific set of ill-timed reads, writes, and failures. Could I have found it by inspection or ad-hoc analysis? Hell, no. Could I have found it by testing on live systems? Maybe, eventually, but it probably would have taken months for this particular combination to occur on its own. Forcing it to occur would require a lot of extra code, plus an exerciser that would amount to a model checker running 100x slower across machines than Murphi does. With enough real deployments over enough time it would have happened, but the only feasible way to prevent that was with model checking. These are exactly the kinds of bugs that are hardest to fix in the field, and that make users distrust distributed systems, so those of us who build such systems should use every tool at our disposal to avoid them.
model-checking  formal-methods  modelling  murphi  distcomp  distributed-systems  glusterfs  testing  protocols 
september 2013 by jm
2-day volcanic ash forecasts over Europe
using SNAP, the Norwegian Meterological Institute's nuclear-fallout modelling system. Very useful, much niftier than the UK version, and much more *publicly available* than the Irish version
europe  volcano  norway  weather  ash  snap  modelling  forecasts  from delicious
may 2010 by jm

Copy this bookmark: