jm + formal-methods   5

Combining static model checking with dynamic enforcement using the Statecall Policy Language
This looks quite nice -- a model-checker "for regular programmers". Example model for ping(1):

<pre>01 automaton ping (int max_count, int count, bool can_timeout) {
02 Initialize;
03 during {
04 count = 0;
05 do {
06 Transmit_Ping;
07 either {
08 Receive_Ping;
09 } or (can_timeout) {
10 Timeout_Ping;
11 };
12 count = count + 1;
13 } until (count &gt;= max_count);
14 } handle {
16 Print_Summary;
17 };</pre>
ping  model-checking  models  formal-methods  verification  static  dynamic  coding  debugging  testing  distcomp  papers 
march 2015 by jm
Proving that Android’s, Java’s and Python’s sorting algorithm is broken (and showing how to fix it)
Wow, this is excellent work. A formal verification of Tim Peters' TimSort failed, resulting in a bugfix:
While attempting to verify TimSort, we failed to establish its instance invariant. Analysing the reason, we discovered a bug in TimSort’s implementation leading to an ArrayOutOfBoundsException for certain inputs. We suggested a proper fix for the culprit method (without losing measurable performance) and we have formally proven that the fix actually is correct and that this bug no longer persists.
timsort  algorithms  android  java  python  sorting  formal-methods  proofs  openjdk 
february 2015 by jm
Use of Formal Methods at Amazon Web Services
Chris Newcombe, Marc Brooker, et al. writing about their experience using formal specification and model-checking languages (TLA+) in production in AWS:

The success with DynamoDB gave us enough evidence to present TLA+ to the broader engineering community at Amazon. This raised a challenge; how to convey the purpose and benefits of formal methods to an audience of software engineers? Engineers think in terms of debugging rather than ‘verification’, so we called the presentation “Debugging Designs”.

Continuing that metaphor, we have found that software engineers more readily grasp the concept and practical value of TLA+ if we dub it 'Exhaustively-testable pseudo-code'.

We initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the widespread view that formal methods are impractical. We also initially avoid mentioning what the acronym ‘TLA’ stands for, as doing so would give an incorrect impression of complexity.

More slides at ; proggit discussion at
formal-methods  model-checking  tla  tla+  programming  distsys  distcomp  ebs  s3  dynamodb  aws  ec2  marc-brooker  chris-newcombe 
june 2014 by jm
'Experience of software engineers using TLA+, PlusCal and TLC' [slides] [pdf]
by Chris Newcombe, an AWS principal engineer. Several Amazonians sharing their results in simulating tricky distributed-systems problems using formal methods
tla+  pluscal  tlc  formal-methods  simulation  proving  aws  amazon  architecture  design 
october 2013 by jm
Model checking for highly concurrent code
Applied formal methods in order to test distributed systems -- specifically GlusterFS:

I'll use an example from my own recent experience. I'm developing a new kind of replication for GlusterFS. To make sure the protocol behaves correctly even across multiple failures, I developed a Murphi model for it. [...]

I added a third failure [to the simulated model]. I didn't expect a three-node system to continue working if more than one of those were concurrent (the model allows the failures to be any mix of sequential and concurrent), but I expected it to fail cleanly without reaching an invalid state. Surprise! It managed to produce a case where a reader can observe values that go back in time. This might not make much sense without knowing the protocol involved, but it might give some idea of the crazy conditions a model checker will find that you couldn't possibly have considered. [...]

So now I have a bug to fix, and that's a good thing. Clearly, it involves a very specific set of ill-timed reads, writes, and failures. Could I have found it by inspection or ad-hoc analysis? Hell, no. Could I have found it by testing on live systems? Maybe, eventually, but it probably would have taken months for this particular combination to occur on its own. Forcing it to occur would require a lot of extra code, plus an exerciser that would amount to a model checker running 100x slower across machines than Murphi does. With enough real deployments over enough time it would have happened, but the only feasible way to prevent that was with model checking. These are exactly the kinds of bugs that are hardest to fix in the field, and that make users distrust distributed systems, so those of us who build such systems should use every tool at our disposal to avoid them.
model-checking  formal-methods  modelling  murphi  distcomp  distributed-systems  glusterfs  testing  protocols 
september 2013 by jm

Copy this bookmark: