**nhaliday + formal-methods**
45

Woboq Code Browser - Explore C++ code on the web

10 days ago by nhaliday

→ Browse the source code of Qt | GLibc | LLVM | Boost | GCC | Linux

repo
oss
info-foraging
code-dive
search
tools
aggregator
software
ui
compilers
gcc
linux
c(pp)
static-dynamic
graphs
formal-methods
llvm
10 days ago by nhaliday

graydon2 | "What next?"

28 days ago by nhaliday

"After memory safety, what do you think is the next big step for compiled languages to take?"

techtariat
programming
pls
plt
compilers
design
speculation
frontier
state-of-art
rust
expert-experience
composition-decomposition
coupling-cohesion
ocaml-sml
functional
types
error
error-handling
concurrency
state
formal-methods
hmm
idk
cost-benefit
time-complexity
space-complexity
static-dynamic
oss
open-closed
prediction
research
28 days ago by nhaliday

A Formal Verification of Rust's Binary Search Implementation

4 weeks ago by nhaliday

Part of the reason for this is that it’s quite complicated to apply mathematical tools to something unmathematical like a functionally unpure language (which, unfortunately, most programs tend to be written in). In mathematics, you don’t expect a variable to suddenly change its value, and it only gets more complicated when you have pointers to those dang things:

“Dealing with aliasing is one of the key challenges for the verification of imperative programs. For instance, aliases make it difficult to determine which abstractions are potentially affected by a heap update and to determine which locks need to be acquired to avoid data races.” 1

While there are whole logics focused on trying to tackle these problems, a master’s thesis wouldn’t be nearly enough time to model a formal Rust semantics on top of these, so I opted for a more straightforward solution: Simply make Rust a purely functional language!

Electrolysis: Simple Verification of Rust Programs via Functional Purification

If you know a bit about Rust, you may have noticed something about that quote in the previous section: There actually are no data races in (safe) Rust, precisely because there is no mutable aliasing. Either all references to some datum are immutable, or there is a single mutable reference. This means that mutability in Rust is much more localized than in most other imperative languages, and that it is sound to replace a destructive update like

p.x += 1

with a functional one – we know there’s no one else around observing p:

let p = Point { x = p.x + 1, ..p };

techtariat
plt
programming
formal-methods
rust
arrows
simplification
reduction
divide-and-conquer
correctness
project
state
functional
concurrency
direct-indirect
pls
examples
“Dealing with aliasing is one of the key challenges for the verification of imperative programs. For instance, aliases make it difficult to determine which abstractions are potentially affected by a heap update and to determine which locks need to be acquired to avoid data races.” 1

While there are whole logics focused on trying to tackle these problems, a master’s thesis wouldn’t be nearly enough time to model a formal Rust semantics on top of these, so I opted for a more straightforward solution: Simply make Rust a purely functional language!

Electrolysis: Simple Verification of Rust Programs via Functional Purification

If you know a bit about Rust, you may have noticed something about that quote in the previous section: There actually are no data races in (safe) Rust, precisely because there is no mutable aliasing. Either all references to some datum are immutable, or there is a single mutable reference. This means that mutability in Rust is much more localized than in most other imperative languages, and that it is sound to replace a destructive update like

p.x += 1

with a functional one – we know there’s no one else around observing p:

let p = Point { x = p.x + 1, ..p };

4 weeks ago by nhaliday

Leslie Lamport: Thinking Above the Code - YouTube

heavyweights cs distributed systems system-design formal-methods rigor correctness rhetoric contrarianism presentation video detail-architecture engineering programming thinking writing technical-writing concurrency protocol-metadata

4 weeks ago by nhaliday

heavyweights cs distributed systems system-design formal-methods rigor correctness rhetoric contrarianism presentation video detail-architecture engineering programming thinking writing technical-writing concurrency protocol-metadata

4 weeks ago by nhaliday

Purpose of proof: semi-formal methods : Inside 245-5D

techtariat reflection rhetoric contrarianism programming engineering formal-methods rigor proofs communication collaboration c(pp) concurrency plt correctness pragmatic math lens abstraction flux-stasis flexibility documentation system-design impetus ends-means

5 weeks ago by nhaliday

techtariat reflection rhetoric contrarianism programming engineering formal-methods rigor proofs communication collaboration c(pp) concurrency plt correctness pragmatic math lens abstraction flux-stasis flexibility documentation system-design impetus ends-means

5 weeks ago by nhaliday

CakeML

5 weeks ago by nhaliday

some interesting job openings in Sydney listed here

programming
pls
plt
functional
ocaml-sml
formal-methods
rigor
compilers
types
numerics
accuracy
estimate
research-program
homepage
anglo
jobs
tech
cool
5 weeks ago by nhaliday

Epigrams in Programming | Computer Science

6 weeks ago by nhaliday

- Alan Perlis

nibble
quotes
aphorism
list
cs
computation
programming
pls
hi-order-bits
synthesis
lens
data-structures
arrows
algorithms
iteration-recursion
intricacy
strings
types
math
formal-methods
pic
visuo
visual-understanding
systems
state
structure
turing
cost-benefit
lisp
performance
software
language
plt
invariance
ends-means
ai
nitty-gritty
sci-comp
composition-decomposition
tradeoffs
grokkability
assembly
internet
egalitarianism-hierarchy
functional
impetus
roots
path-dependence
heavyweights
6 weeks ago by nhaliday

OCaml For the Masses | November 2011 | Communications of the ACM

7 weeks ago by nhaliday

Straight out of the box, OCaml is pretty good at catching bugs, but it can do even more if you design your types carefully. Consider as an example the following types for representing the state of a network connection as illustrated in Figure 4.

that one excellent example of using algebraic data types

techtariat
rhetoric
programming
pls
engineering
pragmatic
carmack
quotes
aphorism
functional
ocaml-sml
types
formal-methods
correctness
finance
tip-of-tongue
examples
characterization
invariance
networking
that one excellent example of using algebraic data types

7 weeks ago by nhaliday

mypy - Optional Static Typing for Python

7 weeks ago by nhaliday

developed by Dropbox in Python and past contributors include Greg Price, Reid Barton

https://pyre-check.org

developed by Facebook in OCaml, seems less complete atm

python
types
programming
pls
devtools
compilers
formal-methods
dropbox
multi
facebook
ocaml-sml
libraries
the-prices
oly
static-dynamic
https://pyre-check.org

developed by Facebook in OCaml, seems less complete atm

7 weeks ago by nhaliday

International Solver Competition

8 weeks ago by nhaliday

a contest for dependency solvers

homepage
academia
research
cs
programming
build-packaging
formal-methods
constraint-satisfaction
contest
unix
linux
terminal
yak-shaving
lol
8 weeks ago by nhaliday

Panel: Systems Programming in 2014 and Beyond | Lang.NEXT 2014 | Channel 9

10 weeks ago by nhaliday

- Bjarne Stroustrup, Niko Matsakis, Andrei Alexandrescu, Rob Pike

- 2014 so pretty outdated but rare to find a discussion with people like this together

- pretty sure Jonathan Blow asked a couple questions

- Rob Pike compliments Rust at one point. Also kinda softly rags on dynamic typing at one point ("unit testing is what they have instead of static types").

video
presentation
debate
programming
pls
c(pp)
systems
os
rust
d-lang
golang
computer-memory
legacy
devtools
formal-methods
concurrency
compilers
syntax
parsimony
google
intricacy
thinking
cost-benefit
degrees-of-freedom
facebook
performance
people
rsc
cracker-prog
critique
types
checking
api
flux-stasis
engineering
time
wire-guided
worse-is-better/the-right-thing
static-dynamic
- 2014 so pretty outdated but rare to find a discussion with people like this together

- pretty sure Jonathan Blow asked a couple questions

- Rob Pike compliments Rust at one point. Also kinda softly rags on dynamic typing at one point ("unit testing is what they have instead of static types").

10 weeks ago by nhaliday

The Existential Risk of Math Errors - Gwern.net

10 weeks ago by nhaliday

How big is this upper bound? Mathematicians have often made errors in proofs. But it’s rarer for ideas to be accepted for a long time and then rejected. But we can divide errors into 2 basic cases corresponding to type I and type II errors:

1. Mistakes where the theorem is still true, but the proof was incorrect (type I)

2. Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)

Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable5.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept - but of course the results were some of the greatest mathematical work ever conducted6 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.7

...

Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:

...

TYPE I > TYPE II?

“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”

- Gian-Carlo Rota13

Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.

...

Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”.

...

Gian-Carlo Rota gives us an example with Hilbert:

...

Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties.

...

Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1/329 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures30.

[more on these "structured proofs":

https://academia.stackexchange.com/questions/52435/does-anyone-actually-publish-structured-proofs

https://mathoverflow.net/questions/35727/community-experiences-writing-lamports-structured-proofs

]

We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages31.

[related:

https://mathoverflow.net/questions/11517/computer-algebra-errors

I don't know any interesting bugs in symbolic algebra packages but I know a true, enlightening and entertaining story about something that looked like a bug but wasn't.

Define sinc𝑥=(sin𝑥)/𝑥.

Someone found the following result in an algebra package: ∫∞0𝑑𝑥sinc𝑥=𝜋/2

They then found the following results:

...

So of course when they got:

∫∞0𝑑𝑥sinc𝑥sinc(𝑥/3)sinc(𝑥/5)⋯sinc(𝑥/15)=(467807924713440738696537864469/935615849440640907310521750000)𝜋

hmm:

Which means that nobody knows Fourier analysis nowdays. Very sad and discouraging story... – fedja Jan 29 '10 at 18:47

--

Because the most popular systems are all commercial, they tend to guard their bug database rather closely -- making them public would seriously cut their sales. For example, for the open source project Sage (which is quite young), you can get a list of all the known bugs from this page. 1582 known issues on Feb.16th 2010 (which includes feature requests, problems with documentation, etc).

That is an order of magnitude less than the commercial systems. And it's not because it is better, it is because it is younger and smaller. It might be better, but until SAGE does a lot of analysis (about 40% of CAS bugs are there) and a fancy user interface (another 40%), it is too hard to compare.

I once ran a graduate course whose core topic was studying the fundamental disconnect between the algebraic nature of CAS and the analytic nature of the what it is mostly used for. There are issues of logic -- CASes work more or less in an intensional logic, while most of analysis is stated in a purely extensional fashion. There is no well-defined 'denotational semantics' for expressions-as-functions, which strongly contributes to the deeper bugs in CASes.]

...

Should such widely-believed conjectures as P≠NP or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, a far larger math holocaust would ensue38 - and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining, if it doesn’t come at a time of danger.

https://mathoverflow.net/questions/338607/why-doesnt-mathematics-collapse-down-even-though-humans-quite-often-make-mista

more on formal methods in programming:

https://www.quantamagazine.org/formal-verification-creates-hacker-proof-code-20160920/

https://intelligence.org/2014/03/02/bob-constable/

https://softwareengineering.stackexchange.com/questions/375342/what-are-the-barriers-that-prevent-widespread-adoption-of-formal-methods

Update: measured effort

In the October 2018 issue of Communications of the ACM there is an interesting article about Formally verified software in the real world with some estimates of the effort.

Interestingly (based on OS development for military equipment), it seems that producing formally proved software requires 3.3 times more effort than with traditional engineering techniques. So it's really costly.

On the other hand, it requires 2.3 times less effort to get high security software this way than with traditionally engineered software if you add the effort to make such software certified at a high security level (EAL 7). So if you have high reliability or security requirements there is definitively a business case for going formal.

WHY DON'T PEOPLE USE FORMAL METHODS?: https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/

You can see examples of how all of these look at Let’s Prove Leftpad. HOL4 and Isabelle are good examples of “independent theorem” specs, SPARK and Dafny have “embedded assertion” specs, and Coq and Agda have “dependent type” specs.6

If you squint a bit it looks like these three forms of code spec map to the three main domains of automated correctness checking: tests, contracts, and types. This is not a coincidence. Correctness is a spectrum, and formal verification is one extreme of that spectrum. As we reduce the rigour (and effort) of our verification we get simpler and narrower checks, whether that means limiting the explored state space, using weaker types, or pushing verification to the runtime. Any means of total specification then becomes a means of partial specification, and vice versa: many consider Cleanroom a formal verification technique, which primarily works by pushing code review far beyond what’s humanly possible.

...

The question, then: “is 90/95/99% correct significantly cheaper than 100% correct?” The answer is very yes. We all are comfortable saying that a codebase we’ve well-tested and well-typed is mostly correct modulo a few fixes in prod, and we’re even writing more than four lines of code a day. In fact, the vast… [more]

ratty
gwern
analysis
essay
realness
truth
correctness
reason
philosophy
math
proofs
formal-methods
cs
programming
engineering
worse-is-better/the-right-thing
intuition
giants
old-anglo
error
street-fighting
heuristic
zooming
risk
threat-modeling
software
lens
logic
inference
physics
differential
geometry
estimate
distribution
robust
speculation
nonlinearity
cost-benefit
convexity-curvature
measure
scale
trivia
cocktail
history
early-modern
europe
math.CA
rigor
news
org:mag
org:sci
miri-cfar
pdf
thesis
comparison
examples
org:junk
q-n-a
stackex
pragmatic
tradeoffs
cracker-prog
techtariat
invariance
DSL
chart
ecosystem
grokkability
heavyweights
CAS
static-dynamic
lower-bounds
complexity
tcs
open-problems
big-surf
ideas
certificates-recognition
proof-systems
PCP
mediterranean
SDP
meta:prediction
epistemic
questions
guessing
distributed
overflow
nibble
soft-question
track-record
big-list
hmm
frontier
state-of-art
1. Mistakes where the theorem is still true, but the proof was incorrect (type I)

2. Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)

Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable5.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept - but of course the results were some of the greatest mathematical work ever conducted6 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.7

...

Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:

...

TYPE I > TYPE II?

“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”

- Gian-Carlo Rota13

Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.

...

Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”.

...

Gian-Carlo Rota gives us an example with Hilbert:

...

Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties.

...

Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1/329 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures30.

[more on these "structured proofs":

https://academia.stackexchange.com/questions/52435/does-anyone-actually-publish-structured-proofs

https://mathoverflow.net/questions/35727/community-experiences-writing-lamports-structured-proofs

]

We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages31.

[related:

https://mathoverflow.net/questions/11517/computer-algebra-errors

I don't know any interesting bugs in symbolic algebra packages but I know a true, enlightening and entertaining story about something that looked like a bug but wasn't.

Define sinc𝑥=(sin𝑥)/𝑥.

Someone found the following result in an algebra package: ∫∞0𝑑𝑥sinc𝑥=𝜋/2

They then found the following results:

...

So of course when they got:

∫∞0𝑑𝑥sinc𝑥sinc(𝑥/3)sinc(𝑥/5)⋯sinc(𝑥/15)=(467807924713440738696537864469/935615849440640907310521750000)𝜋

hmm:

Which means that nobody knows Fourier analysis nowdays. Very sad and discouraging story... – fedja Jan 29 '10 at 18:47

--

Because the most popular systems are all commercial, they tend to guard their bug database rather closely -- making them public would seriously cut their sales. For example, for the open source project Sage (which is quite young), you can get a list of all the known bugs from this page. 1582 known issues on Feb.16th 2010 (which includes feature requests, problems with documentation, etc).

That is an order of magnitude less than the commercial systems. And it's not because it is better, it is because it is younger and smaller. It might be better, but until SAGE does a lot of analysis (about 40% of CAS bugs are there) and a fancy user interface (another 40%), it is too hard to compare.

I once ran a graduate course whose core topic was studying the fundamental disconnect between the algebraic nature of CAS and the analytic nature of the what it is mostly used for. There are issues of logic -- CASes work more or less in an intensional logic, while most of analysis is stated in a purely extensional fashion. There is no well-defined 'denotational semantics' for expressions-as-functions, which strongly contributes to the deeper bugs in CASes.]

...

Should such widely-believed conjectures as P≠NP or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, a far larger math holocaust would ensue38 - and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining, if it doesn’t come at a time of danger.

https://mathoverflow.net/questions/338607/why-doesnt-mathematics-collapse-down-even-though-humans-quite-often-make-mista

more on formal methods in programming:

https://www.quantamagazine.org/formal-verification-creates-hacker-proof-code-20160920/

https://intelligence.org/2014/03/02/bob-constable/

https://softwareengineering.stackexchange.com/questions/375342/what-are-the-barriers-that-prevent-widespread-adoption-of-formal-methods

Update: measured effort

In the October 2018 issue of Communications of the ACM there is an interesting article about Formally verified software in the real world with some estimates of the effort.

Interestingly (based on OS development for military equipment), it seems that producing formally proved software requires 3.3 times more effort than with traditional engineering techniques. So it's really costly.

On the other hand, it requires 2.3 times less effort to get high security software this way than with traditionally engineered software if you add the effort to make such software certified at a high security level (EAL 7). So if you have high reliability or security requirements there is definitively a business case for going formal.

WHY DON'T PEOPLE USE FORMAL METHODS?: https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/

You can see examples of how all of these look at Let’s Prove Leftpad. HOL4 and Isabelle are good examples of “independent theorem” specs, SPARK and Dafny have “embedded assertion” specs, and Coq and Agda have “dependent type” specs.6

If you squint a bit it looks like these three forms of code spec map to the three main domains of automated correctness checking: tests, contracts, and types. This is not a coincidence. Correctness is a spectrum, and formal verification is one extreme of that spectrum. As we reduce the rigour (and effort) of our verification we get simpler and narrower checks, whether that means limiting the explored state space, using weaker types, or pushing verification to the runtime. Any means of total specification then becomes a means of partial specification, and vice versa: many consider Cleanroom a formal verification technique, which primarily works by pushing code review far beyond what’s humanly possible.

...

The question, then: “is 90/95/99% correct significantly cheaper than 100% correct?” The answer is very yes. We all are comfortable saying that a codebase we’ve well-tested and well-typed is mostly correct modulo a few fixes in prod, and we’re even writing more than four lines of code a day. In fact, the vast… [more]

10 weeks ago by nhaliday

Mutation testing - Wikipedia

11 weeks ago by nhaliday

Mutation testing involves modifying a program in small ways.[1] Each mutated version is called a mutant and tests detect and reject mutants by causing the behavior of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants.

wiki
reference
concept
mutation
selection
analogy
programming
checking
formal-methods
debugging
random
list
libraries
links
functional
haskell
javascript
jvm
c(pp)
python
dotnet
oop
perturbation
static-dynamic
11 weeks ago by nhaliday

What every computer scientist should know about floating-point arithmetic

may 2019 by nhaliday

Floating-point arithmetic is considered as esoteric subject by many people. This is rather surprising, because floating-point is ubiquitous in computer systems: Almost every language has a floating-point datatype; computers from PCs to supercomputers have floating-point accelerators; most compilers will be called upon to compile floating-point algorithms from time to time; and virtually every operating system must respond to floating-point exceptions such as overflow. This paper presents a tutorial on the aspects of floating-point that have a direct impact on designers of computer systems. It begins with background on floating-point representation and rounding error, continues with a discussion of the IEEE floating point standard, and concludes with examples of how computer system builders can better support floating point.

https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations

"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:

https://en.wikipedia.org/wiki/Pairwise_summation

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs

However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]

nibble
pdf
papers
programming
systems
numerics
nitty-gritty
intricacy
approximation
accuracy
types
sci-comp
multi
q-n-a
stackex
hmm
oly-programming
accretion
formal-methods
yak-shaving
wiki
reference
algorithms
yoga
ground-up
divide-and-conquer
fourier
books
tidbits
chart
caltech
nostalgia
https://stackoverflow.com/questions/2729637/does-epsilon-really-guarantees-anything-in-floating-point-computations

"you must use an epsilon when dealing with floats" is a knee-jerk reaction of programmers with a superficial understanding of floating-point computations, for comparisons in general (not only to zero).

This is usually unhelpful because it doesn't tell you how to minimize the propagation of rounding errors, it doesn't tell you how to avoid cancellation or absorption problems, and even when your problem is indeed related to the comparison of two floats, it doesn't tell you what value of epsilon is right for what you are doing.

...

Regarding the propagation of rounding errors, there exists specialized analyzers that can help you estimate it, because it is a tedious thing to do by hand.

https://www.di.ens.fr/~cousot/projects/DAEDALUS/synthetic_summary/CEA/Fluctuat/index.html

This was part of HW1 of CS24:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

In particular, simply summing n numbers in sequence has a worst-case error that grows proportional to n, and a root mean square error that grows as {\displaystyle {\sqrt {n}}} {\sqrt {n}} for random inputs (the roundoff errors form a random walk).[2] With compensated summation, the worst-case error bound is independent of n, so a large number of values can be summed with an error that only depends on the floating-point precision.[2]

cf:

https://en.wikipedia.org/wiki/Pairwise_summation

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence.[1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational cost—it can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O(ε log n), where ε is the machine precision (assuming a fixed condition number, as discussed below).[1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as O(εn).[1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations.[1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of {\displaystyle O(\varepsilon {\sqrt {\log n}})} O(\varepsilon {\sqrt {\log n}}) for pairwise summation.[2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs.[2][3]

https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Book%3A_Fast_Fourier_Transforms_(Burrus)/10%3A_Implementing_FFTs_in_Practice/10.8%3A_Numerical_Accuracy_in_FFTs

However, these encouraging error-growth rates only apply if the trigonometric “twiddle” factors in the FFT algorithm are computed very accurately. Many FFT implementations, including FFTW and common manufacturer-optimized libraries, therefore use precomputed tables of twiddle factors calculated by means of standard library functions (which compute trigonometric constants to roughly machine precision). The other common method to compute twiddle factors is to use a trigonometric recurrence formula—this saves memory (and cache), but almost all recurrences have errors that grow as O(n‾√) , O(n) or even O(n2) which lead to corresponding errors in the FFT.

...

There are, in fact, trigonometric recurrences with the same logarithmic error growth as the FFT, but these seem more difficult to implement efficiently; they require that a table of Θ(logn) values be stored and updated as the recurrence progresses. Instead, in order to gain at least some of the benefits of a trigonometric recurrence (reduced memory pressure at the expense of more arithmetic), FFTW includes several ways to compute a much smaller twiddle table, from which the desired entries can be computed accurately on the fly using a bounded number (usually <3) of complex multiplications. For example, instead of a twiddle table with n entries ωkn , FFTW can use two tables with Θ(n‾√) entries each, so that ωkn is computed by multiplying an entry in one table (indexed with the low-order bits of k ) by an entry in the other table (indexed with the high-order bits of k ).

[ed.: Nicholas Higham's "Accuracy and Stability of Numerical Algorithms" seems like a good reference for this kind of analysis.]

may 2019 by nhaliday

Frama-C

may 2019 by nhaliday

Frama-C is organized with a plug-in architecture (comparable to that of the Gimp or Eclipse). A common kernel centralizes information and conducts the analysis. Plug-ins interact with each other through interfaces defined by the kernel. This makes for robustness in the development of Frama-C while allowing a wide functionality spectrum.

...

Three heavyweight plug-ins that are used by the other plug-ins:

- Eva (Evolved Value analysis)

This plug-in computes variation domains for variables. It is quite automatic, although the user may guide the analysis in places. It handles a wide spectrum of C constructs. This plug-in uses abstract interpretation techniques.

- Jessie and Wp, two deductive verification plug-ins

These plug-ins are based on weakest precondition computation techniques. They allow to prove that C functions satisfy their specification as expressed in ACSL. These proofs are modular: the specifications of the called functions are used to establish the proof without looking at their code.

For browsing unfamiliar code:

- Impact analysis

This plug-in highlights the locations in the source code that are impacted by a modification.

- Scope & Data-flow browsing

This plug-in allows the user to navigate the dataflow of the program, from definition to use or from use to definition.

- Variable occurrence browsing

Also provided as a simple example for new plug-in development, this plug-in allows the user to reach the statements where a given variable is used.

- Metrics calculation

This plug-in allows the user to compute various metrics from the source code.

For code transformation:

- Semantic constant folding

This plug-in makes use of the results of the evolved value analysis plug-in to replace, in the source code, the constant expressions by their values. Because it relies on EVA, it is able to do more of these simplifications than a syntactic analysis would.

- Slicing

This plug-in slices the code according to a user-provided criterion: it creates a copy of the program, but keeps only those parts which are necessary with respect to the given criterion.

- Spare code: remove "spare code", code that does not contribute to the final results of the program.

- E-ACSL: translate annotations into C code for runtime assertion checking.

For verifying functional specifications:

- Aoraï: verify specifications expressed as LTL (Linear Temporal Logic) formulas

Other functionalities documented together with the EVA plug-in can be considered as verifying low-level functional specifications (inputs, outputs, dependencies,…)

For test-case generation:

- PathCrawler automatically finds test-case inputs to ensure coverage of a C function. It can be used for structural unit testing, as a complement to static analysis or to study the feasible execution paths of the function.

For concurrent programs:

- Mthread

This plug-in automatically analyzes concurrent C programs, using the EVA plug-in, taking into account all possible thread interactions. At the end of its execution, the concurrent behavior of each thread is over-approximated, resulting in precise information about shared variables, which mutex protects a part of the code, etc.

Front-end for other languages

- Frama-Clang

This plug-in provides a C++ front-end to Frama-C, based on the clang compiler. It transforms C++ code into a Frama-C AST, which can then be analyzed by the plug-ins above. Note however that it is very experimental and only supports a subset of C++11

tools
devtools
formal-methods
programming
software
c(pp)
systems
memory-management
ocaml-sml
debugging
checking
rigor
oss
code-dive
graphs
state
metrics
llvm
gallic
cool
worrydream
impact
flux-stasis
correctness
computer-memory
structure
static-dynamic
...

Three heavyweight plug-ins that are used by the other plug-ins:

- Eva (Evolved Value analysis)

This plug-in computes variation domains for variables. It is quite automatic, although the user may guide the analysis in places. It handles a wide spectrum of C constructs. This plug-in uses abstract interpretation techniques.

- Jessie and Wp, two deductive verification plug-ins

These plug-ins are based on weakest precondition computation techniques. They allow to prove that C functions satisfy their specification as expressed in ACSL. These proofs are modular: the specifications of the called functions are used to establish the proof without looking at their code.

For browsing unfamiliar code:

- Impact analysis

This plug-in highlights the locations in the source code that are impacted by a modification.

- Scope & Data-flow browsing

This plug-in allows the user to navigate the dataflow of the program, from definition to use or from use to definition.

- Variable occurrence browsing

Also provided as a simple example for new plug-in development, this plug-in allows the user to reach the statements where a given variable is used.

- Metrics calculation

This plug-in allows the user to compute various metrics from the source code.

For code transformation:

- Semantic constant folding

This plug-in makes use of the results of the evolved value analysis plug-in to replace, in the source code, the constant expressions by their values. Because it relies on EVA, it is able to do more of these simplifications than a syntactic analysis would.

- Slicing

This plug-in slices the code according to a user-provided criterion: it creates a copy of the program, but keeps only those parts which are necessary with respect to the given criterion.

- Spare code: remove "spare code", code that does not contribute to the final results of the program.

- E-ACSL: translate annotations into C code for runtime assertion checking.

For verifying functional specifications:

- Aoraï: verify specifications expressed as LTL (Linear Temporal Logic) formulas

Other functionalities documented together with the EVA plug-in can be considered as verifying low-level functional specifications (inputs, outputs, dependencies,…)

For test-case generation:

- PathCrawler automatically finds test-case inputs to ensure coverage of a C function. It can be used for structural unit testing, as a complement to static analysis or to study the feasible execution paths of the function.

For concurrent programs:

- Mthread

This plug-in automatically analyzes concurrent C programs, using the EVA plug-in, taking into account all possible thread interactions. At the end of its execution, the concurrent behavior of each thread is over-approximated, resulting in precise information about shared variables, which mutex protects a part of the code, etc.

Front-end for other languages

- Frama-Clang

This plug-in provides a C++ front-end to Frama-C, based on the clang compiler. It transforms C++ code into a Frama-C AST, which can then be analyzed by the plug-ins above. Note however that it is very experimental and only supports a subset of C++11

may 2019 by nhaliday

One week of bugs

may 2019 by nhaliday

If I had to guess, I'd say I probably work around hundreds of bugs in an average week, and thousands in a bad week. It's not unusual for me to run into a hundred new bugs in a single week. But I often get skepticism when I mention that I run into multiple new (to me) bugs per day, and that this is inevitable if we don't change how we write tests. Well, here's a log of one week of bugs, limited to bugs that were new to me that week. After a brief description of the bugs, I'll talk about what we can do to improve the situation. The obvious answer to spend more effort on testing, but everyone already knows we should do that and no one does it. That doesn't mean it's hopeless, though.

...

Here's where I'm supposed to write an appeal to take testing more seriously and put real effort into it. But we all know that's not going to work. It would take 90k LOC of tests to get Julia to be as well tested as a poorly tested prototype (falsely assuming linear complexity in size). That's two person-years of work, not even including time to debug and fix bugs (which probably brings it closer to four of five years). Who's going to do that? No one. Writing tests is like writing documentation. Everyone already knows you should do it. Telling people they should do it adds zero information1.

Given that people aren't going to put any effort into testing, what's the best way to do it?

Property-based testing. Generative testing. Random testing. Concolic Testing (which was done long before the term was coined). Static analysis. Fuzzing. Statistical bug finding. There are lots of options. Some of them are actually the same thing because the terminology we use is inconsistent and buggy. I'm going to arbitrarily pick one to talk about, but they're all worth looking into.

...

There are a lot of great resources out there, but if you're just getting started, I found this description of types of fuzzers to be one of those most helpful (and simplest) things I've read.

John Regehr has a udacity course on software testing. I haven't worked through it yet (Pablo Torres just pointed to it), but given the quality of Dr. Regehr's writing, I expect the course to be good.

For more on my perspective on testing, there's this.

https://hypothesis.works/articles/the-purpose-of-hypothesis/

From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

Software is everywhere. We have built a civilization on it, and it’s only getting more prevalent as more services move online and embedded and “internet of things” devices become cheaper and more common.

Software is also terrible. It’s buggy, it’s insecure, and it’s rarely well thought out.

This combination is clearly a recipe for disaster.

The state of software testing is even worse. It’s uncontroversial at this point that you should be testing your code, but it’s a rare codebase whose authors could honestly claim that they feel its testing is sufficient.

Much of the problem here is that it’s too hard to write good tests. Tests take up a vast quantity of development time, but they mostly just laboriously encode exactly the same assumptions and fallacies that the authors had when they wrote the code, so they miss exactly the same bugs that you missed when they wrote the code.

Preventing the Collapse of Civilization [video]: https://news.ycombinator.com/item?id=19945452

- Jonathan Blow

NB: DevGAMM is a game industry conference

- loss of technological knowledge (Antikythera mechanism, aqueducts, etc.)

- hardware driving most gains, not software

- software's actually less robust, often poorly designed and overengineered these days

- *list of bugs he's encountered recently*:

https://youtu.be/pW-SOdj4Kkk?t=1387

- knowledge of trivia becomes more than general, deep knowledge

- does at least acknowledge value of DRY, reusing code, abstraction saving dev time

techtariat
dan-luu
tech
software
error
list
debugging
linux
github
robust
checking
oss
troll
lol
aphorism
webapp
email
google
facebook
games
julia
pls
compilers
communication
mooc
browser
rust
programming
engineering
random
jargon
formal-methods
expert-experience
prof
c(pp)
course
correctness
hn
commentary
video
presentation
carmack
pragmatic
contrarianism
pessimism
sv
unix
rhetoric
critique
worrydream
hardware
performance
trends
multiplicative
roots
impact
comparison
history
iron-age
the-classics
mediterranean
conquest-empire
gibbon
technology
the-world-is-just-atoms
flux-stasis
increase-decrease
graphics
hmm
idk
systems
os
abstraction
intricacy
worse-is-better/the-right-thing
build-packaging
microsoft
osx
apple
reflection
assembly
things
knowledge
detail-architecture
thick-thin
trivia
info-dynamics
caching
frameworks
generalization
systematic-ad-hoc
universalism-particularism
analytical-holistic
structure
tainter
libraries
tradeoffs
prepping
threat-modeling
network-structure
writing
risk
local-glob
...

Here's where I'm supposed to write an appeal to take testing more seriously and put real effort into it. But we all know that's not going to work. It would take 90k LOC of tests to get Julia to be as well tested as a poorly tested prototype (falsely assuming linear complexity in size). That's two person-years of work, not even including time to debug and fix bugs (which probably brings it closer to four of five years). Who's going to do that? No one. Writing tests is like writing documentation. Everyone already knows you should do it. Telling people they should do it adds zero information1.

Given that people aren't going to put any effort into testing, what's the best way to do it?

Property-based testing. Generative testing. Random testing. Concolic Testing (which was done long before the term was coined). Static analysis. Fuzzing. Statistical bug finding. There are lots of options. Some of them are actually the same thing because the terminology we use is inconsistent and buggy. I'm going to arbitrarily pick one to talk about, but they're all worth looking into.

...

There are a lot of great resources out there, but if you're just getting started, I found this description of types of fuzzers to be one of those most helpful (and simplest) things I've read.

John Regehr has a udacity course on software testing. I haven't worked through it yet (Pablo Torres just pointed to it), but given the quality of Dr. Regehr's writing, I expect the course to be good.

For more on my perspective on testing, there's this.

https://hypothesis.works/articles/the-purpose-of-hypothesis/

From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

Software is everywhere. We have built a civilization on it, and it’s only getting more prevalent as more services move online and embedded and “internet of things” devices become cheaper and more common.

Software is also terrible. It’s buggy, it’s insecure, and it’s rarely well thought out.

This combination is clearly a recipe for disaster.

The state of software testing is even worse. It’s uncontroversial at this point that you should be testing your code, but it’s a rare codebase whose authors could honestly claim that they feel its testing is sufficient.

Much of the problem here is that it’s too hard to write good tests. Tests take up a vast quantity of development time, but they mostly just laboriously encode exactly the same assumptions and fallacies that the authors had when they wrote the code, so they miss exactly the same bugs that you missed when they wrote the code.

Preventing the Collapse of Civilization [video]: https://news.ycombinator.com/item?id=19945452

- Jonathan Blow

NB: DevGAMM is a game industry conference

- loss of technological knowledge (Antikythera mechanism, aqueducts, etc.)

- hardware driving most gains, not software

- software's actually less robust, often poorly designed and overengineered these days

- *list of bugs he's encountered recently*:

https://youtu.be/pW-SOdj4Kkk?t=1387

- knowledge of trivia becomes more than general, deep knowledge

- does at least acknowledge value of DRY, reusing code, abstraction saving dev time

may 2019 by nhaliday

Rust Creator Graydon Hoare Recounts the History of Compilers - The New Stack

techtariat presentation links commentary summary slides pdf programming pls plt compilers reflection history comparison cost-benefit c(pp) performance lisp functional ocaml-sml haskell formal-methods llvm tradeoffs measurement intricacy troll aphorism software hardware roots impact expert-experience jvm constraint-satisfaction pareto rust gcc

may 2019 by nhaliday

techtariat presentation links commentary summary slides pdf programming pls plt compilers reflection history comparison cost-benefit c(pp) performance lisp functional ocaml-sml haskell formal-methods llvm tradeoffs measurement intricacy troll aphorism software hardware roots impact expert-experience jvm constraint-satisfaction pareto rust gcc

may 2019 by nhaliday

Why is Software Engineering so difficult? - James Miller

may 2019 by nhaliday

basic message: No silver bullet!

most interesting nuggets:

Scale and Complexity

- Windows 7 > 50 million LOC

Expect a staggering number of bugs.

Bugs?

- Well-written C and C++ code contains some 5 to 10 errors per 100 LOC after a clean compile, but before inspection and testing.

- At a 5% rate any 50 MLOC program will start off with some 2.5 million bugs.

Bug removal

- Testing typically exercises only half the code.

Better bug removal?

- There are better ways to do testing that do produce fantastic programs.”

- Are we sure about this fact?

* No, its only an opinion!

* In general Software Engineering has ....

NO FACTS!

So why not do this?

- The costs are unbelievable.

- It’s not unusual for the qualification process to produce a half page of documentation for each line of code.

pdf
slides
engineering
nitty-gritty
programming
best-practices
roots
comparison
cost-benefit
software
systematic-ad-hoc
structure
error
frontier
debugging
checking
formal-methods
context
detail-architecture
intricacy
big-picture
system-design
correctness
scale
scaling-tech
shipping
money
data
stylized-facts
street-fighting
objektbuch
pro-rata
estimate
pessimism
degrees-of-freedom
volo-avolo
no-go
things
thinking
summary
quality
density
most interesting nuggets:

Scale and Complexity

- Windows 7 > 50 million LOC

Expect a staggering number of bugs.

Bugs?

- Well-written C and C++ code contains some 5 to 10 errors per 100 LOC after a clean compile, but before inspection and testing.

- At a 5% rate any 50 MLOC program will start off with some 2.5 million bugs.

Bug removal

- Testing typically exercises only half the code.

Better bug removal?

- There are better ways to do testing that do produce fantastic programs.”

- Are we sure about this fact?

* No, its only an opinion!

* In general Software Engineering has ....

NO FACTS!

So why not do this?

- The costs are unbelievable.

- It’s not unusual for the qualification process to produce a half page of documentation for each line of code.

may 2019 by nhaliday

AFL + QuickCheck = ?

may 2019 by nhaliday

Adventures in fuzzing. Also differences between testing culture in software and hardware.

techtariat
dan-luu
programming
engineering
checking
random
haskell
path-dependence
span-cover
heuristic
libraries
links
tools
devtools
software
hardware
culture
formal-methods
local-global
golang
correctness
may 2019 by nhaliday

Delta debugging - Wikipedia

may 2019 by nhaliday

good overview of with examples: https://www.csm.ornl.gov/~sheldon/bucket/Automated-Debugging.pdf

Not as useful for my usecases (mostly contest programming) as QuickCheck. Input is generally pretty structured and I don't have a long history of code in VCS. And when I do have the latter git-bisect is probably enough.

good book tho: http://www.whyprogramsfail.com/toc.php

WHY PROGRAMS FAIL: A Guide to Systematic Debugging\

wiki
reference
programming
systems
debugging
c(pp)
python
tools
devtools
links
hmm
formal-methods
divide-and-conquer
vcs
git
search
yak-shaving
pdf
white-paper
multi
examples
stories
books
unit
caltech
recommendations
advanced
correctness
Not as useful for my usecases (mostly contest programming) as QuickCheck. Input is generally pretty structured and I don't have a long history of code in VCS. And when I do have the latter git-bisect is probably enough.

good book tho: http://www.whyprogramsfail.com/toc.php

WHY PROGRAMS FAIL: A Guide to Systematic Debugging\

may 2019 by nhaliday

Negative Results in Empirical Soft Eng - Journal Special Issue

techtariat programming engineering pragmatic software tech list links study summary commentary carmack empirical evidence-based shipping null-result replication expert-experience ability-competence metrics correlation degrees-of-freedom devtools formal-methods best-practices 🖥 working-stiff measure estimate

november 2017 by nhaliday

techtariat programming engineering pragmatic software tech list links study summary commentary carmack empirical evidence-based shipping null-result replication expert-experience ability-competence metrics correlation degrees-of-freedom devtools formal-methods best-practices 🖥 working-stiff measure estimate

november 2017 by nhaliday

In Computers We Trust? | Quanta Magazine

january 2017 by nhaliday

As math grows ever more complex, will computers reign?

Shalosh B. Ekhad is a computer. Or, rather, it is any of a rotating cast of computers used by the mathematician Doron Zeilberger, from the Dell in his New Jersey office to a supercomputer whose services he occasionally enlists in Austria. The name — Hebrew for “three B one” — refers to the AT&T 3B1, Ekhad’s earliest incarnation.

“The soul is the software,” said Zeilberger, who writes his own code using a popular math programming tool called Maple.

news
org:mag
org:sci
popsci
math
culture
academia
automation
formal-methods
ai
debate
interdisciplinary
rigor
proofs
nibble
org:inst
calculation
bare-hands
heavyweights
contrarianism
computation
correctness
oss
replication
logic
frontier
state-of-art
Shalosh B. Ekhad is a computer. Or, rather, it is any of a rotating cast of computers used by the mathematician Doron Zeilberger, from the Dell in his New Jersey office to a supercomputer whose services he occasionally enlists in Austria. The name — Hebrew for “three B one” — refers to the AT&T 3B1, Ekhad’s earliest incarnation.

“The soul is the software,” said Zeilberger, who writes his own code using a popular math programming tool called Maple.

january 2017 by nhaliday

Notes on concurrency bugs

dan-luu tech programming concurrency debugging data engineering techtariat intricacy structure characterization composition-decomposition distribution empirical correctness tools devtools formal-methods state meta-analysis analysis objektbuch volo-avolo degrees-of-freedom

january 2017 by nhaliday

dan-luu tech programming concurrency debugging data engineering techtariat intricacy structure characterization composition-decomposition distribution empirical correctness tools devtools formal-methods state meta-analysis analysis objektbuch volo-avolo degrees-of-freedom

january 2017 by nhaliday

What's worked in computer science

reflection engineering dan-luu list analysis history programming formal-methods carmack contrarianism pragmatic comparison len:short 🖥 2015 techtariat evidence-based data empirical shipping security distributed concurrency internet web pls plt functional working-stiff best-practices correctness worse-is-better/the-right-thing quotes aphorism critique essay papers ubiquity cost-benefit s:** types scala ocaml-sml haskell dotnet minimalism intricacy static-dynamic protocol-metadata big-picture

october 2016 by nhaliday

reflection engineering dan-luu list analysis history programming formal-methods carmack contrarianism pragmatic comparison len:short 🖥 2015 techtariat evidence-based data empirical shipping security distributed concurrency internet web pls plt functional working-stiff best-practices correctness worse-is-better/the-right-thing quotes aphorism critique essay papers ubiquity cost-benefit s:** types scala ocaml-sml haskell dotnet minimalism intricacy static-dynamic protocol-metadata big-picture

october 2016 by nhaliday

Hacker-Proof Code Confirmed | Hacker News

september 2016 by nhaliday

interesting Quanta article w/ bad title

commentary
hn
security
plt
news
org:mag
org:sci
formal-methods
september 2016 by nhaliday

On proof and progress in mathematics

pdf thurston math writing thinking synthesis papers essay unit nibble intuition worrydream communication proofs the-trenches reflection geometry meta:math better-explained stories virtu 🎓 scholar metameta wisdom narrative p:whenever inference cs programming rigor formal-methods meta:research info-dynamics elegance technical-writing heavyweights guessing

august 2016 by nhaliday

pdf thurston math writing thinking synthesis papers essay unit nibble intuition worrydream communication proofs the-trenches reflection geometry meta:math better-explained stories virtu 🎓 scholar metameta wisdom narrative p:whenever inference cs programming rigor formal-methods meta:research info-dynamics elegance technical-writing heavyweights guessing

august 2016 by nhaliday

They Write the Right Stuff | Fast Company | Business + Innovation

june 2016 by nhaliday

doesn't seem like a very in-depth article from a quick scan

engineering
management
stories
profile
org:biz
space
news
scaling-tech
org:lite
formal-methods
correctness
june 2016 by nhaliday

Static vs. dynamic languages: a literature review

june 2016 by nhaliday

would be a good test case for doing a meta-analysis using RevMan

programming
study
worrydream
plt
critique
data
hmm
comparison
pls
carmack
dan-luu
techtariat
evidence-based
types
engineering
jvm
c(pp)
python
time
measure
expectancy
moments
productivity
pro-rata
input-output
documentation
meta-analysis
summary
ocaml-sml
formal-methods
causation
endogenous-exogenous
econometrics
correlation
best-practices
error
open-closed
dotnet
correctness
static-dynamic
june 2016 by nhaliday

Cyclomatic complexity - Wikipedia, the free encyclopedia

april 2016 by nhaliday

note: equivalent to # of basis paths (dimensionality of cycle space), and linear in size of code

https://en.wikipedia.org/wiki/Basis_path_testing

https://en.wikipedia.org/wiki/Cycle_space

programming
metrics
devtools
engineering
pragmatic
intricacy
structure
graphs
formal-methods
dimensionality
magnitude
multi
linear-algebra
span-cover
checking
debugging
algebra
topology
wormholes
https://en.wikipedia.org/wiki/Basis_path_testing

https://en.wikipedia.org/wiki/Cycle_space

april 2016 by nhaliday

Lean

january 2016 by nhaliday

https://lean-forward.github.io

The goal of the Lean Forward project is to collaborate with number theorists to formally prove theorems about research mathematics and to address the main usability issues hampering the adoption of proof assistants in mathematical circles. The theorems will be selected together with our collaborators to guide the development of formal libraries and verified tools.

mostly happening in the Netherlands

https://formalabstracts.github.io

A Review of the Lean Theorem Prover: https://jiggerwit.wordpress.com/2018/09/18/a-review-of-the-lean-theorem-prover/

- Thomas Hales

seems like a Coq might be a better starter if I ever try to get into proof assistants/theorem provers

An Argument for Controlled Natural Languages in Mathematics: https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-controlled-natural-languages-in-mathematics/

By controlled natural language for mathematics (CNL), we mean an artificial language for the communication of mathematics that is (1) designed in a deliberate and explicit way with precise computer-readable syntax and semantics, (2) based on a single natural language (such as Chinese, Spanish, or English), and (3) broadly understood at least in an intuitive way by mathematically literate speakers of the natural language.

The definition of controlled natural language is intended to exclude invented languages such as Esperanto and Logjam that are not based on a single natural language. Programming languages are meant to be excluded, but a case might be made for TeX as the first broadly adopted controlled natural language for mathematics.

Perhaps it is best to start with an example. Here is a beautifully crafted CNL text created by Peter Koepke and Steffen Frerix. It reproduces a theorem and proof in Rudin’s Principles of mathematical analysis almost word for word. Their automated proof system is able to read and verify the proof.

https://github.com/Naproche/Naproche-SAD

research
math
formal-methods
msr
multi
homepage
research-program
skunkworks
math.NT
academia
ux
CAS
mathtariat
expert-experience
cost-benefit
nitty-gritty
review
critique
rant
types
learning
intricacy
functional
performance
c(pp)
ocaml-sml
comparison
ecosystem
DSL
tradeoffs
composition-decomposition
interdisciplinary
europe
germanic
grokkability
nlp
language
heavyweights
inference
rigor
automata-languages
repo
software
tools
syntax
frontier
state-of-art
pls
The goal of the Lean Forward project is to collaborate with number theorists to formally prove theorems about research mathematics and to address the main usability issues hampering the adoption of proof assistants in mathematical circles. The theorems will be selected together with our collaborators to guide the development of formal libraries and verified tools.

mostly happening in the Netherlands

https://formalabstracts.github.io

A Review of the Lean Theorem Prover: https://jiggerwit.wordpress.com/2018/09/18/a-review-of-the-lean-theorem-prover/

- Thomas Hales

seems like a Coq might be a better starter if I ever try to get into proof assistants/theorem provers

An Argument for Controlled Natural Languages in Mathematics: https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-controlled-natural-languages-in-mathematics/

By controlled natural language for mathematics (CNL), we mean an artificial language for the communication of mathematics that is (1) designed in a deliberate and explicit way with precise computer-readable syntax and semantics, (2) based on a single natural language (such as Chinese, Spanish, or English), and (3) broadly understood at least in an intuitive way by mathematically literate speakers of the natural language.

The definition of controlled natural language is intended to exclude invented languages such as Esperanto and Logjam that are not based on a single natural language. Programming languages are meant to be excluded, but a case might be made for TeX as the first broadly adopted controlled natural language for mathematics.

Perhaps it is best to start with an example. Here is a beautifully crafted CNL text created by Peter Koepke and Steffen Frerix. It reproduces a theorem and proof in Rudin’s Principles of mathematical analysis almost word for word. Their automated proof system is able to read and verify the proof.

https://github.com/Naproche/Naproche-SAD

january 2016 by nhaliday

ImperialViolet - A shallow survey of formal methods for C code

september 2014 by nhaliday

Conclusion

The conclusion is a bit disappointing really: Curve25519 has no side effects and performs no allocation, it's a pure function that should be highly amenable to verification and yet I've been unable to find anything that can get even 20 lines into it. Some of this might be my own stupidity, but I put a fair amount of work into trying to find something that worked.

There seems to be a lot of promise in the area and some pieces work well (SMT solvers are often quite impressive, the Frama-C framework appears to be solid, Isabelle is quite pleasant) but nothing I found worked well together, at least for verifying C code. That makes efforts like SeL4 and Ironsides even more impressive. However, if you're happy to work at a higher level I'm guessing that verifying functional programs is a lot easier going.

formal-methods
programming
functional
techtariat
c(pp)
review
critique
systems
survey
analysis
correctness
pessimism
rigor
static-dynamic
frontier
state-of-art
The conclusion is a bit disappointing really: Curve25519 has no side effects and performs no allocation, it's a pure function that should be highly amenable to verification and yet I've been unable to find anything that can get even 20 lines into it. Some of this might be my own stupidity, but I put a fair amount of work into trying to find something that worked.

There seems to be a lot of promise in the area and some pieces work well (SMT solvers are often quite impressive, the Frama-C framework appears to be solid, Isabelle is quite pleasant) but nothing I found worked well together, at least for verifying C code. That makes efforts like SeL4 and Ironsides even more impressive. However, if you're happy to work at a higher level I'm guessing that verifying functional programs is a lot easier going.

september 2014 by nhaliday

bundles : techie

**related tags**

Copy this bookmark: