nhaliday + correctness   57

Software Testing Anti-patterns | Hacker News
I haven't read this but both the article and commentary/discussion look interesting from a glance

hmm: https://news.ycombinator.com/item?id=16896390
In small companies where there is no time to "waste" on tests, my view is that 80% of the problems can be caught with 20% of the work by writing integration tests that cover large areas of the application. Writing unit tests would be ideal, but time-consuming. For a web project, that would involve testing all pages for HTTP 200 (< 1 hour bash script that will catch most major bugs), automatically testing most interfaces to see if filling data and clicking "save" works. Of course, for very important/dangerous/complex algorithms in the code, unit tests are useful, but generally, that represents a very low fraction of a web application's code.
hn  commentary  techtariat  discussion  programming  engineering  methodology  best-practices  checklists  thinking  correctness  api  interface-compatibility  jargon  list  metabuch  objektbuch  workflow  documentation  debugging  span-cover  checking  metrics  abstraction  within-without  characterization  error  move-fast-(and-break-things)  minimum-viable  efficiency  multi  poast  pareto  coarse-fine 
8 weeks ago by nhaliday
The Future of Mathematics? [video] | Hacker News
https://news.ycombinator.com/item?id=20909404
Kevin Buzzard (the Lean guy)

- general reflection on proof asssistants/theorem provers
- Kevin Hale's formal abstracts project, etc
- thinks of available theorem provers, Lean is "[the only one currently available that may be capable of formalizing all of mathematics eventually]" (goes into more detail right at the end, eg, quotient types)
hn  commentary  discussion  video  talks  presentation  math  formal-methods  expert-experience  msr  frontier  state-of-art  proofs  rigor  education  higher-ed  optimism  prediction  lens  search  meta:research  speculation  exocortex  skunkworks  automation  research  math.NT  big-surf  software  parsimony  cost-benefit  intricacy  correctness  programming  pls  python  functional  haskell  heavyweights  research-program  review  reflection  multi  pdf  slides  oly  experiment  span-cover  git  vcs  teaching  impetus  academia  composition-decomposition  coupling-cohesion  database  trust  types  plt  lifts-projections  induction  critique  beauty  truth  elegance  aesthetics 
8 weeks ago by nhaliday
Linus's Law - Wikipedia
Linus's Law is a claim about software development, named in honor of Linus Torvalds and formulated by Eric S. Raymond in his essay and book The Cathedral and the Bazaar (1999).[1][2] The law states that "given enough eyeballs, all bugs are shallow";

--

In Facts and Fallacies about Software Engineering, Robert Glass refers to the law as a "mantra" of the open source movement, but calls it a fallacy due to the lack of supporting evidence and because research has indicated that the rate at which additional bugs are uncovered does not scale linearly with the number of reviewers; rather, there is a small maximum number of useful reviewers, between two and four, and additional reviewers above this number uncover bugs at a much lower rate.[4] While closed-source practitioners also promote stringent, independent code analysis during a software project's development, they focus on in-depth review by a few and not primarily the number of "eyeballs".[5][6]

Although detection of even deliberately inserted flaws[7][8] can be attributed to Raymond's claim, the persistence of the Heartbleed security bug in a critical piece of code for two years has been considered as a refutation of Raymond's dictum.[9][10][11][12] Larry Seltzer suspects that the availability of source code may cause some developers and researchers to perform less extensive tests than they would with closed source software, making it easier for bugs to remain.[12] In 2015, the Linux Foundation's executive director Jim Zemlin argued that the complexity of modern software has increased to such levels that specific resource allocation is desirable to improve its security. Regarding some of 2014's largest global open source software vulnerabilities, he says, "In these cases, the eyeballs weren't really looking".[11] Large scale experiments or peer-reviewed surveys to test how well the mantra holds in practice have not been performed.

Given enough eyeballs, all bugs are shallow? Revisiting Eric Raymond with bug bounty programs: https://academic.oup.com/cybersecurity/article/3/2/81/4524054

https://hbfs.wordpress.com/2009/03/31/how-many-eyeballs-to-make-a-bug-shallow/
wiki  reference  aphorism  ideas  stylized-facts  programming  engineering  linux  worse-is-better/the-right-thing  correctness  debugging  checking  best-practices  security  error  scale  ubiquity  collaboration  oss  realness  empirical  evidence-based  multi  study  info-econ  economics  intricacy  plots  manifolds  techtariat  cracker-prog  os  systems  magnitude  quantitative-qualitative  number  threat-modeling 
8 weeks ago by nhaliday
A Formal Verification of Rust's Binary Search Implementation
Part of the reason for this is that it’s quite complicated to apply mathematical tools to something unmathematical like a functionally unpure language (which, unfortunately, most programs tend to be written in). In mathematics, you don’t expect a variable to suddenly change its value, and it only gets more complicated when you have pointers to those dang things:

“Dealing with aliasing is one of the key challenges for the verification of imperative programs. For instance, aliases make it difficult to determine which abstractions are potentially affected by a heap update and to determine which locks need to be acquired to avoid data races.” 1

While there are whole logics focused on trying to tackle these problems, a master’s thesis wouldn’t be nearly enough time to model a formal Rust semantics on top of these, so I opted for a more straightforward solution: Simply make Rust a purely functional language!

Electrolysis: Simple Verification of Rust Programs via Functional Purification
If you know a bit about Rust, you may have noticed something about that quote in the previous section: There actually are no data races in (safe) Rust, precisely because there is no mutable aliasing. Either all references to some datum are immutable, or there is a single mutable reference. This means that mutability in Rust is much more localized than in most other imperative languages, and that it is sound to replace a destructive update like

p.x += 1
with a functional one – we know there’s no one else around observing p:

let p = Point { x = p.x + 1, ..p };
techtariat  plt  programming  formal-methods  rust  arrows  reduction  divide-and-conquer  correctness  project  state  functional  concurrency  direct-indirect  pls  examples  simplification-normalization  compilers 
august 2019 by nhaliday
testing - Is there a reason that tests aren't written inline with the code that they test? - Software Engineering Stack Exchange
The only advantage I can think of for inline tests would be reducing the number of files to be written. With modern IDEs this really isn't that big a deal.

There are, however, a number of obvious drawbacks to inline testing:
- It violates separation of concerns. This may be debatable, but to me testing functionality is a different responsibility than implementing it.
- You'd either have to introduce new language features to distinguish between tests/implementation, or you'd risk blurring the line between the two.
- Larger source files are harder to work with: harder to read, harder to understand, you're more likely to have to deal with source control conflicts.
- I think it would make it harder to put your "tester" hat on, so to speak. If you're looking at the implementation details, you'll be more tempted to skip implementing certain tests.
q-n-a  stackex  programming  engineering  best-practices  debate  correctness  checking  code-organizing  composition-decomposition  coupling-cohesion  psychology  cog-psych  attention  thinking  neurons  contiguity-proximity  grokkability  grokkability-clarity 
august 2019 by nhaliday
OCaml For the Masses | November 2011 | Communications of the ACM
Straight out of the box, OCaml is pretty good at catching bugs, but it can do even more if you design your types carefully. Consider as an example the following types for representing the state of a network connection as illustrated in Figure 4.

that one excellent example of using algebraic data types
techtariat  rhetoric  programming  pls  engineering  pragmatic  carmack  quotes  aphorism  functional  ocaml-sml  types  formal-methods  correctness  finance  tip-of-tongue  examples  characterization  invariance  networking 
july 2019 by nhaliday
The Existential Risk of Math Errors - Gwern.net
How big is this upper bound? Mathematicians have often made errors in proofs. But it’s rarer for ideas to be accepted for a long time and then rejected. But we can divide errors into 2 basic cases corresponding to type I and type II errors:

1. Mistakes where the theorem is still true, but the proof was incorrect (type I)
2. Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)

Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable5.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept - but of course the results were some of the greatest mathematical work ever conducted6 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.7

...

Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:

...

TYPE I > TYPE II?
“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”
- Gian-Carlo Rota13

Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.

...

Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”.

...

Gian-Carlo Rota gives us an example with Hilbert:

...

Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties.

...

Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1/329 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures30.

[more on these "structured proofs":
https://academia.stackexchange.com/questions/52435/does-anyone-actually-publish-structured-proofs
https://mathoverflow.net/questions/35727/community-experiences-writing-lamports-structured-proofs
]

We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages31.

[related:
https://mathoverflow.net/questions/11517/computer-algebra-errors
I don't know any interesting bugs in symbolic algebra packages but I know a true, enlightening and entertaining story about something that looked like a bug but wasn't.

Define sinc𝑥=(sin𝑥)/𝑥.

Someone found the following result in an algebra package: ∫∞0𝑑𝑥sinc𝑥=𝜋/2
They then found the following results:

...

So of course when they got:

∫∞0𝑑𝑥sinc𝑥sinc(𝑥/3)sinc(𝑥/5)⋯sinc(𝑥/15)=(467807924713440738696537864469/935615849440640907310521750000)𝜋

hmm:
Which means that nobody knows Fourier analysis nowdays. Very sad and discouraging story... – fedja Jan 29 '10 at 18:47

--

Because the most popular systems are all commercial, they tend to guard their bug database rather closely -- making them public would seriously cut their sales. For example, for the open source project Sage (which is quite young), you can get a list of all the known bugs from this page. 1582 known issues on Feb.16th 2010 (which includes feature requests, problems with documentation, etc).

That is an order of magnitude less than the commercial systems. And it's not because it is better, it is because it is younger and smaller. It might be better, but until SAGE does a lot of analysis (about 40% of CAS bugs are there) and a fancy user interface (another 40%), it is too hard to compare.

I once ran a graduate course whose core topic was studying the fundamental disconnect between the algebraic nature of CAS and the analytic nature of the what it is mostly used for. There are issues of logic -- CASes work more or less in an intensional logic, while most of analysis is stated in a purely extensional fashion. There is no well-defined 'denotational semantics' for expressions-as-functions, which strongly contributes to the deeper bugs in CASes.]

...

Should such widely-believed conjectures as P≠NP or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, a far larger math holocaust would ensue38 - and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining, if it doesn’t come at a time of danger.

https://mathoverflow.net/questions/338607/why-doesnt-mathematics-collapse-down-even-though-humans-quite-often-make-mista

more on formal methods in programming:
https://www.quantamagazine.org/formal-verification-creates-hacker-proof-code-20160920/
https://intelligence.org/2014/03/02/bob-constable/

https://softwareengineering.stackexchange.com/questions/375342/what-are-the-barriers-that-prevent-widespread-adoption-of-formal-methods
Update: measured effort
In the October 2018 issue of Communications of the ACM there is an interesting article about Formally verified software in the real world with some estimates of the effort.

Interestingly (based on OS development for military equipment), it seems that producing formally proved software requires 3.3 times more effort than with traditional engineering techniques. So it's really costly.

On the other hand, it requires 2.3 times less effort to get high security software this way than with traditionally engineered software if you add the effort to make such software certified at a high security level (EAL 7). So if you have high reliability or security requirements there is definitively a business case for going formal.

WHY DON'T PEOPLE USE FORMAL METHODS?: https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/
You can see examples of how all of these look at Let’s Prove Leftpad. HOL4 and Isabelle are good examples of “independent theorem” specs, SPARK and Dafny have “embedded assertion” specs, and Coq and Agda have “dependent type” specs.6

If you squint a bit it looks like these three forms of code spec map to the three main domains of automated correctness checking: tests, contracts, and types. This is not a coincidence. Correctness is a spectrum, and formal verification is one extreme of that spectrum. As we reduce the rigour (and effort) of our verification we get simpler and narrower checks, whether that means limiting the explored state space, using weaker types, or pushing verification to the runtime. Any means of total specification then becomes a means of partial specification, and vice versa: many consider Cleanroom a formal verification technique, which primarily works by pushing code review far beyond what’s humanly possible.

...

The question, then: “is 90/95/99% correct significantly cheaper than 100% correct?” The answer is very yes. We all are comfortable saying that a codebase we’ve well-tested and well-typed is mostly correct modulo a few fixes in prod, and we’re even writing more than four lines of code a day. In fact, the vast… [more]
ratty  gwern  analysis  essay  realness  truth  correctness  reason  philosophy  math  proofs  formal-methods  cs  programming  engineering  worse-is-better/the-right-thing  intuition  giants  old-anglo  error  street-fighting  heuristic  zooming  risk  threat-modeling  software  lens  logic  inference  physics  differential  geometry  estimate  distribution  robust  speculation  nonlinearity  cost-benefit  convexity-curvature  measure  scale  trivia  cocktail  history  early-modern  europe  math.CA  rigor  news  org:mag  org:sci  miri-cfar  pdf  thesis  comparison  examples  org:junk  q-n-a  stackex  pragmatic  tradeoffs  cracker-prog  techtariat  invariance  DSL  chart  ecosystem  grokkability  heavyweights  CAS  static-dynamic  lower-bounds  complexity  tcs  open-problems  big-surf  ideas  certificates-recognition  proof-systems  PCP  mediterranean  SDP  meta:prediction  epistemic  questions  guessing  distributed  overflow  nibble  soft-question  track-record  big-list  hmm  frontier  state-of-art  move-fast-(and-break-things)  grokkability-clarity  technical-writing  trust 
july 2019 by nhaliday
Cleaner, more elegant, and harder to recognize | The Old New Thing
Really easy
Writing bad error-code-based code
Writing bad exception-based code

Hard
Writing good error-code-based code

Really hard
Writing good exception-based code

--

Really easy
Recognizing that error-code-based code is badly-written
Recognizing the difference between bad error-code-based code and
not-bad error-code-based code.

Hard
Recognizing that error-code-base code is not badly-written

Really hard
Recognizing that exception-based code is badly-written
Recognizing that exception-based code is not badly-written
Recognizing the difference between bad exception-based code
and not-bad exception-based code

https://ra3s.com/wordpress/dysfunctional-programming/2009/07/15/return-code-vs-exception-handling/
https://nedbatchelder.com/blog/200501/more_exception_handling_debate.html
techtariat  org:com  microsoft  working-stiff  pragmatic  carmack  error  error-handling  programming  rhetoric  debate  critique  pls  search  structure  cost-benefit  comparison  summary  intricacy  certificates-recognition  commentary  multi  contrarianism  correctness  quality  code-dive  cracker-prog 
july 2019 by nhaliday
C++ Core Guidelines
This document is a set of guidelines for using C++ well. The aim of this document is to help people to use modern C++ effectively. By “modern C++” we mean effective use of the ISO C++ standard (currently C++17, but almost all of our recommendations also apply to C++14 and C++11). In other words, what would you like your code to look like in 5 years’ time, given that you can start now? In 10 years’ time?

https://isocpp.github.io/CppCoreGuidelines/
“Within C++ is a smaller, simpler, safer language struggling to get out.” – Bjarne Stroustrup

...

The guidelines are focused on relatively higher-level issues, such as interfaces, resource management, memory management, and concurrency. Such rules affect application architecture and library design. Following the rules will lead to code that is statically type safe, has no resource leaks, and catches many more programming logic errors than is common in code today. And it will run fast - you can afford to do things right.

We are less concerned with low-level issues, such as naming conventions and indentation style. However, no topic that can help a programmer is out of bounds.

Our initial set of rules emphasize safety (of various forms) and simplicity. They may very well be too strict. We expect to have to introduce more exceptions to better accommodate real-world needs. We also need more rules.

...

The rules are designed to be supported by an analysis tool. Violations of rules will be flagged with references (or links) to the relevant rule. We do not expect you to memorize all the rules before trying to write code.

contrary:
https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
This will be a long wall of text, and kinda random! My main points are:
1. C++ compile times are important,
2. Non-optimized build performance is important,
3. Cognitive load is important. I don’t expand much on this here, but if a programming language or a library makes me feel stupid, then I’m less likely to use it or like it. C++ does that a lot :)
programming  engineering  pls  best-practices  systems  c(pp)  guide  metabuch  objektbuch  reference  cheatsheet  elegance  frontier  libraries  intricacy  advanced  advice  recommendations  big-picture  novelty  lens  philosophy  state  error  types  concurrency  memory-management  performance  abstraction  plt  compilers  expert-experience  multi  checking  devtools  flux-stasis  safety  system-design  techtariat  time  measure  dotnet  comparison  examples  build-packaging  thinking  worse-is-better/the-right-thing  cost-benefit  tradeoffs  essay  commentary  oop  correctness  computer-memory  error-handling  resources-effects  latency-throughput 
june 2019 by nhaliday
Hardware is unforgiving
Today, anyone with a CS 101 background can take Geoffrey Hinton's course on neural networks and deep learning, and start applying state of the art machine learning techniques in production within a couple months. In software land, you can fix minor bugs in real time. If it takes a whole day to run your regression test suite, you consider yourself lucky because it means you're in one of the few environments that takes testing seriously. If the architecture is fundamentally flawed, you pull out your copy of Feathers' “Working Effectively with Legacy Code” and you apply minor fixes until you're done.

This isn't to say that software isn't hard, it's just a different kind of hard: the sort of hard that can be attacked with genius and perseverance, even without experience. But, if you want to build a ship, and you "only" have a decade of experience with carpentry, milling, metalworking, etc., well, good luck. You're going to need it. With a large ship, “minor” fixes can take days or weeks, and a fundamental flaw means that your ship sinks and you've lost half a year of work and tens of millions of dollars. By the time you get to something with the complexity of a modern high-performance microprocessor, a minor bug discovered in production costs three months and five million dollars. A fundamental flaw in the architecture will cost you five years and hundreds of millions of dollars2.

Physical mistakes are costly. There's no undo and editing isn't simply a matter of pressing some keys; changes consume real, physical resources. You need enough wisdom and experience to avoid common mistakes entirely – especially the ones that can't be fixed.
techtariat  comparison  software  hardware  programming  engineering  nitty-gritty  realness  roots  explanans  startups  tech  sv  the-world-is-just-atoms  examples  stories  economics  heavy-industry  hard-tech  cs  IEEE  oceans  trade  korea  asia  recruiting  britain  anglo  expert-experience  growth-econ  world  developing-world  books  recommendations  intricacy  dan-luu  age-generation  system-design  correctness  metal-to-virtual  psycho-atoms  move-fast-(and-break-things)  kumbaya-kult 
june 2019 by nhaliday
Frama-C
Frama-C is organized with a plug-in architecture (comparable to that of the Gimp or Eclipse). A common kernel centralizes information and conducts the analysis. Plug-ins interact with each other through interfaces defined by the kernel. This makes for robustness in the development of Frama-C while allowing a wide functionality spectrum.

...

Three heavyweight plug-ins that are used by the other plug-ins:

- Eva (Evolved Value analysis)
This plug-in computes variation domains for variables. It is quite automatic, although the user may guide the analysis in places. It handles a wide spectrum of C constructs. This plug-in uses abstract interpretation techniques.
- Jessie and Wp, two deductive verification plug-ins
These plug-ins are based on weakest precondition computation techniques. They allow to prove that C functions satisfy their specification as expressed in ACSL. These proofs are modular: the specifications of the called functions are used to establish the proof without looking at their code.

For browsing unfamiliar code:
- Impact analysis
This plug-in highlights the locations in the source code that are impacted by a modification.
- Scope & Data-flow browsing
This plug-in allows the user to navigate the dataflow of the program, from definition to use or from use to definition.
- Variable occurrence browsing
Also provided as a simple example for new plug-in development, this plug-in allows the user to reach the statements where a given variable is used.
- Metrics calculation
This plug-in allows the user to compute various metrics from the source code.

For code transformation:
- Semantic constant folding
This plug-in makes use of the results of the evolved value analysis plug-in to replace, in the source code, the constant expressions by their values. Because it relies on EVA, it is able to do more of these simplifications than a syntactic analysis would.
- Slicing
This plug-in slices the code according to a user-provided criterion: it creates a copy of the program, but keeps only those parts which are necessary with respect to the given criterion.
- Spare code: remove "spare code", code that does not contribute to the final results of the program.
- E-ACSL: translate annotations into C code for runtime assertion checking.
For verifying functional specifications:

- Aoraï: verify specifications expressed as LTL (Linear Temporal Logic) formulas
Other functionalities documented together with the EVA plug-in can be considered as verifying low-level functional specifications (inputs, outputs, dependencies,…)
For test-case generation:

- PathCrawler automatically finds test-case inputs to ensure coverage of a C function. It can be used for structural unit testing, as a complement to static analysis or to study the feasible execution paths of the function.
For concurrent programs:

- Mthread
This plug-in automatically analyzes concurrent C programs, using the EVA plug-in, taking into account all possible thread interactions. At the end of its execution, the concurrent behavior of each thread is over-approximated, resulting in precise information about shared variables, which mutex protects a part of the code, etc.
Front-end for other languages

- Frama-Clang
This plug-in provides a C++ front-end to Frama-C, based on the clang compiler. It transforms C++ code into a Frama-C AST, which can then be analyzed by the plug-ins above. Note however that it is very experimental and only supports a subset of C++11
tools  devtools  formal-methods  programming  software  c(pp)  systems  memory-management  ocaml-sml  debugging  checking  rigor  oss  code-dive  graphs  state  metrics  llvm  gallic  cool  worrydream  impact  flux-stasis  correctness  computer-memory  structure  static-dynamic 
may 2019 by nhaliday
One week of bugs
If I had to guess, I'd say I probably work around hundreds of bugs in an average week, and thousands in a bad week. It's not unusual for me to run into a hundred new bugs in a single week. But I often get skepticism when I mention that I run into multiple new (to me) bugs per day, and that this is inevitable if we don't change how we write tests. Well, here's a log of one week of bugs, limited to bugs that were new to me that week. After a brief description of the bugs, I'll talk about what we can do to improve the situation. The obvious answer to spend more effort on testing, but everyone already knows we should do that and no one does it. That doesn't mean it's hopeless, though.

...

Here's where I'm supposed to write an appeal to take testing more seriously and put real effort into it. But we all know that's not going to work. It would take 90k LOC of tests to get Julia to be as well tested as a poorly tested prototype (falsely assuming linear complexity in size). That's two person-years of work, not even including time to debug and fix bugs (which probably brings it closer to four of five years). Who's going to do that? No one. Writing tests is like writing documentation. Everyone already knows you should do it. Telling people they should do it adds zero information1.

Given that people aren't going to put any effort into testing, what's the best way to do it?

Property-based testing. Generative testing. Random testing. Concolic Testing (which was done long before the term was coined). Static analysis. Fuzzing. Statistical bug finding. There are lots of options. Some of them are actually the same thing because the terminology we use is inconsistent and buggy. I'm going to arbitrarily pick one to talk about, but they're all worth looking into.

...

There are a lot of great resources out there, but if you're just getting started, I found this description of types of fuzzers to be one of those most helpful (and simplest) things I've read.

John Regehr has a udacity course on software testing. I haven't worked through it yet (Pablo Torres just pointed to it), but given the quality of Dr. Regehr's writing, I expect the course to be good.

For more on my perspective on testing, there's this.

Everything's broken and nobody's upset: https://www.hanselman.com/blog/EverythingsBrokenAndNobodysUpset.aspx
https://news.ycombinator.com/item?id=4531549

https://hypothesis.works/articles/the-purpose-of-hypothesis/
From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

Software is everywhere. We have built a civilization on it, and it’s only getting more prevalent as more services move online and embedded and “internet of things” devices become cheaper and more common.

Software is also terrible. It’s buggy, it’s insecure, and it’s rarely well thought out.

This combination is clearly a recipe for disaster.

The state of software testing is even worse. It’s uncontroversial at this point that you should be testing your code, but it’s a rare codebase whose authors could honestly claim that they feel its testing is sufficient.

Much of the problem here is that it’s too hard to write good tests. Tests take up a vast quantity of development time, but they mostly just laboriously encode exactly the same assumptions and fallacies that the authors had when they wrote the code, so they miss exactly the same bugs that you missed when they wrote the code.

Preventing the Collapse of Civilization [video]: https://news.ycombinator.com/item?id=19945452
- Jonathan Blow

NB: DevGAMM is a game industry conference

- loss of technological knowledge (Antikythera mechanism, aqueducts, etc.)
- hardware driving most gains, not software
- software's actually less robust, often poorly designed and overengineered these days
- *list of bugs he's encountered recently*:
https://youtu.be/pW-SOdj4Kkk?t=1387
- knowledge of trivia becomes more than general, deep knowledge
- does at least acknowledge value of DRY, reusing code, abstraction saving dev time
techtariat  dan-luu  tech  software  error  list  debugging  linux  github  robust  checking  oss  troll  lol  aphorism  webapp  email  google  facebook  games  julia  pls  compilers  communication  mooc  browser  rust  programming  engineering  random  jargon  formal-methods  expert-experience  prof  c(pp)  course  correctness  hn  commentary  video  presentation  carmack  pragmatic  contrarianism  pessimism  sv  unix  rhetoric  critique  worrydream  hardware  performance  trends  multiplicative  roots  impact  comparison  history  iron-age  the-classics  mediterranean  conquest-empire  gibbon  technology  the-world-is-just-atoms  flux-stasis  increase-decrease  graphics  hmm  idk  systems  os  abstraction  intricacy  worse-is-better/the-right-thing  build-packaging  microsoft  osx  apple  reflection  assembly  things  knowledge  detail-architecture  thick-thin  trivia  info-dynamics  caching  frameworks  generalization  systematic-ad-hoc  universalism-particularism  analytical-holistic  structure  tainter  libraries  tradeoffs  prepping  threat-modeling  network-structure  writing  risk  local-glob 
may 2019 by nhaliday
Why is Software Engineering so difficult? - James Miller
basic message: No silver bullet!

most interesting nuggets:
Scale and Complexity
- Windows 7 > 50 million LOC
Expect a staggering number of bugs.

Bugs?
- Well-written C and C++ code contains some 5 to 10 errors per 100 LOC after a clean compile, but before inspection and testing.
- At a 5% rate any 50 MLOC program will start off with some 2.5 million bugs.

Bug removal
- Testing typically exercises only half the code.

Better bug removal?
- There are better ways to do testing that do produce fantastic programs.”
- Are we sure about this fact?
* No, its only an opinion!
* In general Software Engineering has ....
NO FACTS!

So why not do this?
- The costs are unbelievable.
- It’s not unusual for the qualification process to produce a half page of documentation for each line of code.
pdf  slides  engineering  nitty-gritty  programming  best-practices  roots  comparison  cost-benefit  software  systematic-ad-hoc  structure  error  frontier  debugging  checking  formal-methods  context  detail-architecture  intricacy  big-picture  system-design  correctness  scale  scaling-tech  shipping  money  data  stylized-facts  street-fighting  objektbuch  pro-rata  estimate  pessimism  degrees-of-freedom  volo-avolo  no-go  things  thinking  summary  quality  density  methodology 
may 2019 by nhaliday
AFL + QuickCheck = ?
Adventures in fuzzing. Also differences between testing culture in software and hardware.
techtariat  dan-luu  programming  engineering  checking  random  haskell  path-dependence  span-cover  heuristic  libraries  links  tools  devtools  software  hardware  culture  formal-methods  local-global  golang  correctness  methodology 
may 2019 by nhaliday
Teach debugging
A friend of mine and I couldn't understand why some people were having so much trouble; the material seemed like common sense. The Feynman Method was the only tool we needed.

1. Write down the problem
2. Think real hard
3. Write down the solution

The Feynman Method failed us on the last project: the design of a divider, a real-world-scale project an order of magnitude more complex than anything we'd been asked to tackle before. On the day he assigned the project, the professor exhorted us to begin early. Over the next few weeks, we heard rumors that some of our classmates worked day and night without making progress.

...

And then, just after midnight, a number of our newfound buddies from dinner reported successes. Half of those who started from scratch had working designs. Others were despondent, because their design was still broken in some subtle, non-obvious way. As I talked with one of those students, I began poring over his design. And after a few minutes, I realized that the Feynman method wasn't the only way forward: it should be possible to systematically apply a mechanical technique repeatedly to find the source of our problems. Beneath all the abstractions, our projects consisted purely of NAND gates (woe to those who dug around our toolbox enough to uncover dynamic logic), which outputs a 0 only when both inputs are 1. If the correct output is 0, both inputs should be 1. The input that isn't is in error, an error that is, itself, the output of a NAND gate where at least one input is 0 when it should be 1. We applied this method recursively, finding the source of all the problems in both our designs in under half an hour.

How To Debug Any Program: https://www.blinddata.com/blog/how-to-debug-any-program-9
May 8th 2019 by Saketh Are

Start by Questioning Everything

...

When a program is behaving unexpectedly, our attention tends to be drawn first to the most complex portions of the code. However, mistakes can come in all forms. I've personally been guilty of rushing to debug sophisticated portions of my code when the real bug was that I forgot to read in the input file. In the following section, we'll discuss how to reliably focus our attention on the portions of the program that need correction.

Then Question as Little as Possible

Suppose that we have a program and some input on which its behavior doesn’t match our expectations. The goal of debugging is to narrow our focus to as small a section of the program as possible. Once our area of interest is small enough, the value of the incorrect output that is being produced will typically tell us exactly what the bug is.

In order to catch the point at which our program diverges from expected behavior, we must inspect the intermediate state of the program. Suppose that we select some point during execution of the program and print out all values in memory. We can inspect the results manually and decide whether they match our expectations. If they don't, we know for a fact that we can focus on the first half of the program. It either contains a bug, or our expectations of what it should produce were misguided. If the intermediate state does match our expectations, we can focus on the second half of the program. It either contains a bug, or our understanding of what input it expects was incorrect.

Question Things Efficiently

For practical purposes, inspecting intermediate state usually doesn't involve a complete memory dump. We'll typically print a small number of variables and check whether they have the properties we expect of them. Verifying the behavior of a section of code involves:

1. Before it runs, inspecting all values in memory that may influence its behavior.
2. Reasoning about the expected behavior of the code.
3. After it runs, inspecting all values in memory that may be modified by the code.

Reasoning about expected behavior is typically the easiest step to perform even in the case of highly complex programs. Practically speaking, it's time-consuming and mentally strenuous to write debug output into your program and to read and decipher the resulting values. It is therefore advantageous to structure your code into functions and sections that pass a relatively small amount of information between themselves, minimizing the number of values you need to inspect.

...

Finding the Right Question to Ask

We’ve assumed so far that we have available a test case on which our program behaves unexpectedly. Sometimes, getting to that point can be half the battle. There are a few different approaches to finding a test case on which our program fails. It is reasonable to attempt them in the following order:

1. Verify correctness on the sample inputs.
2. Test additional small cases generated by hand.
3. Adversarially construct corner cases by hand.
4. Re-read the problem to verify understanding of input constraints.
5. Design large cases by hand and write a program to construct them.
6. Write a generator to construct large random cases and a brute force oracle to verify outputs.
techtariat  dan-luu  engineering  programming  debugging  IEEE  reflection  stories  education  higher-ed  checklists  iteration-recursion  divide-and-conquer  thinking  ground-up  nitty-gritty  giants  feynman  error  input-output  structure  composition-decomposition  abstraction  systematic-ad-hoc  reduction  teaching  state  correctness  multi  oly  oly-programming  metabuch  neurons  problem-solving  wire-guided  marginal  strategy  tactics  methodology  simplification-normalization 
may 2019 by nhaliday
Delta debugging - Wikipedia
good overview of with examples: https://www.csm.ornl.gov/~sheldon/bucket/Automated-Debugging.pdf

Not as useful for my usecases (mostly contest programming) as QuickCheck. Input is generally pretty structured and I don't have a long history of code in VCS. And when I do have the latter git-bisect is probably enough.

good book tho: http://www.whyprogramsfail.com/toc.php
WHY PROGRAMS FAIL: A Guide to Systematic Debugging\
wiki  reference  programming  systems  debugging  c(pp)  python  tools  devtools  links  hmm  formal-methods  divide-and-conquer  vcs  git  search  yak-shaving  pdf  white-paper  multi  examples  stories  books  unit  caltech  recommendations  advanced  correctness 
may 2019 by nhaliday
quality - Is the average number of bugs per loc the same for different programming languages? - Software Engineering Stack Exchange
Contrary to intuition, the number of errors per 1000 lines of does seem to be relatively constant, reguardless of the specific language involved. Steve McConnell, author of Code Complete and Software Estimation: Demystifying the Black Art goes over this area in some detail.

I don't have my copies readily to hand - they're sitting on my bookshelf at work - but a quick Google found a relevant quote:

Industry Average: "about 15 - 50 errors per 1000 lines of delivered code."
(Steve) further says this is usually representative of code that has some level of structured programming behind it, but probably includes a mix of coding techniques.

Quoted from Code Complete, found here: http://mayerdan.com/ruby/2012/11/11/bugs-per-line-of-code-ratio/

If memory serves correctly, Steve goes into a thorough discussion of this, showing that the figures are constant across languages (C, C++, Java, Assembly and so on) and despite difficulties (such as defining what "line of code" means).

Most importantly he has lots of citations for his sources - he's not offering unsubstantiated opinions, but has the references to back them up.

[ed.: I think this is delivered code? So after testing, debugging, etc. I'm more interested in the metric for the moment after you've gotten something to compile.

edit: cf https://pinboard.in/u:nhaliday/b:0a6eb68166e6]
q-n-a  stackex  programming  engineering  nitty-gritty  error  flux-stasis  books  recommendations  software  checking  debugging  pro-rata  pls  comparison  parsimony  measure  data  objektbuch  speculation  accuracy  density  correctness  estimate  street-fighting  multi  quality  stylized-facts  methodology 
april 2019 by nhaliday
Book review: "Working Effectively with Legacy Code" by Michael C. Feathers - Eli Bendersky's website
The basic premise of the book is simple, and can be summarized as follows:

To improve some piece of code, we must be able to refactor it.
To be able to refactor code, we must have tests that prove our refactoring didn't break anything.
To have reasonable tests, the code has to be testable; that is, it should be in a form amenable to test harnessing. This most often means breaking implicit dependencies.
... and the author spends about 400 pages on how to achieve that. This book is dense, and it took me a long time to plow through it. I started reading linerarly, but very soon discovered this approach doesn't work. So I began hopping forward and backward between the main text and the "dependency-breaking techniques" chapter which holds isolated recipes for dealing with specific kinds of dependencies. There's quite a bit of repetition in the book, which makes it even more tedious to read.

The techniques described by the author are as terrible as the code they're up against. Horrible abuses of the preprocessor in C/C++, abuses of inheritance in C++ and Java, and so on. Particularly the latter is quite sobering. If you love OOP beware - this book may leave you disenchanted, if not full of hate.

To reiterate the conclusion I already presented earlier - get this book if you have to work with old balls of mud; it will be effort well spent. Otherwise, if you're working on one of those new-age continuously integrated codebases with a 2/1 test to code ratio, feel free to skip it.
techtariat  books  review  summary  critique  engineering  programming  intricacy  code-dive  best-practices  checklists  checking  working-stiff  retrofit  oop  code-organizing  legacy  correctness  coupling-cohesion  composition-decomposition  tricks  metabuch  nitty-gritty  move-fast-(and-break-things)  methodology 
july 2017 by nhaliday
In Computers We Trust? | Quanta Magazine
As math grows ever more complex, will computers reign?

Shalosh B. Ekhad is a computer. Or, rather, it is any of a rotating cast of computers used by the mathematician Doron Zeilberger, from the Dell in his New Jersey office to a supercomputer whose services he occasionally enlists in Austria. The name — Hebrew for “three B one” — refers to the AT&T 3B1, Ekhad’s earliest incarnation.

“The soul is the software,” said Zeilberger, who writes his own code using a popular math programming tool called Maple.
news  org:mag  org:sci  popsci  math  culture  academia  automation  formal-methods  ai  debate  interdisciplinary  rigor  proofs  nibble  org:inst  calculation  bare-hands  heavyweights  contrarianism  computation  correctness  oss  replication  logic  frontier  state-of-art  technical-writing  trust 
january 2017 by nhaliday
Memory Leaks are Memory Safe | Huon on the internet
The best programs lie in the 👍 cells only: they manipulate valid things, and don’t manipulate invalid ones. Passable programs might also have some valid data that they don’t use (leak memory), but bad ones will try to use invalid data.

When a language, such as Rust, advertises itself as memory safe, it isn’t saying anything about whether memory leaks are impossible.

...

However, this was not correct in practice, as things like reference cycles and thread deadlock could cause memory to leak. Rust decided to make forget safe, focusing its guarantees on just preventing memory unsafety and instead making only best-effort attempts towards preventing memory leaks (like essentially all other languages, memory safe and otherwise).
systems  rust  programming  memory-management  explanation  compilers  pls  techtariat  computer-memory  safety  correctness  comparison 
april 2016 by nhaliday
ImperialViolet - A shallow survey of formal methods for C code
Conclusion
The conclusion is a bit disappointing really: Curve25519 has no side effects and performs no allocation, it's a pure function that should be highly amenable to verification and yet I've been unable to find anything that can get even 20 lines into it. Some of this might be my own stupidity, but I put a fair amount of work into trying to find something that worked.

There seems to be a lot of promise in the area and some pieces work well (SMT solvers are often quite impressive, the Frama-C framework appears to be solid, Isabelle is quite pleasant) but nothing I found worked well together, at least for verifying C code. That makes efforts like SeL4 and Ironsides even more impressive. However, if you're happy to work at a higher level I'm guessing that verifying functional programs is a lot easier going.
formal-methods  programming  functional  techtariat  c(pp)  review  critique  systems  survey  analysis  correctness  pessimism  rigor  static-dynamic  frontier  state-of-art 
september 2014 by nhaliday

related tags

ability-competence  abstraction  academia  accuracy  advanced  advice  aesthetics  age-generation  ai  algorithms  analogy  analysis  analytical-holistic  anglo  aphorism  api  app  apple  arrows  asia  assembly  attention  automation  backup  bare-hands  beauty  best-practices  bifl  big-list  big-picture  big-surf  blowhards  books  bounded-cognition  britain  browser  build-packaging  business  c(pp)  caching  calculation  caltech  career  carmack  CAS  causation  certificates-recognition  characterization  chart  cheatsheet  checking  checklists  civilization  cloud  coarse-fine  cocktail  cocoa  code-dive  code-organizing  cog-psych  collaboration  commentary  communication  community  comparison  compilers  complex-systems  complexity  composition-decomposition  computation  computer-memory  concept  conceptual-vocab  concurrency  conquest-empire  context  contiguity-proximity  contrarianism  convexity-curvature  cool  coordination  cornell  correctness  correlation  cost-benefit  counterexample  coupling-cohesion  course  cracker-prog  critique  cs  culture  cynicism-idealism  dan-luu  data  data-structures  database  dbs  debate  debugging  decision-making  degrees-of-freedom  density  design  desktop  detail-architecture  developing-world  devops  devtools  differential  diogenes  direct-indirect  discussion  distributed  distribution  divide-and-conquer  documentation  dotnet  draft  DSL  duplication  early-modern  econometrics  economics  ecosystem  editors  education  efficiency  elegance  email  empirical  endogenous-exogenous  ends-means  engineering  epistemic  erlang  error  error-handling  essay  estimate  europe  evidence-based  examples  exocortex  expectancy  experiment  expert-experience  explanans  explanation  facebook  faq  feynman  finance  flexibility  flux-stasis  formal-methods  frameworks  frontier  functional  gallic  games  generalization  geometry  giants  gibbon  git  github  golang  google  gotchas  graphics  graphs  greedy  grokkability  grokkability-clarity  ground-up  growth-econ  guessing  guide  gwern  hard-tech  hardware  haskell  hci  healthcare  heavy-industry  heavyweights  heuristic  higher-ed  history  hmm  hn  homepage  homo-hetero  howto  huge-data-the-biggest  ide  ideas  idk  IEEE  impact  impetus  incentives  increase-decrease  induction  inference  info-dynamics  info-econ  input-output  institutions  integration-extension  interdisciplinary  interface-compatibility  internet  intricacy  intuition  invariance  investing  ios  iron-age  iteration-recursion  jargon  javascript  judgement  julia  jvm  knowledge  korea  kumbaya-kult  latency-throughput  latex  learning  legacy  len:short  lens  let-me-see  libraries  lifts-projections  links  linux  lisp  list  llvm  local-global  logic  lol  long-term  lower-bounds  machine-learning  magnitude  management  manifolds  marginal  math  math.CA  math.NT  mathtariat  measure  measurement  medicine  mediterranean  memory-management  mental-math  meta-analysis  meta:prediction  meta:research  metabuch  metal-to-virtual  methodology  metrics  microsoft  minimalism  minimum-viable  miri-cfar  mit  mobile  models  moments  money  mooc  move-fast-(and-break-things)  msr  multi  multiplicative  network-structure  networking  neurons  news  nibble  nitty-gritty  no-go  nonlinearity  notation  novelty  number  numerics  objektbuch  ocaml-sml  oceans  old-anglo  oly  oly-programming  oop  open-closed  open-problems  optimism  optimization  org:biz  org:com  org:inst  org:junk  org:lite  org:mag  org:med  org:sci  organizing  os  oss  osx  outcome-risk  overflow  papers  pareto  parsimony  path-dependence  PCP  pdf  performance  pessimism  philosophy  physics  plots  pls  plt  poast  popsci  pragmatic  prediction  prepping  presentation  prioritizing  privacy  pro-rata  problem-solving  productivity  prof  profile  programming  project  proof-systems  proofs  protocol-metadata  psycho-atoms  psychology  python  q-n-a  qra  quality  quantitative-qualitative  questions  quotes  r-lang  rand-approx  random  rant  ratty  realness  reason  recommendations  recruiting  reduction  reference  reflection  regularizer  replication  repo  research  research-program  resources-effects  retrofit  review  rhetoric  rigor  risk  robust  roots  rsc  rust  s:**  safety  scala  scale  scaling-tech  SDP  search  security  sentiment  shipping  signal-noise  simplification-normalization  skunkworks  slides  slippery-slope  soft-question  software  space  span-cover  speculation  stackex  stanford  startups  state  state-of-art  static-dynamic  stories  strategy  street-fighting  structure  study  stylized-facts  summary  survey  sv  synchrony  syntax  system-design  systematic-ad-hoc  systems  tactics  tainter  talks  tcs  teaching  tech  tech-infrastructure  technical-writing  technology  techtariat  terminal  the-classics  the-world-is-just-atoms  thesis  thick-thin  things  thinking  threat-modeling  tidbits  time  tip-of-tongue  tools  top-n  track-record  trade  tradeoffs  tradition  trends  tricki  tricks  trivia  troll  trust  truth  tutorial  types  ubiquity  uncertainty  unintended-consequences  unit  universalism-particularism  unix  ux  vcs  video  volo-avolo  water  web  webapp  white-paper  whole-partial-many  wiki  wire-guided  within-without  workflow  working-stiff  world  worrydream  worse-is-better/the-right-thing  writing  wtf  yak-shaving  yoga  zooming  🖥 

Copy this bookmark:



description:


tags: