nhaliday + heavyweights + chart   3

The Existential Risk of Math Errors - Gwern.net
How big is this upper bound? Mathematicians have often made errors in proofs. But it’s rarer for ideas to be accepted for a long time and then rejected. But we can divide errors into 2 basic cases corresponding to type I and type II errors:

1. Mistakes where the theorem is still true, but the proof was incorrect (type I)
2. Mistakes where the theorem was false, and the proof was also necessarily incorrect (type II)

Before someone comes up with a final answer, a mathematician may have many levels of intuition in formulating & working on the problem, but we’ll consider the final end-product where the mathematician feels satisfied that he has solved it. Case 1 is perhaps the most common case, with innumerable examples; this is sometimes due to mistakes in the proof that anyone would accept is a mistake, but many of these cases are due to changing standards of proof. For example, when David Hilbert discovered errors in Euclid’s proofs which no one noticed before, the theorems were still true, and the gaps more due to Hilbert being a modern mathematician thinking in terms of formal systems (which of course Euclid did not think in). (David Hilbert himself turns out to be a useful example of the other kind of error: his famous list of 23 problems was accompanied by definite opinions on the outcome of each problem and sometimes timings, several of which were wrong or questionable5.) Similarly, early calculus used ‘infinitesimals’ which were sometimes treated as being 0 and sometimes treated as an indefinitely small non-zero number; this was incoherent and strictly speaking, practically all of the calculus results were wrong because they relied on an incoherent concept - but of course the results were some of the greatest mathematical work ever conducted6 and when later mathematicians put calculus on a more rigorous footing, they immediately re-derived those results (sometimes with important qualifications), and doubtless as modern math evolves other fields have sometimes needed to go back and clean up the foundations and will in the future.7


Isaac Newton, incidentally, gave two proofs of the same solution to a problem in probability, one via enumeration and the other more abstract; the enumeration was correct, but the other proof totally wrong and this was not noticed for a long time, leading Stigler to remark:


“Lefschetz was a purely intuitive mathematician. It was said of him that he had never given a completely correct proof, but had never made a wrong guess either.”
- Gian-Carlo Rota13

Case 2 is disturbing, since it is a case in which we wind up with false beliefs and also false beliefs about our beliefs (we no longer know that we don’t know). Case 2 could lead to extinction.


Except, errors do not seem to be evenly & randomly distributed between case 1 and case 2. There seem to be far more case 1s than case 2s, as already mentioned in the early calculus example: far more than 50% of the early calculus results were correct when checked more rigorously. Richard Hamming attributes to Ralph Boas a comment that while editing Mathematical Reviews that “of the new results in the papers reviewed most are true but the corresponding proofs are perhaps half the time plain wrong”.


Gian-Carlo Rota gives us an example with Hilbert:


Olga labored for three years; it turned out that all mistakes could be corrected without any major changes in the statement of the theorems. There was one exception, a paper Hilbert wrote in his old age, which could not be fixed; it was a purported proof of the continuum hypothesis, you will find it in a volume of the Mathematische Annalen of the early thirties.


Leslie Lamport advocates for machine-checked proofs and a more rigorous style of proofs similar to natural deduction, noting a mathematician acquaintance guesses at a broad error rate of 1/329 and that he routinely found mistakes in his own proofs and, worse, believed false conjectures30.

[more on these "structured proofs":

We can probably add software to that list: early software engineering work found that, dismayingly, bug rates seem to be simply a function of lines of code, and one would expect diseconomies of scale. So one would expect that in going from the ~4,000 lines of code of the Microsoft DOS operating system kernel to the ~50,000,000 lines of code in Windows Server 2003 (with full systems of applications and libraries being even larger: the comprehensive Debian repository in 2007 contained ~323,551,126 lines of code) that the number of active bugs at any time would be… fairly large. Mathematical software is hopefully better, but practitioners still run into issues (eg Durán et al 2014, Fonseca et al 2017) and I don’t know of any research pinning down how buggy key mathematical systems like Mathematica are or how much published mathematics may be erroneous due to bugs. This general problem led to predictions of doom and spurred much research into automated proof-checking, static analysis, and functional languages31.

I don't know any interesting bugs in symbolic algebra packages but I know a true, enlightening and entertaining story about something that looked like a bug but wasn't.

Define sinc𝑥=(sin𝑥)/𝑥.

Someone found the following result in an algebra package: ∫∞0𝑑𝑥sinc𝑥=𝜋/2
They then found the following results:


So of course when they got:


Which means that nobody knows Fourier analysis nowdays. Very sad and discouraging story... – fedja Jan 29 '10 at 18:47


Because the most popular systems are all commercial, they tend to guard their bug database rather closely -- making them public would seriously cut their sales. For example, for the open source project Sage (which is quite young), you can get a list of all the known bugs from this page. 1582 known issues on Feb.16th 2010 (which includes feature requests, problems with documentation, etc).

That is an order of magnitude less than the commercial systems. And it's not because it is better, it is because it is younger and smaller. It might be better, but until SAGE does a lot of analysis (about 40% of CAS bugs are there) and a fancy user interface (another 40%), it is too hard to compare.

I once ran a graduate course whose core topic was studying the fundamental disconnect between the algebraic nature of CAS and the analytic nature of the what it is mostly used for. There are issues of logic -- CASes work more or less in an intensional logic, while most of analysis is stated in a purely extensional fashion. There is no well-defined 'denotational semantics' for expressions-as-functions, which strongly contributes to the deeper bugs in CASes.]


Should such widely-believed conjectures as P≠NP or the Riemann hypothesis turn out be false, then because they are assumed by so many existing proofs, a far larger math holocaust would ensue38 - and our previous estimates of error rates will turn out to have been substantial underestimates. But it may be a cloud with a silver lining, if it doesn’t come at a time of danger.


more on formal methods in programming:

Update: measured effort
In the October 2018 issue of Communications of the ACM there is an interesting article about Formally verified software in the real world with some estimates of the effort.

Interestingly (based on OS development for military equipment), it seems that producing formally proved software requires 3.3 times more effort than with traditional engineering techniques. So it's really costly.

On the other hand, it requires 2.3 times less effort to get high security software this way than with traditionally engineered software if you add the effort to make such software certified at a high security level (EAL 7). So if you have high reliability or security requirements there is definitively a business case for going formal.

WHY DON'T PEOPLE USE FORMAL METHODS?: https://www.hillelwayne.com/post/why-dont-people-use-formal-methods/
You can see examples of how all of these look at Let’s Prove Leftpad. HOL4 and Isabelle are good examples of “independent theorem” specs, SPARK and Dafny have “embedded assertion” specs, and Coq and Agda have “dependent type” specs.6

If you squint a bit it looks like these three forms of code spec map to the three main domains of automated correctness checking: tests, contracts, and types. This is not a coincidence. Correctness is a spectrum, and formal verification is one extreme of that spectrum. As we reduce the rigour (and effort) of our verification we get simpler and narrower checks, whether that means limiting the explored state space, using weaker types, or pushing verification to the runtime. Any means of total specification then becomes a means of partial specification, and vice versa: many consider Cleanroom a formal verification technique, which primarily works by pushing code review far beyond what’s humanly possible.


The question, then: “is 90/95/99% correct significantly cheaper than 100% correct?” The answer is very yes. We all are comfortable saying that a codebase we’ve well-tested and well-typed is mostly correct modulo a few fixes in prod, and we’re even writing more than four lines of code a day. In fact, the vast… [more]
ratty  gwern  analysis  essay  realness  truth  correctness  reason  philosophy  math  proofs  formal-methods  cs  programming  engineering  worse-is-better/the-right-thing  intuition  giants  old-anglo  error  street-fighting  heuristic  zooming  risk  threat-modeling  software  lens  logic  inference  physics  differential  geometry  estimate  distribution  robust  speculation  nonlinearity  cost-benefit  convexity-curvature  measure  scale  trivia  cocktail  history  early-modern  europe  math.CA  rigor  news  org:mag  org:sci  miri-cfar  pdf  thesis  comparison  examples  org:junk  q-n-a  stackex  pragmatic  tradeoffs  cracker-prog  techtariat  invariance  DSL  chart  ecosystem  grokkability  heavyweights  CAS  static-dynamic  lower-bounds  complexity  tcs  open-problems  big-surf  ideas  certificates-recognition  proof-systems  PCP  mediterranean  SDP  meta:prediction  epistemic  questions  guessing  distributed  overflow  nibble  soft-question  track-record  big-list  hmm  frontier  state-of-art  move-fast-(and-break-things)  grokkability-clarity  technical-writing  trust 
july 2019 by nhaliday
One week of bugs
If I had to guess, I'd say I probably work around hundreds of bugs in an average week, and thousands in a bad week. It's not unusual for me to run into a hundred new bugs in a single week. But I often get skepticism when I mention that I run into multiple new (to me) bugs per day, and that this is inevitable if we don't change how we write tests. Well, here's a log of one week of bugs, limited to bugs that were new to me that week. After a brief description of the bugs, I'll talk about what we can do to improve the situation. The obvious answer to spend more effort on testing, but everyone already knows we should do that and no one does it. That doesn't mean it's hopeless, though.


Here's where I'm supposed to write an appeal to take testing more seriously and put real effort into it. But we all know that's not going to work. It would take 90k LOC of tests to get Julia to be as well tested as a poorly tested prototype (falsely assuming linear complexity in size). That's two person-years of work, not even including time to debug and fix bugs (which probably brings it closer to four of five years). Who's going to do that? No one. Writing tests is like writing documentation. Everyone already knows you should do it. Telling people they should do it adds zero information1.

Given that people aren't going to put any effort into testing, what's the best way to do it?

Property-based testing. Generative testing. Random testing. Concolic Testing (which was done long before the term was coined). Static analysis. Fuzzing. Statistical bug finding. There are lots of options. Some of them are actually the same thing because the terminology we use is inconsistent and buggy. I'm going to arbitrarily pick one to talk about, but they're all worth looking into.


There are a lot of great resources out there, but if you're just getting started, I found this description of types of fuzzers to be one of those most helpful (and simplest) things I've read.

John Regehr has a udacity course on software testing. I haven't worked through it yet (Pablo Torres just pointed to it), but given the quality of Dr. Regehr's writing, I expect the course to be good.

For more on my perspective on testing, there's this.

Everything's broken and nobody's upset: https://www.hanselman.com/blog/EverythingsBrokenAndNobodysUpset.aspx

From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

Software is everywhere. We have built a civilization on it, and it’s only getting more prevalent as more services move online and embedded and “internet of things” devices become cheaper and more common.

Software is also terrible. It’s buggy, it’s insecure, and it’s rarely well thought out.

This combination is clearly a recipe for disaster.

The state of software testing is even worse. It’s uncontroversial at this point that you should be testing your code, but it’s a rare codebase whose authors could honestly claim that they feel its testing is sufficient.

Much of the problem here is that it’s too hard to write good tests. Tests take up a vast quantity of development time, but they mostly just laboriously encode exactly the same assumptions and fallacies that the authors had when they wrote the code, so they miss exactly the same bugs that you missed when they wrote the code.

Preventing the Collapse of Civilization [video]: https://news.ycombinator.com/item?id=19945452
- Jonathan Blow

NB: DevGAMM is a game industry conference

- loss of technological knowledge (Antikythera mechanism, aqueducts, etc.)
- hardware driving most gains, not software
- software's actually less robust, often poorly designed and overengineered these days
- *list of bugs he's encountered recently*:
- knowledge of trivia becomes more than general, deep knowledge
- does at least acknowledge value of DRY, reusing code, abstraction saving dev time
techtariat  dan-luu  tech  software  error  list  debugging  linux  github  robust  checking  oss  troll  lol  aphorism  webapp  email  google  facebook  games  julia  pls  compilers  communication  mooc  browser  rust  programming  engineering  random  jargon  formal-methods  expert-experience  prof  c(pp)  course  correctness  hn  commentary  video  presentation  carmack  pragmatic  contrarianism  pessimism  sv  unix  rhetoric  critique  worrydream  hardware  performance  trends  multiplicative  roots  impact  comparison  history  iron-age  the-classics  mediterranean  conquest-empire  gibbon  technology  the-world-is-just-atoms  flux-stasis  increase-decrease  graphics  hmm  idk  systems  os  abstraction  intricacy  worse-is-better/the-right-thing  build-packaging  microsoft  osx  apple  reflection  assembly  things  knowledge  detail-architecture  thick-thin  trivia  info-dynamics  caching  frameworks  generalization  systematic-ad-hoc  universalism-particularism  analytical-holistic  structure  tainter  libraries  tradeoffs  prepping  threat-modeling  network-structure  writing  risk  local-glob 
may 2019 by nhaliday
soft question - How do you not forget old math? - MathOverflow
Terry Tao:
I find that blogging about material that I would otherwise forget eventually is extremely valuable in this regard. (I end up consulting my own blog posts on a regular basis.) EDIT: and now I remember I already wrote on this topic: terrytao.wordpress.com/career-advice/write-down-what-youve-d‌​one

The only way to cope with this loss of memory I know is to do some reading on systematic basis. Of course, if you read one paper in algebraic geometry (or whatever else) a month (or even two months), you may not remember the exact content of all of them by the end of the year but, since all mathematicians in one field use pretty much the same tricks and draw from pretty much the same general knowledge, you'll keep the core things in your memory no matter what you read (provided it is not patented junk, of course) and this is about as much as you can hope for.

Relating abstract things to "real life stuff" (and vice versa) is automatic when you work as a mathematician. For me, the proof of the Chacon-Ornstein ergodic theorem is just a sandpile moving over a pit with the sand falling down after every shift. I often tell my students that every individual term in the sequence doesn't matter at all for the limit but somehow together they determine it like no individual human is of any real importance while together they keep this civilization running, etc. No special effort is needed here and, moreover, if the analogy is not natural but contrived, it'll not be helpful or memorable. The standard mnemonic techniques are pretty useless in math. IMHO (the famous "foil" rule for the multiplication of sums of two terms is inferior to the natural "pair each term in the first sum with each term in the second sum" and to the picture of a rectangle tiled with smaller rectangles, though, of course, the foil rule sounds way more sexy).

One thing that I don't think the other respondents have emphasized enough is that you should work on prioritizing what you choose to study and remember.

Timothy Chow:
As others have said, forgetting lots of stuff is inevitable. But there are ways you can mitigate the damage of this information loss. I find that a useful technique is to try to organize your knowledge hierarchically. Start by coming up with a big picture, and make sure you understand and remember that picture thoroughly. Then drill down to the next level of detail, and work on remembering that. For example, if I were trying to remember everything in a particular book, I might start by memorizing the table of contents, and then I'd work on remembering the theorem statements, and then finally the proofs. (Don't take this illustration too literally; it's better to come up with your own conceptual hierarchy than to slavishly follow the formal hierarchy of a published text. But I do think that a hierarchical approach is valuable.)

Organizing your knowledge like this helps you prioritize. You can then consciously decide that certain large swaths of knowledge are not worth your time at the moment, and just keep a "stub" in memory to remind you that that body of knowledge exists, should you ever need to dive into it. In areas of higher priority, you can plunge more deeply. By making sure you thoroughly internalize the top levels of the hierarchy, you reduce the risk of losing sight of entire areas of important knowledge. Generally it's less catastrophic to forget the details than to forget about a whole region of the big picture, because you can often revisit the details as long as you know what details you need to dig up. (This is fortunate since the details are the most memory-intensive.)

Having a hierarchy also helps you accrue new knowledge. Often when you encounter something new, you can relate it to something you already know, and file it in the same branch of your mental tree.
thinking  math  growth  advice  expert  q-n-a  🎓  long-term  tradeoffs  scholar  overflow  soft-question  gowers  mathtariat  ground-up  hi-order-bits  intuition  synthesis  visual-understanding  decision-making  scholar-pack  cartoons  lens  big-picture  ergodic  nibble  zooming  trees  fedja  reflection  retention  meta:research  wisdom  skeleton  practice  prioritizing  concrete  s:***  info-dynamics  knowledge  studying  the-trenches  chart  expert-experience  quixotic  elegance  heavyweights 
june 2016 by nhaliday

bundles : abstractmetametapredictionthinkingvague

related tags

abstraction  advice  analysis  analytical-holistic  aphorism  apple  assembly  big-list  big-picture  big-surf  browser  build-packaging  c(pp)  caching  carmack  cartoons  CAS  certificates-recognition  chart  checking  civilization  cocktail  commentary  communication  comparison  compilers  complex-systems  complexity  composition-decomposition  concrete  conquest-empire  contrarianism  convexity-curvature  correctness  cost-benefit  coupling-cohesion  course  cracker-prog  critique  cs  dan-luu  debugging  decision-making  desktop  detail-architecture  differential  diogenes  distributed  distribution  DSL  early-modern  ecosystem  elegance  email  engineering  epistemic  ergodic  error  essay  estimate  europe  examples  expert  expert-experience  facebook  fedja  flux-stasis  formal-methods  frameworks  frontier  games  generalization  geometry  giants  gibbon  github  google  gowers  graphics  grokkability  grokkability-clarity  ground-up  growth  guessing  gwern  hardware  heavyweights  heuristic  hi-order-bits  history  hmm  hn  ideas  idk  impact  increase-decrease  inference  info-dynamics  intricacy  intuition  invariance  iron-age  jargon  julia  knowledge  lens  libraries  linux  list  local-global  logic  lol  long-term  lower-bounds  math  math.CA  mathtariat  measure  mediterranean  meta:prediction  meta:research  metal-to-virtual  methodology  microsoft  miri-cfar  mooc  move-fast-(and-break-things)  multi  multiplicative  network-structure  news  nibble  nonlinearity  old-anglo  open-problems  org:junk  org:mag  org:sci  os  oss  osx  overflow  parsimony  PCP  pdf  performance  pessimism  philosophy  physics  pls  practice  pragmatic  prepping  presentation  prioritizing  prof  programming  proof-systems  proofs  q-n-a  questions  quixotic  random  ratty  realness  reason  reflection  retention  rhetoric  rigor  risk  robust  roots  rust  s:***  scale  scholar  scholar-pack  SDP  skeleton  soft-question  software  speculation  stackex  state-of-art  static-dynamic  street-fighting  structure  studying  sv  synthesis  system-design  systematic-ad-hoc  systems  tainter  tcs  tech  technical-writing  technology  techtariat  the-classics  the-trenches  the-world-is-just-atoms  thesis  thick-thin  things  thinking  threat-modeling  track-record  trade  tradeoffs  trees  trends  trivia  troll  trust  truth  universalism-particularism  unix  video  visual-understanding  webapp  wisdom  worrydream  worse-is-better/the-right-thing  writing  zooming  🎓 

Copy this bookmark: