nhaliday + debugging   79

Linus's Law - Wikipedia
Linus's Law is a claim about software development, named in honor of Linus Torvalds and formulated by Eric S. Raymond in his essay and book The Cathedral and the Bazaar (1999).[1][2] The law states that "given enough eyeballs, all bugs are shallow";


In Facts and Fallacies about Software Engineering, Robert Glass refers to the law as a "mantra" of the open source movement, but calls it a fallacy due to the lack of supporting evidence and because research has indicated that the rate at which additional bugs are uncovered does not scale linearly with the number of reviewers; rather, there is a small maximum number of useful reviewers, between two and four, and additional reviewers above this number uncover bugs at a much lower rate.[4] While closed-source practitioners also promote stringent, independent code analysis during a software project's development, they focus on in-depth review by a few and not primarily the number of "eyeballs".[5][6]

Although detection of even deliberately inserted flaws[7][8] can be attributed to Raymond's claim, the persistence of the Heartbleed security bug in a critical piece of code for two years has been considered as a refutation of Raymond's dictum.[9][10][11][12] Larry Seltzer suspects that the availability of source code may cause some developers and researchers to perform less extensive tests than they would with closed source software, making it easier for bugs to remain.[12] In 2015, the Linux Foundation's executive director Jim Zemlin argued that the complexity of modern software has increased to such levels that specific resource allocation is desirable to improve its security. Regarding some of 2014's largest global open source software vulnerabilities, he says, "In these cases, the eyeballs weren't really looking".[11] Large scale experiments or peer-reviewed surveys to test how well the mantra holds in practice have not been performed.

Given enough eyeballs, all bugs are shallow? Revisiting Eric Raymond with bug bounty programs: https://academic.oup.com/cybersecurity/article/3/2/81/4524054

wiki  reference  aphorism  ideas  stylized-facts  programming  engineering  linux  worse-is-better/the-right-thing  correctness  debugging  checking  best-practices  security  error  scale  ubiquity  collaboration  oss  realness  empirical  evidence-based  multi  study  info-econ  economics  intricacy  plots  manifolds  techtariat  cracker-prog  os  systems  magnitude  quantitative-qualitative 
7 days ago by nhaliday
How to come up with the solutions: techniques - Codeforces
Technique 1: "Total Recall"
Technique 2: "From Specific to General"
Let's say that you've found the solution for the problem (hurray!). Let's consider some particular case of a problem. Of course, you can apply the algorithm/solution to it. That's why, in order to solve a general problem, you need to solve all of its specific cases. Try solving some (or multiple) specific cases and then try and generalize them to the solution of the main problem.
Technique 3: "Bold Hypothesis"
Technique 4: "To solve a problem, you should think like a problem"
Technique 5: "Think together"
Technique 6: "Pick a Method"
Technique 7: "Print Out and Look"
Technique 8: "Google"
oly  oly-programming  problem-solving  thinking  expert-experience  retention  metabuch  visual-understanding  zooming  local-global  collaboration  tactics  debugging  bare-hands  let-me-see  advice 
9 weeks ago by nhaliday
history - Why are UNIX/POSIX system call namings so illegible? - Unix & Linux Stack Exchange
It's due to the technical constraints of the time. The POSIX standard was created in the 1980s and referred to UNIX, which was born in the 1970. Several C compilers at that time were limited to identifiers that were 6 or 8 characters long, so that settled the standard for the length of variable and function names.

We carried out a family of controlled experiments to investigate whether the use of abbreviated identifier names, with respect to full-word identifier names, affects fault fixing in C and Java source code. This family consists of an original (or baseline) controlled experiment and three replications. We involved 100 participants with different backgrounds and experiences in total. Overall results suggested that there is no difference in terms of effort, effectiveness, and efficiency to fix faults, when source code contains either only abbreviated or only full-word identifier names. We also conducted a qualitative study to understand the values, beliefs, and assumptions that inform and shape fault fixing when identifier names are either abbreviated or full-word. We involved in this qualitative study six professional developers with 1--3 years of work experience. A number of insights emerged from this qualitative study and can be considered a useful complement to the quantitative results from our family of experiments. One of the most interesting insights is that developers, when working on source code with abbreviated identifier names, adopt a more methodical approach to identify and fix faults by extending their focus point and only in a few cases do they expand abbreviated identifiers.
q-n-a  stackex  trivia  programming  os  systems  legacy  legibility  ux  libraries  unix  linux  hacker  cracker-prog  multi  evidence-based  empirical  expert-experience  engineering  study  best-practices  comparison  quality  debugging  efficiency  time  code-organizing  grokkability  grokkability-clarity 
july 2019 by nhaliday
Mutation testing - Wikipedia
Mutation testing involves modifying a program in small ways.[1] Each mutated version is called a mutant and tests detect and reject mutants by causing the behavior of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants.
wiki  reference  concept  mutation  selection  analogy  programming  checking  formal-methods  debugging  random  list  libraries  links  functional  haskell  javascript  jvm  c(pp)  python  dotnet  oop  perturbation  static-dynamic 
july 2019 by nhaliday
Interview with Donald Knuth | Interview with Donald Knuth | InformIT
Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.

Reusable vs. re-editable code: https://hal.archives-ouvertes.fr/hal-01966146/document
- Konrad Hinsen

I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.
nibble  interview  giants  expert-experience  programming  cs  software  contrarianism  carmack  oss  prediction  trends  linux  concurrency  desktop  comparison  checking  debugging  stories  engineering  hmm  idk  algorithms  books  debate  flux-stasis  duplication  parsimony  best-practices  writing  documentation  latex  intricacy  structure  hardware  caching  workflow  editors  composition-decomposition  coupling-cohesion  exposition  technical-writing  thinking  cracker-prog  code-organizing  grokkability  multi  techtariat  commentary  pdf  reflection  essay  examples  python  data-science  libraries  grokkability-clarity 
june 2019 by nhaliday
c++ - How to check if LLDB loaded debug symbols from shared libraries? - Stack Overflow
Now this question is also answered in official LDDB documentation in "Troubleshooting LLDB" section, please see "How do I check if I have debug symbols?": https://lldb.llvm.org/use/troubleshooting.html#how-do-i-check-if-i-have-debug-symbols It gives a slightly different approach, even though the approach from the accepted answer worked quite fine for me. – dying_sphynx Nov 3 '18 at 10:58

One fairly simple way to do it is:
(lldb) image lookup -vn <SomeFunctionNameThatShouldHaveDebugInfo>
q-n-a  stackex  programming  yak-shaving  gotchas  howto  debugging  build-packaging  llvm  multi  documentation 
may 2019 by nhaliday
Fossil: Home
VCS w/ builtin issue tracking and wiki used by SQLite
tools  devtools  software  vcs  wiki  debugging  integration-extension  oss  dbs 
may 2019 by nhaliday
Frama-C is organized with a plug-in architecture (comparable to that of the Gimp or Eclipse). A common kernel centralizes information and conducts the analysis. Plug-ins interact with each other through interfaces defined by the kernel. This makes for robustness in the development of Frama-C while allowing a wide functionality spectrum.


Three heavyweight plug-ins that are used by the other plug-ins:

- Eva (Evolved Value analysis)
This plug-in computes variation domains for variables. It is quite automatic, although the user may guide the analysis in places. It handles a wide spectrum of C constructs. This plug-in uses abstract interpretation techniques.
- Jessie and Wp, two deductive verification plug-ins
These plug-ins are based on weakest precondition computation techniques. They allow to prove that C functions satisfy their specification as expressed in ACSL. These proofs are modular: the specifications of the called functions are used to establish the proof without looking at their code.

For browsing unfamiliar code:
- Impact analysis
This plug-in highlights the locations in the source code that are impacted by a modification.
- Scope & Data-flow browsing
This plug-in allows the user to navigate the dataflow of the program, from definition to use or from use to definition.
- Variable occurrence browsing
Also provided as a simple example for new plug-in development, this plug-in allows the user to reach the statements where a given variable is used.
- Metrics calculation
This plug-in allows the user to compute various metrics from the source code.

For code transformation:
- Semantic constant folding
This plug-in makes use of the results of the evolved value analysis plug-in to replace, in the source code, the constant expressions by their values. Because it relies on EVA, it is able to do more of these simplifications than a syntactic analysis would.
- Slicing
This plug-in slices the code according to a user-provided criterion: it creates a copy of the program, but keeps only those parts which are necessary with respect to the given criterion.
- Spare code: remove "spare code", code that does not contribute to the final results of the program.
- E-ACSL: translate annotations into C code for runtime assertion checking.
For verifying functional specifications:

- Aoraï: verify specifications expressed as LTL (Linear Temporal Logic) formulas
Other functionalities documented together with the EVA plug-in can be considered as verifying low-level functional specifications (inputs, outputs, dependencies,…)
For test-case generation:

- PathCrawler automatically finds test-case inputs to ensure coverage of a C function. It can be used for structural unit testing, as a complement to static analysis or to study the feasible execution paths of the function.
For concurrent programs:

- Mthread
This plug-in automatically analyzes concurrent C programs, using the EVA plug-in, taking into account all possible thread interactions. At the end of its execution, the concurrent behavior of each thread is over-approximated, resulting in precise information about shared variables, which mutex protects a part of the code, etc.
Front-end for other languages

- Frama-Clang
This plug-in provides a C++ front-end to Frama-C, based on the clang compiler. It transforms C++ code into a Frama-C AST, which can then be analyzed by the plug-ins above. Note however that it is very experimental and only supports a subset of C++11
tools  devtools  formal-methods  programming  software  c(pp)  systems  memory-management  ocaml-sml  debugging  checking  rigor  oss  code-dive  graphs  state  metrics  llvm  gallic  cool  worrydream  impact  flux-stasis  correctness  computer-memory  structure  static-dynamic 
may 2019 by nhaliday
Should I go for TensorFlow or PyTorch?
Honestly, most experts that I know love Pytorch and detest TensorFlow. Karpathy and Justin from Stanford for example. You can see Karpthy's thoughts and I've asked Justin personally and the answer was sharp: PYTORCH!!! TF has lots of PR but its API and graph model are horrible and will waste lots of your research time.



Updated Mar 12
Update after 2019 TF summit:

TL/DR: previously I was in the pytorch camp but with TF 2.0 it’s clear that Google is really going to try to have parity or try to be better than Pytorch in all aspects where people voiced concerns (ease of use/debugging/dynamic graphs). They seem to be allocating more resources on development than Facebook so the longer term currently looks promising for Google. Prior to TF 2.0 I thought that Pytorch team had more momentum. One area where FB/Pytorch is still stronger is Google is a bit more closed and doesn’t seem to release reproducible cutting edge models such as AlphaGo whereas FAIR released OpenGo for instance. Generally you will end up running into models that are only implemented in one framework of the other so chances are you might end up learning both.
q-n-a  qra  comparison  software  recommendations  cost-benefit  tradeoffs  python  libraries  machine-learning  deep-learning  data-science  sci-comp  tools  google  facebook  tech  competition  best-practices  trends  debugging  expert-experience  ecosystem  theory-practice  pragmatic  wire-guided  static-dynamic  state  academia  frameworks  open-closed 
may 2019 by nhaliday
One week of bugs
If I had to guess, I'd say I probably work around hundreds of bugs in an average week, and thousands in a bad week. It's not unusual for me to run into a hundred new bugs in a single week. But I often get skepticism when I mention that I run into multiple new (to me) bugs per day, and that this is inevitable if we don't change how we write tests. Well, here's a log of one week of bugs, limited to bugs that were new to me that week. After a brief description of the bugs, I'll talk about what we can do to improve the situation. The obvious answer to spend more effort on testing, but everyone already knows we should do that and no one does it. That doesn't mean it's hopeless, though.


Here's where I'm supposed to write an appeal to take testing more seriously and put real effort into it. But we all know that's not going to work. It would take 90k LOC of tests to get Julia to be as well tested as a poorly tested prototype (falsely assuming linear complexity in size). That's two person-years of work, not even including time to debug and fix bugs (which probably brings it closer to four of five years). Who's going to do that? No one. Writing tests is like writing documentation. Everyone already knows you should do it. Telling people they should do it adds zero information1.

Given that people aren't going to put any effort into testing, what's the best way to do it?

Property-based testing. Generative testing. Random testing. Concolic Testing (which was done long before the term was coined). Static analysis. Fuzzing. Statistical bug finding. There are lots of options. Some of them are actually the same thing because the terminology we use is inconsistent and buggy. I'm going to arbitrarily pick one to talk about, but they're all worth looking into.


There are a lot of great resources out there, but if you're just getting started, I found this description of types of fuzzers to be one of those most helpful (and simplest) things I've read.

John Regehr has a udacity course on software testing. I haven't worked through it yet (Pablo Torres just pointed to it), but given the quality of Dr. Regehr's writing, I expect the course to be good.

For more on my perspective on testing, there's this.

Everything's broken and nobody's upset: https://www.hanselman.com/blog/EverythingsBrokenAndNobodysUpset.aspx

From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

Software is everywhere. We have built a civilization on it, and it’s only getting more prevalent as more services move online and embedded and “internet of things” devices become cheaper and more common.

Software is also terrible. It’s buggy, it’s insecure, and it’s rarely well thought out.

This combination is clearly a recipe for disaster.

The state of software testing is even worse. It’s uncontroversial at this point that you should be testing your code, but it’s a rare codebase whose authors could honestly claim that they feel its testing is sufficient.

Much of the problem here is that it’s too hard to write good tests. Tests take up a vast quantity of development time, but they mostly just laboriously encode exactly the same assumptions and fallacies that the authors had when they wrote the code, so they miss exactly the same bugs that you missed when they wrote the code.

Preventing the Collapse of Civilization [video]: https://news.ycombinator.com/item?id=19945452
- Jonathan Blow

NB: DevGAMM is a game industry conference

- loss of technological knowledge (Antikythera mechanism, aqueducts, etc.)
- hardware driving most gains, not software
- software's actually less robust, often poorly designed and overengineered these days
- *list of bugs he's encountered recently*:
- knowledge of trivia becomes more than general, deep knowledge
- does at least acknowledge value of DRY, reusing code, abstraction saving dev time
techtariat  dan-luu  tech  software  error  list  debugging  linux  github  robust  checking  oss  troll  lol  aphorism  webapp  email  google  facebook  games  julia  pls  compilers  communication  mooc  browser  rust  programming  engineering  random  jargon  formal-methods  expert-experience  prof  c(pp)  course  correctness  hn  commentary  video  presentation  carmack  pragmatic  contrarianism  pessimism  sv  unix  rhetoric  critique  worrydream  hardware  performance  trends  multiplicative  roots  impact  comparison  history  iron-age  the-classics  mediterranean  conquest-empire  gibbon  technology  the-world-is-just-atoms  flux-stasis  increase-decrease  graphics  hmm  idk  systems  os  abstraction  intricacy  worse-is-better/the-right-thing  build-packaging  microsoft  osx  apple  reflection  assembly  things  knowledge  detail-architecture  thick-thin  trivia  info-dynamics  caching  frameworks  generalization  systematic-ad-hoc  universalism-particularism  analytical-holistic  structure  tainter  libraries  tradeoffs  prepping  threat-modeling  network-structure  writing  risk  local-glob 
may 2019 by nhaliday
Dump include paths from g++ - Stack Overflow
g++ -E -x c++ - -v < /dev/null
clang++ -E -x c++ - -v < /dev/null
q-n-a  stackex  trivia  howto  programming  c(pp)  debugging 
may 2019 by nhaliday
c++ - Debugging template instantiations - Stack Overflow
Yes, there is a template metaprogramming debugger. Templight

Seems to be dead now, though :( [ed.: Partially true. They've merged pull requests recently tho.]
Metashell is still in active development though: github.com/metashell/metashell
q-n-a  stackex  nitty-gritty  pls  types  c(pp)  debugging  devtools  tools  programming  howto  advice  checklists  multi  repo  wire-guided  static-dynamic  compilers  performance  measurement  time  latency-throughput 
may 2019 by nhaliday
Why is Software Engineering so difficult? - James Miller
basic message: No silver bullet!

most interesting nuggets:
Scale and Complexity
- Windows 7 > 50 million LOC
Expect a staggering number of bugs.

- Well-written C and C++ code contains some 5 to 10 errors per 100 LOC after a clean compile, but before inspection and testing.
- At a 5% rate any 50 MLOC program will start off with some 2.5 million bugs.

Bug removal
- Testing typically exercises only half the code.

Better bug removal?
- There are better ways to do testing that do produce fantastic programs.”
- Are we sure about this fact?
* No, its only an opinion!
* In general Software Engineering has ....

So why not do this?
- The costs are unbelievable.
- It’s not unusual for the qualification process to produce a half page of documentation for each line of code.
pdf  slides  engineering  nitty-gritty  programming  best-practices  roots  comparison  cost-benefit  software  systematic-ad-hoc  structure  error  frontier  debugging  checking  formal-methods  context  detail-architecture  intricacy  big-picture  system-design  correctness  scale  scaling-tech  shipping  money  data  stylized-facts  street-fighting  objektbuch  pro-rata  estimate  pessimism  degrees-of-freedom  volo-avolo  no-go  things  thinking  summary  quality  density  methodology 
may 2019 by nhaliday
Why is reverse debugging rarely used? - Software Engineering Stack Exchange
(time travel)

For one, running in debug mode with recording on is very expensive compared to even normal debug mode; it also consumes a lot more memory.

It is easier to decrease the granularity from line level to function call level. For example, the standard debugger in eclipse allows you to "drop to frame," which is essentially a jump back to the start of the function with a reset of all the parameters (nothing done on the heap is reverted, and finally blocks are not executed, so it is not a true reverse debugger; be careful about that).

Note that this has been available for several years now and works hand in hand with hot-code replacement.
As mentioned already, performance is key e.g. with gdb's reversible debugging, running something like gzip sees a slowdown of 50,000x compared to running natively. There are commercial alternatives however: I work for Undo undo.io, and our UndoDB product does the same but with a slowdown of less than 2x. There are other commercial reversible debuggers available too.

Based on GDB, UndoDB supports source-level debugging for applications written in any language supported by GDB, including C/C++, Rust and Ada.
q-n-a  stackex  programming  engineering  impetus  debugging  time  increase-decrease  worrydream  hci  devtools  direction  roots  money-for-time  review  comparison  critique  tools  software  multi  systems  c(pp)  rust  state 
may 2019 by nhaliday
unix - How can I profile C++ code running on Linux? - Stack Overflow
If your goal is to use a profiler, use one of the suggested ones.

However, if you're in a hurry and you can manually interrupt your program under the debugger while it's being subjectively slow, there's a simple way to find performance problems.

Just halt it several times, and each time look at the call stack. If there is some code that is wasting some percentage of the time, 20% or 50% or whatever, that is the probability that you will catch it in the act on each sample. So that is roughly the percentage of samples on which you will see it. There is no educated guesswork required. If you do have a guess as to what the problem is, this will prove or disprove it.

You may have multiple performance problems of different sizes. If you clean out any one of them, the remaining ones will take a larger percentage, and be easier to spot, on subsequent passes. This magnification effect, when compounded over multiple problems, can lead to truly massive speedup factors.

Caveat: Programmers tend to be skeptical of this technique unless they've used it themselves. They will say that profilers give you this information, but that is only true if they sample the entire call stack, and then let you examine a random set of samples. (The summaries are where the insight is lost.) Call graphs don't give you the same information, because they don't summarize at the instruction level, and
they give confusing summaries in the presence of recursion.
They will also say it only works on toy programs, when actually it works on any program, and it seems to work better on bigger programs, because they tend to have more problems to find. They will say it sometimes finds things that aren't problems, but that is only true if you see something once. If you see a problem on more than one sample, it is real.


gprof, Valgrind and gperftools - an evaluation of some tools for application level CPU profiling on Linux: http://gernotklingler.com/blog/gprof-valgrind-gperftools-evaluation-tools-application-level-cpu-profiling-linux/
gprof is the dinosaur among the evaluated profilers - its roots go back into the 1980’s. It seems it was widely used and a good solution during the past decades. But its limited support for multi-threaded applications, the inability to profile shared libraries and the need for recompilation with compatible compilers and special flags that produce a considerable runtime overhead, make it unsuitable for using it in today’s real-world projects.

Valgrind delivers the most accurate results and is well suited for multi-threaded applications. It’s very easy to use and there is KCachegrind for visualization/analysis of the profiling data, but the slow execution of the application under test disqualifies it for larger, longer running applications.

The gperftools CPU profiler has a very little runtime overhead, provides some nice features like selectively profiling certain areas of interest and has no problem with multi-threaded applications. KCachegrind can be used to analyze the profiling data. Like all sampling based profilers, it suffers statistical inaccuracy and therefore the results are not as accurate as with Valgrind, but practically that’s usually not a big problem (you can always increase the sampling frequency if you need more accurate results). I’m using this profiler on a large code-base and from my personal experience I can definitely recommend using it.
q-n-a  stackex  programming  engineering  performance  devtools  tools  advice  checklists  hacker  nitty-gritty  tricks  lol  multi  unix  linux  techtariat  analysis  comparison  recommendations  software  measurement  oly-programming  concurrency  debugging  metabuch 
may 2019 by nhaliday
Teach debugging
A friend of mine and I couldn't understand why some people were having so much trouble; the material seemed like common sense. The Feynman Method was the only tool we needed.

1. Write down the problem
2. Think real hard
3. Write down the solution

The Feynman Method failed us on the last project: the design of a divider, a real-world-scale project an order of magnitude more complex than anything we'd been asked to tackle before. On the day he assigned the project, the professor exhorted us to begin early. Over the next few weeks, we heard rumors that some of our classmates worked day and night without making progress.


And then, just after midnight, a number of our newfound buddies from dinner reported successes. Half of those who started from scratch had working designs. Others were despondent, because their design was still broken in some subtle, non-obvious way. As I talked with one of those students, I began poring over his design. And after a few minutes, I realized that the Feynman method wasn't the only way forward: it should be possible to systematically apply a mechanical technique repeatedly to find the source of our problems. Beneath all the abstractions, our projects consisted purely of NAND gates (woe to those who dug around our toolbox enough to uncover dynamic logic), which outputs a 0 only when both inputs are 1. If the correct output is 0, both inputs should be 1. The input that isn't is in error, an error that is, itself, the output of a NAND gate where at least one input is 0 when it should be 1. We applied this method recursively, finding the source of all the problems in both our designs in under half an hour.

How To Debug Any Program: https://www.blinddata.com/blog/how-to-debug-any-program-9
May 8th 2019 by Saketh Are

Start by Questioning Everything


When a program is behaving unexpectedly, our attention tends to be drawn first to the most complex portions of the code. However, mistakes can come in all forms. I've personally been guilty of rushing to debug sophisticated portions of my code when the real bug was that I forgot to read in the input file. In the following section, we'll discuss how to reliably focus our attention on the portions of the program that need correction.

Then Question as Little as Possible

Suppose that we have a program and some input on which its behavior doesn’t match our expectations. The goal of debugging is to narrow our focus to as small a section of the program as possible. Once our area of interest is small enough, the value of the incorrect output that is being produced will typically tell us exactly what the bug is.

In order to catch the point at which our program diverges from expected behavior, we must inspect the intermediate state of the program. Suppose that we select some point during execution of the program and print out all values in memory. We can inspect the results manually and decide whether they match our expectations. If they don't, we know for a fact that we can focus on the first half of the program. It either contains a bug, or our expectations of what it should produce were misguided. If the intermediate state does match our expectations, we can focus on the second half of the program. It either contains a bug, or our understanding of what input it expects was incorrect.

Question Things Efficiently

For practical purposes, inspecting intermediate state usually doesn't involve a complete memory dump. We'll typically print a small number of variables and check whether they have the properties we expect of them. Verifying the behavior of a section of code involves:

1. Before it runs, inspecting all values in memory that may influence its behavior.
2. Reasoning about the expected behavior of the code.
3. After it runs, inspecting all values in memory that may be modified by the code.

Reasoning about expected behavior is typically the easiest step to perform even in the case of highly complex programs. Practically speaking, it's time-consuming and mentally strenuous to write debug output into your program and to read and decipher the resulting values. It is therefore advantageous to structure your code into functions and sections that pass a relatively small amount of information between themselves, minimizing the number of values you need to inspect.


Finding the Right Question to Ask

We’ve assumed so far that we have available a test case on which our program behaves unexpectedly. Sometimes, getting to that point can be half the battle. There are a few different approaches to finding a test case on which our program fails. It is reasonable to attempt them in the following order:

1. Verify correctness on the sample inputs.
2. Test additional small cases generated by hand.
3. Adversarially construct corner cases by hand.
4. Re-read the problem to verify understanding of input constraints.
5. Design large cases by hand and write a program to construct them.
6. Write a generator to construct large random cases and a brute force oracle to verify outputs.
techtariat  dan-luu  engineering  programming  debugging  IEEE  reflection  stories  education  higher-ed  checklists  iteration-recursion  divide-and-conquer  thinking  ground-up  nitty-gritty  giants  feynman  error  input-output  structure  composition-decomposition  abstraction  systematic-ad-hoc  reduction  teaching  state  correctness  multi  oly  oly-programming  metabuch  neurons  problem-solving  wire-guided  marginal  simplification  strategy  tactics  methodology 
may 2019 by nhaliday
Delta debugging - Wikipedia
good overview of with examples: https://www.csm.ornl.gov/~sheldon/bucket/Automated-Debugging.pdf

Not as useful for my usecases (mostly contest programming) as QuickCheck. Input is generally pretty structured and I don't have a long history of code in VCS. And when I do have the latter git-bisect is probably enough.

good book tho: http://www.whyprogramsfail.com/toc.php
WHY PROGRAMS FAIL: A Guide to Systematic Debugging\
wiki  reference  programming  systems  debugging  c(pp)  python  tools  devtools  links  hmm  formal-methods  divide-and-conquer  vcs  git  search  yak-shaving  pdf  white-paper  multi  examples  stories  books  unit  caltech  recommendations  advanced  correctness 
may 2019 by nhaliday
quality - Is the average number of bugs per loc the same for different programming languages? - Software Engineering Stack Exchange
Contrary to intuition, the number of errors per 1000 lines of does seem to be relatively constant, reguardless of the specific language involved. Steve McConnell, author of Code Complete and Software Estimation: Demystifying the Black Art goes over this area in some detail.

I don't have my copies readily to hand - they're sitting on my bookshelf at work - but a quick Google found a relevant quote:

Industry Average: "about 15 - 50 errors per 1000 lines of delivered code."
(Steve) further says this is usually representative of code that has some level of structured programming behind it, but probably includes a mix of coding techniques.

Quoted from Code Complete, found here: http://mayerdan.com/ruby/2012/11/11/bugs-per-line-of-code-ratio/

If memory serves correctly, Steve goes into a thorough discussion of this, showing that the figures are constant across languages (C, C++, Java, Assembly and so on) and despite difficulties (such as defining what "line of code" means).

Most importantly he has lots of citations for his sources - he's not offering unsubstantiated opinions, but has the references to back them up.

[ed.: I think this is delivered code? So after testing, debugging, etc. I'm more interested in the metric for the moment after you've gotten something to compile.

edit: cf https://pinboard.in/u:nhaliday/b:0a6eb68166e6]
q-n-a  stackex  programming  engineering  nitty-gritty  error  flux-stasis  books  recommendations  software  checking  debugging  pro-rata  pls  comparison  parsimony  measure  data  objektbuch  speculation  accuracy  density  correctness  estimate  street-fighting  multi  quality  stylized-facts  methodology 
april 2019 by nhaliday
Peter Norvig, the meaning of polynomials, debugging as psychotherapy | Quomodocumque
He briefly showed a demo where, given values of a polynomial, a machine can put together a few lines of code that successfully computes the polynomial. But the code looks weird to a human eye. To compute some quadratic, it nests for-loops and adds things up in a funny way that ends up giving the right output. So has it really ”learned” the polynomial? I think in computer science, you typically feel you’ve learned a function if you can accurately predict its value on a given input. For an algebraist like me, a function determines but isn’t determined by the values it takes; to me, there’s something about that quadratic polynomial the machine has failed to grasp. I don’t think there’s a right or wrong answer here, just a cultural difference to be aware of. Relevant: Norvig’s description of “the two cultures” at the end of this long post on natural language processing (which is interesting all the way through!)
mathtariat  org:bleg  nibble  tech  ai  talks  summary  philosophy  lens  comparison  math  cs  tcs  polynomials  nlp  debugging  psychology  cog-psych  complex-systems  deep-learning  analogy  legibility  interpretability  composition-decomposition  coupling-cohesion  apollonian-dionysian  heavyweights 
march 2017 by nhaliday
Reproducing bugs is awful. You get an issue like “Problem with Sidebar” that vaguely describes some odd behavior. Now you must somehow reproduce it exactly. Was it the specific timing of events? Was it bad data from the server? Was it specific to a certain user? Was it a recently updated dependency? As you slog through all these possibilities, the most annoying thing is that the person who opened the bug report already had all this information! In an ideal world, you could just replay their exact session.

Elm 0.18 lets you do exactly that! In debug mode, Elm lets you import and export the exact sequence of events from a program. You get all the information necessary to reproduce the session exactly, from mouse clicks to HTTP requests.
worrydream  functional  pls  announcement  debugging  frontend  web  javascript  time  traces  sequential  roots  explanans  replication  duplication  live-coding 
november 2016 by nhaliday
natural language processing blog: Debugging machine learning
I've been thinking, mostly in the context of teaching, about how to specifically teach debugging of machine learning. Personally I find it very helpful to break things down in terms of the usual error terms: Bayes error (how much error is there in the best possible classifier), approximation error (how much do you pay for restricting to some hypothesis class), estimation error (how much do you pay because you only have finite samples), optimization error (how much do you pay because you didn't find a global optimum to your optimization problem). I've generally found that trying to isolate errors to one of these pieces, and then debugging that piece in particular (eg., pick a better optimizer versus pick a better hypothesis class) has been useful.
machine-learning  debugging  checklists  best-practices  pragmatic  expert  init  system-design  data-science  acmtariat  error  engineering  clarity  intricacy  model-selection  org:bleg  nibble  noise-structure  signal-noise  knowledge  accuracy  expert-experience  checking  grokkability-clarity  methodology 
september 2016 by nhaliday

bundles : techie

related tags

ability-competence  abstraction  academia  accuracy  acmtariat  advanced  advice  ai  algebra  algorithms  analogy  analysis  analytical-holistic  announcement  aphorism  api  apollonian-dionysian  app  apple  assembly  atoms  attention  automata-languages  automation  backup  bangbang  bare-hands  being-right  benchmarks  best-practices  big-picture  biohacking  blog  blowhards  books  bounded-cognition  browser  build-packaging  business  c(pp)  caching  calculation  calculator  caltech  career  carmack  characterization  chart  cheatsheet  checking  checklists  civilization  clarity  clever-rats  cloud  coarse-fine  cocoa  code-dive  code-organizing  cog-psych  collaboration  commentary  communication  comparison  competition  compilers  complex-systems  composition-decomposition  computer-memory  computer-vision  concept  conceptual-vocab  concurrency  config  conquest-empire  context  contrarianism  cool  coordination  core-rats  correctness  cost-benefit  counterexample  coupling-cohesion  course  cracker-prog  critique  cs  dan-luu  data  data-science  data-structures  dataviz  dbs  debate  debugging  decision-making  decision-theory  deep-learning  degrees-of-freedom  density  dependence-independence  design  desktop  detail-architecture  devops  devtools  dimensionality  diogenes  direction  discussion  distributed  distribution  divide-and-conquer  documentation  dotnet  draft  DSL  duplication  dynamic  economics  ecosystem  editors  education  efficiency  elegance  email  embodied  embodied-cognition  embodied-street-fighting  empirical  engineering  ensembles  epistemic  error  essay  estimate  evidence-based  examples  expectancy  experiment  expert  expert-experience  explanans  explanation  exposition  facebook  features  feynman  flux-stasis  form-design  formal-methods  frameworks  frontend  frontier  functional  gallic  games  generalization  giants  gibbon  git  github  gnu  golang  google  gotchas  grad-school  gradient-descent  graph-theory  graphics  graphs  greedy  grokkability  grokkability-clarity  ground-up  guide  hacker  hardware  haskell  hci  health  heavyweights  heuristic  higher-ed  history  hmm  hn  homepage  howto  huge-data-the-biggest  ide  ideas  idk  IEEE  impact  impetus  increase-decrease  info-dynamics  info-econ  init  input-output  integration-extension  interdisciplinary  interface-compatibility  internet  interpretability  interview  intricacy  investing  ios  iron-age  iteration-recursion  jargon  javascript  judgement  julia  jvm  knowledge  latency-throughput  latex  learning  lecture-notes  legacy  legibility  lens  lesswrong  let-me-see  libraries  linear-algebra  links  linux  list  live-coding  llvm  local-global  lol  long-term  machine-learning  magnitude  management  manifolds  marginal  math  math.NT  mathtariat  measure  measurement  mediterranean  memory-management  mental-math  meta-analysis  metabolic  metabuch  metal-to-virtual  methodology  metrics  microsoft  minimalism  minimum-viable  mit  mobile  model-class  model-selection  models  money  money-for-time  mooc  move-fast-(and-break-things)  multi  multiplicative  mutation  network-structure  networking  neurons  nibble  nitty-gritty  nlp  no-go  noise-structure  nootropics  nostalgia  notation  numerics  objektbuch  ocaml-sml  occam  oly  oly-programming  oop  open-closed  optimization  org:bleg  org:com  org:junk  org:med  os  oss  osx  overflow  pareto  parsimony  paying-rent  pdf  performance  perturbation  pessimism  phd  philosophy  plan9  plots  pls  plt  poast  polynomials  pragmatic  prediction  prepping  presentation  prioritizing  pro-rata  problem-solving  prof  programming  project  protocol-metadata  psychology  python  q-n-a  qra  quality  quantified-self  quantitative-qualitative  rand-approx  random  rant  rationality  ratty  realness  recommendations  reduction  reference  reflection  regularization  regularizer  replication  repo  research  retention  review  rhetoric  rigor  risk  robust  roots  rsc  rust  safety  scale  scaling-tech  sci-comp  search  security  selection  sequential  shipping  signal-noise  simplification  slides  social  software  span-cover  speculation  spock  stackex  stanford  state  static-dynamic  stories  strategy  stream  street-fighting  strings  structure  study  stylized-facts  summary  summer-2014  sv  system-design  systematic-ad-hoc  systems  tactics  tainter  talks  tcs  teaching  tech  tech-infrastructure  technical-writing  technology  techtariat  terminal  tetlock  the-classics  the-world-is-just-atoms  theory-practice  thick-thin  things  thinking  threat-modeling  time  tools  top-n  topology  traces  trade  tradeoffs  tradition  trees  trends  tricks  trivia  troll  truth  tutorial  types  ubiquity  ui  uncertainty  unit  universalism-particularism  unix  unsupervised  ux  vcs  video  virtualization  visual-understanding  visualization  volo-avolo  web  webapp  white-paper  whole-partial-many  wiki  wire-guided  within-without  workflow  working-stiff  wormholes  worrydream  worse-is-better/the-right-thing  writing  yak-shaving  zooming  🎓  🖥 

Copy this bookmark: