nhaliday + checking   54

testing - Is there a reason that tests aren't written inline with the code that they test? - Software Engineering Stack Exchange
The only advantage I can think of for inline tests would be reducing the number of files to be written. With modern IDEs this really isn't that big a deal.

There are, however, a number of obvious drawbacks to inline testing:
- It violates separation of concerns. This may be debatable, but to me testing functionality is a different responsibility than implementing it.
- You'd either have to introduce new language features to distinguish between tests/implementation, or you'd risk blurring the line between the two.
- Larger source files are harder to work with: harder to read, harder to understand, you're more likely to have to deal with source control conflicts.
- I think it would make it harder to put your "tester" hat on, so to speak. If you're looking at the implementation details, you'll be more tempted to skip implementing certain tests.
q-n-a  stackex  programming  engineering  best-practices  debate  correctness  checking  code-organizing  composition-decomposition  coupling-cohesion  psychology  cog-psych  attention  thinking  neurons  contiguity-proximity  grokkability 
21 days ago by nhaliday
Panel: Systems Programming in 2014 and Beyond | Lang.NEXT 2014 | Channel 9
- Bjarne Stroustrup, Niko Matsakis, Andrei Alexandrescu, Rob Pike
- 2014 so pretty outdated but rare to find a discussion with people like this together
- pretty sure Jonathan Blow asked a couple questions
- Rob Pike compliments Rust at one point. Also kinda softly rags on dynamic typing at one point ("unit testing is what they have instead of static types").
video  presentation  debate  programming  pls  c(pp)  systems  os  rust  d-lang  golang  computer-memory  legacy  devtools  formal-methods  concurrency  compilers  syntax  parsimony  google  intricacy  thinking  cost-benefit  degrees-of-freedom  facebook  performance  people  rsc  cracker-prog  critique  types  checking  api  flux-stasis  engineering  time  wire-guided  worse-is-better/the-right-thing  static-dynamic 
6 weeks ago by nhaliday
Integrated vs type based shrinking - Hypothesis
The big difference is whether shrinking is integrated into generation.

In Haskell’s QuickCheck, shrinking is defined based on types: Any value of a given type shrinks the same way, regardless of how it is generated. In Hypothesis, test.check, etc. instead shrinking is part of the generation, and the generator controls how the values it produces shrinks (this works differently in Hypothesis and test.check, and probably differently again in EQC, but the user visible result is largely the same)

This is not a trivial distinction. Integrating shrinking into generation has two large benefits:
- Shrinking composes nicely, and you can shrink anything you can generate regardless of whether there is a defined shrinker for the type produced.
- You can _guarantee that shrinking satisfies the same invariants as generation_.
The first is mostly important from a convenience point of view: Although there are some things it let you do that you can’t do in the type based approach, they’re mostly of secondary importance. It largely just saves you from the effort of having to write your own shrinkers.

But the second is really important, because the lack of it makes your test failures potentially extremely confusing.

...

[example: even_numbers = integers().map(lambda x: x * 2)]

...

In this example the problem was relatively obvious and so easy to work around, but as your invariants get more implicit and subtle it becomes really problematic: In Hypothesis it’s easy and convenient to generate quite complex data, and trying to recreate the invariants that are automatically satisfied with that in your tests and/or your custom shrinkers would quickly become a nightmare.

I don’t think it’s an accident that the main systems to get this right are in dynamic languages. It’s certainly not essential - the original proposal that lead to the implementation for test.check was for Haskell, and Jack is an alternative property based system for Haskell that does this - but you feel the pain much more quickly in dynamic languages because the typical workaround for this problem in Haskell is to define a newtype, which lets you turn off the default shrinking for your types and possibly define your own.

But that’s a workaround for a problem that shouldn’t be there in the first place, and using it will still result in your having to encode the invariants into your your shrinkers, which is more work and more brittle than just having it work automatically.

So although (as far as I know) none of the currently popular property based testing systems for statically typed languages implement this behaviour correctly, they absolutely can and they absolutely should. It will improve users’ lives significantly.

https://hypothesis.works/articles/compositional-shrinking/
In my last article about shrinking, I discussed the problems with basing shrinking on the type of the values to be shrunk.

In writing it though I forgot that there was a halfway house which is also somewhat bad (but significantly less so) that you see in a couple of implementations.

This is when the shrinking is not type based, but still follows the classic shrinking API that takes a value and returns a lazy list of shrinks of that value. Examples of libraries that do this are theft and QuickTheories.

This works reasonably well and solves the major problems with type directed shrinking, but it’s still somewhat fragile and importantly does not compose nearly as well as the approaches that Hypothesis or test.check take.

Ideally, as well as not being based on the types of the values being generated, shrinking should not be based on the actual values generated at all.

This may seem counter-intuitive, but it actually works pretty well.

...

We took a strategy and composed it with a function mapping over the values that that strategy produced to get a new strategy.

Suppose the Hypothesis strategy implementation looked something like the following:
...
i.e. we can generate a value and we can shrink a value that we’ve previously generated. By default we don’t know how to generate values (subclasses have to implement that) and we can’t shrink anything, which subclasses are able to fix if they want or leave as is if they’re fine with that.

(This is in fact how a very early implementation of it looked)

This is essentially the approach taken by theft or QuickTheories, and the problem with it is that under this implementation the ‘map’ function we used above is impossible to define in a way that preserves shrinking: In order to shrink a generated value, you need some way to invert the function you’re composing with (which is in general impossible even if your language somehow exposed the facilities to do it, which it almost certainly doesn’t) so you could take the generated value, map it back to the value that produced it, shrink that and then compose with the mapping function.

...

The key idea for fixing this is as follows: In order to shrink outputs it almost always suffices to shrink inputs. Although in theory you can get functions where simpler input leads to more complicated output, in practice this seems to be rare enough that it’s OK to just shrug and accept more complicated test output in those cases.

Given that, the _way to shrink the output of a mapped strategy is to just shrink the value generated from the first strategy and feed it to the mapping function_.

Which means that you need an API that can support that sort of shrinking.

https://hypothesis.works/articles/types-and-properties/
This happens a lot: Frequently there are properties that only hold in some restricted domain, and so you want more specific tests for that domain to complement your other tests for the larger range of data.

When this happens you need tools to generate something more specific, and those requirements don’t map naturally to types.

[ed.: Some examples of how this idea can be useful:
Have a type but want to test different distributions on it for different purposes. Eg, comparing worst-case and average-case guarantees for benchmarking time/memory complexity. Comparing a slow and fast implementation on small input sizes, then running some sanity checks for the fast implementation on large input sizes beyond what the slow implementation can handle.]

...

In Haskell, traditionally we would fix this with a newtype declaration which wraps the type. We could find a newtype NonEmptyList and a newtype FiniteFloat and then say that we actually wanted a NonEmptyList[FiniteFloat] there.

...

But why should we bother? Especially if we’re only using these in one test, we’re not actually interested in these types at all, and it just adds a whole bunch of syntactic noise when you could just pass the data generators directly. Defining new types for the data you want to generate is purely a workaround for a limitation of the API.

If you were working in a dependently typed language where you could already naturally express this in the type system it might be OK (I don’t have any direct experience of working in type systems that strong), but I’m sceptical of being able to make it work well - you’re unlikely to be able to automatically derive data generators in the general case, because the needs of data generation “go in the opposite direction” from types (a type is effectively a predicate which consumes a value, where a data generator is a function that produces a value, so in order to produce a generator for a type automatically you need to basically invert the predicate). I suspect most approaches here will leave you with a bunch of sharp edges, but I would be interested to see experiments in this direction.

https://www.reddit.com/r/haskell/comments/646k3d/ann_hedgehog_property_testing/dg1485c/
techtariat  rhetoric  rant  programming  libraries  pls  types  functional  haskell  python  random  checking  design  critique  multi  composition-decomposition  api  reddit  social  commentary  system-design  arrows  lifts-projections  DSL  static-dynamic 
7 weeks ago by nhaliday
Mutation testing - Wikipedia
Mutation testing involves modifying a program in small ways.[1] Each mutated version is called a mutant and tests detect and reject mutants by causing the behavior of the original version to differ from the mutant. This is called killing the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants.
wiki  reference  concept  mutation  selection  analogy  programming  checking  formal-methods  debugging  random  list  libraries  links  functional  haskell  javascript  jvm  c(pp)  python  dotnet  oop  perturbation  static-dynamic 
7 weeks ago by nhaliday
C++ Core Guidelines
This document is a set of guidelines for using C++ well. The aim of this document is to help people to use modern C++ effectively. By “modern C++” we mean effective use of the ISO C++ standard (currently C++17, but almost all of our recommendations also apply to C++14 and C++11). In other words, what would you like your code to look like in 5 years’ time, given that you can start now? In 10 years’ time?

https://isocpp.github.io/CppCoreGuidelines/
“Within C++ is a smaller, simpler, safer language struggling to get out.” – Bjarne Stroustrup

...

The guidelines are focused on relatively higher-level issues, such as interfaces, resource management, memory management, and concurrency. Such rules affect application architecture and library design. Following the rules will lead to code that is statically type safe, has no resource leaks, and catches many more programming logic errors than is common in code today. And it will run fast - you can afford to do things right.

We are less concerned with low-level issues, such as naming conventions and indentation style. However, no topic that can help a programmer is out of bounds.

Our initial set of rules emphasize safety (of various forms) and simplicity. They may very well be too strict. We expect to have to introduce more exceptions to better accommodate real-world needs. We also need more rules.

...

The rules are designed to be supported by an analysis tool. Violations of rules will be flagged with references (or links) to the relevant rule. We do not expect you to memorize all the rules before trying to write code.

contrary:
https://aras-p.info/blog/2018/12/28/Modern-C-Lamentations/
This will be a long wall of text, and kinda random! My main points are:
1. C++ compile times are important,
2. Non-optimized build performance is important,
3. Cognitive load is important. I don’t expand much on this here, but if a programming language or a library makes me feel stupid, then I’m less likely to use it or like it. C++ does that a lot :)
programming  engineering  pls  best-practices  systems  c(pp)  guide  metabuch  objektbuch  reference  cheatsheet  elegance  frontier  libraries  intricacy  advanced  advice  recommendations  big-picture  novelty  lens  philosophy  state  error  types  concurrency  memory-management  performance  abstraction  plt  compilers  expert-experience  multi  checking  devtools  flux-stasis  safety  system-design  techtariat  time  measure  dotnet  comparison  examples  build-packaging  thinking  worse-is-better/the-right-thing  cost-benefit  tradeoffs  essay  commentary  oop  correctness  computer-memory  error-handling  resources-effects 
8 weeks ago by nhaliday
Interview with Donald Knuth | Interview with Donald Knuth | InformIT
Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.

Reusable vs. re-editable code: https://hal.archives-ouvertes.fr/hal-01966146/document
- Konrad Hinsen

https://www.johndcook.com/blog/2008/05/03/reusable-code-vs-re-editable-code/
I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.
nibble  interview  giants  expert-experience  programming  cs  software  contrarianism  carmack  oss  prediction  trends  linux  concurrency  desktop  comparison  checking  debugging  stories  engineering  hmm  idk  algorithms  books  debate  flux-stasis  duplication  parsimony  best-practices  writing  documentation  latex  intricacy  structure  hardware  caching  workflow  editors  composition-decomposition  coupling-cohesion  exposition  technical-writing  thinking  cracker-prog  code-organizing  grokkability  multi  techtariat  commentary  pdf  reflection  essay  examples  python  data-science  libraries 
10 weeks ago by nhaliday
Frama-C
Frama-C is organized with a plug-in architecture (comparable to that of the Gimp or Eclipse). A common kernel centralizes information and conducts the analysis. Plug-ins interact with each other through interfaces defined by the kernel. This makes for robustness in the development of Frama-C while allowing a wide functionality spectrum.

...

Three heavyweight plug-ins that are used by the other plug-ins:

- Eva (Evolved Value analysis)
This plug-in computes variation domains for variables. It is quite automatic, although the user may guide the analysis in places. It handles a wide spectrum of C constructs. This plug-in uses abstract interpretation techniques.
- Jessie and Wp, two deductive verification plug-ins
These plug-ins are based on weakest precondition computation techniques. They allow to prove that C functions satisfy their specification as expressed in ACSL. These proofs are modular: the specifications of the called functions are used to establish the proof without looking at their code.

For browsing unfamiliar code:
- Impact analysis
This plug-in highlights the locations in the source code that are impacted by a modification.
- Scope & Data-flow browsing
This plug-in allows the user to navigate the dataflow of the program, from definition to use or from use to definition.
- Variable occurrence browsing
Also provided as a simple example for new plug-in development, this plug-in allows the user to reach the statements where a given variable is used.
- Metrics calculation
This plug-in allows the user to compute various metrics from the source code.

For code transformation:
- Semantic constant folding
This plug-in makes use of the results of the evolved value analysis plug-in to replace, in the source code, the constant expressions by their values. Because it relies on EVA, it is able to do more of these simplifications than a syntactic analysis would.
- Slicing
This plug-in slices the code according to a user-provided criterion: it creates a copy of the program, but keeps only those parts which are necessary with respect to the given criterion.
- Spare code: remove "spare code", code that does not contribute to the final results of the program.
- E-ACSL: translate annotations into C code for runtime assertion checking.
For verifying functional specifications:

- Aoraï: verify specifications expressed as LTL (Linear Temporal Logic) formulas
Other functionalities documented together with the EVA plug-in can be considered as verifying low-level functional specifications (inputs, outputs, dependencies,…)
For test-case generation:

- PathCrawler automatically finds test-case inputs to ensure coverage of a C function. It can be used for structural unit testing, as a complement to static analysis or to study the feasible execution paths of the function.
For concurrent programs:

- Mthread
This plug-in automatically analyzes concurrent C programs, using the EVA plug-in, taking into account all possible thread interactions. At the end of its execution, the concurrent behavior of each thread is over-approximated, resulting in precise information about shared variables, which mutex protects a part of the code, etc.
Front-end for other languages

- Frama-Clang
This plug-in provides a C++ front-end to Frama-C, based on the clang compiler. It transforms C++ code into a Frama-C AST, which can then be analyzed by the plug-ins above. Note however that it is very experimental and only supports a subset of C++11
tools  devtools  formal-methods  programming  software  c(pp)  systems  memory-management  ocaml-sml  debugging  checking  rigor  oss  code-dive  graphs  state  metrics  llvm  gallic  cool  worrydream  impact  flux-stasis  correctness  computer-memory  structure  static-dynamic 
12 weeks ago by nhaliday
One week of bugs
If I had to guess, I'd say I probably work around hundreds of bugs in an average week, and thousands in a bad week. It's not unusual for me to run into a hundred new bugs in a single week. But I often get skepticism when I mention that I run into multiple new (to me) bugs per day, and that this is inevitable if we don't change how we write tests. Well, here's a log of one week of bugs, limited to bugs that were new to me that week. After a brief description of the bugs, I'll talk about what we can do to improve the situation. The obvious answer to spend more effort on testing, but everyone already knows we should do that and no one does it. That doesn't mean it's hopeless, though.

...

Here's where I'm supposed to write an appeal to take testing more seriously and put real effort into it. But we all know that's not going to work. It would take 90k LOC of tests to get Julia to be as well tested as a poorly tested prototype (falsely assuming linear complexity in size). That's two person-years of work, not even including time to debug and fix bugs (which probably brings it closer to four of five years). Who's going to do that? No one. Writing tests is like writing documentation. Everyone already knows you should do it. Telling people they should do it adds zero information1.

Given that people aren't going to put any effort into testing, what's the best way to do it?

Property-based testing. Generative testing. Random testing. Concolic Testing (which was done long before the term was coined). Static analysis. Fuzzing. Statistical bug finding. There are lots of options. Some of them are actually the same thing because the terminology we use is inconsistent and buggy. I'm going to arbitrarily pick one to talk about, but they're all worth looking into.

...

There are a lot of great resources out there, but if you're just getting started, I found this description of types of fuzzers to be one of those most helpful (and simplest) things I've read.

John Regehr has a udacity course on software testing. I haven't worked through it yet (Pablo Torres just pointed to it), but given the quality of Dr. Regehr's writing, I expect the course to be good.

For more on my perspective on testing, there's this.

https://hypothesis.works/articles/the-purpose-of-hypothesis/
From the perspective of a user, the purpose of Hypothesis is to make it easier for you to write better tests.

From my perspective as the primary author, that is of course also a purpose of Hypothesis. I write a lot of code, it needs testing, and the idea of trying to do that without Hypothesis has become nearly unthinkable.

But, on a large scale, the true purpose of Hypothesis is to drag the world kicking and screaming into a new and terrifying age of high quality software.

Software is everywhere. We have built a civilization on it, and it’s only getting more prevalent as more services move online and embedded and “internet of things” devices become cheaper and more common.

Software is also terrible. It’s buggy, it’s insecure, and it’s rarely well thought out.

This combination is clearly a recipe for disaster.

The state of software testing is even worse. It’s uncontroversial at this point that you should be testing your code, but it’s a rare codebase whose authors could honestly claim that they feel its testing is sufficient.

Much of the problem here is that it’s too hard to write good tests. Tests take up a vast quantity of development time, but they mostly just laboriously encode exactly the same assumptions and fallacies that the authors had when they wrote the code, so they miss exactly the same bugs that you missed when they wrote the code.

Preventing the Collapse of Civilization [video]: https://news.ycombinator.com/item?id=19945452
- Jonathan Blow

NB: DevGAMM is a game industry conference

- loss of technological knowledge (Antikythera mechanism, aqueducts, etc.)
- hardware driving most gains, not software
- software's actually less robust, often poorly designed and overengineered these days
- *list of bugs he's encountered recently*:
https://youtu.be/pW-SOdj4Kkk?t=1387
- knowledge of trivia becomes more than general, deep knowledge
- does at least acknowledge value of DRY, reusing code, abstraction saving dev time
techtariat  dan-luu  tech  software  error  list  debugging  linux  github  robust  checking  oss  troll  lol  aphorism  webapp  email  google  facebook  games  julia  pls  compilers  communication  mooc  browser  rust  programming  engineering  random  jargon  formal-methods  expert-experience  prof  c(pp)  course  correctness  hn  commentary  video  presentation  carmack  pragmatic  contrarianism  pessimism  sv  unix  rhetoric  critique  worrydream  hardware  performance  trends  multiplicative  roots  impact  comparison  history  iron-age  the-classics  mediterranean  conquest-empire  gibbon  technology  the-world-is-just-atoms  flux-stasis  increase-decrease  graphics  hmm  idk  systems  os  abstraction  intricacy  worse-is-better/the-right-thing  build-packaging  microsoft  osx  apple  reflection  assembly  things  knowledge  detail-architecture  thick-thin  trivia  info-dynamics  caching  frameworks  generalization  systematic-ad-hoc  universalism-particularism  analytical-holistic  structure  tainter  libraries  tradeoffs  prepping  threat-modeling  network-structure  writing  risk  local-glob 
may 2019 by nhaliday
Continuous Code Quality | SonarSource
they have cyclomatic complexity rule
$150/year for dev edition (needed for C++ but not Java/Python)
devtools  software  ruby  saas  programming  python  checking  c(pp)  jvm  structure  intricacy  graphs  golang  scala  metrics  javascript  dotnet  quality  static-dynamic 
may 2019 by nhaliday
Why is Software Engineering so difficult? - James Miller
basic message: No silver bullet!

most interesting nuggets:
Scale and Complexity
- Windows 7 > 50 million LOC
Expect a staggering number of bugs.

Bugs?
- Well-written C and C++ code contains some 5 to 10 errors per 100 LOC after a clean compile, but before inspection and testing.
- At a 5% rate any 50 MLOC program will start off with some 2.5 million bugs.

Bug removal
- Testing typically exercises only half the code.

Better bug removal?
- There are better ways to do testing that do produce fantastic programs.”
- Are we sure about this fact?
* No, its only an opinion!
* In general Software Engineering has ....
NO FACTS!

So why not do this?
- The costs are unbelievable.
- It’s not unusual for the qualification process to produce a half page of documentation for each line of code.
pdf  slides  engineering  nitty-gritty  programming  best-practices  roots  comparison  cost-benefit  software  systematic-ad-hoc  structure  error  frontier  debugging  checking  formal-methods  context  detail-architecture  intricacy  big-picture  system-design  correctness  scale  scaling-tech  shipping  money  data  stylized-facts  street-fighting  objektbuch  pro-rata  estimate  pessimism  degrees-of-freedom  volo-avolo  no-go  things  thinking  summary  quality  density 
may 2019 by nhaliday
AFL + QuickCheck = ?
Adventures in fuzzing. Also differences between testing culture in software and hardware.
techtariat  dan-luu  programming  engineering  checking  random  haskell  path-dependence  span-cover  heuristic  libraries  links  tools  devtools  software  hardware  culture  formal-methods  local-global  golang  correctness 
may 2019 by nhaliday
quality - Is the average number of bugs per loc the same for different programming languages? - Software Engineering Stack Exchange
Contrary to intuition, the number of errors per 1000 lines of does seem to be relatively constant, reguardless of the specific language involved. Steve McConnell, author of Code Complete and Software Estimation: Demystifying the Black Art goes over this area in some detail.

I don't have my copies readily to hand - they're sitting on my bookshelf at work - but a quick Google found a relevant quote:

Industry Average: "about 15 - 50 errors per 1000 lines of delivered code."
(Steve) further says this is usually representative of code that has some level of structured programming behind it, but probably includes a mix of coding techniques.

Quoted from Code Complete, found here: http://mayerdan.com/ruby/2012/11/11/bugs-per-line-of-code-ratio/

If memory serves correctly, Steve goes into a thorough discussion of this, showing that the figures are constant across languages (C, C++, Java, Assembly and so on) and despite difficulties (such as defining what "line of code" means).

Most importantly he has lots of citations for his sources - he's not offering unsubstantiated opinions, but has the references to back them up.

[ed.: I think this is delivered code? So after testing, debugging, etc. I'm more interested in the metric for the moment after you've gotten something to compile.

edit: cf https://pinboard.in/u:nhaliday/b:0a6eb68166e6]
q-n-a  stackex  programming  engineering  nitty-gritty  error  flux-stasis  books  recommendations  software  checking  debugging  pro-rata  pls  comparison  parsimony  measure  data  objektbuch  speculation  accuracy  density  correctness  estimate  street-fighting  multi  quality  stylized-facts 
april 2019 by nhaliday
Book review: "Working Effectively with Legacy Code" by Michael C. Feathers - Eli Bendersky's website
The basic premise of the book is simple, and can be summarized as follows:

To improve some piece of code, we must be able to refactor it.
To be able to refactor code, we must have tests that prove our refactoring didn't break anything.
To have reasonable tests, the code has to be testable; that is, it should be in a form amenable to test harnessing. This most often means breaking implicit dependencies.
... and the author spends about 400 pages on how to achieve that. This book is dense, and it took me a long time to plow through it. I started reading linerarly, but very soon discovered this approach doesn't work. So I began hopping forward and backward between the main text and the "dependency-breaking techniques" chapter which holds isolated recipes for dealing with specific kinds of dependencies. There's quite a bit of repetition in the book, which makes it even more tedious to read.

The techniques described by the author are as terrible as the code they're up against. Horrible abuses of the preprocessor in C/C++, abuses of inheritance in C++ and Java, and so on. Particularly the latter is quite sobering. If you love OOP beware - this book may leave you disenchanted, if not full of hate.

To reiterate the conclusion I already presented earlier - get this book if you have to work with old balls of mud; it will be effort well spent. Otherwise, if you're working on one of those new-age continuously integrated codebases with a 2/1 test to code ratio, feel free to skip it.
techtariat  books  review  summary  critique  engineering  programming  intricacy  code-dive  best-practices  checklists  checking  working-stiff  retrofit  oop  code-organizing 
july 2017 by nhaliday
Validation is a Galilean enterprise
We contend that Frey's analyses actually have little bearing on the external validity of the PGG. Evidence from recent experiments using modified versions of the PGG and stringent comprehension checks indicate that individual differences in people's tendencies to contribute to the public good are better explained by individual differences in participants' comprehension of the game's payoff structure than by individual differences in cooperativeness (Burton-Chellew, El Mouden, & West, 2016). For example, only free riders reliably understand right away that complete defection maximizes one's own payoff, regardless of how much other participants contribute. This difference in comprehension alone explains the so-called free riders' low PGG contributions. These recent results also provide a new interpretation of why conditional cooperators often contribute generously in early rounds, and then less in later rounds (Fischbacher et al., 2001). Fischbacher et al. (2001) attribute the relatively high contributions in the early rounds to cooperativeness and the subsequent decline in contributions to conditional cooperators' frustration with free riders. In reality, the decline in cooperation observed over the course of PGGs occurs because so-called conditional cooperators initially believe that their payoff-maximizing decision depends on whether others contribute, but eventually learn that contributing never benefits the contributor (Burton-Chellew, Nax, & West, 2015). Because contributions in the PGG do not actually reflect cooperativeness, there is no real-world cooperative setting to which inferences about contributions in the PGG can generalize.
study  behavioral-econ  economics  psychology  social-psych  coordination  cooperate-defect  piracy  altruism  bounded-cognition  error  lol  pdf  map-territory  GT-101  realness  free-riding  public-goodish  decision-making  microfoundations  descriptive  values  interests  generalization  measurement  checking 
june 2017 by nhaliday
natural language processing blog: Debugging machine learning
I've been thinking, mostly in the context of teaching, about how to specifically teach debugging of machine learning. Personally I find it very helpful to break things down in terms of the usual error terms: Bayes error (how much error is there in the best possible classifier), approximation error (how much do you pay for restricting to some hypothesis class), estimation error (how much do you pay because you only have finite samples), optimization error (how much do you pay because you didn't find a global optimum to your optimization problem). I've generally found that trying to isolate errors to one of these pieces, and then debugging that piece in particular (eg., pick a better optimizer versus pick a better hypothesis class) has been useful.
machine-learning  debugging  checklists  best-practices  pragmatic  expert  init  system-design  data-science  acmtariat  error  engineering  clarity  intricacy  model-selection  org:bleg  nibble  noise-structure  signal-noise  knowledge  accuracy  expert-experience  checking 
september 2016 by nhaliday
Rumor has it : snopes.com
https://www.nytimes.com/2017/07/24/business/media/snopes-crowdfunding-proper-media.html
Proper Media and its lawyers tell a starkly different story. They say that Snopes employees will continue to be paid from the advertising revenue, and that Mr. Mikkelson should be removed from the company because of wasteful spending.

At Snopes, a Quest to Debunk Misinformation Online APRIL 4, 2010
The two sides, which have sued each other in separate claims, present entirely conflicting descriptions of who owns the company and what is being withheld from whom. The earliest chance for resolution appears to be a court hearing scheduled for next week.
politics  news  tools  wiki  media  organization  letters  drama  sleuthin  current-events  multi  org:rec  law  checking 
july 2016 by nhaliday
Modafinil - Gwern.net
I would describe its advantages over other common stimulants as: more powerful and less addictive & tolerating than caffeine or khat; much longer-lasting than nicotine; less likely to alter mood or produce ‘tweaking’ behavior than Adderall or Vyvanse; and much more legal & with almost no side-effects compared to methamphetamine or cocaine.
nootropics  drugs  analysis  longform  comparison  data  gwern  ratty  faq  guide  chart  cost-benefit  money  internet  brands  list  top-n  ranking  checking  chemistry 
may 2016 by nhaliday

related tags

2016-election  ability-competence  absolute-relative  abstraction  accuracy  acmtariat  advanced  advice  algebra  algorithms  altruism  analogy  analysis  analytical-holistic  aphorism  api  apple  arrows  art  assembly  attention  automata-languages  automation  axioms  backup  bare-hands  behavioral-econ  benchmarks  best-practices  big-picture  books  bostrom  bounded-cognition  brands  browser  build-packaging  business  c(pp)  caching  career  carmack  causation  chart  cheatsheet  checking  checklists  chemistry  civilization  clarity  cloud  code-dive  code-organizing  cog-psych  commentary  common-case  communication  community  comparison  compilers  complex-systems  composition-decomposition  computer-memory  computer-vision  concept  conceptual-vocab  concurrency  config  conquest-empire  context  contiguity-proximity  contradiction  contrarianism  cool  cooperate-defect  coordination  correctness  correlation  cost-benefit  coupling-cohesion  course  cracker-prog  critique  crypto  cs  culture  current-events  d-lang  dan-luu  data  data-science  database  dataviz  dbs  debate  debugging  decision-making  deep-learning  degrees-of-freedom  density  descriptive  design  desktop  detail-architecture  devops  devtools  dimensionality  diogenes  distributed  distribution  documentation  dotnet  draft  drama  drugs  DSL  duplication  econometrics  economics  ecosystem  editors  elegance  email  empirical  endo-exo  endogenous-exogenous  engineering  ensembles  epistemic  erlang  error  error-handling  essay  estimate  ethics  examples  expert  expert-experience  exposition  facebook  faq  flux-stasis  formal-methods  formal-values  frameworks  free-riding  frontier  functional  gallic  games  generalization  giants  gibbon  git  github  golang  good-evil  google  gotchas  government  gradient-descent  graphics  graphs  greedy  grokkability  GT-101  guide  gwern  hardware  haskell  healthcare  heuristic  history  hmm  hn  homepage  homo-hetero  howto  hypothesis-testing  ideas  identification-equivalence  idk  impact  increase-decrease  inference  info-dynamics  info-foraging  init  institutions  interests  interface  internet  interview  intricacy  investing  iron-age  is-ought  iteration-recursion  jargon  javascript  judgement  julia  jvm  knowledge  latex  law  learning  legacy  lens  let-me-see  letters  lexical  libraries  lifts-projections  linear-algebra  linear-models  links  linux  lisp  list  live-coding  llvm  local-global  logic  lol  long-term  longform  machine-learning  madisonian  magnitude  management  map-territory  measure  measurement  media  medicine  mediterranean  memory-management  meta:prediction  metabuch  metal-to-virtual  methodology  metrics  microfoundations  microsoft  model-class  model-selection  models  money  mooc  multi  multiplicative  music  mutation  network-structure  networking  neurons  news  nibble  nihil  nitty-gritty  no-go  noise-structure  nootropics  novelty  numerics  objektbuch  ocaml-sml  oly-programming  oop  open-closed  opsec  optimization  orders  org:bleg  org:com  org:data  org:med  org:rec  organization  organizing  os  oss  osx  outcome-risk  overflow  parsimony  paste  path-dependence  pdf  people  performance  perturbation  pessimism  philosophy  piracy  plots  pls  plt  polisci  politics  postmortem  pragmatic  prediction  prepping  presentation  prioritizing  priors-posteriors  pro-rata  prof  programming  project  protocol  psychology  public-goodish  python  q-n-a  quality  quiz  r-lang  random  ranking  rant  rationality  ratty  realness  reason  recommendations  reddit  reduction  reference  reflection  regularization  regularizer  repo  resources-effects  retrofit  review  rhetoric  rigor  risk  robust  roots  rsc  ruby  rust  saas  safety  scala  scale  scaling-tech  sci-comp  search  security  selection  sentiment  sequential  shipping  signal-noise  simplification  sleuthin  slides  slippery-slope  social  social-psych  software  span-cover  speculation  spock  ssc  stackex  state  static-dynamic  stats  stories  street-fighting  structure  study  stylized-facts  subculture  summary  sv  syntax  system-design  systematic-ad-hoc  systems  tainter  tech  tech-infrastructure  technical-writing  technology  techtariat  tetlock  the-classics  the-world-is-just-atoms  theos  thick-thin  things  thinking  threat-modeling  time  time-complexity  tools  top-n  topology  tracker  trade  tradeoffs  trees  trends  tricks  trivia  troll  trust  truth  tutorial  types  uncertainty  unintended-consequences  unit  universalism-particularism  unix  unsupervised  values  video  volo-avolo  water  web  webapp  wiki  wire-guided  workflow  working-stiff  wormholes  worrydream  worse-is-better/the-right-thing  writing  wtf  yak-shaving  yvain  🎩  🖥 

Copy this bookmark:



description:


tags: