nhaliday + data-science   251

Ask HN: Getting into NLP in 2018? | Hacker News
syllogism (spaCy author):
I think it's probably a bad strategy to try to be the "NLP guy" to potential employers. You'd do much better off being a software engineer on a project with people with ML or NLP expertise.

NLP projects fail a lot. If you line up a job as a company's first NLP person, you'll probably be setting yourself up for failure. You'll get handed an idea that can't work, you won't know enough about how to push back to change it into something that might, etc. After the project fails, you might get a chance to fail at a second one, but maybe not a third. This isn't a great way to move into any new field.

I think a cunning plan would be to angle to be the person who "productionises" models.
...
.--
...

Basically, don't just work on having more powerful solutions. Make sure you've tried hard to have easier problems as well --- that part tends to be higher leverage.

https://news.ycombinator.com/item?id=14008752
https://news.ycombinator.com/item?id=12916498
https://algorithmia.com/blog/introduction-natural-language-processing-nlp
hn  q-n-a  discussion  tech  programming  machine-learning  nlp  strategy  career  planning  human-capital  init  advice  books  recommendations  course  unit  links  automation  project  examples  applications  multi  mooc  lectures  video  data-science  org:com  roadmap  summary  error  applicability-prereqs  ends-means  telos-atelos  cost-benefit 
3 days ago by nhaliday
Ask HN: What's a promising area to work on? | Hacker News
hn  discussion  q-n-a  ideas  impact  trends  the-bones  speedometer  technology  applications  tech  cs  programming  list  top-n  recommendations  lens  machine-learning  deep-learning  security  privacy  crypto  software  hardware  cloud  biotech  CRISPR  bioinformatics  biohacking  blockchain  cryptocurrency  crypto-anarchy  healthcare  graphics  SIGGRAPH  vr  automation  universalism-particularism  expert-experience  reddit  social  arbitrage  supply-demand  ubiquity  cost-benefit  compensation  chart  career  planning  strategy  long-term  advice  sub-super  commentary  rhetoric  org:com  techtariat  human-capital  prioritizing  tech-infrastructure  working-stiff  data-science 
3 days ago by nhaliday
How good are decisions?
A statement I commonly hear in tech-utopian circles is that some seeming inefficiency can’t actually be inefficient because the market is efficient and inefficiencies will quickly be eliminated. A contentious example of this is the claim that companies can’t be discriminating because the market is too competitive to tolerate discrimination. A less contentious example is that when you see a big company doing something that seems bizarrely inefficient, maybe it’s not inefficient and you just lack the information necessary to understand why the decision was efficient.

Unfortunately, arguments like this are difficult to settle because, even in retrospect, it’s usually not possible to get enough information to determine the precise “value” of a decision. Even in cases where the decision led to an unambiguous success or failure, there are so many factors that led to the result that it’s difficult to figure out precisely why something happened.

One nice thing about sports is that they often have detailed play-by-play data and well-defined win criteria which lets us tell, on average, what the expected value of a decision is. In this post, we’ll look at the cost of bad decision making in one sport and then briefly discuss why decision quality in sports might be the same or better as decision quality in other fields.

Just to have a concrete example, we’re going to look at baseball, but you could do the same kind of analysis for football, hockey, basketball, etc., and my understanding is that you’d get a roughly similar result in all of those cases.

We’re going to model baseball as a state machine, both because that makes it easy to understand the expected value of particular decisions and because this lets us talk about the value of decisions without having to go over most of the rules of baseball.

exactly the kinda thing Dad likes
techtariat  dan-luu  data  analysis  examples  nitty-gritty  sports  street-fighting  automata-languages  models  optimization  arbitrage  data-science  cost-benefit  tactics  baseball  low-hanging 
august 2019 by nhaliday
Three best practices for building successful data pipelines - O'Reilly Media
Drawn from their experiences and my own, I’ve identified three key areas that are often overlooked in data pipelines, and those are making your analysis:
1. Reproducible
2. Consistent
3. Productionizable

...

Science that cannot be reproduced by an external third party is just not science — and this does apply to data science. One of the benefits of working in data science is the ability to apply the existing tools from software engineering. These tools let you isolate all the dependencies of your analyses and make them reproducible.

Dependencies fall into three categories:
1. Analysis code ...
2. Data sources ...
3. Algorithmic randomness ...

...

Establishing consistency in data
...

There are generally two ways of establishing the consistency of data sources. The first is by checking-in all code and data into a single revision control repository. The second method is to reserve source control for code and build a pipeline that explicitly depends on external data being in a stable, consistent format and location.

Checking data into version control is generally considered verboten for production software engineers, but it has a place in data analysis. For one thing, it makes your analysis very portable by isolating all dependencies into source control. Here are some conditions under which it makes sense to have both code and data in source control:
Small data sets ...
Regular analytics ...
Fixed source ...

Productionizability: Developing a common ETL
...

1. Common data format ...
2. Isolating library dependencies ...

https://blog.koresoftware.com/blog/etl-principles
Rigorously enforce the idempotency constraint
For efficiency, seek to load data incrementally
Always ensure that you can efficiently process historic data
Partition ingested data at the destination
Rest data between tasks
Pool resources for efficiency
Store all metadata together in one place
Manage login details in one place
Specify configuration details once
Parameterize sub flows and dynamically run tasks where possible
Execute conditionally
Develop your own workflow framework and reuse workflow components

more focused on details of specific technologies:
https://medium.com/@rchang/a-beginners-guide-to-data-engineering-part-i-4227c5c457d7

https://www.cloudera.com/documentation/director/cloud/topics/cloud_de_best_practices.html
techtariat  org:com  best-practices  engineering  code-organizing  machine-learning  data-science  yak-shaving  nitty-gritty  workflow  config  vcs  replication  homo-hetero  multi  org:med  design  system-design  links  shipping  minimalism  volo-avolo  causation  random  invariance  structure  arrows  protocol-metadata  interface-compatibility 
august 2019 by nhaliday
Interview with Donald Knuth | Interview with Donald Knuth | InformIT
Andrew Binstock and Donald Knuth converse on the success of open source, the problem with multicore architecture, the disappointing lack of interest in literate programming, the menace of reusable code, and that urban legend about winning a programming contest with a single compilation.

Reusable vs. re-editable code: https://hal.archives-ouvertes.fr/hal-01966146/document
- Konrad Hinsen

https://www.johndcook.com/blog/2008/05/03/reusable-code-vs-re-editable-code/
I think whether code should be editable or in “an untouchable black box” depends on the number of developers involved, as well as their talent and motivation. Knuth is a highly motivated genius working in isolation. Most software is developed by large teams of programmers with varying degrees of motivation and talent. I think the further you move away from Knuth along these three axes the more important black boxes become.
nibble  interview  giants  expert-experience  programming  cs  software  contrarianism  carmack  oss  prediction  trends  linux  concurrency  desktop  comparison  checking  debugging  stories  engineering  hmm  idk  algorithms  books  debate  flux-stasis  duplication  parsimony  best-practices  writing  documentation  latex  intricacy  structure  hardware  caching  workflow  editors  composition-decomposition  coupling-cohesion  exposition  technical-writing  thinking  cracker-prog  code-organizing  grokkability  multi  techtariat  commentary  pdf  reflection  essay  examples  python  data-science  libraries  grokkability-clarity 
june 2019 by nhaliday
Should I go for TensorFlow or PyTorch?
Honestly, most experts that I know love Pytorch and detest TensorFlow. Karpathy and Justin from Stanford for example. You can see Karpthy's thoughts and I've asked Justin personally and the answer was sharp: PYTORCH!!! TF has lots of PR but its API and graph model are horrible and will waste lots of your research time.

--

...

Updated Mar 12
Update after 2019 TF summit:

TL/DR: previously I was in the pytorch camp but with TF 2.0 it’s clear that Google is really going to try to have parity or try to be better than Pytorch in all aspects where people voiced concerns (ease of use/debugging/dynamic graphs). They seem to be allocating more resources on development than Facebook so the longer term currently looks promising for Google. Prior to TF 2.0 I thought that Pytorch team had more momentum. One area where FB/Pytorch is still stronger is Google is a bit more closed and doesn’t seem to release reproducible cutting edge models such as AlphaGo whereas FAIR released OpenGo for instance. Generally you will end up running into models that are only implemented in one framework of the other so chances are you might end up learning both.
q-n-a  qra  comparison  software  recommendations  cost-benefit  tradeoffs  python  libraries  machine-learning  deep-learning  data-science  sci-comp  tools  google  facebook  tech  competition  best-practices  trends  debugging  expert-experience  ecosystem  theory-practice  pragmatic  wire-guided  static-dynamic  state  academia  frameworks  open-closed 
may 2019 by nhaliday
python - Does pandas iterrows have performance issues? - Stack Overflow
Generally, iterrows should only be used in very very specific cases. This is the general order of precedence for performance of various operations:

1) vectorization
2) using a custom cython routine
3) apply
a) reductions that can be performed in cython
b) iteration in python space
4) itertuples
5) iterrows
6) updating an empty frame (e.g. using loc one-row-at-a-time)
q-n-a  stackex  programming  python  libraries  gotchas  data-science  sci-comp  performance  checklists  objektbuch  best-practices  DSL  frameworks 
may 2019 by nhaliday
Burrito: Rethinking the Electronic Lab Notebook
Seems very well-suited for ML experiments (if you can get it to work), also the nilfs aspect is cool and basically implements exactly one of the my project ideas (mini-VCS for competitive programming). Unfortunately gnarly installation instructions specify running it on Linux VM: https://github.com/pgbovine/burrito/blob/master/INSTALL. Linux is hard requirement due to nilfs.
techtariat  project  tools  devtools  linux  programming  yak-shaving  integration-extension  nitty-gritty  workflow  exocortex  scholar  software  python  app  desktop  notetaking  state  machine-learning  data-science  nibble  sci-comp  oly  vcs  multi  repo  paste  homepage  research 
may 2019 by nhaliday
Workshop Abstract | Identifying and Understanding Deep Learning Phenomena
ICML 2019 workshop, June 15th 2019, Long Beach, CA

We solicit contributions that view the behavior of deep nets as natural phenomena, to be investigated with methods inspired from the natural sciences like physics, astronomy, and biology.
unit  workshop  acm  machine-learning  science  empirical  nitty-gritty  atoms  deep-learning  model-class  icml  data-science  rigor  replication  examples  ben-recht  physics 
april 2019 by nhaliday
Stack Overflow Developer Survey 2018
Rust, Python, Go in top most loved
F#/OCaml most high paying globally, Erlang/Scala/OCaml in the US (F# still in top 10)
ML specialists high-paid
editor usage: VSCode > VS > Sublime > Vim > Intellij >> Emacs
ranking  list  top-n  time-series  data  database  programming  engineering  pls  trends  stackex  poll  career  exploratory  network-structure  ubiquity  ocaml-sml  rust  golang  python  dotnet  money  jobs  compensation  erlang  scala  jvm  ai  ai-control  risk  futurism  ethical-algorithms  data-science  machine-learning  editors  devtools  tools  pro-rata  org:com  software  analysis  article  human-capital  let-me-see  expert-experience  complement-substitute 
december 2018 by nhaliday
The Gelman View – spottedtoad
I have read Andrew Gelman’s blog for about five years, and gradually, I’ve decided that among his many blog posts and hundreds of academic articles, he is advancing a philosophy not just of statistics but of quantitative social science in general. Not a statistician myself, here is how I would articulate the Gelman View:

A. Purposes

1. The purpose of social statistics is to describe and understand variation in the world. The world is a complicated place, and we shouldn’t expect things to be simple.
2. The purpose of scientific publication is to allow for communication, dialogue, and critique, not to “certify” a specific finding as absolute truth.
3. The incentive structure of science needs to reward attempts to independently investigate, reproduce, and refute existing claims and observed patterns, not just to advance new hypotheses or support a particular research agenda.

B. Approach

1. Because the world is complicated, the most valuable statistical models for the world will generally be complicated. The result of statistical investigations will only rarely be to  give a stamp of truth on a specific effect or causal claim, but will generally show variation in effects and outcomes.
2. Whenever possible, the data, analytic approach, and methods should be made as transparent and replicable as possible, and should be fair game for anyone to examine, critique, or amend.
3. Social scientists should look to build upon a broad shared body of knowledge, not to “own” a particular intervention, theoretic framework, or technique. Such ownership creates incentive problems when the intervention, framework, or technique fail and the scientist is left trying to support a flawed structure.

Components

1. Measurement. How and what we measure is the first question, well before we decide on what the effects are or what is making that measurement change.
2. Sampling. Who we talk to or collect information from always matters, because we should always expect effects to depend on context.
3. Inference. While models should usually be complex, our inferential framework should be simple enough for anyone to follow along. And no p values.

He might disagree with all of this, or how it reflects his understanding of his own work. But I think it is a valuable guide to empirical work.
ratty  unaffiliated  summary  gelman  scitariat  philosophy  lens  stats  hypothesis-testing  science  meta:science  social-science  institutions  truth  is-ought  best-practices  data-science  info-dynamics  alt-inst  academia  empirical  evidence-based  checklists  strategy  epistemic 
november 2017 by nhaliday
self study - Looking for a good and complete probability and statistics book - Cross Validated
I never had the opportunity to visit a stats course from a math faculty. I am looking for a probability theory and statistics book that is complete and self-sufficient. By complete I mean that it contains all the proofs and not just states results.
nibble  q-n-a  overflow  data-science  stats  methodology  books  recommendations  list  top-n  confluence  proofs  rigor  reference  accretion 
october 2017 by nhaliday
The Downside of Baseball’s Data Revolution—Long Games, Less Action - WSJ
After years of ‘Moneyball’-style quantitative analysis, major-league teams are setting records for inactivity
news  org:rec  trends  sports  data-science  unintended-consequences  quantitative-qualitative  modernity  time  baseball  measure 
october 2017 by nhaliday
Atrocity statistics from the Roman Era
Christian Martyrs [make link]
Gibbon, Decline & Fall v.2 ch.XVI: < 2,000 k. under Roman persecution.
Ludwig Hertling ("Die Zahl de Märtyrer bis 313", 1944) estimated 100,000 Christians killed between 30 and 313 CE. (cited -- unfavorably -- by David Henige, Numbers From Nowhere, 1998)
Catholic Encyclopedia, "Martyr": number of Christian martyrs under the Romans unknown, unknowable. Origen says not many. Eusebius says thousands.

...

General population decline during The Fall of Rome: 7,000,000 [make link]
- Colin McEvedy, The New Penguin Atlas of Medieval History (1992)
- From 2nd Century CE to 4th Century CE: Empire's population declined from 45M to 36M [i.e. 9M]
- From 400 CE to 600 CE: Empire's population declined by 20% [i.e. 7.2M]
- Paul Bairoch, Cities and economic development: from the dawn of history to the present, p.111
- "The population of Europe except Russia, then, having apparently reached a high point of some 40-55 million people by the start of the third century [ca.200 C.E.], seems to have fallen by the year 500 to about 30-40 million, bottoming out at about 20-35 million around 600." [i.e. ca.20M]
- Francois Crouzet, A History of the European Economy, 1000-2000 (University Press of Virginia: 2001) p.1.
- "The population of Europe (west of the Urals) in c. AD 200 has been estimated at 36 million; by 600, it had fallen to 26 million; another estimate (excluding ‘Russia’) gives a more drastic fall, from 44 to 22 million." [i.e. 10M or 22M]

also:
The geometric mean of these two extremes would come to 4½ per day, which is a credible daily rate for the really bad years.

why geometric mean? can you get it as the MLE given min{X1, ..., Xn} and max{X1, ..., Xn} for {X_i} iid Poissons? some kinda limit? think it might just be a rule of thumb.

yeah, it's a rule of thumb. found it it his book (epub).
org:junk  data  let-me-see  scale  history  iron-age  mediterranean  the-classics  death  nihil  conquest-empire  war  peace-violence  gibbon  trivia  multi  todo  AMT  expectancy  heuristic  stats  ML-MAP-E  data-science  estimate  magnitude  population  demographics  database  list  religion  christianity  leviathan 
september 2017 by nhaliday
All models are wrong - Wikipedia
Box repeated the aphorism in a paper that was published in the proceedings of a 1978 statistics workshop.[2] The paper contains a section entitled "All models are wrong but some are useful". The section is copied below.

Now it would be very remarkable if any system existing in the real world could be exactly represented by any simple model. However, cunningly chosen parsimonious models often do provide remarkably useful approximations. For example, the law PV = RT relating pressure P, volume V and temperature T of an "ideal" gas via a constant R is not exactly true for any real gas, but it frequently provides a useful approximation and furthermore its structure is informative since it springs from a physical view of the behavior of gas molecules.

For such a model there is no need to ask the question "Is the model true?". If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?".
thinking  metabuch  metameta  map-territory  models  accuracy  wire-guided  truth  philosophy  stats  data-science  methodology  lens  wiki  reference  complex-systems  occam  parsimony  science  nibble  hi-order-bits  info-dynamics  the-trenches  meta:science  physics  fluid  thermo  stat-mech  applicability-prereqs  theory-practice  elegance  simplification-normalization 
august 2017 by nhaliday
trees are harlequins, words are harlequins — bayes: a kinda-sorta masterpost
lol, gwern: https://www.reddit.com/r/slatestarcodex/comments/6ghsxf/biweekly_rational_feed/diqr0rq/
> What sort of person thinks “oh yeah, my beliefs about these coefficients correspond to a Gaussian with variance 2.5″? And what if I do cross-validation, like I always do, and find that variance 200 works better for the problem? Was the other person wrong? But how could they have known?
> ...Even ignoring the mode vs. mean issue, I have never met anyone who could tell whether their beliefs were normally distributed vs. Laplace distributed. Have you?
I must have spent too much time in Bayesland because both those strike me as very easy and I often think them! My beliefs usually are Laplace distributed when it comes to things like genetics (it makes me very sad to see GWASes with flat priors), and my Gaussian coefficients are actually a variance of 0.70 (assuming standardized variables w.l.o.g.) as is consistent with field-wide meta-analyses indicating that d>1 is pretty rare.
ratty  ssc  core-rats  tumblr  social  explanation  init  philosophy  bayesian  thinking  probability  stats  frequentist  big-yud  lesswrong  synchrony  similarity  critique  intricacy  shalizi  scitariat  selection  mutation  evolution  priors-posteriors  regularization  bias-variance  gwern  reddit  commentary  GWAS  genetics  regression  spock  nitty-gritty  generalization  epistemic  🤖  rationality  poast  multi  best-practices  methodology  data-science 
august 2017 by nhaliday
Analysis of variance - Wikipedia
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:
y_ij = alpha_i + beta_j + eps_ij
Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)
and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?
data-science  stats  methodology  hypothesis-testing  variance-components  concept  conceptual-vocab  thinking  wiki  reference  nibble  multi  visualization  visual-understanding  pic  pdf  exposition  lecture-notes  gelman  scitariat  tutorial  acm  ground-up  yoga 
july 2017 by nhaliday
How accurate are population forecasts?
2 The Accuracy of Past Projections: https://www.nap.edu/read/9828/chapter/4
good ebook:
Beyond Six Billion: Forecasting the World's Population (2000)
https://www.nap.edu/read/9828/chapter/2
Appendix A: Computer Software Packages for Projecting Population
https://www.nap.edu/read/9828/chapter/12
PDE Population Projections looks most relevant for my interests but it's also *ancient*
https://applieddemogtoolbox.github.io/Toolbox/
This Applied Demography Toolbox is a collection of applied demography computer programs, scripts, spreadsheets, databases and texts.

How Accurate Are the United Nations World Population Projections?: http://pages.stern.nyu.edu/~dbackus/BCH/demography/Keilman_JDR_98.pdf

cf. Razib on this: https://pinboard.in/u:nhaliday/b:d63e6df859e8
news  org:lite  prediction  meta:prediction  tetlock  demographics  population  demographic-transition  fertility  islam  world  developing-world  africa  europe  multi  track-record  accuracy  org:ngo  pdf  study  sociology  measurement  volo-avolo  methodology  estimate  data-science  error  wire-guided  priors-posteriors  books  guide  howto  software  tools  recommendations  libraries  gnxp  scitariat 
july 2017 by nhaliday
Econometric Modeling as Junk Science
The Credibility Revolution in Empirical Economics: How Better Research Design Is Taking the Con out of Econometrics: https://www.aeaweb.org/articles?id=10.1257/jep.24.2.3

On data, experiments, incentives and highly unconvincing research – papers and hot beverages: https://papersandhotbeverages.wordpress.com/2015/10/31/on-data-experiments-incentives-and-highly-unconvincing-research/
In my view, it has just to do with the fact that academia is a peer monitored organization. In the case of (bad) data collection papers, issues related to measurement are typically boring. They are relegated to appendices, no one really has an incentive to monitor it seriously. The problem is similar in formal theory: no one really goes through the algebra in detail, but it is in principle feasible to do it, and, actually, sometimes these errors are detected. If discussing the algebra of a proof is almost unthinkable in a seminar, going into the details of data collection, measurement and aggregation is not only hard to imagine, but probably intrinsically infeasible.

Something different happens for the experimentalist people. As I was saying, I feel we have come to a point in which many papers are evaluated based on the cleverness and originality of the research design (“Using the World Cup qualifiers as an instrument for patriotism!? Woaw! how cool/crazy is that! I wish I had had that idea”). The sexiness of the identification strategy has too often become a goal in itself. When your peers monitor you paying more attention to the originality of the identification strategy than to the research question, you probably have an incentive to mine reality for ever crazier discontinuities. It is true methodologists have been criticized in the past for analogous reasons, such as being guided by the desire to increase mathematical complexity without a clear benefit. But, if you work with pure formal theory or statistical theory, your work is not meant to immediately answer question about the real world, but instead to serve other researchers in their quest. This is something that can, in general, not be said of applied CI work.

https://twitter.com/pseudoerasmus/status/662007951415238656
This post should have been entitled “Zombies who only think of their next cool IV fix”
https://twitter.com/pseudoerasmus/status/662692917069422592
massive lust for quasi-natural experiments, regression discontinuities
barely matters if the effects are not all that big
I suppose even the best of things must reach their decadent phase; methodological innov. to manias……

https://twitter.com/cblatts/status/920988530788130816
Following this "collapse of small-N social psych results" business, where do I predict econ will collapse? I see two main contenders.
One is lab studies. I dallied with these a few years ago in a Kenya lab. We ran several pilots of N=200 to figure out the best way to treat
and to measure the outcome. Every pilot gave us a different stat sig result. I could have written six papers concluding different things.
I gave up more skeptical of these lab studies than ever before. The second contender is the long run impacts literature in economic history
We should be very suspicious since we never see a paper showing that a historical event had no effect on modern day institutions or dvpt.
On the one hand I find these studies fun, fascinating, and probably true in a broad sense. They usually reinforce a widely believed history
argument with interesting data and a cute empirical strategy. But I don't think anyone believes the standard errors. There's probably a HUGE
problem of nonsignificant results staying in the file drawer. Also, there are probably data problems that don't get revealed, as we see with
the recent Piketty paper (http://marginalrevolution.com/marginalrevolution/2017/10/pikettys-data-reliable.html). So I take that literature with a vat of salt, even if I enjoy and admire the works
I used to think field experiments would show little consistency in results across place. That external validity concerns would be fatal.
In fact the results across different samples and places have proven surprisingly similar across places, and added a lot to general theory
Last, I've come to believe there is no such thing as a useful instrumental variable. The ones that actually meet the exclusion restriction
are so weird & particular that the local treatment effect is likely far different from the average treatment effect in non-transparent ways.
Most of the other IVs don't plausibly meet the e clue ion restriction. I mean, we should be concerned when the IV estimate is always 10x
larger than the OLS coefficient. This I find myself much more persuaded by simple natural experiments that use OLS, diff in diff, or
discontinuities, alongside randomized trials.

What do others think are the cliffs in economics?
PS All of these apply to political science too. Though I have a special extra target in poli sci: survey experiments! A few are good. I like
Dan Corstange's work. But it feels like 60% of dissertations these days are experiments buried in a survey instrument that measure small
changes in response. These at least have large N. But these are just uncontrolled labs, with negligible external validity in my mind.
The good ones are good. This method has its uses. But it's being way over-applied. More people have to make big and risky investments in big
natural and field experiments. Time to raise expectations and ambitions. This expectation bar, not technical ability, is the big advantage
economists have over political scientists when they compete in the same space.
(Ok. So are there any friends and colleagues I haven't insulted this morning? Let me know and I'll try my best to fix it with a screed)

HOW MUCH SHOULD WE TRUST DIFFERENCES-IN-DIFFERENCES ESTIMATES?∗: https://economics.mit.edu/files/750
Most papers that employ Differences-in-Differences estimation (DD) use many years of data and focus on serially correlated outcomes but ignore that the resulting standard errors are inconsistent. To illustrate the severity of this issue, we randomly generate placebo laws in state-level data on female wages from the Current Population Survey. For each law, we use OLS to compute the DD estimate of its “effect” as well as the standard error of this estimate. These conventional DD standard errors severely understate the standard deviation of the estimators: we find an “effect” significant at the 5 percent level for up to 45 percent of the placebo interventions. We use Monte Carlo simulations to investigate how well existing methods help solve this problem. Econometric corrections that place a specific parametric form on the time-series process do not perform well. Bootstrap (taking into account the auto-correlation of the data) works well when the number of states is large enough. Two corrections based on asymptotic approximation of the variance-covariance matrix work well for moderate numbers of states and one correction that collapses the time series information into a “pre” and “post” period and explicitly takes into account the effective sample size works well even for small numbers of states.

‘METRICS MONDAY: 2SLS–CHRONICLE OF A DEATH FORETOLD: http://marcfbellemare.com/wordpress/12733
As it turns out, Young finds that
1. Conventional tests tend to overreject the null hypothesis that the 2SLS coefficient is equal to zero.
2. 2SLS estimates are falsely declared significant one third to one half of the time, depending on the method used for bootstrapping.
3. The 99-percent confidence intervals (CIs) of those 2SLS estimates include the OLS point estimate over 90 of the time. They include the full OLS 99-percent CI over 75 percent of the time.
4. 2SLS estimates are extremely sensitive to outliers. Removing simply one outlying cluster or observation, almost half of 2SLS results become insignificant. Things get worse when removing two outlying clusters or observations, as over 60 percent of 2SLS results then become insignificant.
5. Using a Durbin-Wu-Hausman test, less than 15 percent of regressions can reject the null that OLS estimates are unbiased at the 1-percent level.
6. 2SLS has considerably higher mean squared error than OLS.
7. In one third to one half of published results, the null that the IVs are totally irrelevant cannot be rejected, and so the correlation between the endogenous variable(s) and the IVs is due to finite sample correlation between them.
8. Finally, fewer than 10 percent of 2SLS estimates reject instrument irrelevance and the absence of OLS bias at the 1-percent level using a Durbin-Wu-Hausman test. It gets much worse–fewer than 5 percent–if you add in the requirement that the 2SLS CI that excludes the OLS estimate.

Methods Matter: P-Hacking and Causal Inference in Economics*: http://ftp.iza.org/dp11796.pdf
Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.

https://twitter.com/NoamJStein/status/1040887307568664577
Ever since I learned social science is completely fake, I've had a lot more time to do stuff that matters, like deadlifting and reading about Mediterranean haplogroups
--
Wait, so, from fakest to realest IV>DD>RCT>RDD? That totally matches my impression.

https://twitter.com/wwwojtekk/status/1190731344336293889
https://archive.is/EZu0h
Great (not completely new but still good to have it in one place) discussion of RCTs and inference in economics by Deaton, my favorite sentences (more general than just about RCT) below
Randomization in the tropics revisited: a theme and eleven variations: https://scholar.princeton.edu/sites/default/files/deaton/files/deaton_randomization_revisited_v3_2019.pdf
org:junk  org:edu  economics  econometrics  methodology  realness  truth  science  social-science  accuracy  generalization  essay  article  hmm  multi  study  🎩  empirical  causation  error  critique  sociology  criminology  hypothesis-testing  econotariat  broad-econ  cliometrics  endo-exo  replication  incentives  academia  measurement  wire-guided  intricacy  twitter  social  discussion  pseudoE  effect-size  reflection  field-study  stat-power  piketty  marginal-rev  commentary  data-science  expert-experience  regression  gotchas  rant  map-territory  pdf  simulation  moments  confidence  bias-variance  stats  endogenous-exogenous  control  meta:science  meta-analysis  outliers  summary  sampling  ensembles  monte-carlo  theory-practice  applicability-prereqs  chart  comparison  shift  ratty  unaffiliated  garett-jones 
june 2017 by nhaliday
« earlier      
per page:    204080120160

bundles : acmengframetechie

related tags

2016-election  ability-competence  abortion-contraception-embryo  absolute-relative  abstraction  academia  accretion  accuracy  acm  acmtariat  additive  advanced  adversarial  advertising  advice  africa  aggregator  ai  ai-control  albion  algebra  algorithms  alt-inst  AMT  analogy  analysis  anglo  announcement  anomie  aphorism  api  apollonian-dionysian  app  applicability-prereqs  applications  arbitrage  arrows  art  article  asia  atoms  audio  authoritarianism  auto-learning  automata-languages  automation  average-case  aversion  backup  bandits  bangbang  baseball  bayesian  behavioral-econ  behavioral-gen  being-right  ben-recht  benchmarks  berkeley  best-practices  better-explained  bias-variance  biases  big-peeps  big-picture  big-yud  bio  biodet  biohacking  bioinformatics  biotech  bits  blockchain  blog  bonferroni  books  bounded-cognition  brands  bret-victor  brexit  britain  broad-econ  business  c(pp)  c:**  caching  calculation  calculator  caltech  capital  capitalism  career  carmack  CAS  causation  censorship  characterization  chart  cheatsheet  checking  checklists  chemistry  china  christianity  civic  cjones-like  clarity  class  class-warfare  classic  classification  clever-rats  cliometrics  cloud  cmu  coalitions  code-organizing  cog-psych  cohesion  comics  commentary  communication  communism  community  comparison  compensation  competition  compilers  complement-substitute  complex-systems  composition-decomposition  compression  computer-memory  computer-vision  concentration-of-measure  concept  conceptual-vocab  concurrency  conference  confidence  config  confluence  confusion  conquest-empire  consilience  contest  contrarianism  control  convexity-curvature  cool  cooperate-defect  core-rats  correlation  cost-benefit  cost-disease  counter-revolution  counterexample  coupling-cohesion  courage  course  cracker-econ  cracker-prog  creative  crime  criminal-justice  criminology  CRISPR  critique  crooked  crosstab  crypto  crypto-anarchy  cryptocurrency  cs  culture  culture-war  curiosity  current-events  cycles  d3  dan-luu  darwinian  data  data-science  database  dataset  dataviz  dbs  death  debate  debt  debugging  decision-making  decision-theory  deep-learning  defense  definite-planning  definition  degrees-of-freedom  demographic-transition  demographics  density  dependence-independence  descriptive  design  desktop  detail-architecture  developing-world  devops  devtools  differential  differential-privacy  dimensionality  direction  dirty-hands  discipline  discovery  discrimination  discussion  distribution  distributional  diversity  documentation  dotnet  douthatish  draft  DSL  dumb-ML  duplication  duty  dynamic  dysgenics  ecology  econ-metrics  econometrics  economics  econotariat  ecosystem  editors  education  effect-size  egalitarianism-hierarchy  eh  elections  electromag  elegance  elite  embedded  embodied  empirical  encyclopedic  endo-exo  endogenous-exogenous  ends-means  energy-resources  engineering  enhancement  ensembles  entrepreneurialism  entropy-like  environment  envy  epistemic  eric-kaufmann  erlang  error  essay  estimate  ethical-algorithms  ethics  europe  evan-miller  events  evidence-based  evolution  examples  exocortex  expectancy  experiment  expert  expert-experience  explanation  exploratory  exposition  extrema  facebook  faq  features  fertility  ffi  fiction  field-study  film  finance  fisher  fluid  flux-stasis  foreign-lang  foreign-policy  form-design  formal-values  forms-instances  forum  fourier  frameworks  french  frequentist  frontend  frontier  functional  futurism  gallic  games  garett-jones  gavisti  gedanken  gelman  gender  gender-diff  generalization  generative  genetic-correlation  genetics  genomics  geography  geometry  geopolitics  giants  gibbon  gilens-page  git  gnon  gnosis-logos  gnxp  golang  google  gotchas  government  gowers  gradient-descent  graph-theory  graphical-models  graphics  graphs  grokkability  grokkability-clarity  ground-up  group-selection  growth  growth-econ  guide  GWAS  gwern  hacker  hardware  hari-seldon  haskell  hci  healthcare  heavy-industry  heuristic  hg  hi-order-bits  high-dimension  higher-ed  history  hmm  hn  homepage  homo-hetero  housing  howto  hsu  huge-data-the-biggest  human-capital  human-ml  hypothesis-testing  icml  ide  ideas  identity-politics  ideology  idk  IEEE  iidness  impact  impro  incentives  india  inequality  info-dynamics  info-foraging  infographic  information-theory  init  innovation  institutions  integration-extension  intelligence  interdisciplinary  interests  interface  interface-compatibility  internet  interpretability  intersection  intersection-connectedness  intervention  interview  intricacy  intuition  invariance  investigative-journo  investing  iq  iron-age  is-ought  islam  iteration-recursion  iterative-methods  jargon  javascript  jobs  journos-pundits  judaism  julia  jvm  kaggle  kernels  knowledge  korea  krugman  kumbaya-kult  labor  language  large-factor  latent-variables  latex  law  leadership  learning-theory  lecture-notes  lectures  left-wing  legacy  len:short  lens  lesswrong  let-me-see  letters  levers  leviathan  lexical  libraries  life-history  limits  linear-algebra  linear-models  linearity  liner-notes  links  linux  lisp  list  literature  live-coding  logic  lol  long-short-run  long-term  longitudinal  low-hanging  lower-bounds  machine-learning  macro  madisonian  magnitude  malaise  management  manifolds  map-territory  maps  marginal  marginal-rev  market-failure  marketing  markets  markov  martial  martingale  matching  math  math.AT  math.CA  math.CO  math.GR  math.MG  math.NT  math.RT  mathtariat  matrix-factorization  measure  measurement  media  mediterranean  MENA  mendel-randomization  mental-math  meta-analysis  meta:prediction  meta:rhetoric  meta:science  metabuch  metal-to-virtual  metameta  methodology  metric-space  metrics  michael-nielsen  micro  military  minimalism  minimum-viable  miri-cfar  missing-heritability  mit  ML-MAP-E  mobility  model-class  model-selection  models  modernity  mokyr-allen-mccloskey  moments  money  monte-carlo  mooc  morality  mostly-modern  motivation  move-fast-(and-break-things)  mrtz  multi  multiplicative  music  music-theory  mutation  nascent-state  nationalism-globalism  nature  network-structure  networking  neuro  neuro-nitgrit  new-religion  news  nibble  nihil  nips  nitty-gritty  nl-and-so-can-you  nlp  no-go  noise-structure  nonlinearity  nonparametric  norms  notation  notetaking  novelty  null-result  numerics  nyc  objektbuch  ocaml-sml  occam  ocr  old-anglo  oly  online-learning  oop  open-closed  operational  optimate  optimism  optimization  orders  ORFE  org:anglo  org:biz  org:bleg  org:bv  org:com  org:data  org:davos  org:edu  org:gov  org:inst  org:junk  org:lite  org:mag  org:mat  org:med  org:ngo  org:popup  org:rec  org:sci  organization  organizing  oss  osx  outcome-risk  outliers  overflow  p:***  p:someday  p:whenever  paas  papers  paradox  parametric  parenting  pareto  parsimony  paste  patho-altruism  paying-rent  pdf  peace-violence  people  performance  personal-finance  perturbation  pessimism  phalanges  phd  philosophy  phys-energy  physics  pic  piketty  pinboard  planning  plots  pls  poast  podcast  polanyi-marx  polarization  policy  polisci  politics  poll  polynomials  population  populism  positivity  postmortem  power  power-law  ppl  pragmatic  pre-2013  prediction  prediction-markets  preference-falsification  preprint  presentation  princeton  prioritizing  priors-posteriors  privacy  pro-rata  probability  productivity  prof  profile  programming  progression  project  proofs  propaganda  properties  protocol-metadata  prudence  pseudoE  psych-architecture  psycho-atoms  psychology  psychometrics  publishing  putnam-like  puzzles  python  q-n-a  qra  quality  quantitative-qualitative  quantum  questions  quixotic  quora  quotes  r-lang  race  rand-approx  random  randy-ayndy  ranking  rant  rat-pack  rationality  ratty  reading  realness  reason  rec-math  recommendations  recruiting  reddit  redistribution  reference  reflection  regression  regression-to-mean  regularization  regularizer  regulation  reinforcement  relativity  religion  rent-seeking  replication  repo  research  research-program  review  rhetoric  right-wing  rigor  risk  roadmap  robotics  robust  rock  roots  rot  rust  s:*  s:**  s:null  saas  sample-complexity  sampling  sampling-bias  scala  scale  scaling-tech  scaling-up  scholar  scholar-pack  sci-comp  science  scitariat  search  security  selection  sensitivity  sequential  shalizi  shift  shipping  short-circuit  sib-study  SIGGRAPH  signal-noise  signaling  signum  similarity  simplex  simplification-normalization  simulation  sinosphere  skeleton  skunkworks  slides  social  social-choice  social-psych  social-science  sociology  socs-and-mops  soft-question  software  solid-study  space-complexity  spatial  spearhead  speedometer  spock  sports  ssc  stackex  stagnation  stanford  startups  stat-mech  stat-power  state  state-of-art  static-dynamic  stats  status  stochastic-processes  stock-flow  stories  strategy  stream  street-fighting  stress  strings  stripe  structure  study  stylized-facts  sub-super  subculture  sublinear  summary  supply-demand  survey  sv  symmetry  synchrony  syntax  synthesis  system-design  systematic-ad-hoc  systems  tactics  tails  talks  taxes  tcs  tcstariat  tech  tech-infrastructure  technical-writing  technocracy  technology  techtariat  telos-atelos  temperature  terminal  tetlock  the-bones  the-classics  the-trenches  the-world-is-just-atoms  theory-of-mind  theory-practice  thermo  thesis  thick-thin  things  thinking  tidbits  time  time-complexity  time-preference  time-series  tip-of-tongue  todo  tools  top-n  topics  topology  traces  track-record  tracker  trade  tradeoffs  transitions  transportation  trees  trends  tricks  trivia  trump  truth  tumblr  tutorial  twitter  types  ubiquity  ui  unaffiliated  uncertainty  unintended-consequences  unit  universalism-particularism  unix  unsupervised  urban  urban-rural  usa  vague  variance-components  vcs  video  virginia-DC  visual-understanding  visualization  visuo  volo-avolo  vr  vulgar  war  wealth  web  webapp  west-hunter  whiggish-hegelian  whole-partial-many  wiki  winner-take-all  wire-guided  wonkish  workflow  working-stiff  workshop  world  world-war  worrydream  worse-is-better/the-right-thing  writing  wut  yak-shaving  yc  yoga  yvain  zero-positive-sum  🌞  🎩  🐸  👳  🔬  🖥  🤖  🦉 

Copy this bookmark:



description:


tags: