hypothesis-testing   224

« earlier    

Robustness checks are a joke - Statistical Modeling, Causal Inference, and Social Science
The problem as I see it is that robustness checks are supposed to be for exploration but are typically used for confirmation.

Maybe another way to put it is: As long as we recognize that robustness checks are typically used for confirmation, we can interpret them in that way. Thus, instead of taking a robustness check as evidence that a claimed finding is robust, we should take a robustness check as providing evidence on particular directions the model can be perturbed without changing the main conclusions.
hypothesis-testing  robustness  sensitivity 
november 2018 by arsyed
Methods Matter: P-Hacking and Causal Inference in Economics | IZA - Institute of Labor Economics
The economics 'credibility revolution' has promoted the identification of causal relationships using difference-in-differences (DID), instrumental variables (IV), randomized control trials (RCT) and regression discontinuity design (RDD) methods. The extent to which a reader should trust claims about the statistical significance of results proves very sensitive to method. Applying multiple methods to 13,440 hypothesis tests reported in 25 top economics journals in 2015, we show that selective publication and p-hacking is a substantial problem in research employing DID and (in particular) IV. RCT and RDD are much less problematic. Almost 25% of claims of marginally significant results in IV papers are misleading.
p-hacking  p-values  hypothesis-testing  statistics  economics 
october 2018 by arsyed
Machine Learning Trick of the Day (7): Density Ratio Trick ← The Spectator
"Comparisons are the drivers of learning. And the density ratio trick is a generic tool that makes comparison a statistical operations that can be used widely—by replacing density ratios where we see them with classifiers—and using it in conjunction with other tricks. It is the importance of comparison that makes Bayesian statistical approaches interesting, since, by learning entire distributions rather than point-estimates, we always strive to make the widest set of comparisons possible. And this trick also highlights the power of other principles of learning, in particular of likelihood-free estimation."
machine-learning  tricks  density  ratio  classification  hypothesis-testing 
june 2018 by arsyed
Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations | SpringerLink
Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so—and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.
statistics  hypothesis-testing 
june 2018 by arsyed
Some good "Statistics for programmers" resources
This post is basically a list of books & other resources that teach statistics using programming.
statistics  Programming  learning  hypothesis-testing  confidence-intervals  probability  t-tests  normal-distribution  boostrapping 
february 2018 by rishaanp
The Gelman View – spottedtoad
I have read Andrew Gelman’s blog for about five years, and gradually, I’ve decided that among his many blog posts and hundreds of academic articles, he is advancing a philosophy not just of statistics but of quantitative social science in general. Not a statistician myself, here is how I would articulate the Gelman View:

A. Purposes

1. The purpose of social statistics is to describe and understand variation in the world. The world is a complicated place, and we shouldn’t expect things to be simple.
2. The purpose of scientific publication is to allow for communication, dialogue, and critique, not to “certify” a specific finding as absolute truth.
3. The incentive structure of science needs to reward attempts to independently investigate, reproduce, and refute existing claims and observed patterns, not just to advance new hypotheses or support a particular research agenda.

B. Approach

1. Because the world is complicated, the most valuable statistical models for the world will generally be complicated. The result of statistical investigations will only rarely be to  give a stamp of truth on a specific effect or causal claim, but will generally show variation in effects and outcomes.
2. Whenever possible, the data, analytic approach, and methods should be made as transparent and replicable as possible, and should be fair game for anyone to examine, critique, or amend.
3. Social scientists should look to build upon a broad shared body of knowledge, not to “own” a particular intervention, theoretic framework, or technique. Such ownership creates incentive problems when the intervention, framework, or technique fail and the scientist is left trying to support a flawed structure.


1. Measurement. How and what we measure is the first question, well before we decide on what the effects are or what is making that measurement change.
2. Sampling. Who we talk to or collect information from always matters, because we should always expect effects to depend on context.
3. Inference. While models should usually be complex, our inferential framework should be simple enough for anyone to follow along. And no p values.

He might disagree with all of this, or how it reflects his understanding of his own work. But I think it is a valuable guide to empirical work.
ratty  unaffiliated  summary  gelman  scitariat  philosophy  lens  stats  hypothesis-testing  science  meta:science  social-science  institutions  truth  is-ought  best-practices  data-science  info-dynamics  alt-inst  academia  empirical  evidence-based  checklists  strategy  epistemic 
november 2017 by nhaliday
Use and Interpretation of LD Score Regression
LD Score regression distinguishes confounding from polygenicity in genome-wide association studies: https://sci-hub.bz/10.1038/ng.3211
- Po-Ru Loh, Nick Patterson, et al.


Both polygenicity (i.e. many small genetic effects) and confounding biases, such as cryptic relatedness and population stratification, can yield inflated distributions of test statistics in genome-wide association studies (GWAS). However, current methods cannot distinguish between inflation from bias and true signal from polygenicity. We have developed an approach that quantifies the contributions of each by examining the relationship between test statistics and linkage disequilibrium (LD). We term this approach LD Score regression. LD Score regression provides an upper bound on the contribution of confounding bias to the observed inflation in test statistics and can be used to estimate a more powerful correction factor than genomic control. We find strong evidence that polygenicity accounts for the majority of test statistic inflation in many GWAS of large sample size.

Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n3/extref/ng.3211-S1.pdf

An atlas of genetic correlations across human diseases
and traits: https://sci-hub.bz/10.1038/ng.3406


Supplementary Note: https://images.nature.com/original/nature-assets/ng/journal/v47/n11/extref/ng.3406-S1.pdf

ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.
nibble  pdf  slides  talks  bio  biodet  genetics  genomics  GWAS  genetic-correlation  correlation  methodology  bioinformatics  concept  levers  🌞  tutorial  explanation  pop-structure  gene-drift  ideas  multi  study  org:nat  article  repo  software  tools  libraries  stats  hypothesis-testing  biases  confounding  gotchas  QTL  simulation  survey  preprint  population-genetics 
november 2017 by nhaliday
Fitting a Structural Equation Model
seems rather unrigorous: nonlinear optimization, possibility of nonconvergence, doesn't even mention local vs. global optimality...
pdf  slides  lectures  acm  stats  hypothesis-testing  graphs  graphical-models  latent-variables  model-class  optimization  nonlinearity  gotchas  nibble  ML-MAP-E  iteration-recursion  convergence 
november 2017 by nhaliday
Ancient Admixture in Human History
- Patterson, Reich et al., 2012
Population mixture is an important process in biology. We present a suite of methods for learning about population mixtures, implemented in a software package called ADMIXTOOLS, that support formal tests for whether mixture occurred and make it possible to infer proportions and dates of mixture. We also describe the development of a new single nucleotide polymorphism (SNP) array consisting of 629,433 sites with clearly documented ascertainment that was specifically designed for population genetic analyses and that we genotyped in 934 individuals from 53 diverse populations. To illustrate the methods, we give a number of examples that provide new insights about the history of human admixture. The most striking finding is a clear signal of admixture into northern Europe, with one ancestral population related to present-day Basques and Sardinians and the other related to present-day populations of northeast Asia and the Americas. This likely reflects a history of admixture between Neolithic migrants and the indigenous Mesolithic population of Europe, consistent with recent analyses of ancient bones from Sweden and the sequencing of the genome of the Tyrolean “Iceman.”
nibble  pdf  study  article  methodology  bio  sapiens  genetics  genomics  population-genetics  migration  gene-flow  software  trees  concept  history  antiquity  europe  roots  gavisti  🌞  bioinformatics  metrics  hypothesis-testing  levers  ideas  libraries  tools  pop-structure 
november 2017 by nhaliday

« earlier    

related tags

2014  academia  accretion  accuracy  acm  acmtariat  active-learning  adversarial  advice  albion  algorithms  alt-inst  analysis  antiquity  aphorism  applicability-prereqs  article  assortative-mating  audio  autism  backup  bayesian  behavioral-gen  best-practices  better-explained  bias-variance  biases  big-peeps  big-picture  bio  biodet  bioinformatics  bits  bonferroni  books  boostrapping  bounded-cognition  brain-scan  broad-econ  calculator  causation  chart  cheatsheet  checking  checklists  clarity  classification  cliometrics  cmu  code  cog-psych  commentary  comparison  complex-systems  concentration-of-measure  concept  conceptual-vocab  confidence-intervals  confidence  confluence  confounding  confusion  control  convergence  correlation  counterexample  cracker-econ  crime  criminology  critique  culture  curiosity  data-analysis  data-science  data  database  dataviz  debate  decision-making  definition  degrees-of-freedom  density  dependence-independence  detection  differential-privacy  dimensionality  direct-indirect  direction  discovery  discussion  disease  distribution  draft  econometrics  economics  econotariat  education  effect-size  empirical  encyclopedic  endo-exo  endogenous-exogenous  engineering  ensembles  epistemic  equilibrium  error  essay  ethics  europe  evidence-based  examples  expectancy  expert-experience  expert  explanation  exposition  faq  field-study  flowchart  foreign-policy  frank-harrell  frequentist  garett-jones  gavisti  gelman  gene-drift  gene-flow  generalization  genetic-correlation  genetic-load  genetics  genomics  giants  gnon  gotchas  gradient-descent  graphical-models  graphs  ground-up  gwas  gwern  gxe  history  hmm  hn  howto  hsu  human-ml  ideas  identity  iidness  incentives  info-dynamics  information-theory  innovation  institutions  integrity  interdisciplinary  intersection-connectedness  intersection  intervention  intricacy  ioannidis  iq  is-ought  iteration-recursion  jargon  journos-pundits  latent-variables  learning-theory  learning  lecture-notes  lectures  left-wing  lens  levers  libraries  lifts-projections  limits  liner-notes  links  list  localization  looking-to-see  lower-bounds  machine-learning  magnitude  manifolds  map-territory  marginal-rev  matrix-factorization  measure  measurement  medicine  mental-math  meta-analysis  meta:prediction  meta:rhetoric  meta:science  metabuch  metameta  methodology  metric-space  metrics  michael-nielsen  migration  missing-heritability  mit  ml-map-e  model-class  model-organism  models  moments  monte-carlo  mostly-modern  motivation  mrtz  multi  natural-experiment  networks  neuro  news  nibble  nitty-gritty  no-go  noise-structure  nonlinearity  nonparametric  normal-distribution  normality  null-result  objektbuch  old-anglo  online-learning  optimization  orders  org:bleg  org:data  org:econlib  org:edu  org:junk  org:nat  org:ngo  org:sci  oss  outliers  overflow  p-hacking  p-value  p-values  p:someday  p:whenever  papers  paradox  parametric  pdf  pennsylvania  performance  personality  perturbation  philosophical  philosophy-of-science  philosophy  pic  piketty  piracy  plots  poast  policy  polisci  pop-structure  population-genetics  population  postmortem  pragmatic  pre-ww2  preprint  presentation  priors-posteriors  probability  programming  project  proofs  propaganda  pseudoe  psychiatry  psychology  psychometrics  publishing  q-n-a  qra  qtl  quixotic  quora  quotes  r-project  rant  ratio  ratty  reading  realness  recommendations  reddit  reference  reflection  regression-to-mean  regression  regularizer  replication  repo  research-program  research  rhetoric  rigor  roadmap  robust  robustness  roots  s:**  sampling  sapiens  scale  scaling-up  science  scitariat  search  sensitivity  shift  sib-study  signal-noise  similarity  simulation  slides  social-choice  social-psych  social-science  social  sociology  software  solid-study  spatial  spearhead  ssc  stackexchangefavs  stat-power  statistics  stats  stories  strategy  street-fighting  study  stylized-facts  subjective-objective  summary  survey  syllabus  synchrony  systems  t-tests  talks  tcstariat  techtariat  tests  the-trenches  theory-practice  thesis  thick-thin  things  thinking  time-complexity  to-read  to-write-about  todo  tools  top-n  trees  trends  tricks  truth  tutorial  twin-study  twitter  unaffiliated  unit  usa  variance-components  virginia-dc  visual-understanding  visualization  volo-avolo  wiki  winter-2016  wire-guided  wut  yoga  🌞  🎩  👳  👽  🔬 

Copy this bookmark: