ab   3801

« earlier    

Experiments at Airbnb – Airbnb Engineering & Data Science – Medium
Why did we know to not stop when the p-value hit 0.05? It turns out that this pattern of hitting “significance” early and then converging back to a neutral result is actually quite common in our system. There are various reasons for this. Users often take a long time to book, so the early converters have a disproportionately large influence in the beginning of the experiment. Also, even small sample sizes in online experiments are massive in the scale of classical statistics in which these methods were developed. Since the statistical test is a function of the sample- and effect sizes, if an early effect size is large through natural variation it is likely for the p-value to be below 0.05 early. But the most important reason is that you are performing a statistical test every time you compute a p-value and the more you do it, the more likely you are to find an effect.
AB  testing  stopping 
15 days ago by foodbaby

« earlier    

related tags

@scale  a/b  aa  ab-test  abtesting  armed  bandit  bayesian  bias  bonferroni  calculator  canary  cash  correction  course  cre  cregen  cretransmission  cretreatment  deployments  design  development  edm  ethics  experiments  facebook  framework  go-lang  hash  holidays  http  iid  illusion  infusion  interleaving  load  mdr  metrics  microsoft  multi  multiple  ndm-1  ndm  obama  optimisation  pa  papers  peaking  recipes  segment  selection  software  split  stopping  test  testing  user  ux  variance  video  vs  workouts 

Copy this bookmark: