nhaliday + acm + variance-components   4

Analysis of variance - Wikipedia
Analysis of variance (ANOVA) is a collection of statistical models used to analyze the differences among group means and their associated procedures (such as "variation" among and between groups), developed by statistician and evolutionary biologist Ronald Fisher. In the ANOVA setting, the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether or not the means of several groups are equal, and therefore generalizes the t-test to more than two groups. ANOVAs are useful for comparing (testing) three or more means (groups or variables) for statistical significance. It is conceptually similar to multiple two-sample t-tests, but is more conservative (results in less type I error) and is therefore suited to a wide range of practical problems.

good pic: https://en.wikipedia.org/wiki/Analysis_of_variance#Motivating_example

tutorial by Gelman: http://www.stat.columbia.edu/~gelman/research/published/econanova3.pdf

so one way to think of partitioning the variance:
y_ij = alpha_i + beta_j + eps_ij
Var(y_ij) = Var(alpha_i) + Var(beta_j) + Cov(alpha_i, beta_j) + Var(eps_ij)
and alpha_i, beta_j are independent, so Cov(alpha_i, beta_j) = 0

can you make this work w/ interaction effects?
data-science  stats  methodology  hypothesis-testing  variance-components  concept  conceptual-vocab  thinking  wiki  reference  nibble  multi  visualization  visual-understanding  pic  pdf  exposition  lecture-notes  gelman  scitariat  tutorial  acm  ground-up  yoga 
july 2017 by nhaliday
Pearson correlation coefficient - Wikipedia
what does this mean?: https://twitter.com/GarettJones/status/863546692724858880
deleted but it was about the Pearson correlation distance: 1-r
I guess it's a metric


A less misleading way to think about the correlation R is as follows: given X,Y from a standardized bivariate distribution with correlation R, an increase in X leads to an expected increase in Y: dY = R dX. In other words, students with +1 SD SAT score have, on average, roughly +0.4 SD college GPAs. Similarly, students with +1 SD college GPAs have on average +0.4 SAT.

this reminds me of the breeder's equation (but it uses r instead of h^2, so it can't actually be the same)

stats  science  hypothesis-testing  correlation  metrics  plots  regression  wiki  reference  nibble  methodology  multi  twitter  social  discussion  best-practices  econotariat  garett-jones  concept  conceptual-vocab  accuracy  causation  acm  matrix-factorization  todo  explanation  yoga  hsu  street-fighting  levers  🌞  2014  scitariat  variance-components  meta:prediction  biodet  s:**  mental-math  reddit  commentary  ssc  poast  gwern  data-science  metric-space  similarity  measure  dependence-independence 
may 2017 by nhaliday
Information Processing: Assortative mating, regression and all that: offspring IQ vs parental midpoint
Assuming parental midpoint of n SD above the population average, the kids' IQ will be normally distributed about a mean which is around +.6n with residual SD of about 12 points. (The .6 could actually be anywhere in the range (.5, .7), but the SD doesn't vary much from choice of empirical inputs.)

possible to calculate the residual variance from first principles?

Some data on regression: http://infoproc.blogspot.com/2010/10/some-data-on-regression.html
hsu  parenting  iq  regression-to-mean  street-fighting  explanation  methodology  assortative-mating  scitariat  variance-components  biodet  nibble  behavioral-gen  multi  data  stories  education  acm 
november 2016 by nhaliday

bundles : academeacmframesci

Copy this bookmark: