bias-variance   40

trees are harlequins, words are harlequins — bayes: a kinda-sorta masterpost
lol, gwern: https://www.reddit.com/r/slatestarcodex/comments/6ghsxf/biweekly_rational_feed/diqr0rq/
> What sort of person thinks “oh yeah, my beliefs about these coefficients correspond to a Gaussian with variance 2.5″? And what if I do cross-validation, like I always do, and find that variance 200 works better for the problem? Was the other person wrong? But how could they have known?
> ...Even ignoring the mode vs. mean issue, I have never met anyone who could tell whether their beliefs were normally distributed vs. Laplace distributed. Have you?
I must have spent too much time in Bayesland because both those strike me as very easy and I often think them! My beliefs usually are Laplace distributed when it comes to things like genetics (it makes me very sad to see GWASes with flat priors), and my Gaussian coefficients are actually a variance of 0.70 (assuming standardized variables w.l.o.g.) as is consistent with field-wide meta-analyses indicating that d>1 is pretty rare.
ratty  ssc  core-rats  tumblr  social  explanation  init  philosophy  bayesian  thinking  probability  stats  frequentist  big-yud  lesswrong  synchrony  similarity  critique  intricacy  shalizi  scitariat  selection  mutation  evolution  priors-posteriors  regularization  bias-variance  gwern  reddit  commentary  GWAS  genetics  regression  spock  nitty-gritty  generalization  epistemic  🤖  rationality  poast  multi  best-practices  methodology  data-science 
august 2017 by nhaliday
POPULATION STRUCTURE AND QUANTITATIVE CHARACTERS
The variance of among-group variance is substantial and does not depend on the number of loci contributing to variance in the character. It is just as large for polygenic characters as for single loci with the same additive variance. This implies that one polygenic character contains exactly as much information about population relationships as one single-locus marker.

same is true of expectation apparently (so drift has same impact on polygenic and single-locus traits)
pdf  study  west-hunter  scitariat  bio  genetics  genomics  sapiens  QTL  correlation  null-result  magnitude  nibble  🌞  models  population-genetics  methodology  regularizer  moments  bias-variance  pop-diff  pop-structure  gene-drift 
may 2017 by nhaliday
probability - Variance of maximum of Gaussian random variables - Cross Validated
In full generality it is rather hard to find the right order of magnitude of the variance of a Gaussien supremum since the tools from concentration theory are always suboptimal for the maximum function.

order ~ 1/log n
q-n-a  overflow  stats  probability  acm  orders  tails  bias-variance  moments  concentration-of-measure  magnitude  tidbits  distribution  yoga  structure  extrema  nibble 
february 2017 by nhaliday
bounds - What is the variance of the maximum of a sample? - Cross Validated
- sum of variances is always a bound
- can't do better even for iid Bernoulli
- looks like nice argument from well-known probabilist (using E[(X-Y)^2] = 2Var X), but not clear to me how he gets to sum_i instead of sum_{i,j} in the union bound?
edit: argument is that, for j = argmax_k Y_k, we have r < X_i - Y_j <= X_i - Y_i for all i, including i = argmax_k X_k
- different proof here (later pages): http://www.ism.ac.jp/editsec/aism/pdf/047_1_0185.pdf
Var(X_n:n) <= sum Var(X_k:n) + 2 sum_{i < j} Cov(X_i:n, X_j:n) = Var(sum X_k:n) = Var(sum X_k) = nσ^2
why are the covariances nonnegative? (are they?). intuitively seems true.
- for that, see https://pinboard.in/u:nhaliday/b:ed4466204bb1
- note that this proof shows more generally that sum Var(X_k:n) <= sum Var(X_k)
- apparently that holds for dependent X_k too? http://mathoverflow.net/a/96943/20644
q-n-a  overflow  stats  acm  distribution  tails  bias-variance  moments  estimate  magnitude  probability  iidness  tidbits  concentration-of-measure  multi  orders  levers  extrema  nibble  bonferroni  coarse-fine  expert  symmetry  s:*  expert-experience  proofs 
february 2017 by nhaliday
Count–min sketch - Wikipedia
- estimates frequency vector (f_i)
- idea:
d = O(log 1/δ) hash functions h_j: [n] -> [w] (w = O(1/ε))
d*w counters a[r, c]
for each event i, increment counters a[1, h_1(i)], a[2, h_2(i)], ..., a[d, h_d(i)]
estimate for f_i is min_j a[j, h_j(i)]
- never underestimates but upward-biased
- pf: Markov to get constant probability of success, then exponential decrease with repetition
lecture notes: http://theory.stanford.edu/~tim/s15/l/l2.pdf
- note this can work w/ negative updates. just use median instead of min. pf still uses markov on the absolute value of error.
algorithms  data-structures  sublinear  hashing  wiki  reference  bias-variance  approximation  random  tcs  multi  stanford  lecture-notes  pdf  tim-roughgarden  nibble  pigeonhole-markov  PAC 
february 2017 by nhaliday
teaching - Intuitive explanation for dividing by $n-1$ when calculating standard deviation? - Cross Validated
The standard deviation calculated with a divisor of n-1 is a standard deviation calculated from the sample as an estimate of the standard deviation of the population from which the sample was drawn. Because the observed values fall, on average, closer to the sample mean than to the population mean, the standard deviation which is calculated using deviations from the sample mean underestimates the desired standard deviation of the population. Using n-1 instead of n as the divisor corrects for that by making the result a little bit bigger.

Note that the correction has a larger proportional effect when n is small than when it is large, which is what we want because when n is larger the sample mean is likely to be a good estimator of the population mean.

...

A common one is that the definition of variance (of a distribution) is the second moment recentered around a known, definite mean, whereas the estimator uses an estimated mean. This loss of a degree of freedom (given the mean, you can reconstitute the dataset with knowledge of just n−1 of the data values) requires the use of n−1 rather than nn to "adjust" the result.
q-n-a  overflow  stats  acm  intuition  explanation  bias-variance  methodology  moments  nibble  degrees-of-freedom  sampling-bias  generalization  dimensionality  ground-up  intricacy 
january 2017 by nhaliday
Nuts and Bolts of Applying Deep Learning
"1. When the available data is not enough, hand-craft work (like feature design) is really important.
2. Andrew also mentioned the reason why End-2-End learning is less likely to plateau, according to my interpretation(NOT VERY SURE), end2end system neglects the human-design intermediate structures (eg. phonemes, which may be the bottleneck for performance improvement), so as more data comes, the true mechanism will be better learned, with better performance achieved."
machine-learning  best-practices  nips  2016  andrew-ng  bias-variance  overiftting  e2e 
january 2017 by arsyed
Understanding the Pseudo-Truth as an Optimal Approximation
for a mis-specified model m1 and a true model m2 (ie m2 generated the data), a model is the "pseudo-truth" if it is the version of m1 which is closest to m2.

Bias (and variance) as an "asymptotic property of a model class" vs. "a finite-sample property of an estimator". Paul Mineiro describes the "asymptotic property" view, and White adds, "But Paul Mineiro is not, I think, interested in these finite-sample properties of estimators. I believe he’s concerned about the intrinsic error introduced by approximating one function with another. And that’s a very important topic that I haven’t seen discussed as often as I’d like."
stats  bias  variance  bias-variance  machine-learning  statistics  prediction  modeling 
january 2017 by tarakc02
pr.probability - Google question: In a country in which people only want boys - MathOverflow
- limits to 1/2 w/ number of families -> ∞
- proportion of girls in one family is biased estimator of proportion in general population (larger families w/ more girls count more)
- interesting comment on Douglas Zare's answer (whether process has stopped or not)
puzzles  math  google  thinking  probability  q-n-a  gotchas  tidbits  math.CO  overflow  nibble  paradox  gender  bias-variance  stochastic-processes 
december 2016 by nhaliday

related tags

2016  accuracy  acm  acmtariat  algorithms  analysis  andrew-ng  approximation  atoms  bagging  bangbang  bayesian  berkeley  best-practices  bias  biases  big-picture  big-yud  bio  bonferroni  books  boosting  bounded-cognition  broad-econ  censorship  characterization  classification  coarse-fine  commentary  comparison  complex-systems  concentration-of-measure  concept  confusion  core-rats  correlation  crime  criminal-justice  criminology  critique  cross-validation  curiosity  data-science  data-structures  data  debate  debugging  decision_theory  decomposition  definition  degrees-of-freedom  dimensionality  discussion  distribution  diversity  douthatish  e2e  econotariat  ensemble  ensembles  epistemic  error  essay  estimate  ethical-algorithms  evolution  expert-experience  expert  explanation  exposition  extrema  finance  frequentist  gender  gene-drift  generalization  genetics  genomics  gnon  google  gotchas  gradient-descent  ground-up  gwas  gwern  hashing  heterodox  hn  huge-data-the-biggest  identity  iidness  info-dynamics  init  interdisciplinary  interpretability  intricacy  intuition  investigative-journo  investing  iterative-methods  jargon  knn  law  lecture-notes  len:long  lens  lesswrong  levers  limits  linear-models  linearity  liner-notes  lko  loo  loss-function  machine-learning  madisonian  magnitude  marginal  math.co  math  media  meta:prediction  meta:rhetoric  meta:science  methodology  metrics  ml  mlgems  model-class  model-complexity  model-selection  modeling  models  moments  monte-carlo  motivation  multi  mutation  news  nibble  nips  nitty-gritty  noise-structure  nonlinearity  nonparametric  null-result  oly  online-learning  optimization  orders  orfe  org:bleg  org:edu  org:mag  org:popup  org:rec  outcome-risk  overfitting  overflow  overiftting  pac  papers  paradox  pdf  philosophy  pigeonhole-markov  poast  policy  pop-diff  pop-structure  population-genetics  prediction  presentation  princeton  priors-posteriors  probability  proofs  propaganda  psychology  puzzles  q-n-a  qra  qtl  random  rationality  ratty  reddit  reference  regression  regularization  regularizer  repeated-cross-validation  research  review  rhetoric  right-wing  robust  s:*  sampling-bias  sampling  sapiens  scitariat  selection  shalizi  shrinkage  similarity  slides  smoothing  social-psych  social-science  social  soft-question  spock  ssc  stanford  statistics  stats  stochastic-processes  structure  study  sublinear  summary  sv  symmetry  synchrony  synthesis  tails  tcs  tech  technocracy  techtariat  tetlock  things  thinking  tidbits  tim-roughgarden  tips  to:nb  to_read  to_teach:data-mining  to_teach:undergrad-ada  tradeoffs  tumblr  tutorial  variance  video  visual-understanding  west-hunter  wiki  yoga  🌞  🤖 

Copy this bookmark:



description:


tags: