linear-models   38

Common statistical tests are linear models (or: how to teach stats)
Most of the common statistical models (t-test, correlation, ANOVA; chi-square, etc.) are special cases of linear models or a very close approximation. This beautiful simplicity means that there is less to learn. In particular, it all comes down to y=a⋅x+b which most students know from highschool. Unfortunately, stats intro courses are usually taught as if each test is an independent tool, needlessly making life more complicated for students and teachers alike.
statistics  linear-models  inference  statistical-tests 
11 weeks ago by tarakc02
[1902.06720] Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent
A longstanding goal in deep learning research has been to precisely characterize training and generalization. However, the often complex loss landscapes of neural networks have made a theory of learning dynamics elusive. In this work, we show that for wide neural networks the learning dynamics simplify considerably and that, in the infinite width limit, they are governed by a linear model obtained from the first-order Taylor expansion of the network around its initial parameters. Furthermore, mirroring the correspondence between wide Bayesian neural networks and Gaussian processes, gradient-based training of wide neural networks with a squared loss produces test set predictions drawn from a Gaussian process with a particular compositional kernel. While these theoretical results are only exact in the infinite width limit, we nevertheless find excellent empirical agreement between the predictions of the original network and those of the linearized version even for finite practically-sized networks. This agreement is robust across different architectures, optimization methods, and loss functions.
neural-net  linear-models  gradient-descent 
february 2019 by arsyed

related tags

accuracy  acm  acmtariat  ai  algorithms  approximation  article  arxiv  atoms  bankruptcy  bayesian  best-practices  better-explained  bias-variance  bio  biodet  bioinformatics  boltzmann  c(pp)  cheat-sheets  cheatsheet  checking  checklists  classification  columbia  comparison  compressed-sensing  computer-vision  concept  concurrency  confusion  consider:performance-measures  consider:representation  contrast-matrices  core-rats  correlation  course  critique  crypto  dan-luu  data-science  data.frame  datamining  david-golan  dbs  deep-learning  design-patterns  distribution  documentation  dumb-ml  dynamic  econometrics  economics  elastic-net  ensembles  explanation  exploratory  exposition  facebook  features  gcta  generalization  generative-models  generative  genetics  genomics  glm  glmnet  gradient-descent  graphical-models  graphics  graphs  gwas  heritability  hi-order-bits  homo-hetero  howto  human-capital  ideas  iidness  image-analysis  image-processing  inference  init  interface  intricacy  kernels  labor  language  lasso  learning-theory  lecture-notes  left-wing  len:short  levers  libraries  libs  limma  linear-algebra  linearity  links  list  lmer  machine-learning  machinelearning  markov  mass  math  matrix-factorization  methodology  mit  mixed-linear-models  ml-map-e  model-class  model-view-controller  modeling  models  moments  monte-carlo  multi  multiplicative  networking  neural-net  nibble  nitty-gritty  nlp  nonlinearity  nonparametric  norms  notes  nudge-targets  numerics  ontology  org:bleg  org:lite  oss  outliers  overflow  p:*  packages  pdf  pls  population-genetics  positivity  ppl  programming  protocol  python  q-n-a  r-language  r  rather-interesting  ratty  recommendations  ref  reference  regression  regularization  rhetoric  ridge  robust  s:*  sci-comp  science  signum  slides  sparsity  spearhead  stanford  statistical-tests  statistics  stats  stepwise-regression  street-fighting  structure  survey  symmetry  synthesis  systems  teaching  techtariat  tomography  top-n  tutorial  tutorials  unaffiliated  unit  variable-selection  wiki  🌞 

Copy this bookmark: