generalization   226

« earlier    

Do smoother areas of the error surface lead to better generalization?
In the first lecture of the outstanding Deep Learning Course (linking to version 1, which is also superb, v2 to become available early 2018), we learned how to train a state of the art model using…
deep-learning  generalization  Neural-Networks 
7 days ago by rishaanp
Information Processing: Mathematical Theory of Deep Neural Networks (Princeton workshop)
"Recently, long-past-due theoretical results have begun to emerge. These results, and those that will follow in their wake, will begin to shed light on the properties of large, adaptive, distributed learning architectures, and stand to revolutionize how computer science and neuroscience understand these systems."
hsu  scitariat  commentary  links  research  research-program  workshop  events  princeton  sanjeev-arora  deep-learning  machine-learning  ai  generalization  explanans  off-convex  nibble  frontier  speedometer  state-of-art  big-surf  announcement 
28 days ago by nhaliday
[1711.00350] Still not systematic after all these years: On the compositional skills of sequence-to-sequence recurrent networks
"Humans can understand and produce new utterances effortlessly, thanks to their systematic compositional skills. Once a person learns the meaning of a new verb "dax," he or she can immediately understand the meaning of "dax twice" or "sing and dax." In this paper, we introduce the SCAN domain, consisting of a set of simple compositional navigation commands paired with the corresponding action sequences. We then test the zero-shot generalization capabilities of a variety of recurrent neural networks (RNNs) trained on SCAN with sequence-to-sequence methods. We find that RNNs can generalize well when the differences between training and test commands are small, so that they can apply "mix-and-match" strategies to solve the task. However, when generalization requires systematic compositional skills (as in the "dax" example above), RNNs fail spectacularly. We conclude with a proof-of-concept experiment in neural machine translation, supporting the conjecture that lack of systematicity is an important factor explaining why neural networks need very large training sets."
papers  deep-learning  rnn  seq2seq  generalization  brenden-lake 
5 weeks ago by arsyed
[1711.11561] Measuring the tendency of CNNs to Learn Surface Statistical Regularities
"Deep CNNs are known to exhibit the following peculiarity: on the one hand they generalize extremely well to a test set, while on the other hand they are extremely sensitive to so-called adversarial perturbations. The extreme sensitivity of high performance CNNs to adversarial examples casts serious doubt that these networks are learning high level abstractions in the dataset. We are concerned with the following question: How can a deep CNN that does not learn any high level semantics of the dataset manage to generalize so well? The goal of this article is to measure the tendency of CNNs to learn surface statistical regularities of the dataset. To this end, we use Fourier filtering to construct datasets which share the exact same high level abstractions but exhibit qualitatively different surface statistical regularities. For the SVHN and CIFAR-10 datasets, we present two Fourier filtered variants: a low frequency variant and a randomly filtered variant. Each of the Fourier filtering schemes is tuned to preserve the recognizability of the objects. Our main finding is that CNNs exhibit a tendency to latch onto the Fourier image statistics of the training dataset, sometimes exhibiting up to a 28% generalization gap across the various test sets. Moreover, we observe that significantly increasing the depth of a network has a very marginal impact on closing the aforementioned generalization gap. Thus we provide quantitative evidence supporting the hypothesis that deep CNNs tend to learn surface statistical regularities in the dataset rather than higher-level abstract concepts."
papers  neural-net  convnet  generalization  via:csantos 
5 weeks ago by arsyed
[1711.11561] Measuring the tendency of CNNs to Learn Surface Statistical Regularities
Our main finding is that CNNs exhibit a tendency to latch onto the Fourier image statistics of the training dataset, sometimes exhibiting up to a 28% generalization gap across the various test sets. Moreover, we observe that significantly increasing the depth of a network has a very marginal impact on closing the aforementioned generalization gap. Thus we provide quantitative evidence supporting the hypothesis that deep CNNs tend to learn surface statistical regularities in the dataset rather than higher-level abstract concepts.
machinelearning  deeplearning  deep-learning  machine-learning  by:YoshuaBengio  NeuralNetworks  generalization  statistics 
5 weeks ago by csantos
[1703.11008] Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
"One of the defining properties of deep learning is that models are chosen to have many more parameters than available training data. In light of this capacity for overfitting, it is remarkable that simple algorithms like SGD reliably return solutions with low test error. One roadblock to explaining these phenomena in terms of implicit regularization, structural properties of the solution, and/or easiness of the data is that many learning bounds are quantitatively vacuous when applied to networks learned by SGD in this "deep learning" regime. Logically, in order to explain generalization, we need nonvacuous bounds. We return to an idea by Langford and Caruana (2001), who used PAC-Bayes bounds to compute nonvacuous numerical bounds on generalization error for stochastic two-layer two-hidden-unit neural networks via a sensitivity analysis. By optimizing the PAC-Bayes bound directly, we are able to extend their approach and obtain nonvacuous generalization bounds for deep stochastic neural network classifiers with millions of parameters trained on only tens of thousands of examples. We connect our findings to recent and old work on flat minima and MDL-based explanations of generalization."
papers  neural-net  learning-theory  generalization 
10 weeks ago by arsyed
Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia | Nature Genetics
We conducted a genome-wide association study (GWAS) with replication in 36,180 Chinese individuals and performed further transancestry meta-analyses with data from the Psychiatry Genomics Consortium (PGC2). Approximately 95% of the genome-wide significant (GWS) index alleles (or their proxies) from the PGC2 study were overrepresented in Chinese schizophrenia cases, including ∼50% that achieved nominal significance and ∼75% that continued to be GWS in the transancestry analysis. The Chinese-only analysis identified seven GWS loci; three of these also were GWS in the transancestry analyses, which identified 109 GWS loci, thus yielding a total of 113 GWS loci (30 novel) in at least one of these analyses. We observed improvements in the fine-mapping resolution at many susceptibility loci. Our results provide several lines of evidence supporting candidate genes at many loci and highlight some pathways for further research. Together, our findings provide novel insight into the genetic architecture and biological etiology of schizophrenia.
study  biodet  behavioral-gen  psychiatry  disease  GWAS  china  asia  race  generalization  genetics  replication 
november 2017 by nhaliday
The weirdest people in the world?
Abstract: Behavioral scientists routinely publish broad claims about human psychology and behavior in the world’s top journals based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Researchers – often implicitly – assume that either there is little variation across human populations, or that these “standard subjects” are as representative of the species as any other population. Are these assumptions justified? Here, our review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species – frequent outliers. The domains reviewed include visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, reasoning styles, self-concepts and related motivations, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans. Many of these findings involve domains that are associated with fundamental aspects of psychology, motivation, and behavior – hence, there are no obvious a priori grounds for claiming that a particular behavioral phenomenon is universal based on sampling from a single subpopulation. Overall, these empirical patterns suggests that we need to be less cavalier in addressing questions of human nature on the basis of data drawn from this particularly thin, and rather unusual, slice of humanity. We close by proposing ways to structurally re-organize the behavioral sciences to best tackle these challenges.
pdf  study  microfoundations  anthropology  cultural-dynamics  sociology  psychology  social-psych  cog-psych  iq  biodet  behavioral-gen  variance-components  psychometrics  psych-architecture  visuo  spatial  morality  individualism-collectivism  n-factor  justice  egalitarianism-hierarchy  cooperate-defect  outliers  homo-hetero  evopsych  generalization  henrich  europe  the-great-west-whale  occident  organizing  🌞  universalism-particularism  applicability-prereqs  hari-seldon 
november 2017 by nhaliday
[1710.05468] Generalization in Deep Learning
This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularization methods. Its simplest member was empirically shown to improve base models and achieve state-of-the-art performance on MNIST and CIFAR-10 benchmarks. Moreover, this paper presents both data-dependent and data-independent generalization guarantees with improved convergence rates. Our results suggest several new open areas of research.
via:numerous  deep-learning  generalization  one-way-to-look-at-it  formal-models  neural-networks  statistics  consider:the-other-way-too 
november 2017 by Vaguery

« earlier    

related tags

2016-election  :/  ability-competence  abstraction  academia  accuracy  acm  acmtariat  adaptation  adriancolyer  adversarial  ai-control  ai  algorithm  alien-character  alt-inst  altruism  analysis  announcement  anthropology  antidemos  aphorism  applicability-prereqs  applications  architecture  arms  article  asia  auto-learning  bare-hands  bayesian  behavioral-econ  behavioral-gen  being-right  ben-recht  benchmarking  best-practices  bias-variance  bias  biases  big-peeps  big-picture  big-surf  bio  biodet  bioinformatics  bits  bonferroni  books  bounded-cognition  brenden-lake  broad-econ  by:yoshuabengio  cardio  cartography  causation  chart  china  classification  clever-rats  cliometrics  cog-psych  combinatorics  commandline  commentary  comparison  complex-systems  composition-decomposition  compression  computational-geometry  computer-vision  concept  conceptual-vocab  conference  confidence  confounding  confusion  consider:looking-at-gp-models  consider:looking-to-see  consider:performance-measures  consider:rediscovery  consider:representation  consider:symbolic-regression  consider:the-other-way-too  constraint-satisfaction  context  contracts  control  convexity-curvature  convnet  cooperate-defect  coordination  cost-benefit  counter-revolution  counterexample  counterfactual  cracker-econ  creative  criminology  critique  cultural-dynamics  curvature  cycles  data-fusion-sortof  data-science  data  datascience  debate  decision-making  deep-learning  deep-materialism  deep  deep_learning  deepgoog  deeplearning  define-your-terms  definition  degrees-of-freedom  descriptive  differential-privacy  dimensionality  dirty-hands  discovery  discussion  disease  diversity  early-stopping  econometrics  economics  econotariat  education  effect-size  egalitarianism-hierarchy  embedded-cognition  emergent  empirical  endo-exo  endogenous-exogenous  energy-resources  enhancement  ensembles  epistemic  ergodic  error  essay  europe  events  evopsych  expert-experience  expert  explanans  explanation  exploratory  exposition  extrema  farmers-and-foragers  features  field-study  finance  flexibility  flux-stasis  formal-models  fourier  free-riding  frequency  frequentist  frontier  garett-jones  gelman  gender-diff  gender  generative-models  generative  genetics  genomics  geo  geojson  geometry  giants  gis  github  gotchas  government  gradient-descent  graph-theory  greedy  ground-up  gt-101  gwas  gwern  hard-tech  hardness  hari-seldon  henrich  heterodox  high-dimension  history  hmm  homo-hetero  housing  howto  hsu  huge-data-the-biggest  human-capital  human-ml  hypergraphs  hypothesis-testing  ideas  image-processing  impetus  incentives  individualism-collectivism  industrial-org  info-dynamics  infographics  information-theory  informationtheory  innovation  intel  intelligence  interdisciplinary  interests  interpretability  intervention  intricacy  intuition  investing  iq  iraq-syria  iteration-recursion  jargon  javascript  justice  kids  latent-variables  latin-america  learning-theory  learning  len:long  lens  lesswrong  liner-notes  links  list  local-global  logistic  lol  longitudinal  lower-bounds  machine-learning  machine_learning  machinelearning  macro  magic-squares  magnitude  management  map-territory  mapping  maps  marginal-rev  market-failure  markov  matching  math.ds  mathematical-recreations  mathematics  matrix-factorization  measurement  mechanics  medicine  memorization  mena  meta-analysis  meta:medicine  meta:prediction  meta:rhetoric  meta:science  metabuch  metameta  methodology  metrics  microfoundations  military  mixup  ml  model-class  model-selection  modeling  models  moments  monetary-fiscal  monte-carlo  morality  mostly-modern  mrtz  multi  n-factor  ner  neural-net  neural-networks  neural_networks  neuralnetworks  news  nibble  nlp  nn  no-go  nudge-targets  null-result  occident  off-convex  one-way-to-look-at-it  online-learning  openai  optimization  orfe  org:bleg  org:econlib  org:edu  org:junk  org:lite  org:mag  org:mat  org:sci  organization  organizing  oscillation  osm  out-of-sample-recognition  outliers  overfitting  overflow  pac  packing  paper  papers  parent  pdf  people  personality  perturbation  phalanges  piketty  piracy  poast  policy  politics  pop-diff  pop-structure  preprint  princeton  priors-posteriors  programming  proposal  pseudoe  psych-architecture  psychiatry  psychology  psychometrics  public-goodish  q-n-a  qra  qtl  questions  quotes  race  rademacher  random  ranking  rant  rather-interesting  rationality  ratty  realness  reference  reflection  regression  regularization  regularizer  reinforcement  replication  research-program  research  review  rhetoric  rivers  rnn  robotics  robust  roots  s-factor  s:*  sample-complexity  sampling-bias  sampling  sanjeev-arora  sasha-rakhlin  scale  science  scitariat  sebastien-bubeck  securities  sensitivity  seq2seq  sgd  shapefile  signal-noise  similarity  simplify  simulation  skunkworks  slides  social-psych  social-science  social  society  sociology  solid-study  sparsity  spatial  speculation  speedometer  stability  stackex  stat-power  state-of-art  statesmen  statistical  statistical_mechanics  statistics  stats  stereotype  stories  stress  structure  study  success  summary  supervised-learning  supply-demand  survival  talks  technology  terminal  testerror  the-great-west-whale  the-trenches  the-world-is-just-atoms  theory-practice  theory  things  thinking  to-read  to-understand  to-write-about  tool  track-record  trainingerror  trends  trump  truth  twitter  uncertainty  universalism-particularism  usa  values  variance-components  variance  vc-dimension  vc  visualization  visuo  volo-avolo  water  waves  west-hunter  wiki  wire-guided  wonkish  workshop  yak-shaving  🌞  🎩  🔬 

Copy this bookmark: