generalization   252

« earlier    

[1808.05563] Learning Invariances using the Marginal Likelihood
Generalising well in supervised learning tasks relies on correctly extrapolating the training data to a large region of the input space. One way to achieve this is to constrain the predictions to be invariant to transformations on the input that are known to be irrelevant (e.g. translation). Commonly, this is done through data augmentation, where the training set is enlarged by applying hand-crafted transformations to the inputs. We argue that invariances should instead be incorporated in the model structure, and learned using the marginal likelihood, which correctly rewards the reduced complexity of invariant models. We demonstrate this for Gaussian process models, due to the ease with which their marginal likelihood can be estimated. Our main contribution is a variational inference scheme for Gaussian processes containing invariances described by a sampling procedure. We learn the sampling procedure by back-propagating through it to maximise the marginal likelihood.
machine-learning  generalization  representation  rather-interesting  HOWEVER  consider:genetic-programming  consider:evolution-of-code  to-write-about 
4 weeks ago by Vaguery
[1805.12076] Towards Understanding the Role of Over-Parametrization in Generalization of Neural Networks
"Despite existing work on ensuring generalization of neural networks in terms of scale sensitive complexity measures, such as norms, margin and sharpness, these complexity measures do not offer an explanation of why neural networks generalize better with over-parametrization. In this work we suggest a novel complexity measure based on unit-wise capacities resulting in a tighter generalization bound for two layer ReLU networks. Our capacity bound correlates with the behavior of test error with increasing network sizes, and could potentially explain the improvement in generalization with over-parametrization. We further present a matching lower bound for the Rademacher complexity that improves over previous capacity lower bounds for neural networks."
neural-net  generalization 
6 weeks ago by arsyed
[1710.09412] mixup: Beyond Empirical Risk Minimization
Large deep neural networks are powerful, but exhibit undesirable behaviors such as memorization and sensitivity to adversarial examples. In this work, we propose mixup, a simple learning principle to alleviate these issues. In essence, mixup trains a neural network on convex combinations of pairs of examples and their labels. By doing so, mixup regularizes the neural network to favor simple linear behavior in-between training examples. Our experiments on the ImageNet-2012, CIFAR-10, CIFAR-100, Google commands and UCI datasets show that mixup improves the generalization of state-of-the-art neural network architectures. We also find that mixup reduces the memorization of corrupt labels, increases the robustness to adversarial examples, and stabilizes the training of generative adversarial networks.
ML  generalization 
6 weeks ago by foodbaby
[1702.00025] An Experimental Analysis of the Entanglement Problem in Neural-Network-based Music Transcription Systems
"Several recent polyphonic music transcription systems have utilized deep neural networks to achieve state of the art results on various benchmark datasets, pushing the envelope on framewise and note-level performance measures. Unfortunately we can observe a sort of glass ceiling effect. To investigate this effect, we provide a detailed analysis of the particular kinds of errors that state of the art deep neural transcription systems make, when trained and tested on a piano transcription task. We are ultimately forced to draw a rather disheartening conclusion: the networks seem to learn combinations of notes, and have a hard time generalizing to unseen combinations of notes. Furthermore, we speculate on various means to alleviate this situation."
music  music-transcription  deep-learning  entanglement  generalization 
10 weeks ago by arsyed
[1806.00451] Do CIFAR-10 Classifiers Generalize to CIFAR-10?
Machine learning is currently dominated by largely experimental work focused on improvements in a few key tasks. However, the impressive accuracy numbers of the best performing models are questionable because the same test sets have been used to select these models for multiple years now. To understand the danger of overfitting, we measure the accuracy of CIFAR-10 classifiers by creating a new test set of truly unseen images. Although we ensure that the new test set is as close to the original data distribution as possible, we find a large drop in accuracy (4% to 10%) for a broad range of deep learning models. Yet more recent models with higher original accuracy show a smaller drop and better overall performance, indicating that this drop is likely not due to overfitting based on adaptivity. Instead, we view our results as evidence that current accuracy numbers are brittle and susceptible to even minute natural variations in the data distribution.
neural-net  deep-learning  computer-vision  generalization  cifar  overfitting 
june 2018 by arsyed
[1805.06822] DNN or $k$-NN: That is the Generalize vs. Memorize Question
This paper studies the relationship between the classification performed by deep neural networks and the k-NN decision at the embedding space of these networks. This simple important connection shown here provides a better understanding of the relationship between the ability of neural networks to generalize and their tendency to memorize the training data, which are traditionally considered to be contradicting to each other and here shown to be compatible and complementary. Our results support the conjecture that deep neural networks approach Bayes optimal error rates.
neural-net  knn  generalization  memorization 
may 2018 by arsyed
[1805.01445] The Fine Line between Linguistic Generalization and Failure in Seq2Seq-Attention Models
Seq2Seq based neural architectures have become the go-to architecture to apply to sequence to sequence language tasks. Despite their excellent performance on these tasks, recent work has noted that these models usually do not fully capture the linguistic structure required to generalize beyond the dense sections of the data distribution \cite{ettinger2017towards}, and as such, are likely to fail on samples from the tail end of the distribution (such as inputs that are noisy \citep{belkinovnmtbreak} or of different lengths \citep{bentivoglinmtlength}). In this paper, we look at a model's ability to generalize on a simple symbol rewriting task with a clearly defined structure. We find that the model's ability to generalize this structure beyond the training distribution depends greatly on the chosen random seed, even when performance on the standard test set remains the same. This suggests that a model's ability to capture generalizable structure is highly sensitive. Moreover, this sensitivity may not be apparent when evaluating it on standard test sets.
nlp  seq2seq  rnn  generalization 
may 2018 by arsyed

« earlier    

related tags

:/  ability-competence  absolute-relative  abstraction  academia  accuracy  acm  acmtariat  adaptation  adriancolyer  adversarial  ai-control  ai  algorithm  alien-character  altruism  analysis  analytical-holistic  anglo  anglosphere  announcement  anthropology  apollonian-dionysian  applicability-prereqs  architecture  arms  article  asia  audio  auto-learning  bare-hands  bayesian  behavioral-econ  behavioral-gen  being-right  ben-recht  benchmarking  best-practices  bias-variance  bias  biases  big-peeps  big-picture  big-surf  bio  biodet  biotech  bits  books  bounded-cognition  brenden-lake  broad-econ  by:yoshuabengio  cardio  causation  china  cifar  class  classification  cliometrics  cog-psych  combinatorics  commandline  commentary  community  comparison  competition  complex-systems  complexity  composition-decomposition  compression  computation  computational-geometry  computer-vision  concept  conceptual-vocab  confidence  confounding  confusion  consider:evolution-of-code  consider:genetic-programming  consider:looking-at-gp-models  consider:looking-to-see  consider:performance-measures  consider:rediscovery  consider:representation  consider:symbolic-regression  consider:the-other-way-too  constraint-satisfaction  context  contracts  control  convexity-curvature  convnet  cool  cooperate-defect  coordination  cost-benefit  counterexample  cracker-econ  creative  criminology  critique  cs  cultural-dynamics  cybernetics  cycles  data-fusion-sortof  data-science  data  data_augmentation  debate  decision-making  decision-theory  deep-learning  deep  deep_learning  deepgoog  deeplearning  define-your-terms  definite-planning  definition  dennett  density  descriptive  detail-architecture  developing-world  dignity  dimensionality  dirty-hands  discovery  discussion  disease  domain  domestication  duty  early-stopping  earth  ecology  econometrics  economics  econotariat  effect-size  egalitarianism-hierarchy  egt  embedded-cognition  emergent  empirical  endo-exo  endogenous-exogenous  energy-resources  enhancement  ensembles  entanglement  epistemic  ergodic  error  essay  ethics  europe  events  evopsych  existence  expert-experience  expert  explanans  explanation  exploratory  exposition  extrema  farmers-and-foragers  features  field-study  finance  flexibility  flux-stasis  formal-models  fourier  free-riding  frequency  frontier  futurism  games  garett-jones  gaussianprocesses  gavisti  generalisation  generative-models  generative  genetics  geo  geojson  geometry  gis  github  good-evil  gotchas  gradient-descent  grammar  greedy  gt-101  gwas  gwern  haidt  hard-tech  hari-seldon  heavy-industry  henrich  heterodox  history  hmm  homo-hetero  housing  however  howto  hsu  human-ml  humanity  humility  hypergraphs  hypocrisy  hypothesis-testing  ideas  illusion  image-processing  impetus  incentives  individualism-collectivism  industrial-org  info-dynamics  infographics  information-theory  informationtheory  intel  intelligence  interdisciplinary  interests  intervention  interview  intricacy  invariance  iq  iraq-syria  jargon  javascript  justice  kids  knn  kr  language  large-factor  latin-america  learning-theory  learning  lens  lesswrong  limitations  liner-notes  links  list  local-global  lol  lower-bounds  machine-learning  machine_learning  machinelearning  macro  magic-squares  management  map-territory  mapping  maps  marginal-rev  marginal  matching  math.ds  mathematical-recreations  mathematics  matrix-factorization  maxim-gun  measurement  mechanics  memorization  mena  meta-analysis  meta:prediction  meta:rhetoric  meta:science  metameta  methodology  metrics  microfoundations  military  mixup  ml  model-class  models  modernity  moments  monetary-fiscal  monte-carlo  morality  mostly-modern  multi  music-transcription  music  n-factor  nature  network-structure  neural-net  neural-networks  neural_networks  neuralnetworks  neurons  nibble  nitty-gritty  nlp  nn  nonlinearity  nordic  nudge-targets  null-result  occam  occident  off-convex  one-shot  one-way-to-look-at-it  optimism  optimization  order-disorder  org:bleg  org:econlib  org:edu  org:junk  org:mat  organization  organizing  orient  oscillation  osm  out-of-sample-recognition  outliers  overfitting  overflow  pac  packing  paper  papers  parent  parsimony  pdf  people  personality  perturbation  pessimism  phalanges  philosophy  piketty  piracy  poast  policy  population  pragmatic  prediction  preprint  princeton  priors-posteriors  problem-solving  programming  properties  pseudoe  psych-architecture  psychiatry  psychology  psychometrics  public-goodish  q-n-a  qra  quotes  race  rademacher  random  ranking  rant  rather-interesting  rationality  ratty  realness  reason  reference  reflection  regression  regularization  regularizer  reinforcement  religion  replication  representation  research-program  research  review  rigor  risk  rnn  robotics  robust  roots  s:*  sample-complexity  sampling-bias  sampling  sanctity-degradation  sanjeev-arora  sapiens  sasha-rakhlin  science  scitariat  search  self-interest  sentiment  seq2seq  sgd  shapefile  signal-noise  signum  simplify  simulation  singularity  sinosphere  skeleton  skunkworks  social-norms  social-psych  social-science  social  sociality  society  sociology  solid-study  spatial  speculation  speedometer  spock  stackex  stat-power  state-of-art  statesmen  statistical  statistical_mechanics  statistics  stats  stereotype  stereotypes  stories  stress  structure  study  stylized-facts  subjective-objective  summary  supervised-learning  supply-demand  survey  synthesis  systematic-ad-hoc  technology  telos-atelos  terminal  testerror  the-great-west-whale  the-self  the-trenches  theory-of-mind  theory-practice  theory  thick-thin  things  thinking  time  to-read  to-understand  to-write-about  tool  track-record  training  trainingerror  truth  turing  twitter  uncertainty  uniqueness  universalism-particularism  usa  values  variance-components  variance  vc-dimension  vc  visuo  volo-avolo  water  waves  west-hunter  wiki  wire-guided  within-without  wonkish  workshop  world  yak-shaving  🌞  🎩  🔬 

Copy this bookmark: