generalization   298

« earlier    

The Disproportional Power of Anecdotes
increase your sample size when making decisions. You need enough information to be able to plot the range of possibilities, identify the outliers, and define the average.

"The Canadian Encyclopedia states, “If the nearly 29 million (figure unadjusted) in gold that was recovered during the heady years of 1897 to 1899 [in the Klondike] was divided equally among all those who participated in the gold rush, the amount would fall far short of the total they had invested in time and money.”

How did this happen? Because those miners took anecdotes as being representative of a broader reality. Quite literally, they learned mining from rumor, and didn’t develop any real knowledge. Most people fought for claims along the creeks, where easy gold had been discovered, while rejecting the bench claims on the hillsides above, which often had just as much gold.

You may be thinking that these men must have been desperate if they packed themselves up, heading into unknown territory, facing multiple dangers along the way, to chase a dream of easy money. But most of us aren’t that different. How many times have you invested in a “hot stock” on a tip from one person, only to have the company go under within a year? Ultimately, the smaller the sample size, the greater role the factors of chance play in determining an outcome.”

many scientific psychology studies use college students who are predominantly Western, Educated, Industrialized, Rich, and Democratic (WEIRD), and then draw conclusions about the entire human race from these outliers. They reviewed scientific literature from domains such as “visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans.”
this particular subpopulation is highly unrepresentative of the species.”
decision  making  against  generalization  bias  gold 
8 days ago by dandv
[1902.10286] On Multi-Cause Causal Inference with Unobserved Confounding: Counterexamples, Impossibility, and Alternatives
Unobserved confounding is a central barrier to drawing causal inferences from observational data. Several authors have recently proposed that this barrier can be overcome in the case where one attempts to infer the effects of several variables simultaneously. In this paper, we present two simple, analytical counterexamples that challenge the general claims that are central to these approaches. In addition, we show that nonparametric identification is impossible in this setting. We discuss practical implications, and suggest alternatives to the methods that have been proposed so far in this line of work: using proxy variables and shifting focus to sensitivity analysis.
via:cshalizi  machine-learning  whereof-one-cannot-speak  generalization  dreaming  algorithms  induction?  counterexamples 
5 weeks ago by Vaguery
[1706.06083] Towards Deep Learning Models Resistant to Adversarial Attacks
Recent work has demonstrated that neural networks are vulnerable to adversarial examples, i.e., inputs that are almost indistinguishable from natural data and yet classified incorrectly by the network. In fact, some of the latest findings suggest that the existence of adversarial attacks may be an inherent weakness of deep learning models. To address this problem, we study the adversarial robustness of neural networks through the lens of robust optimization. This approach provides us with a broad and unifying view on much of the prior work on this topic. Its principled nature also enables us to identify methods for both training and attacking neural networks that are reliable and, in a certain sense, universal. In particular, they specify a concrete security guarantee that would protect against any adversary. These methods let us train networks with significantly improved resistance to a wide range of adversarial attacks. They also suggest the notion of security against a first-order adversary as a natural and broad security guarantee. We believe that robustness against such well-defined classes of adversaries is an important stepping stone towards fully resistant deep learning models.
machine-learning  adversarial-learning  coevolution  generalization  optical-illusions  feature-extraction  feature-construction  neural-networks  rather-interesting  to-write-about 
7 weeks ago by Vaguery
OSF | Near and Far Transfer in Cognitive Training: A Second-Order Meta- Analysis
In Models 1 (k = 99) and 2 (k = 119), we investigated the impact of working-memory training on near-transfer (i.e., memory) and far-transfer (e.g., reasoning, speed, and language) measures, respectively, and whether it is mediated by the type of population. Model 3 (k = 233) extended Model 2 by adding six meta-analyses assessing the far-transfer effects of other cognitive-training programs (video-games, music, chess, and exergames). Model 1 showed that working-memory training does induce near transfer, and that the size of this effect is moderated by the type of population. By contrast, Models 2 and 3 highlighted that far-transfer effects are small or null.
study  preprint  psychology  cog-psych  intelligence  generalization  dimensionality  psych-architecture  intervention  enhancement  practice 
8 weeks ago by nhaliday
[1901.10861] A Simple Explanation for the Existence of Adversarial Examples with Small Hamming Distance
The existence of adversarial examples in which an imperceptible change in the input can fool well trained neural networks was experimentally discovered by Szegedy et al in 2013, who called them "Intriguing properties of neural networks". Since then, this topic had become one of the hottest research areas within machine learning, but the ease with which we can switch between any two decisions in targeted attacks is still far from being understood, and in particular it is not clear which parameters determine the number of input coordinates we have to change in order to mislead the network. In this paper we develop a simple mathematical framework which enables us to think about this baffling phenomenon from a fresh perspective, turning it into a natural consequence of the geometry of ℝn with the L0 (Hamming) metric, which can be quantitatively analyzed. In particular, we explain why we should expect to find targeted adversarial examples with Hamming distance of roughly m in arbitrarily deep neural networks which are designed to distinguish between m input classes.
machine-learning  robustness  adversarial-examples  rather-interesting  good-explanations  to-write-about  computational-vs-systems  generalization  feature-construction  saliency 
9 weeks ago by Vaguery
Modern Neural Networks Generalize on Small Data Sets
In this paper, we use a linear program to empirically decompose fitted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads to an internal regularization process, very much like a random forest, which can explain why a neural network is surprisingly resistant to overfitting. We then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository. This collection of data sets contains a much smaller number of training examples than the types of image classification tasks generally studied in the deep learning literature, as well as non-trivial label noise. We show that even in this setting deep neural nets are capable of achieving superior classification accuracy without overfitting.
neural-net  generalization  small-data  richard-berk 
december 2018 by arsyed
[1808.01204] Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data
Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.
neural-net  sgd  generalization 
december 2018 by arsyed
[1810.12282] Assessing Generalization in Deep Reinforcement Learning
Deep reinforcement learning (RL) has achieved breakthrough results on many tasks, but has been shown to be sensitive to system changes at test time. As a result, building deep RL agents that generalize has become an active research area. Our aim is to catalyze and streamline community-wide progress on this problem by providing the first benchmark and a common experimental protocol for investigating generalization in RL. Our benchmark contains a diverse set of environments and our evaluation methodology covers both in-distribution and out-of-distribution generalization. To provide a set of baselines for future research, we conduct a systematic evaluation of deep RL algorithms, including those that specifically tackle the problem of generalization.
reinforcement-learning  generalization 
november 2018 by arsyed
Generalization guides human exploration in vast decision spaces | bioRxiv
From foraging for food to learning complex games, many aspects of human behaviour can be framed as a search problem with a vast space of possible actions. Under finite search horizons, optimal solutions are generally unobtainable. Yet how do humans navigate vast problem spaces, which require intelligent exploration of unobserved actions? Using a variety of bandit tasks with up to 121 arms, we study how humans search for rewards under limited search horizons, where the spatial correlation of rewards (in both generated and natural environments) provides traction for generalization. Across a variety of different probabilistic and heuristic models, we find evidence that Gaussian Process function learning--combined with an optimistic Upper Confidence Bound sampling strategy--provides a robust account of how people use generalization to guide search. Our modelling results and parameter estimates are recoverable, and can be used to simulate human-like performance, providing novel insights about human behaviour in complex environments.
decision  human  exploration  generalization 
november 2018 by arsyed
[1811.03270] An Optimal Transport View on Generalization
We derive upper bounds on the generalization error of learning algorithms based on their \emph{algorithmic transport cost}: the expected Wasserstein distance between the output hypothesis and the output hypothesis conditioned on an input example. The bounds provide a novel approach to study the generalization of learning algorithms from an optimal transport view and impose less constraints on the loss function, such as sub-gaussian or bounded. We further provide several upper bounds on the algorithmic transport cost in terms of total variation distance, relative entropy (or KL-divergence), and VC dimension, thus further bridging optimal transport theory and information theory with statistical learning theory. Moreover, we also study different conditions for loss functions under which the generalization error of a learning algorithm can be upper bounded by different probability metrics between distributions relating to the output hypothesis and/or the input data. Finally, under our established framework, we analyze the generalization in deep learning and conclude that the generalization error in deep neural networks (DNNs) decreases exponentially to zero as the number of layers increases. Our analyses of generalization error in deep learning mainly exploit the hierarchical structure in DNNs and the contraction property of f-divergence, which may be of independent interest in analyzing other learning models with hierarchical structure.
optimal-transport  generalization  machine-learning 
november 2018 by arsyed

« earlier    

related tags

2016  :/  absolute-relative  abstraction  academia  accuracy  acm  acmtariat  adam  adversarial-examples  adversarial-learning  adversarial  against  ai-control  ai  algorithm  algorithms  alien-character  analysis  analytical-holistic  anecdotes  anglo  anglosphere  announcement  anthropology  apollonian-dionysian  applicability-prereqs  arms  article  asia  audio  bare-hands  batch-size  bayesian  behavioral-econ  behavioral-gen  ben-recht  best-practices  bias-variance  bias  biases  big-peeps  big-picture  big-surf  bio  biodet  biotech  bits  books  brenden-lake  broad-econ  by:yoshuabengio  capacity  causal_inference  causation  chart  china  cifar  class  classification  cliometrics  coevolution  cog-psych  commandline  commentary  community  comparison  competition  complex-systems  complexity  composition-decomposition  compression  computation  computational-vs-systems  computer-vision  concept  conceptual-vocab  confidence  consider:evolution-of-code  consider:feature-discovery  consider:genetic-programming  consider:the-other-way-too  contracts  control  convexity-curvature  convnet  cool  cooperate-defect  coordination  cost-benefit  counterexample  counterexamples  course  cracker-econ  creative  criminology  critique  cs  cultural-dynamics  culture  cybernetics  cycles  data-science  data  data_augmentation  debate  decision-making  decision-theory  decision  deep-learning  deep  deep_learning  deepgoog  deeplearning  definite-planning  definition  dennett  density  descriptive  detail-architecture  developing-world  dignity  dimensionality  dirty-hands  discovery  discussion  disease  dl  domain  domestication  dreaming  duty  early-stopping  earth  ecology  econometrics  economics  econotariat  effect-size  egalitarianism-hierarchy  egt  emergent  empirical  endo-exo  endogenous-exogenous  enhancement  ensembles  entanglement  epistemic  ergodic  error  essay  ethics  europe  events  evopsych  existence  expert-experience  expert  explanans  explanation  exploration  extrema  farmers-and-foragers  feature-construction  feature-extraction  features  field-study  finance  flexibility  flux-stasis  forgetting  formal-models  fourier  free-riding  frequency  frontier  futurism  games  garett-jones  gaussianprocesses  gavisti  generalisation  genetics  geo  geojson  gis  github  gold  good-evil  good-explanations  gotchas  gradient-descent  gt-101  gwas  haidt  hard-tech  hari-seldon  heavy-industry  henrich  history  hmm  homo-hetero  housing  how_we_learn  how_we_work  however  howto  hsu  human-ml  human  humanity  humility  hypocrisy  hypothesis-testing  icml  ideas  illusion  impetus  incentives  individualism-collectivism  induction?  industrial-org  info-dynamics  infographics  information-theory  informationtheory  initialization  intelligence  intervention  interview  intricacy  invariance  iq  jargon  javascript  justice  kernels  knn  kr  language  large-factor  learning-theory  learning  lens  lesswrong  limitations  linear-models  liner-notes  links  list  local-global  lower-bounds  machine-learning  machine_learning  machinelearning  macro  making  map-territory  mapping  maps  marginal-rev  marginal  matching  math.ds  mathematics  matrix-factorization  maxim-gun  measurement  mechanics  memorization  mena  meta-analysis  meta:rhetoric  meta:science  metameta  methodology  metrics  microfoundations  minima  mit  mixup  ml-map-e  ml  model-class  models  modernity  moments  monetary-fiscal  monte-carlo  morality  mostly-modern  multi  multiobjective-optimization  music-transcription  music  n-factor  nature  network-structure  neural-net  neural-networks  neural_networks  neuralnetworks  neurons  nibble  nitty-gritty  nlp  nn  no-free-lunch  nonlinearity  nordic  nudge-targets  occam  occident  off-convex  ok-not-surprising?  one-way-to-look-at-it  ontology  optical-illusions  optimal-transport  optimism  optimization  order-disorder  org:bleg  org:econlib  org:edu  org:junk  org:mat  organization  organizing  orient  orthogonal-initialization  oscillation  osm  outliers  overfitting  overflow  pac  paper  papers  parsimony  pdf  performance-measure  personality  perturbation  pessimism  phalanges  philosophy  piketty  policy  population  practice  pragmatic  prediction  preprint  princeton  priors-posteriors  problem-solving  programming  properties  pseudoe  psych-architecture  psychiatry  psychology  psychometrics  q-n-a  qra  quotes  race  rademacher  random  ranking  rant  rather-interesting  rationality  ratty  realness  reason  reflection  regression  regularization  reinforcement-learning  reinforcement  religion  replication  representation  research-program  research  review  reviews  richard-berk  rigor  risk  rnn  robotics  robust  robustness  roots  s:*  saliency  sample-complexity  sampling  sanctity-degradation  sanjeev-arora  sapiens  sasha-rakhlin  science  scitariat  search  self-interest  sentiment  seq2seq  sgd  shapefile  shift  signal-noise  signum  simplify  simulation  singularity  sinosphere  skeleton  skunkworks  small-data  social-norms  social-psych  social-science  social  sociality  sociology  spatial  speculation  speedometer  spock  stability  stackex  stat-power  state-of-art  statistical  statistical_mechanics  statistics  stats  stereotypes  stiffness  structure  study  stylized-facts  subjective-objective  summary  supply-demand  survey  syntax  synthesis  systematic-ad-hoc  talks  technology  telos-atelos  terminal  the-great-west-whale  the-self  the-trenches  theory-of-mind  theory-practice  theory  thick-thin  things  thinking  time  to-write-about  tool  trade-offs  training  truth  turing  twitter  unaffiliated  uncertainty  uniqueness  unit  universalism-particularism  usa  values  variance-components  variance  vc-dimension  visuo  volo-avolo  waves  whereof-one-cannot-speak  wire-guided  within-without  wonkish  workshop  world  yak-shaving  🌞  🎩 

Copy this bookmark: