generalization   215

« earlier    

Genome-wide association analysis identifies 30 new susceptibility loci for schizophrenia | Nature Genetics
We conducted a genome-wide association study (GWAS) with replication in 36,180 Chinese individuals and performed further transancestry meta-analyses with data from the Psychiatry Genomics Consortium (PGC2). Approximately 95% of the genome-wide significant (GWS) index alleles (or their proxies) from the PGC2 study were overrepresented in Chinese schizophrenia cases, including ∼50% that achieved nominal significance and ∼75% that continued to be GWS in the transancestry analysis. The Chinese-only analysis identified seven GWS loci; three of these also were GWS in the transancestry analyses, which identified 109 GWS loci, thus yielding a total of 113 GWS loci (30 novel) in at least one of these analyses. We observed improvements in the fine-mapping resolution at many susceptibility loci. Our results provide several lines of evidence supporting candidate genes at many loci and highlight some pathways for further research. Together, our findings provide novel insight into the genetic architecture and biological etiology of schizophrenia.
study  biodet  behavioral-gen  psychiatry  disease  GWAS  china  asia  race  generalization  genetics  replication 
4 weeks ago by nhaliday
The weirdest people in the world?
Abstract: Behavioral scientists routinely publish broad claims about human psychology and behavior in the world’s top journals based on samples drawn entirely from Western, Educated, Industrialized, Rich, and Democratic (WEIRD) societies. Researchers – often implicitly – assume that either there is little variation across human populations, or that these “standard subjects” are as representative of the species as any other population. Are these assumptions justified? Here, our review of the comparative database from across the behavioral sciences suggests both that there is substantial variability in experimental results across populations and that WEIRD subjects are particularly unusual compared with the rest of the species – frequent outliers. The domains reviewed include visual perception, fairness, cooperation, spatial reasoning, categorization and inferential induction, moral reasoning, reasoning styles, self-concepts and related motivations, and the heritability of IQ. The findings suggest that members of WEIRD societies, including young children, are among the least representative populations one could find for generalizing about humans. Many of these findings involve domains that are associated with fundamental aspects of psychology, motivation, and behavior – hence, there are no obvious a priori grounds for claiming that a particular behavioral phenomenon is universal based on sampling from a single subpopulation. Overall, these empirical patterns suggests that we need to be less cavalier in addressing questions of human nature on the basis of data drawn from this particularly thin, and rather unusual, slice of humanity. We close by proposing ways to structurally re-organize the behavioral sciences to best tackle these challenges.
pdf  study  microfoundations  anthropology  cultural-dynamics  sociology  psychology  social-psych  cog-psych  iq  biodet  behavioral-gen  variance-components  psychometrics  psych-architecture  visuo  spatial  morality  individualism-collectivism  n-factor  justice  egalitarianism-hierarchy  cooperate-defect  outliers  homo-hetero  evopsych  generalization  henrich  europe  the-great-west-whale  occident  organizing  🌞  universalism-particularism  applicability-prereqs 
5 weeks ago by nhaliday
[1710.05468] Generalization in Deep Learning
This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularization methods. Its simplest member was empirically shown to improve base models and achieve state-of-the-art performance on MNIST and CIFAR-10 benchmarks. Moreover, this paper presents both data-dependent and data-independent generalization guarantees with improved convergence rates. Our results suggest several new open areas of research.
via:numerous  deep-learning  generalization  one-way-to-look-at-it  formal-models  neural-networks  statistics  consider:the-other-way-too 
5 weeks ago by Vaguery
[1710.09553] Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
We describe an approach to understand the peculiar and counterintuitive generalization properties of deep neural networks. The approach involves going beyond worst-case theoretical capacity control frameworks that have been popular in machine learning in recent years to revisit old ideas in the statistical mechanics of neural networks. Within this approach, we present a prototypical Very Simple Deep Learning (VSDL) model, whose behavior is controlled by two control parameters, one describing an effective amount of data, or load, on the network (that decreases when noise is added to the input), and one with an effective temperature interpretation (that increases when algorithms are early stopped). Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc
generalization  learning  machine_learning  statistical_mechanics  deep_learning  neural_networks  via:droy 
6 weeks ago by rvenkat
[1710.06451] Understanding Generalization and Stochastic Gradient Descent
"This paper tackles two related questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well? Our work is inspired by Zhang et al. (2017), who showed deep networks can easily memorize randomly labeled training data, despite generalizing well when shown real labels of the same inputs. We show here that the same phenomenon occurs in small linear models. These observations are explained by evaluating the Bayesian evidence in favor of each model, which penalizes sharp minima. Next, we explore the "generalization gap" between small and large batch training, identifying an optimum batch size which maximizes the test set accuracy. Noise in the gradient updates is beneficial, driving the dynamics towards robust minima for which the evidence is large. Interpreting stochastic gradient descent as a stochastic differential equation, we predict the optimum batch size is proportional to both the learning rate and the size of the training set, and verify these predictions empirically."
papers  generalization  sgd  deep-learning 
7 weeks ago by arsyed
[1710.05468] Generalization in Deep Learning
"This paper explains why deep learning can generalize well, despite large capacity and possible algorithmic instability, nonrobustness, and sharp minima, effectively addressing an open problem in the literature. Based on our theoretical insight, this paper also proposes a family of new regularization methods. Its simplest member was empirically shown to improve base models and achieve state-of-the-art performance on MNIST and CIFAR-10 benchmarks. Moreover, this paper presents both data-dependent and data-independent generalization guarantees with improved convergence rates. Our results suggest several new open areas of research."
papers  deep-learning  generalization 
8 weeks ago by arsyed
The Two Phases of Gradient Descent in Deep Learning
Good article that reviews recent papers on the theory behind SGD in deep learning. The links to other papers in this article are also very helpful.
deeplearning  ai  theory  sgd  compression  generalization  informationtheory 
10 weeks ago by drmeme
New Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine
Great review article of a paper explaining the results of a new theory on how deep learning works. They describe SGD as having two distinct phases, a drift phase and a diffusion phase. SGD begins in the first phase, basically exploring the multidimensional space of solutions. When it begins converging, it arrives at the diffusion phase where it is extremely chaotic and the convergence rate slows to a crawl. Also, read the original article at and a video of a talk at
deeplearning  ai  theory  sgd  compression  generalization  informationtheory 
10 weeks ago by drmeme
What does ">" really mean?
This Snapshot is about the generalization of ">" from ordinary numbers to so-called fields. At the end, I will touch on some ideas in recent research.
mathematics  generalization  rather-interesting  summary 
11 weeks ago by Vaguery
[1703.09580] Early Stopping without a Validation Set
"Early stopping is a widely used technique to prevent poor generalization performance when training an over-expressive model by means of gradient-based optimization. To find a good point to halt the optimizer, a common practice is to split the dataset into a training and a smaller validation set to obtain an ongoing estimate of the generalization performance. We propose a novel early stopping criterion based on fast-to-compute local statistics of the computed gradients and entirely removes the need for a held-out validation set. Our experiments show that this is a viable approach in the setting of least-squares and logistic regression, as well as neural networks."
papers  machine-learning  early-stopping  generalization 
august 2017 by arsyed

« earlier    

related tags

2016-election  ability-competence  academia  accuracy  acm  acmtariat  adaptation  adriancolyer  adversarial  ai-control  ai  algorithm  algorithms  alien-character  alt-inst  altruism  analysis  anecdotal-evidence  anthropology  antidemos  aphorism  applicability-prereqs  applications  architecture  arms  article  arxiv  asia  auto-learning  bangbang  bare-hands  bayesian  behavioral-econ  behavioral-gen  being-right  ben-recht  benchmarking  berkeley  best-practices  bias-variance  bias  biases  big-peeps  big-picture  big_data  bio  biodet  bioinformatics  bits  bonferroni  books  bounded-cognition  broad-econ  cardio  cartography  case-based-reasoning  causation  cellular-automata  chart  china  clever-rats  cliometrics  cog-psych  cognitive-science  combinatorics  commandline  commentary  comparison  complex-systems  compression  computation  computational-geometry  computer-vision  concept  conceptual-vocab  conference  confidence  confounding  confusion  consider:feature-discovery  consider:looking-at-gp-models  consider:looking-to-see  consider:performance-measures  consider:rediscovery  consider:representation  consider:symbolic-regression  consider:the-other-way-too  constraint-satisfaction  context  contracts  control  convexity-curvature  cooperate-defect  coordination  cost-benefit  counter-revolution  counterfactual  coupled-oscillators  cracker-econ  criminology  critique  cultural-dynamics  curvature  cycles  data-fusion-sortof  data-science  data  datascience  debate  decision-making  deep-learning  deep  deep_learning  deepgoog  deeplearning  define-your-terms  definition  degrees-of-freedom  descriptive  differential-privacy  dimensionality  dirty-hands  discovery  discussion  disease  diversity  early-stopping  econometrics  economics  econotariat  education  effect-size  egalitarianism-hierarchy  embedded-cognition  emergent  empirical  endo-exo  endogenous-exogenous  energy-resources  enhancement  ensembles  epistemic  ergodic  error  essay  europe  events  evopsych  expert-experience  expert  explanans  explanation  exploratory  exposition  extrema  farmers-and-foragers  field-study  finance  flexibility  flux-stasis  formal-models  formalization  free-riding  frequentist  frontier  fuzzy-logic  game-theory  garett-jones  gelman  gender-diff  gender  generative-models  generative  genetics  genomics  geo  geojson  geometry  giants  gis  github  gotchas  government  gradient-descent  graph-theory  greedy  ground-up  gt-101  gwas  gwern  hard-tech  hardness  henrich  heterodox  high-dimension  history  hmm  hn  holdout  homo-hetero  housing  howto  hsu  huge-data-the-biggest  human-ml  hypergraphs  hypothesis-testing  ideas  image-processing  impetus  incentives  individualism-collectivism  inductive-reasoning  industrial-org  info-dynamics  infographics  information-theory  informationtheory  init  innovation  intel  intelligence  interdisciplinary  interests  interpretability  intervention  intricacy  intuition  investing  iq  iraq-syria  iteration-recursion  jargon  javascript  justice  kids  latent-variables  latin-america  learning-theory  learning  len:long  lens  lesswrong  levers  liner-notes  links  list  local-global  logistic  lol  longitudinal  lower-bounds  machine-learning  machine_learning  machinelearning  macro  magic-squares  magnitude  management  map-territory  mapping  maps  marginal-rev  marginal  market-failure  markov  matching  math.ds  mathematical-recreations  mathematics  matrix-factorization  measurement  mechanics  medicine  memorization  mena  meta-analysis  meta:medicine  meta:prediction  meta:rhetoric  meta:science  metabuch  metameta  methodology  metrics  microfoundations  military  mixup  ml  model-class  model-selection  modeling  models  moments  monetary-fiscal  monte-carlo  morality  mostly-modern  mrtz  multi  n-factor  ner  neural-net  neural-networks  neural_networks  news  nibble  nlp  nn  no-go  noise-structure  nonlinear-dynamics  nudge-targets  null-result  occident  off-convex  one-way-to-look-at-it  online-learning  openai  optimization  orfe  org:bleg  org:econlib  org:edu  org:junk  org:lite  org:mat  org:sci  organization  organizing  oscillation  osm  out-of-sample-recognition  out-of-the-box  outliers  overfitting  overflow  pac  packing  paper  papers  parent  pdf  people  personality  perturbation  phalanges  piketty  piracy  poast  policy  politics  pop-diff  pop-structure  population-biology  preprint  priors-posteriors  problem-solving  programming  proposal  prototype-theory  pseudoe  psych-architecture  psychiatry  psychology  psychometrics  public-goodish  q-n-a  qra  qtl  questions  quotes  race  rademacher  random  ranking  rant  rather-interesting  rationality  ratty  realness  reasoning  reference  reflection  regression  regularization  regularizer  reinforcement  relaxation  replication  representation  research-article  research-program  research  review  rhetoric  rivers  robotics  robust  roots  rule-induction  s:*  sample-complexity  sampling-bias  sampling  sanjeev-arora  scale  science  scitariat  sebastien-bubeck  securities  sensitivity  sgd  shapefile  signal-noise  signal-processing  similarity  simplify  simulation  skunkworks  social-psych  social-science  social  society  sociology  solid-study  sparsity  spatial  speculation  stability  stackex  stat-power  statesmen  statistical  statistical_mechanics  statistics  stats  stereotype  stochastic-resonance  stories  stress  study  success  summary  supervised-learning  supply-demand  survival  synthesis  tcs  technology  techtariat  terminal  testerror  the-great-west-whale  the-trenches  the-world-is-just-atoms  theory-practice  theory  things  thinking  to-read  to-understand  to-write-about  tool  track-record  tradeoffs  trainingerror  trends  trump  truth  tutorial  twitter  uncertainty  universalism-particularism  usa  values  variance-components  variance  vc-dimension  vc  visual-understanding  visualization  visuo  volo-avolo  water  west-hunter  wiki  wire-guided  wonkish  yak-shaving  yoga  🌞  🎩  🔬 

Copy this bookmark: