ben-recht   21

Workshop Abstract | Identifying and Understanding Deep Learning Phenomena
ICML 2019 workshop, June 15th 2019, Long Beach, CA

We solicit contributions that view the behavior of deep nets as natural phenomena, to be investigated with methods inspired from the natural sciences like physics, astronomy, and biology.
unit  workshop  acm  machine-learning  science  empirical  nitty-gritty  atoms  deep-learning  model-class  icml  data-science  rigor  replication  examples  ben-recht  physics 
8 weeks ago by nhaliday
[1806.09460] A Tour of Reinforcement Learning: The View from Continuous Control
This manuscript surveys reinforcement learning from the perspective of optimization and control with a focus on continuous control applications. It surveys the general formulation, terminology, and typical experimental implementations of reinforcement learning and reviews competing solution paradigms. In order to compare the relative merits of various techniques, this survey presents a case study of the Linear Quadratic Regulator (LQR) with unknown dynamics, perhaps the simplest and best studied problem in optimal control. The manuscript describes how merging techniques from learning theory and control can provide non-asymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. This survey concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and controls might be combined to approach these challenges.
reinforcement-learning  control  ben-recht 
july 2018 by arsyed
Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented *without any locking*. We present an update scheme called Hogwild which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then Hogwild achieves a nearly optimal rate of convergence. We demonstrate experimentally that Hogwild outperforms alternative schemes that use locking by an order of magnitude.
papers  optimization  parallel  ben-recht  sgd  gradient-descent 
june 2017 by arsyed
[1509.01240] Train faster, generalize better: Stability of stochastic gradient descent
"We show that parametric models trained by a stochastic gradient method (SGM) with few iterations have vanishing generalization error. We prove our results by arguing that SGM is algorithmically stable in the sense of Bousquet and Elisseeff. Our analysis only employs elementary tools from convex and continuous optimization. We derive stability bounds for both convex and non-convex optimization under standard Lipschitz and smoothness assumptions.
Applying our results to the convex case, we provide new insights for why multiple epochs of stochastic gradient methods generalize well in practice. In the non-convex case, we give a new interpretation of common practices in neural networks, and formally show that popular techniques for training large deep models are indeed stability-promoting. Our findings conceptually underscore the importance of reducing training time beyond its obvious benefit."
papers  optimization  generalization  gradient-descent  sgd  ben-recht 
june 2017 by arsyed
[1611.03530] Understanding deep learning requires rethinking generalization
Despite their massive size, successful deep artificial neural networks can exhibit a remarkably small difference between training and test performance. Conventional wisdom attributes small generalization error either to properties of the model family, or to the regularization techniques used during training.
Through extensive systematic experiments, we show how these traditional approaches fail to explain why large neural networks generalize well in practice. Specifically, our experiments establish that state-of-the-art convolutional networks for image classification trained with stochastic gradient methods easily fit a random labeling of the training data. This phenomenon is qualitatively unaffected by explicit regularization, and occurs even if we replace the true images by completely unstructured random noise. We corroborate these experimental findings with a theoretical construction showing that simple depth two neural networks already have perfect finite sample expressivity as soon as the number of parameters exceeds the number of data points as it usually does in practice.
We interpret our experimental findings by comparison with traditional models.
papers  deep-learning  generalization  machine-learning  ben-recht 
december 2016 by arsyed
Bottoming Out – arg min blog
Now, I’ve been hammering the point in my previous posts that saddle points are not what makes non-convex optimization difficult. Here, when specializing to deep learning, even local minima are not getting in my way. Deep neural nets are just very easy to minimize.
machine-learning  deep-learning  optimization  rhetoric  speculation  research  hmm  research-program  acmtariat  generalization  metabuch  local-global  off-convex  ben-recht  extrema  org:bleg  nibble  sparsity  curvature  ideas  aphorism  convexity-curvature  explanans  volo-avolo  hardness 
june 2016 by nhaliday
The News on Auto-tuning – arg min blog
bayesian optimization is not necessarily obviously better than randomized search on all fronts
critique  bayesian  optimization  machine-learning  expert  hmm  liner-notes  rhetoric  debate  acmtariat  ben-recht  mrtz  gwern  random  org:bleg  nibble  expert-experience 
june 2016 by nhaliday

related tags

2016  2017  accretion  acm  acmtariat  advanced  aphorism  approximation  atoms  bare-hands  bayesian  benchmarks  books  characterization  coarse-fine  composition-decomposition  compressed-sensing  concept  confluence  control  convergence  convexity-curvature  cool  critique  curvature  data-science  debate  decision-making  decision-theory  deep-learning  descriptive  direction  empirical  estimate  examples  expert-experience  expert  explanans  explanation  exposition  extrema  frontier  generalization  geometry  gradient-descent  ground-up  guide  gwern  hardness  hi-order-bits  hmm  iclr  icml  ideas  info-foraging  init  intricacy  isotropy  iteration-recursion  iterative-methods  kernels  learning-theory  levers  limits  linear-algebra  linear-regression  linearity  liner-notes  list  local-global  machine-learning  math.ds  math.mg  math  mathtariat  matrix-factorization  metabuch  michael-jordan  model-class  motivation  mrtz  music  nibble  nitty-gritty  nonlinearity  norms  numerics  off-convex  openai  optimization  org:bleg  org:inst  p:someday  p:whenever  papers  parallel  parsimony  perturbation  philosophy  physics  polynomials  presentation  quixotic  random  ranking  reading  realness  recommendations  reduction  reflection  reinforcement-learning  reinforcement  replication  repo  research-program  research  review  reviews  rhetoric  rigor  roadmap  robust  rounding  science  search  sequential  sgd  signal-noise  slides  sparsity  speculation  stability  stories  success  summary  synthesis  systematic-ad-hoc  talks  tcs  the-trenches  thinking  tightness  toolkit  top-n  tricks  tutorial  unit  unsupervised  values  video  volo-avolo  workshop  yoga 

Copy this bookmark:



description:


tags: