nhaliday + iterative-methods   11

How to Escape Saddle Points Efficiently – Off the convex path
A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts), the critical open problem is one of efficiency — is GD able to move past saddle points quickly, or can it be slowed down significantly? How does the rate of escape scale with the ambient dimensionality? In this post, we describe our recent work with Rong Ge, Praneeth Netrapalli and Sham Kakade, that provides the first provable positive answer to the efficiency question, showing that, rather surprisingly, GD augmented with suitable perturbations escapes saddle points efficiently; indeed, in terms of rate and dimension dependence it is almost as if the saddle points aren’t there!
acmtariat  org:bleg  nibble  liner-notes  machine-learning  acm  optimization  gradient-descent  local-global  off-convex  time-complexity  random  perturbation  michael-jordan  iterative-methods  research  learning-theory  math.DS  iteration-recursion 
july 2017 by nhaliday

bundles : academeacm

related tags

academia  acm  acmtariat  advanced  adversarial  algorithms  amortization-potential  analysis  ankur-moitra  approximation  atoms  average-case  ben-recht  better-explained  bias-variance  big-picture  blog  boltzmann  characterization  concept  convexity-curvature  cool  course  curvature  data-structures  dimensionality  duality  dynamic  dynamical  embedding  ensembles  expert  expert-experience  explanation  exposition  extrema  flux-stasis  fourier  geometry  georgia  gradient-descent  graph-theory  graphs  greedy  ground-up  hashing  init  iteration-recursion  iterative-methods  latency-throughput  latent-variables  learning-theory  lecture-notes  lens  let-me-see  levers  linear-algebra  linear-programming  liner-notes  local-global  machine-learning  markov  math.DS  metabuch  michael-jordan  mihai  mit  mixing  moments  monte-carlo  mrtz  nibble  off-convex  online-learning  optimization  org:bleg  org:inst  oscillation  p:***  perturbation  polynomials  presentation  princeton  quixotic  rand-approx  random  random-matrices  random-networks  reflection  research  robust  rounding  sampling  SDP  sequential  signal-noise  spectral  stat-mech  state  stock-flow  stream  sublinear  submodular  synthesis  talks  tcs  techtariat  tensors  time  time-complexity  toolkit  topics  trees  tutorial  unit  video  visual-understanding  visualization  yoga  👳 

Copy this bookmark: