nhaliday + acm + off-convex   19

How to Escape Saddle Points Efficiently – Off the convex path
A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts), the critical open problem is one of efficiency — is GD able to move past saddle points quickly, or can it be slowed down significantly? How does the rate of escape scale with the ambient dimensionality? In this post, we describe our recent work with Rong Ge, Praneeth Netrapalli and Sham Kakade, that provides the first provable positive answer to the efficiency question, showing that, rather surprisingly, GD augmented with suitable perturbations escapes saddle points efficiently; indeed, in terms of rate and dimension dependence it is almost as if the saddle points aren’t there!
acmtariat  org:bleg  nibble  liner-notes  machine-learning  acm  optimization  gradient-descent  local-global  off-convex  time-complexity  random  perturbation  michael-jordan  iterative-methods  research  learning-theory  math.DS  iteration-recursion 
july 2017 by nhaliday
Unsupervised learning, one notion or many? – Off the convex path
(Task A) Learning a distribution from samples. (Examples: gaussian mixtures, topic models, variational autoencoders,..)

(Task B) Understanding latent structure in the data. This is not the same as (a); for example principal component analysis, clustering, manifold learning etc. identify latent structure but don’t learn a distribution per se.

(Task C) Feature Learning. Learn a mapping from datapoint → feature vector such that classification tasks are easier to carry out on feature vectors rather than datapoints. For example, unsupervised feature learning could help lower the amount of labeled samples needed for learning a classifier, or be useful for domain adaptation.

Task B is often a subcase of Task C, as the intended user of “structure found in data” are humans (scientists) who pour over the representation of data to gain some intuition about its properties, and these “properties” can be often phrased as a classification task.

This post explains the relationship between Tasks A and C, and why they get mixed up in students’ mind. We hope there is also some food for thought here for experts, namely, our discussion about the fragility of the usual “perplexity” definition of unsupervised learning. It explains why Task A doesn’t in practice lead to good enough solution for Task C. For example, it has been believed for many years that for deep learning, unsupervised pretraining should help supervised training, but this has been hard to show in practice.
acmtariat  org:bleg  nibble  machine-learning  acm  thinking  clarity  unsupervised  conceptual-vocab  concept  explanation  features  bayesian  off-convex  deep-learning  latent-variables  generative  intricacy  distribution  sampling 
june 2017 by nhaliday

bundles : academeacmframe

related tags

accretion  acm  acmtariat  advanced  aphorism  bandits  bayesian  ben-recht  characterization  clarity  commentary  concept  conceptual-vocab  conference  cool  course  deep-learning  descriptive  dimensionality  distribution  dynamical  EGT  events  evolution  expert  expert-experience  explanans  explanation  exposition  extrema  features  frontier  generalization  generative  gradient-descent  graphical-models  ground-up  high-dimension  homepage  hsu  ideas  init  interdisciplinary  intricacy  iteration-recursion  iterative-methods  latent-variables  learning-theory  lecture-notes  lectures  levers  linearity  liner-notes  links  list  local-global  machine-learning  markov  math  math.DS  mathtariat  metabuch  metrics  michael-jordan  mit  mixing  model-class  monte-carlo  nibble  nlp  off-convex  oly  online-learning  optimization  org:bleg  overflow  p:**  p:***  p:someday  PAC  papers  people  perturbation  princeton  priors-posteriors  probability  prof  q-n-a  quixotic  random  reinforcement  research  research-program  rigor  sample-complexity  sampling  sanjeev-arora  scitariat  sebastien-bubeck  seminar  soft-question  speculation  stochastic-processes  stream  summary  talks  tcs  tensors  thinking  time-complexity  toolkit  topics  tutorial  unit  unsupervised  vc-dimension  video  volo-avolo  yoga  👳 

Copy this bookmark: