off-convex   34

Information Processing: Mathematical Theory of Deep Neural Networks (Princeton workshop)
"Recently, long-past-due theoretical results have begun to emerge. These results, and those that will follow in their wake, will begin to shed light on the properties of large, adaptive, distributed learning architectures, and stand to revolutionize how computer science and neuroscience understand these systems."
hsu  scitariat  commentary  links  research  research-program  workshop  events  princeton  sanjeev-arora  deep-learning  machine-learning  ai  generalization  explanans  off-convex  nibble  frontier  speedometer  state-of-art  big-surf  announcement 
january 2018 by nhaliday
How to Escape Saddle Points Efficiently – Off the convex path
A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts), the critical open problem is one of efficiency — is GD able to move past saddle points quickly, or can it be slowed down significantly? How does the rate of escape scale with the ambient dimensionality? In this post, we describe our recent work with Rong Ge, Praneeth Netrapalli and Sham Kakade, that provides the first provable positive answer to the efficiency question, showing that, rather surprisingly, GD augmented with suitable perturbations escapes saddle points efficiently; indeed, in terms of rate and dimension dependence it is almost as if the saddle points aren’t there!
acmtariat  org:bleg  nibble  liner-notes  machine-learning  acm  optimization  gradient-descent  local-global  off-convex  time-complexity  random  perturbation  michael-jordan  iterative-methods  research  learning-theory  math.DS  iteration-recursion 
july 2017 by nhaliday
Unsupervised learning, one notion or many? – Off the convex path
(Task A) Learning a distribution from samples. (Examples: gaussian mixtures, topic models, variational autoencoders,..)

(Task B) Understanding latent structure in the data. This is not the same as (a); for example principal component analysis, clustering, manifold learning etc. identify latent structure but don’t learn a distribution per se.

(Task C) Feature Learning. Learn a mapping from datapoint → feature vector such that classification tasks are easier to carry out on feature vectors rather than datapoints. For example, unsupervised feature learning could help lower the amount of labeled samples needed for learning a classifier, or be useful for domain adaptation.

Task B is often a subcase of Task C, as the intended user of “structure found in data” are humans (scientists) who pour over the representation of data to gain some intuition about its properties, and these “properties” can be often phrased as a classification task.

This post explains the relationship between Tasks A and C, and why they get mixed up in students’ mind. We hope there is also some food for thought here for experts, namely, our discussion about the fragility of the usual “perplexity” definition of unsupervised learning. It explains why Task A doesn’t in practice lead to good enough solution for Task C. For example, it has been believed for many years that for deep learning, unsupervised pretraining should help supervised training, but this has been hard to show in practice.
acmtariat  org:bleg  nibble  machine-learning  acm  thinking  clarity  unsupervised  conceptual-vocab  concept  explanation  features  bayesian  off-convex  deep-learning  latent-variables  generative  intricacy  distribution  sampling 
june 2017 by nhaliday
Why is the Lin and Tegmark paper 'Why does deep and cheap learning work so well?' important? - Quora
To take the analogy further than I probably should, the resolution to the magic key problem might be that the key is magical, but that the locks are particularly magical. For deep learning, my guess is that it’s a bit of both.
q-n-a  qra  papers  liner-notes  deep-learning  off-convex  machine-learning  explanation  nibble  big-picture  explanans 
february 2017 by nhaliday

related tags

academia  accretion  accuracy  acm  acmtariat  ai  announcement  aphorism  atoms  bandits  bayesian  ben-recht  big-picture  big-surf  blog  boltzmann  characterization  clarity  commentary  composition-decomposition  computer-vision  concept  conceptual-vocab  conference  constraint-satisfaction  convexity-curvature  cool  correlation  course  critique  curvature  database  deep-learning  descriptive  dimensionality  direction  discussion  distribution  dynamical  egt  embeddings  events  evolution  expert-experience  expert  explanans  explanation  exposition  extrema  fall-2015  features  frontier  generalization  generative  gradient-descent  graphical-models  ground-up  hardness  hi-order-bits  high-dimension  hmm  homepage  hsu  ideas  init  interdisciplinary  intricacy  isotropy  iteration-recursion  iterative-methods  kernels  language  latent-variables  learning-theory  lecture-notes  lectures  levers  linear-algebra  linearity  liner-notes  links  list  local-global  machine-learning  markov  math.ds  math  mathtariat  matrix-factorization  metabuch  metrics  michael-jordan  mit  mixing  model-class  monte-carlo  motivation  mrtz  multi  neuro  nibble  nlp  no-go  nonlinearity  oly  online-learning  openai  optimization  org:bleg  org:mat  overflow  p:***  p:**  p:someday  pac  papers  pdf  people  perturbation  preprint  princeton  priors-posteriors  probability  prof  publishing  q-n-a  qra  quixotic  random  reflection  regularization  reinforcement  research-program  research  rhetoric  rigor  robust  roots  sample-complexity  sampling  sanjeev-arora  scitariat  sebastien-bubeck  seminar  sensitivity  slides  soft-question  sparsity  spectral  speculation  speedometer  stanford  state-of-art  stochastic-processes  stream  summary  synthesis  talks  tcs  tensors  things  thinking  time-complexity  toolkit  topics  tutorial  unit  unsupervised  vc-dimension  video  volo-avolo  workshop  yoga  👳 

Copy this bookmark: