nhaliday + unsupervised   22

Unsupervised learning, one notion or many? – Off the convex path
(Task A) Learning a distribution from samples. (Examples: gaussian mixtures, topic models, variational autoencoders,..)

(Task B) Understanding latent structure in the data. This is not the same as (a); for example principal component analysis, clustering, manifold learning etc. identify latent structure but don’t learn a distribution per se.

(Task C) Feature Learning. Learn a mapping from datapoint → feature vector such that classification tasks are easier to carry out on feature vectors rather than datapoints. For example, unsupervised feature learning could help lower the amount of labeled samples needed for learning a classifier, or be useful for domain adaptation.

Task B is often a subcase of Task C, as the intended user of “structure found in data” are humans (scientists) who pour over the representation of data to gain some intuition about its properties, and these “properties” can be often phrased as a classification task.

This post explains the relationship between Tasks A and C, and why they get mixed up in students’ mind. We hope there is also some food for thought here for experts, namely, our discussion about the fragility of the usual “perplexity” definition of unsupervised learning. It explains why Task A doesn’t in practice lead to good enough solution for Task C. For example, it has been believed for many years that for deep learning, unsupervised pretraining should help supervised training, but this has been hard to show in practice.
acmtariat  org:bleg  nibble  machine-learning  acm  thinking  clarity  unsupervised  conceptual-vocab  concept  explanation  features  bayesian  off-convex  deep-learning  latent-variables  generative  intricacy  distribution  sampling  grokkability-clarity  org:popup 
june 2017 by nhaliday
Predicting with confidence: the best machine learning idea you never heard of | Locklin on science
The advantages of conformal prediction are many fold. These ideas assume very little about the thing you are trying to forecast, the tool you’re using to forecast or how the world works, and they still produce a pretty good confidence interval. Even if you’re an unrepentant Bayesian, using some of the machinery of conformal prediction, you can tell when things have gone wrong with your prior. The learners work online, and with some modifications and considerations, with batch learning. One of the nice things about calculating confidence intervals as a part of your learning process is they can actually lower error rates or use in semi-supervised learning as well. Honestly, I think this is the best bag of tricks since boosting; everyone should know about and use these ideas.

The essential idea is that a “conformity function” exists. Effectively you are constructing a sort of multivariate cumulative distribution function for your machine learning gizmo using the conformity function. Such CDFs exist for classical stuff like ARIMA and linear regression under the correct circumstances; CP brings the idea to machine learning in general, and to models like ARIMA when the standard parametric confidence intervals won’t work. Within the framework, the conformity function, whatever may be, when used correctly can be guaranteed to give confidence intervals to within a probabilistic tolerance. The original proofs and treatments of conformal prediction, defined for sequences, is extremely computationally inefficient. The conditions can be relaxed in many cases, and the conformity function is in principle arbitrary, though good ones will produce narrower confidence regions. Somewhat confusingly, these good conformity functions are referred to as “efficient” -though they may not be computationally efficient.
techtariat  acmtariat  acm  machine-learning  bayesian  stats  exposition  research  online-learning  probability  decision-theory  frontier  unsupervised  confidence 
february 2017 by nhaliday

bundles : abstractacademeacmframe

related tags

abstraction  academia  acm  acmtariat  adversarial  ai  ai-control  algorithms  analogy  analysis  announcement  applications  approximation  arms  asia  atoms  attention  audio  automation  average-case  bandits  bare-hands  bayesian  ben-recht  benchmarks  best-practices  big-picture  biotech  bits  checking  checklists  china  clarity  classic  clever-rats  coarse-fine  commentary  comparison  competition  complement-substitute  composition-decomposition  computer-vision  concept  conceptual-vocab  conference  confidence  confusion  convexity-curvature  cooperate-defect  coordination  cost-benefit  crux  data-science  dataviz  debate  debugging  decision-making  decision-theory  deep-learning  deepgoog  definition  descriptive  developmental  dimensionality  direct-indirect  discrete  discussion  distribution  economics  empirical  engineering  enhancement  ensembles  entropy-like  equilibrium  error  events  evolution  experiment  expert-experience  explanation  exploratory  explore-exploit  exposition  extrema  facebook  features  flexibility  frontier  futurism  games  generalization  generative  google  gotchas  gradient-descent  graph-theory  graphs  grokkability-clarity  ground-up  guide  gwern  heuristic  hi-order-bits  high-dimension  hmm  homepage  howto  hsu  humanity  idk  incentives  information-theory  init  insight  intelligence  interdisciplinary  interview  intricacy  iteration-recursion  land  language  latent-variables  learning-theory  lectures  lens  lesswrong  linear-algebra  linearity  liner-notes  linguistics  links  list  local-global  machine-learning  markets  math  math.DS  matrix-factorization  methodology  mit  model-class  models  moloch  nature  network-structure  neuro  neuro-nitgrit  nibble  nitty-gritty  nlp  nonlinearity  number  off-convex  offense-defense  oly  online-learning  openai  operational  optimization  org:bleg  org:com  org:med  org:popup  orourke  overflow  p:someday  papers  pdf  performance  plots  podcast  prediction  princeton  probability  prof  programming  q-n-a  questions  random  ratty  realness  reduction  reference  regularization  reinforcement  replication  research  research-program  retention  rhetoric  rigor  risk  robotics  saas  sample-complexity  sampling  sanjeev-arora  science  scifi-fantasy  scitariat  search  sequential  signal-noise  similarity  singularity  slides  smoothness  soft-question  sparsity  speculation  speedometer  stanford  startups  state-of-art  stats  street-fighting  summary  supply-demand  survey  synthesis  systems  talks  tcs  technology  techtariat  telos-atelos  the-self  thesis  thinking  threat-modeling  time  todo  top-n  track-record  trends  turing  tutorial  unit  unsupervised  vague  values  VC-dimension  video  volo-avolo  wiki  yoga 

Copy this bookmark: