nhaliday + acm + acmtariat   64

Sequence Modeling with CTC
A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.
acmtariat  techtariat  org:bleg  nibble  better-explained  machine-learning  deep-learning  visual-understanding  visualization  analysis  let-me-see  research  sequential  audio  classification  model-class  exposition  language  acm  approximation  comparison  markov  iteration-recursion  concept  atoms  distribution  orders  DP  heuristic  optimization  trees  greedy  matching  gradient-descent 
december 2017 by nhaliday
[1709.06560] Deep Reinforcement Learning that Matters
I’ve been experimenting w/ various kinds of value function approaches to RL lately, and its striking how primitive and bad things seem to be
At first I thought it was just that my code sucks, but then I played with the OpenAI baselines and nope, it’s the children that are wrong.
And now, what comes across my desk but this fantastic paper: (link: https://arxiv.org/abs/1709.06560) arxiv.org/abs/1709.06560 How long until the replication crisis hits AI?

Seriously I’m not blown away by the PhDs’ records over the last 30 years. I bet you’d get better payoff funding eccentrics and amateurs.
There are essentially zero fundamentally new ideas in AI, the papers are all grotesquely hyperparameter tuned, nobody knows why it works.

Deep Reinforcement Learning Doesn't Work Yet: https://www.alexirpan.com/2018/02/14/rl-hard.html
Once, on Facebook, I made the following claim.

Whenever someone asks me if reinforcement learning can solve their problem, I tell them it can’t. I think this is right at least 70% of the time.
papers  preprint  machine-learning  acm  frontier  speedometer  deep-learning  realness  replication  state-of-art  survey  reinforcement  multi  twitter  social  discussion  techtariat  ai  nibble  org:mat  unaffiliated  ratty  acmtariat  liner-notes  critique  sample-complexity  cost-benefit  todo 
september 2017 by nhaliday
How to Escape Saddle Points Efficiently – Off the convex path
A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts), the critical open problem is one of efficiency — is GD able to move past saddle points quickly, or can it be slowed down significantly? How does the rate of escape scale with the ambient dimensionality? In this post, we describe our recent work with Rong Ge, Praneeth Netrapalli and Sham Kakade, that provides the first provable positive answer to the efficiency question, showing that, rather surprisingly, GD augmented with suitable perturbations escapes saddle points efficiently; indeed, in terms of rate and dimension dependence it is almost as if the saddle points aren’t there!
acmtariat  org:bleg  nibble  liner-notes  machine-learning  acm  optimization  gradient-descent  local-global  off-convex  time-complexity  random  perturbation  michael-jordan  iterative-methods  research  learning-theory  math.DS  iteration-recursion 
july 2017 by nhaliday
Unsupervised learning, one notion or many? – Off the convex path
(Task A) Learning a distribution from samples. (Examples: gaussian mixtures, topic models, variational autoencoders,..)

(Task B) Understanding latent structure in the data. This is not the same as (a); for example principal component analysis, clustering, manifold learning etc. identify latent structure but don’t learn a distribution per se.

(Task C) Feature Learning. Learn a mapping from datapoint → feature vector such that classification tasks are easier to carry out on feature vectors rather than datapoints. For example, unsupervised feature learning could help lower the amount of labeled samples needed for learning a classifier, or be useful for domain adaptation.

Task B is often a subcase of Task C, as the intended user of “structure found in data” are humans (scientists) who pour over the representation of data to gain some intuition about its properties, and these “properties” can be often phrased as a classification task.

This post explains the relationship between Tasks A and C, and why they get mixed up in students’ mind. We hope there is also some food for thought here for experts, namely, our discussion about the fragility of the usual “perplexity” definition of unsupervised learning. It explains why Task A doesn’t in practice lead to good enough solution for Task C. For example, it has been believed for many years that for deep learning, unsupervised pretraining should help supervised training, but this has been hard to show in practice.
acmtariat  org:bleg  nibble  machine-learning  acm  thinking  clarity  unsupervised  conceptual-vocab  concept  explanation  features  bayesian  off-convex  deep-learning  latent-variables  generative  intricacy  distribution  sampling 
june 2017 by nhaliday
Prékopa–Leindler inequality | Academically Interesting
Consider the following statements:
1. The shape with the largest volume enclosed by a given surface area is the n-dimensional sphere.
2. A marginal or sum of log-concave distributions is log-concave.
3. Any Lipschitz function of a standard n-dimensional Gaussian distribution concentrates around its mean.
What do these all have in common? Despite being fairly non-trivial and deep results, they all can be proved in less than half of a page using the Prékopa–Leindler inequality.

ie, Brunn-Minkowski
acmtariat  clever-rats  ratty  math  acm  geometry  measure  math.MG  estimate  distribution  concentration-of-measure  smoothness  regularity  org:bleg  nibble  brunn-minkowski  curvature  convexity-curvature 
february 2017 by nhaliday
Predicting with confidence: the best machine learning idea you never heard of | Locklin on science
The advantages of conformal prediction are many fold. These ideas assume very little about the thing you are trying to forecast, the tool you’re using to forecast or how the world works, and they still produce a pretty good confidence interval. Even if you’re an unrepentant Bayesian, using some of the machinery of conformal prediction, you can tell when things have gone wrong with your prior. The learners work online, and with some modifications and considerations, with batch learning. One of the nice things about calculating confidence intervals as a part of your learning process is they can actually lower error rates or use in semi-supervised learning as well. Honestly, I think this is the best bag of tricks since boosting; everyone should know about and use these ideas.

The essential idea is that a “conformity function” exists. Effectively you are constructing a sort of multivariate cumulative distribution function for your machine learning gizmo using the conformity function. Such CDFs exist for classical stuff like ARIMA and linear regression under the correct circumstances; CP brings the idea to machine learning in general, and to models like ARIMA when the standard parametric confidence intervals won’t work. Within the framework, the conformity function, whatever may be, when used correctly can be guaranteed to give confidence intervals to within a probabilistic tolerance. The original proofs and treatments of conformal prediction, defined for sequences, is extremely computationally inefficient. The conditions can be relaxed in many cases, and the conformity function is in principle arbitrary, though good ones will produce narrower confidence regions. Somewhat confusingly, these good conformity functions are referred to as “efficient” -though they may not be computationally efficient.
techtariat  acmtariat  acm  machine-learning  bayesian  stats  exposition  research  online-learning  probability  decision-theory  frontier  unsupervised  confidence 
february 2017 by nhaliday
A Fervent Defense of Frequentist Statistics - Less Wrong
Short summary. This essay makes many points, each of which I think is worth reading, but if you are only going to understand one point I think it should be “Myth 5″ below, which describes the online learning framework as a response to the claim that frequentist methods need to make strong modeling assumptions. Among other things, online learning allows me to perform the following remarkable feat: if I’m betting on horses, and I get to place bets after watching other people bet but before seeing which horse wins the race, then I can guarantee that after a relatively small number of races, I will do almost as well overall as the best other person, even if the number of other people is very large (say, 1 billion), and their performance is correlated in complicated ways.

If you’re only going to understand two points, then also read about the frequentist version of Solomonoff induction, which is described in “Myth 6″.


If you are like me from, say, two years ago, you are firmly convinced that Bayesian methods are superior and that you have knockdown arguments in favor of this. If this is the case, then I hope this essay will give you an experience that I myself found life-altering: the experience of having a way of thinking that seemed unquestionably true slowly dissolve into just one of many imperfect models of reality. This experience helped me gain more explicit appreciation for the skill of viewing the world from many different angles, and of distinguishing between a very successful paradigm and reality.

If you are not like me, then you may have had the experience of bringing up one of many reasonable objections to normative Bayesian epistemology, and having it shot down by one of many “standard” arguments that seem wrong but not for easy-to-articulate reasons. I hope to lend some reprieve to those of you in this camp, by providing a collection of “standard” replies to these standard arguments.
bayesian  philosophy  stats  rhetoric  advice  debate  critique  expert  lesswrong  commentary  discussion  regularizer  essay  exposition  🤖  aphorism  spock  synthesis  clever-rats  ratty  hi-order-bits  top-n  2014  acmtariat  big-picture  acm  iidness  online-learning  lens  clarity  unit  nibble  frequentist  s:**  expert-experience  subjective-objective 
september 2016 by nhaliday

bundles : academeacmframemetapeepspub

related tags

academia  accretion  accuracy  acm  acmtariat  advanced  adversarial  advice  ai  ai-control  akrasia  algorithmic-econ  algorithms  alignment  altruism  analysis  anthropic  aphorism  apollonian-dionysian  applications  approximation  arrows  atoms  audio  automata-languages  average-case  bandits  bare-hands  bayesian  ben-recht  benchmarks  berkeley  better-explained  bias-variance  biases  big-picture  bio  bits  blog  boltzmann  bonferroni  books  brunn-minkowski  caltech  causation  chart  checklists  civilization  clarity  classification  clever-rats  cmu  coarse-fine  commentary  communication  comparison  complement-substitute  composition-decomposition  compressed-sensing  concentration-of-measure  concept  conceptual-vocab  concurrency  conference  confidence  confluence  confusion  context  contracts  convexity-curvature  cooperate-defect  cost-benefit  counterfactual  critique  curvature  data  data-science  debate  decision-making  decision-theory  deep-learning  definition  dennett  descriptive  differential  differential-privacy  dimensionality  direction  discussion  distribution  DP  duality  dumb-ML  duplication  dynamic  dynamical  economics  EGT  elegance  embeddings  empirical  encyclopedic  ends-means  entropy-like  ergodic  essay  estimate  ethics  events  evolution  examples  existence  expansionism  expert  expert-experience  explanans  explanation  exploratory  exposition  extrema  fall-2016  features  fermi  finiteness  formal-values  fourier  frequentist  frontier  futurism  game-theory  gedanken  gelman  generalization  generative  geometry  gradient-descent  graph-theory  graphical-models  graphs  greedy  ground-up  growth-econ  guide  heuristic  hi-order-bits  hn  homepage  homogeneity  human-ml  hypothesis-testing  ideas  IEEE  iidness  impact  info-dynamics  information-theory  init  intelligence  interdisciplinary  intricacy  invariance  isotropy  iteration-recursion  iterative-methods  jargon  kernels  knowledge  language  latent-variables  learning-theory  lectures  lens  lesswrong  let-me-see  levers  limits  linear-algebra  linear-models  linearity  liner-notes  links  list  local-global  lower-bounds  machine-learning  magnitude  manifolds  marginal  markov  martingale  matching  math  math.CA  math.DS  math.GN  math.MG  matrix-factorization  measure  meta:math  meta:science  metabuch  metameta  methodology  metrics  michael-jordan  mixing  ML-MAP-E  model-class  models  moments  morality  mrtz  multi  neurons  nibble  nips  nlp  no-go  noise-structure  nonlinearity  norms  numerics  objektbuch  occam  off-convex  online-learning  openai  optimization  orders  org:bleg  org:edu  org:inst  org:mat  org:med  oscillation  p:*  p:***  p:whenever  PAC  papers  parsimony  peace-violence  people  performance  perturbation  philosophy  physics  pigeonhole-markov  polynomials  power-law  pragmatic  pre-2013  preprint  presentation  princeton  prioritizing  priors-posteriors  pro-rata  probability  problem-solving  prof  project  questions  quixotic  random  ranking  rant  ratty  realness  recommendations  reduction  reflection  regression  regularity  regularizer  reinforcement  relativity  replication  repo  research  research-program  review  rhetoric  rigor  risk  roadmap  robust  rounding  s:*  s:**  s:***  sample-complexity  sampling  sanjeev-arora  scale  scholar-pack  science  scitariat  search  sebastien-bubeck  sensitivity  sequential  series  signal-noise  simulation  singularity  skeleton  smoothness  social  space  sparsity  spectral  speculation  speed  speedometer  spock  state-of-art  stats  stochastic-processes  stories  stream  street-fighting  studying  subjective-objective  success  summary  survey  synthesis  systematic-ad-hoc  tails  talks  tcs  teaching  tech  technical-writing  techtariat  telos-atelos  tensors  the-trenches  things  thinking  threat-modeling  tidbits  tightness  time-complexity  todo  toolkit  top-n  topology  track-record  trade  tradeoffs  trees  tricki  tricks  tutorial  twitter  unaffiliated  unit  unsupervised  values  vc-dimension  video  visual-understanding  visualization  volo-avolo  workshop  writing  xenobio  yoga  🎓  👳  🔬  🤖 

Copy this bookmark: