nhaliday + sample-complexity   8

[1709.06560] Deep Reinforcement Learning that Matters
https://twitter.com/WAWilsonIV/status/912505885565452288
I’ve been experimenting w/ various kinds of value function approaches to RL lately, and its striking how primitive and bad things seem to be
At first I thought it was just that my code sucks, but then I played with the OpenAI baselines and nope, it’s the children that are wrong.
And now, what comes across my desk but this fantastic paper: (link: https://arxiv.org/abs/1709.06560) arxiv.org/abs/1709.06560 How long until the replication crisis hits AI?

https://twitter.com/WAWilsonIV/status/911318326504153088
Seriously I’m not blown away by the PhDs’ records over the last 30 years. I bet you’d get better payoff funding eccentrics and amateurs.
There are essentially zero fundamentally new ideas in AI, the papers are all grotesquely hyperparameter tuned, nobody knows why it works.

Deep Reinforcement Learning Doesn't Work Yet: https://www.alexirpan.com/2018/02/14/rl-hard.html
Once, on Facebook, I made the following claim.

Whenever someone asks me if reinforcement learning can solve their problem, I tell them it can’t. I think this is right at least 70% of the time.
papers  preprint  machine-learning  acm  frontier  speedometer  deep-learning  realness  replication  state-of-art  survey  reinforcement  multi  twitter  social  discussion  techtariat  ai  nibble  org:mat  unaffiliated  ratty  acmtariat  liner-notes  critique  sample-complexity  cost-benefit  todo 
september 2017 by nhaliday

bundles : acm

related tags

academia  acm  acmtariat  advanced  ai  ai-control  arms  audio  automation  bandits  bayesian  biotech  classification  comparison  computer-vision  concept  convexity-curvature  cost-benefit  course  critique  deep-learning  deepgoog  descriptive  dimensionality  discussion  enhancement  examples  expert  expert-experience  exposition  extrema  frontier  futurism  games  generalization  generative  gradient-descent  graphical-models  ground-up  guide  heuristic  hsu  init  interdisciplinary  interview  language  learning-theory  lecture-notes  liner-notes  linguistics  local-global  machine-learning  markov  math  metabuch  metrics  model-class  monte-carlo  multi  neuro  neuro-nitgrit  nibble  nlp  off-convex  offense-defense  online-learning  openai  operational  optimization  org:bleg  org:mat  p:***  PAC  papers  pdf  podcast  preprint  princeton  priors-posteriors  random  ratty  realness  reduction  reinforcement  replication  research  risk  robotics  sample-complexity  sanjeev-arora  scifi-fantasy  scitariat  sebastien-bubeck  signal-noise  singularity  social  speedometer  state-of-art  summary  survey  technology  techtariat  todo  toolkit  trends  turing  twitter  unaffiliated  unit  unsupervised  vc-dimension  volo-avolo  yoga  👳 

Copy this bookmark:



description:


tags: