nhaliday + gradient-descent   39

Sequence Modeling with CTC
A visual guide to Connectionist Temporal Classification, an algorithm used to train deep neural networks in speech recognition, handwriting recognition and other sequence problems.
acmtariat  techtariat  org:bleg  nibble  better-explained  machine-learning  deep-learning  visual-understanding  visualization  analysis  let-me-see  research  sequential  audio  classification  model-class  exposition  language  acm  approximation  comparison  markov  iteration-recursion  concept  atoms  distribution  orders  DP  heuristic  optimization  trees  greedy  matching  gradient-descent 
december 2017 by nhaliday
New Theory Cracks Open the Black Box of Deep Learning | Quanta Magazine
A new idea called the “information bottleneck” is helping to explain the puzzling success of today’s artificial-intelligence algorithms — and might also explain how human brains learn.

sounds like he's just talking about autoencoders?
news  org:mag  org:sci  popsci  announcement  research  deep-learning  machine-learning  acm  information-theory  bits  neuro  model-class  big-surf  frontier  nibble  hmm  signal-noise  deepgoog  expert  ideas  wild-ideas  summary  talks  video  israel  roots  physics  interdisciplinary  ai  intelligence  shannon  giants  arrows  preimage  lifts-projections  composition-decomposition  characterization  markov  gradient-descent  papers  liner-notes  experiment  hi-order-bits  generalization  expert-experience  explanans  org:inst  speedometer 
september 2017 by nhaliday
Superintelligence Risk Project Update II

For example, I asked him what he thought of the idea that to we could get AGI with current techniques, primarily deep neural nets and reinforcement learning, without learning anything new about how intelligence works or how to implement it ("Prosaic AGI" [1]). He didn't think this was possible, and believes there are deep conceptual issues we still need to get a handle on. He's also less impressed with deep learning than he was before he started working in it: in his experience it's a much more brittle technology than he had been expecting. Specifically, when trying to replicate results, he's often found that they depend on a bunch of parameters being in just the right range, and without that the systems don't perform nearly as well.

The bottom line, to him, was that since we are still many breakthroughs away from getting to AGI, we can't productively work on reducing superintelligence risk now.

He told me that he worries that the AI risk community is not solving real problems: they're making deductions and inferences that are self-consistent but not being tested or verified in the world. Since we can't tell if that's progress, it probably isn't. I asked if he was referring to MIRI's work here, and he said their work was an example of the kind of approach he's skeptical about, though he wasn't trying to single them out. [2]

Earlier this week I had a conversation with an AI researcher [1] at one of the main industry labs as part of my project of assessing superintelligence risk. Here's what I got from them:

They see progress in ML as almost entirely constrained by hardware and data, to the point that if today's hardware and data had existed in the mid 1950s researchers would have gotten to approximately our current state within ten to twenty years. They gave the example of backprop: we saw how to train multi-layer neural nets decades before we had the computing power to actually train these nets to do useful things.

Similarly, people talk about AlphaGo as a big jump, where Go went from being "ten years away" to "done" within a couple years, but they said it wasn't like that. If Go work had stayed in academia, with academia-level budgets and resources, it probably would have taken nearly that long. What changed was a company seeing promising results, realizing what could be done, and putting way more engineers and hardware on the project than anyone had previously done. AlphaGo couldn't have happened earlier because the hardware wasn't there yet, and was only able to be brought forward by massive application of resources.

Summary: I'm not convinced that AI risk should be highly prioritized, but I'm also not convinced that it shouldn't. Highly qualified researchers in a position to have a good sense the field have massively different views on core questions like how capable ML systems are now, how capable they will be soon, and how we can influence their development. I do think these questions are possible to get a better handle on, but I think this would require much deeper ML knowledge than I have.
ratty  core-rats  ai  risk  ai-control  prediction  expert  machine-learning  deep-learning  speedometer  links  research  research-program  frontier  multi  interview  deepgoog  games  hardware  performance  roots  impetus  chart  big-picture  state-of-art  reinforcement  futurism  🤖  🖥  expert-experience  singularity  miri-cfar  empirical  evidence-based  speculation  volo-avolo  clever-rats  acmtariat  robust  ideas  crux  atoms  detail-architecture  software  gradient-descent 
july 2017 by nhaliday
How to Escape Saddle Points Efficiently – Off the convex path
A core, emerging problem in nonconvex optimization involves the escape of saddle points. While recent research has shown that gradient descent (GD) generically escapes saddle points asymptotically (see Rong Ge’s and Ben Recht’s blog posts), the critical open problem is one of efficiency — is GD able to move past saddle points quickly, or can it be slowed down significantly? How does the rate of escape scale with the ambient dimensionality? In this post, we describe our recent work with Rong Ge, Praneeth Netrapalli and Sham Kakade, that provides the first provable positive answer to the efficiency question, showing that, rather surprisingly, GD augmented with suitable perturbations escapes saddle points efficiently; indeed, in terms of rate and dimension dependence it is almost as if the saddle points aren’t there!
acmtariat  org:bleg  nibble  liner-notes  machine-learning  acm  optimization  gradient-descent  local-global  off-convex  time-complexity  random  perturbation  michael-jordan  iterative-methods  research  learning-theory  math.DS  iteration-recursion 
july 2017 by nhaliday

bundles : academeacm

related tags

abstraction  academia  acm  acmtariat  advanced  adversarial  advice  ai  ai-control  algorithms  amortization-potential  analogy  analysis  analytical-holistic  ankur-moitra  announcement  aphorism  apollonian-dionysian  applications  approximation  arms  arrows  art  atoms  attention  audio  auto-learning  automation  average-case  bandits  bare-hands  bayesian  ben-recht  benchmarks  best-practices  better-explained  bias-variance  biases  big-peeps  big-picture  big-surf  biotech  bits  boltzmann  bonferroni  c(pp)  causation  characterization  chart  cheatsheet  checking  checklists  circuits  classification  clever-rats  coarse-fine  code-dive  coding-theory  combo-optimization  commentary  comparison  competition  complement-substitute  composition-decomposition  compressed-sensing  computer-vision  concentration-of-measure  concept  concurrency  convexity-curvature  cool  cooperate-defect  coordination  core-rats  correlation  cost-benefit  counting  course  creative  crux  crypto  curvature  cybernetics  data-science  data-structures  dataviz  debate  debugging  decision-making  decision-theory  deep-learning  deep-materialism  deepgoog  descriptive  detail-architecture  developmental  differential  differential-privacy  dimensionality  direct-indirect  discrete  discussion  distribution  DP  drugs  duality  dynamic  dynamical  economics  elegance  embedding  embeddings  empirical  endogenous-exogenous  engineering  enhancement  ensembles  equilibrium  error  essay  estimate  evidence-based  evolution  examples  expanders  expectancy  experiment  expert  expert-experience  explanans  explanation  exploratory  explore-exploit  exposition  extrema  facebook  flexibility  flux-stasis  fourier  frequentist  frontier  futurism  game-theory  games  gelman  generalization  generative  geometry  giants  google  gotchas  gradient-descent  graph-theory  graphical-models  graphs  greedy  ground-up  guide  gwern  hardware  hashing  heuristic  hi-order-bits  high-dimension  higher-ed  hmm  howto  hsu  huge-data-the-biggest  human-ml  humanity  hypothesis-testing  ideas  impetus  incentives  info-dynamics  information-theory  inhibition  init  insight  intelligence  interdisciplinary  interview  intricacy  israel  iteration-recursion  iterative-methods  janus  kernels  land  language  latency-throughput  latent-variables  law  learning-theory  lecture-notes  left-wing  lens  lesswrong  let-me-see  levers  libraries  lifts-projections  linear-algebra  linear-programming  linearity  liner-notes  linguistics  links  list  literature  local-global  lower-bounds  machine-learning  manifolds  marginal  markets  markov  matching  math.CA  math.CO  math.DS  matrix-factorization  meta:science  metabuch  metameta  methodology  michael-jordan  mihai  miri-cfar  mit  model-class  models  moloch  moments  monte-carlo  motivation  mrtz  multi  mutation  nature  neuro  neuro-nitgrit  neurons  news  nibble  nitty-gritty  nlp  no-go  nonlinearity  norms  number  numerics  off-convex  offense-defense  old-anglo  online-learning  openai  operational  optimization  order-disorder  orders  org:bleg  org:com  org:inst  org:mag  org:mat  org:med  org:sci  oscillation  oss  outcome-risk  p:***  p:someday  p:whenever  PAC  papers  parsimony  pdf  performance  perturbation  physics  plots  podcast  polynomials  popsci  prediction  preimage  preprint  presentation  princeton  problem-solving  programming  project  python  q-n-a  quixotic  rand-approx  random  random-matrices  random-networks  ranking  rationality  ratty  reading  realness  reason  reduction  reference  reflection  regression  regularization  reinforcement  replication  research  research-program  retention  rhetoric  right-wing  rigor  rigorous-crypto  risk  robotics  robust  roots  rounding  s:*  saas  sample-complexity  sampling  sanjeev-arora  sci-comp  science  scifi-fantasy  scitariat  SDP  search  sebastien-bubeck  selection  sensitivity  sequential  shannon  signal-noise  similarity  singularity  slides  smoothness  software  sparsity  spectral  speculation  speedometer  stanford  stat-mech  state  state-of-art  stats  stochastic-processes  stock-flow  stories  sublinear  submodular  success  summary  supply-demand  survey  synthesis  systematic-ad-hoc  systems  talks  tcs  teaching  technology  techtariat  telos-atelos  the-self  the-trenches  thinking  threat-modeling  tightness  tim-roughgarden  time  time-complexity  todo  toolkit  top-n  track-record  tradeoffs  trees  trends  tricks  turing  tutorial  unit  unsupervised  valiant  values  VC-dimension  video  visual-understanding  visualization  volo-avolo  wiki  wild-ideas  wire-guided  writing  yoga  👳  🔬  🖥  🤖 

Copy this bookmark: