gradient-descent   105

« earlier    

3.17.3 Gradient Descent only Converges to Minimizers on Vimeo
"We show that gradient descent generically does not converge to saddle points. This is proved using the Stable Manifold theorem from dynamical systems theory."
talks  optimization  gradient-descent  saddle-point 
august 2017 by arsyed
The Gradient: A Visual Descent
In this post I aim to visually, mathematically and programatically explain the gradient, and how its understanding is crucial for gradient descent.
july 2017 by Tafkas
Decoding the Enigma with Recurrent Neural Networks
I am blown away by this -- given that Recurrent Neural Networks are Turing-complete, they can actually automate cryptanalysis given sufficient resources, at least to the degree of simulating the internal workings of the Enigma algorithm given plaintext, ciphertext and key:
The model needed to be very large to capture all the Enigma’s transformations. I had success with a single-celled LSTM model with 3000 hidden units. Training involved about a million steps of batched gradient descent: after a few days on a k40 GPU, I was getting 96-97% accuracy!
machine-learning  deep-learning  rnns  enigma  crypto  cryptanalysis  turing  history  gpus  gradient-descent 
july 2017 by jm

« earlier    

related tags

academia  acm  acmtariat  adam  adaptive-control  adaptive  adversarial  ai-control  ai  algorithm  algorithms  analogy  analysis  ankur-moitra  applications  approximation  arxiv  async  atoms  attention  auto-learning  average-case  bandits  bare-hands  batch-norm  bayesian  because-proof  ben-recht  benchmarks  best-practices  better-explained  bias-variance  big-picture  boltzmann  by:sebastian-ruder  c++  characterization  cheatsheet  checklists  circuits  classification  clever-rats  coarse-fine  code  coding-theory  combo-optimization  commentary  comparison  competition  complement-substitute  composition-decomposition  compressed-sensing  computational-complexity  computer-vision  concentration-of-measure  concept  concurrency  conditional  conjugate-gradient-descent  consider:looking-to-see  consider:representation  convexity-curvature  cool  cooperate-defect  coordination  cost-benefit  counting  course  crux  cryptanalysis  crypto  curvature  debate  debugging  decision-making  decision-theory  deep-learning  deepgoog  descent  descriptive  details  developmental  differential  dimensionality  direct-indirect  discrete  discussion  distributed  dl  duality  dynamic  dynamical-systems  dynamical  economics  embeddings  empirical  engineering  enigma  ensembles  equilibrium  error  evolution  expanders  expectation-maximization  experiment  expert-experience  expert  explanans  explanation  explore-exploit  exposition  extrema  fads-and-fallacies  flexibility  fourier  frontier  game-theory  games  gbm  generalization  generative  google  gpus  gradient  graph-theory  graphical-models  graphs  ground-up  guide  gwern  hacks  hadoop  hashing  heuristic  hi-order-bits  high-dimension  higher-ed  history  howto  hsu  huge-data-the-biggest  humanity  ideas  incentives  information-geometry  init  insight  intelligence  interdisciplinary  intricacy  iteration-recursion  iterative-methods  jobs  justin-domke  land  language  latent-variables  learning-theory  learning  lecture-notes  lens  lesswrong  let-me-see  levers  library  libs  linear-algebra  linear-models  linear-programming  linear-regression  linearity  liner-notes  links  list  local-global  machine-learning  machine.learning  machine_learning  machinelearning  mapreduce  marginal  markets  markov  math.ds  math  mathematics  matrix-factorization  matrix  meta-learning  metabuch  metropolis  michael-jordan  mihai  minimization  mit  ml  mltheory  model-class  models  moloch  moments  momentum  monte-carlo  motivation  mrtz  multi  multithreading  natural-gradient  nature  neural-networks  neuralnet  neuro-nitgrit  neuro  nibble  nitty-gritty  nlp  nn  norms  nudge-targets  number  numerics  off-convex  online-learning  openai  opt  optimisation  optimization  org:bleg  org:inst  org:mat  org:med  oscillation  overview  p:***  p:someday  p:whenever  pac  papers  parallel  pdf  pegasos  performance  perturbation  philosophy  polynomials  prediction  preprint  presentation  princeton  programming  project  python  q-n-a  r  random-matrices  random-networks  random  rather-interesting  ratty  read  reading  realness  reasonable-approach  reduction  ref  reference  reflection  regression  regularization  reinforcement  replication  representation  research-article  research  retention  rhetoric  rigorous-crypto  risk  rnns  rounding  s:*  saas  saddle-point  sample-complexity  sampling  sanjeev-arora  science  scitariat  sdp  search-engine  search  sebastien-bubeck  sequential  sgd  signal-noise  silos-in-action  similarity  slides  smoothness  software  sparsity  spectral-projected-gradient  spectral  speculation  speedometer  stanford  stat-mech  state-of-art  statistics  stochastic-approximation  stochastic-processes  stochastic  stories  sublinear  submodular  summary  supply-demand  survey  svm  synthesis  systems  talks  tcs  teaching  techtariat  telos-atelos  tensorflow  the-self  thinking  this-analysis-justifies  threat-modeling  tim-roughgarden  time-complexity  time  to:watch  todo  toolkit  track-record  turing  tutorial  tutorials  unit  unsupervised  valiant  values  video  visual-understanding  visualization  volo-avolo  vowpal  wiki  wire-guided  word2vec  wrapper  yoga  👳 

Copy this bookmark: