seq2seq   117

« earlier    

Why you should care about byte-level sequence-to-sequence models in NLP
This blogpost explains how byte-level models work, how this brings about the benefits they have, and how they relate to other models — character-level and word-level models, in particular.
rnn  seq2seq  byte  bytes  deep-learning  oov  nlp 
20 days ago by nharbour
[1804.07915] A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Beam search is a widely used approximate search strategy for neural network decoders, and it generally outperforms simple greedy decoding on tasks like machine translation. However, this improvement comes at substantial computational cost. In this paper, we propose a flexible new method that allows us to reap nearly the full benefits of beam search with nearly no additional computational cost. The method revolves around a small neural network actor that is trained to observe and manipulate the hidden state of a previously-trained decoder. To train this actor network, we introduce the use of a pseudo-parallel corpus built using the output of beam search on a base model, ranked by a target quality metric like BLEU. Our method is inspired by earlier work on this problem, but requires no reinforcement learning, and can be trained reliably on a range of models. Experiments on three parallel corpora and three architectures show that the method yields substantial improvements in translation quality and speed over each base system.
seq2seq  rnn  decoding  beam-search  via:chl 
september 2018 by arsyed
Attention? Attention!
Attention has been a fairly popular concept and a useful tool in the deep learning community in recent years. In this post, we are gonna look into how attent...
tutorial  tutorials  attention  deep-learning  seq2seq 
september 2018 by nharbour

« earlier    

related tags

ai  algorithms  amazing  asr-error  asr  ast  attention  aug1  autoencoder  bayesian  beam-search  blog  brenden-lake  byte  bytes  causal-convnet  chatbot  chollet  cnn  code  colab  convnet  convolutional  critique  debugging  decoding  deep-learning  deep  deep_learning  deeplearning  discrete  discussion  dothis  dtw  e2e  encoder-decoder  encoding  finance  financial  forecast  forecasting  generalization  geometry  github  google  graves  history  howto  icml2017  inference  information  jit  keras  layperson  learning  legaldeck  library  libs  links  lstm  machine-translation  machine_learning  machine_translation  machinecomprehension  machinelearning  metric  ml  models  mt  nearest  neighbor  neural-mt  neural-net  nlp  nlproc  nmt  nn  oov  paper  papers  production  python  pytorch  q&a  reading-lists  readit  reddit  reinforcementlearning  research  rnn  rouge  script  search  seqm  sequence  sequential-modeling  speech  speech_recognition  summarisation  summariser  summarization  summarizer  summary  tensorflow  tf  theano  time-series  tips  tnn  trace  transducer  transformer  translation  tts  tutorial  tutorials  uber  visualization  vqvae  wavenet 

Copy this bookmark: