rnn   1531

« earlier    

Why you should care about byte-level sequence-to-sequence models in NLP
This blogpost explains how byte-level models work, how this brings about the benefits they have, and how they relate to other models — character-level and word-level models, in particular.
rnn  seq2seq  byte  bytes  deep-learning  oov  nlp 
yesterday by nharbour
[1711.05408] Recurrent Neural Networks as Weighted Language Recognizers
We investigate the computational complexity of various problems for simple recurrent neural networks (RNNs) as formal models for recognizing weighted languages. We focus on the single-layer, ReLU-activation, rational-weight RNNs with softmax, which are commonly used in natural language processing applications. We show that most problems for such RNNs are undecidable, including consistency, equivalence, minimization, and the determination of the highest-weighted string. However, for consistent RNNs the last problem becomes decidable, although the solution length can surpass all computable bounds. If additionally the string is limited to polynomial length, the problem becomes NP-complete and APX-hard. In summary, this shows that approximations and heuristic algorithms are necessary in practical applications of those RNNs.
rnn  automata 
8 days ago by arsyed
[1805.04908] On the Practical Computational Power of Finite Precision RNNs for Language Recognition
While Recurrent Neural Networks (RNNs) are famously known to be Turing complete, this relies on infinite precision in the states and unbounded computation time. We consider the case of RNNs with finite precision whose computation time is linear in the input length. Under these limitations, we show that different RNN variants have different computational power. In particular, we show that the LSTM and the Elman-RNN with ReLU activation are strictly stronger than the RNN with a squashing activation and the GRU. This is achieved because LSTMs and ReLU-RNNs can easily implement counting behavior. We show empirically that the LSTM does indeed learn to effectively use the counting mechanism.
nlp  rnn  complexity 
8 days ago by arsyed
[1808.09357] Rational Recurrences
Despite the tremendous empirical success of neural models in natural language processing, many of them lack the strong intuitions that accompany classical machine learning approaches. Recently, connections have been shown between convolutional neural networks (CNNs) and weighted finite state automata (WFSAs), leading to new interpretations and insights. In this work, we show that some recurrent neural networks also share this connection to WFSAs. We characterize this connection formally, defining rational recurrences to be recurrent hidden state update functions that can be written as the Forward calculation of a finite set of WFSAs. We show that several recent neural models use rational recurrences. Our analysis provides a fresh view of these models and facilitates devising new neural architectures that draw inspiration from WFSAs. We present one such model, which performs better than two recent baselines on language modeling and text classification. Our results demonstrate that transferring intuitions from classical models like WFSAs can be an effective approach to designing and understanding neural models.
rnn  automata  wfsa 
4 weeks ago by arsyed
Surprisingly Easy Hard-Attention for Sequence to Sequence Learning
In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. The method combines the advantage of sharp focus in hard attention and the implementation ease of soft attention. On five translation and two morphological inflection tasks we show effortless and consistent gains in BLEU compared to existing attention mechanisms
rnn  sequential-modeling  attention  sunita-sarawagi 
7 weeks ago by arsyed
[1606.03402] Length bias in Encoder Decoder Models and a Case for Global Conditioning
Encoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size.
In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences.
For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.
rnn  sequential-modeling  encoder-decoder  sunita-sarawagi 
7 weeks ago by arsyed

« earlier    

related tags

ai  algorithm  analytics  android  annotation  archive  art  asr  attention  audio  automata  awesome  bayesian  beam-search  beatbox  benchmark  benchmarks  blogs  boilerplate  books  bush_ii_administration  byte  bytes  chatbot  cheatsheet  cheney  china  chor-rnn  choreography  classification  cloudml  cnn  code  combined  complexity  convnet  cool  coreml  creativetech  creativity  cv  data-science  dataset  datasets  davidha  decoding  deep-learning  deep  deep_learning  deeplearning  design  detection  detector  development  dialogue  ditm  dl  dnn  drawing  drum  ebooks  embeddings  encoder-decoder  explodinggradients  finance  folk  forecast  fpga  fsm  functional-programming  games  gan  gcp  generalization  generation  generative  generator  github  google  gradients  graph  gru  hmi  image  ios  js  kaggle  keras  kombat  layers  learning  lectures  lstm  machine-learning  machine  machine_learning  machinecomprehension  machinelearning  math  mentalhealth  metric-learning  ml  model  mortal  music  network  neural-attention  neural-net  neural-network  neural-networks  neural  neuralnets  neuralnetwork  neuralnetworks  nlp  nn-architecture  nn  numpy  ocaml  online  oov  paper  papers  paragraph  performance  pizza  posture  powell  prediction  programming  python  pytorch  quickdraw  radiology  recipe  recurrent  recurrentneuralnetworks  reinforcementlearning  relational  research  review  rnns  sarcasm  scratch  segmentation  sentimentanalysis  seq2seq  sequence-modeling  sequence-models  sequence  sequential-modeling  speech  state  statistics  sunita-sarawagi  synthesis  talks  techniques  tensorflow.fs  tensorflow  text  textanalysis  theory  time-series  topic  tosteal  transfer  transformer  ts  tune  tutorial  tutorials  tweetit  ui  university  vanishinggradients  visualisation  wfsa  white  wilkerson  with  wordembedding  world  xor 

Copy this bookmark: