sequential-modeling   14

[1805.03714] Foundations of Sequence-to-Sequence Modeling for Time Series
The availability of large amounts of time series data, paired with the performance of deep-learning algorithms on a broad class of problems, has recently led to significant interest in the use of sequence-to-sequence models for time series forecasting. We provide the first theoretical analysis of this time series forecasting framework. We include a comparison of sequence-to-sequence modeling to classical time series models, and as such our theory can serve as a quantitative guide for practitioners choosing between different modeling methodologies.
time-series  sequential-modeling 
april 2019 by arsyed
Transcribing real-valued sequences with deep neural networks
Speech recognition and arrhythmia detection from electrocardiograms are examples of problems which can be formulated as transcribing real-valued sequences. These problems have traditionally been solved with frameworks like the Hidden Markov Model. To generalize well, these models rely on carefully hand engineered building blocks. More general, end-to-end neural networks capable of learning from much larger datasets can achieve lower error rates. However, getting these models to work well in practice has other challenges. In this work, we present end-to-end models for transcribing real-valued sequences and discuss several applications of these models. The first is detecting abnormal heart activity in electrocardiograms. The second is large vocabulary continuous speech recognition. Finally, we investigate the tasks of keyword spotting and voice activity detection. In all cases we show how to scale high capacity models to unprecedentedly large datasets. With these techniques we can achieve performance comparable to that of human experts for both arrhythmia detection and speech recognition and state-of-the-art error rates in speech recognition for multiple languages.
thesis  awni-hannun  deep-learning  sequential-modeling  asr 
october 2018 by arsyed
[1810.01398] Optimal Completion Distillation for Sequence Learning
We present Optimal Completion Distillation (OCD), a training procedure for optimizing sequence to sequence models based on edit distance. OCD is efficient, has no hyper-parameters of its own, and does not require pretraining or joint optimization with conditional log-likelihood. Given a partial sequence generated by the model, we first identify the set of optimal suffixes that minimize the total edit distance, using an efficient dynamic programming algorithm. Then, for each position of the generated sequence, we use a target distribution that puts equal probability on the first token of all the optimal suffixes. OCD achieves the state-of-the-art performance on end-to-end speech recognition, on both Wall Street Journal and Librispeech datasets, achieving 9.3% WER and 4.5% WER respectively.
asr  seq2seq  sequential-modeling  knowledge-distillation 
october 2018 by arsyed
Surprisingly Easy Hard-Attention for Sequence to Sequence Learning
In this paper we show that a simple beam approximation of the joint distribution between attention and output is an easy, accurate, and efficient attention mechanism for sequence to sequence learning. The method combines the advantage of sharp focus in hard attention and the implementation ease of soft attention. On five translation and two morphological inflection tasks we show effortless and consistent gains in BLEU compared to existing attention mechanisms
rnn  sequential-modeling  attention  sunita-sarawagi 
september 2018 by arsyed
[1606.03402] Length bias in Encoder Decoder Models and a Case for Global Conditioning
Encoder-decoder networks are popular for modeling sequences probabilistically in many applications. These models use the power of the Long Short-Term Memory (LSTM) architecture to capture the full dependence among variables, unlike earlier models like CRFs that typically assumed conditional independence among non-adjacent variables. However in practice encoder-decoder models exhibit a bias towards short sequences that surprisingly gets worse with increasing beam size.
In this paper we show that such phenomenon is due to a discrepancy between the full sequence margin and the per-element margin enforced by the locally conditioned training objective of a encoder-decoder model. The discrepancy more adversely impacts long sequences, explaining the bias towards predicting short sequences.
For the case where the predicted sequences come from a closed set, we show that a globally conditioned model alleviates the above problems of encoder-decoder models. From a practical point of view, our proposed model also eliminates the need for a beam-search during inference, which reduces to an efficient dot-product based search in a vector-space.
rnn  sequential-modeling  encoder-decoder  sunita-sarawagi 
september 2018 by arsyed
[1807.00868] Exploring End-to-End Techniques for Low-Resource Speech Recognition
In this work we present simple grapheme-based system for low-resource speech recognition using Babel data for Turkish spontaneous speech (80 hours). We have investigated different neural network architectures performance, including fully-convolutional, recurrent and ResNet with GRU. Different features and normalization techniques are compared as well. We also proposed CTC-loss modification using segmentation during training, which leads to improvement while decoding with small beam size. Our best model achieved word error rate of 45.8%, which is the best reported result for end-to-end systems using in-domain data for this task, according to our knowledge.
asr  sequential-modeling  low-resource  babel  e2e 
july 2018 by arsyed
[1206.6392] Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription
We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences. Our approach outperforms many traditional models of polyphonic music on a variety of realistic datasets. We show how our musical language model can serve as a symbolic prior to improve the accuracy of polyphonic transcription.
deep-learning  sequential-modeling  music  music-transcription 
july 2018 by arsyed
[1611.02796] Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity. An RNN is first pre-trained on data using maximum likelihood estimation (MLE), and the probability distribution over the next token in the sequence learned by this model is treated as a prior policy. Another RNN is then trained using reinforcement learning (RL) to generate higher-quality outputs that account for domain-specific incentives while retaining proximity to the prior policy of the MLE RNN. To formalize this objective, we derive novel off-policy RL methods for RNNs from KL-control. The effectiveness of the approach is demonstrated on two applications; 1) generating novel musical melodies, and 2) computational molecular generation. For both problems, we show that the proposed method improves the desired properties and structure of the generated sequences, while maintaining information learned from data.
reinforcement-learning  supervised-learning  sequential-modeling  generative  rnn  generative-models 
may 2018 by arsyed
[1803.01271] An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling
"For most deep learning practitioners, sequence modeling is synonymous with recurrent networks. Yet recent results indicate that convolutional architectures can outperform recurrent networks on tasks such as audio synthesis and machine translation. Given a new sequence modeling task or dataset, which architecture should one use? We conduct a systematic evaluation of generic convolutional and recurrent architectures for sequence modeling. The models are evaluated across a broad range of standard tasks that are commonly used to benchmark recurrent networks. Our results indicate that a simple convolutional architecture outperforms canonical recurrent networks such as LSTMs across a diverse range of tasks and datasets, while demonstrating longer effective memory. We conclude that the common association between sequence modeling and recurrent networks should be reconsidered, and convolutional networks should be regarded as a natural starting point for sequence modeling tasks."
neural-net  sequential-modeling  rnn  convnet  lstm 
march 2018 by arsyed
[1709.07432] Dynamic Evaluation of Neural Sequence Models
We present methodology for using dynamic evaluation to improve neural sequence models. Models are adapted to recent history via a gradient descent based mechanism, causing them to assign higher probabilities to re-occurring sequential patterns. Dynamic evaluation outperforms existing adaptation approaches in our comparisons. Dynamic evaluation improves the state-of-the-art word-level perplexities on the Penn Treebank and WikiText-2 datasets to 51.1 and 44.3 respectively, and the state-of-the-art character-level cross-entropies on the text8 and Hutter Prize datasets to 1.19 bits/char and 1.08 bits/char respectively.
neural-net  dynamic-evaluation  sequential-modeling 
december 2017 by arsyed
[1705.03122] Convolutional Sequence to Sequence Learning
"The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT'14 English-German and WMT'14 English-French translation at an order of magnitude faster speed, both on GPU and CPU."
seq2seq  papers  neural-net  convnet  sequential-modeling  causal-convnet 
september 2017 by arsyed
[1708.06742] Twin Networks: Using the Future as a Regularizer
We propose a simple technique for encouraging generative RNNs to plan ahead. We train a "backward" recurrent network to generate a given sequence in reverse order, and we encourage states of the forward model to predict cotemporal states of the backward model. The backward network is used only during training, and plays no role during sampling or inference. We hypothesize that our approach eases modeling of long-term dependencies by implicitly forcing the forward states to hold information about the longer-term future (as contained in the backward states). We show empirically that our approach achieves 9% relative improvement for a speech recognition task, and achieves significant improvement on a COCO caption generation task.
sequential-modeling  twin-net  rnn  regularization 
august 2017 by arsyed

related tags

asr  attention  awni-hannun  babel  causal-convnet  convnet  deep-learning  dynamic-evaluation  e2e  encoder-decoder  generative-models  generative  knowledge-distillation  low-resource  lstm  music-transcription  music  neural-net  papers  regularization  reinforcement-learning  rnn  seq2seq  sunita-sarawagi  supervised-learning  thesis  time-series  twin-net 

Copy this bookmark: