convnet   284

« earlier    

Experiments with Convolutional Neural Network Models for Answer Selection
In recent years, neural networks have been applied to many text processing problems. One example is learning a similarity function between pairs of text, which has applications to paraphrase extraction, plagiarism detection, question answering, and ad hoc retrieval. Within the information retrieval community, the convolutional neural network model proposed by Severyn and Moschitti in a SIGIR 2015 paper has gained prominence. This paper focuses on the problem of answer selection for question answering: we attempt to replicate the results of Severyn and Moschitti using their open-source code as well as to reproduce their results via a de novo (i.e., from scratch) implementation using a completely different deep learning toolkit. Our de novo implementation is instructive in ascertaining whether reported results generalize across toolkits, each of which have their idiosyncrasies. We were able to successfully replicate and reproduce the reported results of Severyn and Moschitti, albeit with minor differences in effectiveness, but affirming the overall design of their model. Additional ablation experiments break down the components of the model to show their contributions to overall effectiveness. Interestingly, we find that removing one component actually increases effectiveness and that a simplified model with only four word overlap features performs surprisingly well, even better than convolution feature maps alone.
ir  convnet  question-answering 
8 weeks ago by arsyed
[1710.06554] Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting
We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow. These models are useful for recognizing "command triggers" in speech-based interfaces (e.g., "Hey Siri"), which serve as explicit cues for audio recordings of utterances that are sent to the cloud for full speech recognition. Evaluation on Google's recently released Speech Commands Dataset shows that our reimplementation is comparable in accuracy and provides a starting point for future work on the keyword spotting task.
convnet  pytorch  kws 
8 weeks ago by arsyed
[1808.05587] Deep Convolutional Networks as shallow Gaussian Processes
"We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters."
neural-net  convnet  gaussian-processes 
august 2018 by arsyed
Depthwise Separable Convolutions for Neural Machine Translation | OpenReview
Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency.
They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves better results.
In addition to showing that depthwise separable convolutions perform well for machine translation, we investigate the architectural changes that they enable: we observe that thanks to depthwise separability, we can increase the length of convolution windows, removing the need for filter dilation. We also introduce a new super-separable convolution operation that further reduces the number of parameters and computational cost of the models.
deep-learning  convnet  separable  nmt 
august 2018 by arsyed
[1807.08666] ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages
We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting. This forms part of a United Nations effort using keyword spotting to support humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We use 1920 isolated keywords (40 types, 34 minutes) as exemplars for dynamic time warping (DTW) template matching, which is performed on a much larger body of untranscribed speech. These DTW costs are used as targets for a convolutional neural network (CNN) keyword spotter, giving a much faster system than direct DTW. Here we consider how available data from well-resourced languages can improve this CNN-DTW approach. We show that multilingual BNFs trained on ten languages improve the area under the ROC curve of a CNN-DTW system by 10.9% absolute relative to the MFCC baseline. By combining low-resource DTW-based supervision with information from well-resourced languages, CNN-DTW is a competitive option for low-resource keyword spotting.
dtw  neural-net  convnet  asr  kws 
july 2018 by arsyed
[1610.08927] Voice Conversion using Convolutional Neural Networks
The human auditory system is able to distinguish the vocal source of thousands of speakers, yet not much is known about what features the auditory system uses to do this. Fourier Transforms are capable of capturing the pitch and harmonic structure of the speaker but this alone proves insufficient at identifying speakers uniquely. The remaining structure, often referred to as timbre, is critical to identifying speakers but we understood little about it. In this paper we use recent advances in neural networks in order to manipulate the voice of one speaker into another by transforming not only the pitch of the speaker, but the timbre. We review generative models built with neural networks as well as architectures for creating neural networks that learn analogies. Our preliminary results converting voices from one speaker to another are encouraging.
neural-net  convnet  speech  speech-synthesis  voice-conversion 
july 2018 by arsyed
GitHub - vdumoulin/conv_arithmetic: A technical report on convolution arithmetic in the context of deep learning
GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over 85 million projects.
cnn  deep-learning  deeplearning  visualization  convolution  convnet  convolutional  convolutions  deconvolution  deep_learning 
july 2018 by ohnice
On to the next thing
“But then one of the most exciting recent developments in ray-tracing has been the rapid advancement of deep convnet-based denoisers; among others, researchers at Pixar and NVIDIA have done really impressive work in this area. Out of nowhere, we now have the prospect of being able to generate high-quality images with just a handful of samples per pixel. There’s still much more work to be done, but the results so far have been stunning. And then Marco Salvi’s fantastic SIGGRAPH talk on deep learning and the future of real time rendering really got the gears turning in my head; there’s a lot more beyond just denoising that deep learning has to offer graphics.”
raytracing  dxr  nvidia  denoising  pathtracing  convnet  research  deeplearning  ml 
july 2018 by danhon
[1702.01923] Comparative Study of CNN and RNN for Natural Language Processing
"Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection."
convnet  rnn  nlp 
june 2018 by arsyed
Machined Learnings: ICML 2017 Thoughts
Multitask regularization to mitigate sample complexity in RL. Both in video games and in dialog, it is useful to add extra (auxiliary) tasks in order to accelerate learning.
Leveraging knowledge and memory. Our current models are powerful function approximators, but in NLP especially we need to go beyond "the current example" in order exhibit competence.
Gradient descent as inference. Whether it's inpainting with a GAN or BLUE score maximization with an RNN, gradient descent is an unreasonably good inference algorithm.
Careful initialization is important. I suppose traditional optimization people would say "of course", but we're starting to appreciate the importance of good initialization for deep learning. In particular, start close to linear with eigenvalues close to 1. (Balduzzi et. al. , Poole et. al.)
Convolutions are as good as, and faster than, recurrent models for NLP. Nice work out of Facebook on causal convolutions for seq2seq. This aligns with my personal experience: we use convolutional NLP models in production for computational performance reasons.
Neural networks are overparameterized. They can be made much sparser without losing accuracy (Molchanov et. al., Lobacheva et. al.).
icml  2017  reinforcement-learning  multitask-learning  neural-net  convnet 
june 2018 by arsyed
[1806.05393] Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing hundreds or even thousands of layers. A variety of pathologies such as vanishing/exploding gradients make training such deep networks challenging. While residual connections and batch normalization do enable training at these depths, it has remained unclear whether such specialized architecture designs are truly necessary to train deep CNNs. In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. We derive this initialization scheme theoretically by developing a mean field theory for signal propagation and by characterizing the conditions for dynamical isometry, the equilibration of singular values of the input-output Jacobian matrix. These conditions require that the convolution operator be an orthogonal transformation in the sense that it is norm-preserving. We present an algorithm for generating such random initial orthogonal convolution kernels and demonstrate empirically that they enable efficient training of extremely deep architectures.
deep-learning  convnet  mean-field-theory 
june 2018 by arsyed
Modeling Relational Data with Graph Convolutional Networks | SpringerLink
Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to handle the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved through the use of an R-GCN encoder model to accumulate evidence over multiple inference steps in the graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.
neural-net  convnet  graph  relational 
june 2018 by arsyed
[1805.07883] How Many Samples are Needed to Learn a Convolutional Neural Network?
"A widespread folklore for explaining the success of convolutional neural network (CNN) is that CNN is a more compact representation than the fully connected neural network (FNN) and thus requires fewer samples for learning. We initiate the study of rigorously characterizing the sample complexity of learning convolutional neural networks. We show that for learning an m-dimensional convolutional filter with linear activation acting on a d-dimensional input, the sample complexity of achieving population prediction error of ϵ is O˜(m/ϵ2), whereas its FNN counterpart needs at least Ω(d/ϵ2) samples. Since m≪d, this result demonstrates the advantage of using CNN. We further consider the sample complexity of learning a one-hidden-layer CNN with linear activation where both the m-dimensional convolutional filter and the r-dimensional output weights are unknown. For this model, we show the sample complexity is O˜((m+r)/ϵ2) when the ratio between the stride size and the filter size is a constant. For both models, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main tools for deriving these results are localized empirical process and a new lemma characterizing the convolutional structure. We believe these tools may inspire further developments in understanding CNN."
neural-net  convnet  complexity  learning-theory 
may 2018 by arsyed

« earlier    

related tags

1x1  2017  activation  active-learning  ai  alexnet  analysis  architecture  arithmetic  arxiv  asr  audio  backprop  backpropagation  bayesian  benchmarks  best-papers  capnet  capsules  capsuletheory  causal-convnet  cheatsheet  circuit  classification  cnn  cnns  cocktail-party-problem  code  complexity  compression  computer-vision  control  convolution  convolutional  convolutions  coreset  course  courses  crelu  dcgan  decoding  deconvolution  deep-learning  deep  deep_learning  deeplearning  denoising  dilation  dnn  dropout  dtw  dxr  e2e  ecg  eeg  emotion  fcn  few-shot-learning  forecasting  forgetting  fun  fusion  gan  gaussian-processes  gcn  generalization  google  gpu  graph  grouped-convolution  hacking  hardware  hinton  iccv  iclr-2018  icml  image-processing  imagenet  inception  ir  keras  kws  latex  learning-theory  learning  lego  localization  lstm  machine-learning  machinelearning  markup  mask  math  mean-field-theory  meta-learning  ml  mnist  models  multitask-learning  music  neural-net  neuralnetworks  nlp  nmt  nn  noise  nvidia  object-detection  ocr  optimization  paper  papers  pathtracing  people  performance  perturbation  photo  pooling  posematrix  prediction  pretrain  programming  python  pytorch  question-answering  raytracing  rcnn  rectifier  reinforcement-learning  relational  relu  representation-learning  research  resnet  rnn  routing  saliency  segmentation  separable  seq2seq  sequence  sequencemodelling  sequential-modeling  source-separation  sparsity  spectrogram  speech-synthesis  speech  study-group  surveys  temporal  tensorflow  text  theano  time-series  timeseries  topic  transfer-learning  tutorial  tutorials  unet  unsupervised  variational-inference  vgg  video  vision  visual  visualization  voice-conversion  wav2letter  yarin-gal  yolo 

Copy this bookmark: