**convnet**284

CS231n Convolutional Neural Networks for Visual Recognition

21 days ago by whlteXbread

good writing for when you can't matrix math

cnn
dnn
convnet
deep
learning
21 days ago by whlteXbread

Experiments with Convolutional Neural Network Models for Answer Selection

8 weeks ago by arsyed

In recent years, neural networks have been applied to many text processing problems. One example is learning a similarity function between pairs of text, which has applications to paraphrase extraction, plagiarism detection, question answering, and ad hoc retrieval. Within the information retrieval community, the convolutional neural network model proposed by Severyn and Moschitti in a SIGIR 2015 paper has gained prominence. This paper focuses on the problem of answer selection for question answering: we attempt to replicate the results of Severyn and Moschitti using their open-source code as well as to reproduce their results via a de novo (i.e., from scratch) implementation using a completely different deep learning toolkit. Our de novo implementation is instructive in ascertaining whether reported results generalize across toolkits, each of which have their idiosyncrasies. We were able to successfully replicate and reproduce the reported results of Severyn and Moschitti, albeit with minor differences in effectiveness, but affirming the overall design of their model. Additional ablation experiments break down the components of the model to show their contributions to overall effectiveness. Interestingly, we find that removing one component actually increases effectiveness and that a simplified model with only four word overlap features performs surprisingly well, even better than convolution feature maps alone.

ir
convnet
question-answering
8 weeks ago by arsyed

[1710.06554] Honk: A PyTorch Reimplementation of Convolutional Neural Networks for Keyword Spotting

8 weeks ago by arsyed

We describe Honk, an open-source PyTorch reimplementation of convolutional neural networks for keyword spotting that are included as examples in TensorFlow. These models are useful for recognizing "command triggers" in speech-based interfaces (e.g., "Hey Siri"), which serve as explicit cues for audio recordings of utterances that are sent to the cloud for full speech recognition. Evaluation on Google's recently released Speech Commands Dataset shows that our reimplementation is comparable in accuracy and provides a starting point for future work on the keyword spotting task.

convnet
pytorch
kws
8 weeks ago by arsyed

[1808.05587] Deep Convolutional Networks as shallow Gaussian Processes

august 2018 by arsyed

"We show that the output of a (residual) convolutional neural network (CNN) with an appropriate prior over the weights and biases is a Gaussian process (GP) in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GPs with a comparable number of parameters."

neural-net
convnet
gaussian-processes
august 2018 by arsyed

Depthwise Separable Convolutions for Neural Machine Translation | OpenReview

august 2018 by arsyed

Depthwise separable convolutions reduce the number of parameters and computation used in convolutional operations while increasing representational efficiency.

They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves better results.

In addition to showing that depthwise separable convolutions perform well for machine translation, we investigate the architectural changes that they enable: we observe that thanks to depthwise separability, we can increase the length of convolution windows, removing the need for filter dilation. We also introduce a new super-separable convolution operation that further reduces the number of parameters and computational cost of the models.

deep-learning
convnet
separable
nmt
They have been shown to be successful in image classification models, both in obtaining better models than previously possible for a given parameter count (the Xception architecture) and considerably reducing the number of parameters required to perform at a given level (the MobileNets family of architectures). Recently, convolutional sequence-to-sequence networks have been applied to machine translation tasks with good results. In this work, we study how depthwise separable convolutions can be applied to neural machine translation. We introduce a new architecture inspired by Xception and ByteNet, called SliceNet, which enables a significant reduction of the parameter count and amount of computation needed to obtain results like ByteNet, and, with a similar parameter count, achieves better results.

In addition to showing that depthwise separable convolutions perform well for machine translation, we investigate the architectural changes that they enable: we observe that thanks to depthwise separability, we can increase the length of convolution windows, removing the need for filter dilation. We also introduce a new super-separable convolution operation that further reduces the number of parameters and computational cost of the models.

august 2018 by arsyed

[1807.08666] ASR-free CNN-DTW keyword spotting using multilingual bottleneck features for almost zero-resource languages

july 2018 by arsyed

We consider multilingual bottleneck features (BNFs) for nearly zero-resource keyword spotting. This forms part of a United Nations effort using keyword spotting to support humanitarian relief programmes in parts of Africa where languages are severely under-resourced. We use 1920 isolated keywords (40 types, 34 minutes) as exemplars for dynamic time warping (DTW) template matching, which is performed on a much larger body of untranscribed speech. These DTW costs are used as targets for a convolutional neural network (CNN) keyword spotter, giving a much faster system than direct DTW. Here we consider how available data from well-resourced languages can improve this CNN-DTW approach. We show that multilingual BNFs trained on ten languages improve the area under the ROC curve of a CNN-DTW system by 10.9% absolute relative to the MFCC baseline. By combining low-resource DTW-based supervision with information from well-resourced languages, CNN-DTW is a competitive option for low-resource keyword spotting.

dtw
neural-net
convnet
asr
kws
july 2018 by arsyed

[1610.08927] Voice Conversion using Convolutional Neural Networks

july 2018 by arsyed

The human auditory system is able to distinguish the vocal source of thousands of speakers, yet not much is known about what features the auditory system uses to do this. Fourier Transforms are capable of capturing the pitch and harmonic structure of the speaker but this alone proves insufficient at identifying speakers uniquely. The remaining structure, often referred to as timbre, is critical to identifying speakers but we understood little about it. In this paper we use recent advances in neural networks in order to manipulate the voice of one speaker into another by transforming not only the pitch of the speaker, but the timbre. We review generative models built with neural networks as well as architectures for creating neural networks that learn analogies. Our preliminary results converting voices from one speaker to another are encouraging.

neural-net
convnet
speech
speech-synthesis
voice-conversion
july 2018 by arsyed

GitHub - vdumoulin/conv_arithmetic: A technical report on convolution arithmetic in the context of deep learning

cnn
deep-learning
deeplearning
visualization
convolution
convnet
convolutional
convolutions
deconvolution
deep_learning

july 2018 by ohnice

GitHub is where people build software. More than 28 million people use GitHub to discover, fork, and contribute to over 85 million projects.

july 2018 by ohnice

On to the next thing

july 2018 by danhon

“But then one of the most exciting recent developments in ray-tracing has been the rapid advancement of deep convnet-based denoisers; among others, researchers at Pixar and NVIDIA have done really impressive work in this area. Out of nowhere, we now have the prospect of being able to generate high-quality images with just a handful of samples per pixel. There’s still much more work to be done, but the results so far have been stunning. And then Marco Salvi’s fantastic SIGGRAPH talk on deep learning and the future of real time rendering really got the gears turning in my head; there’s a lot more beyond just denoising that deep learning has to offer graphics.”

raytracing
dxr
nvidia
denoising
pathtracing
convnet
research
deeplearning
ml
july 2018 by danhon

[1702.01923] Comparative Study of CNN and RNN for Natural Language Processing

june 2018 by arsyed

"Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP). Convolutional neural network (CNN) and recurrent neural network (RNN), the two main types of DNN architectures, are widely explored to handle various NLP tasks. CNN is supposed to be good at extracting position-invariant features and RNN at modeling units in sequence. The state of the art on many NLP tasks often switches due to the battle between CNNs and RNNs. This work is the first systematic comparison of CNN and RNN on a wide range of representative NLP tasks, aiming to give basic guidance for DNN selection."

convnet
rnn
nlp
june 2018 by arsyed

Machined Learnings: ICML 2017 Thoughts

june 2018 by arsyed

Multitask regularization to mitigate sample complexity in RL. Both in video games and in dialog, it is useful to add extra (auxiliary) tasks in order to accelerate learning.

Leveraging knowledge and memory. Our current models are powerful function approximators, but in NLP especially we need to go beyond "the current example" in order exhibit competence.

Gradient descent as inference. Whether it's inpainting with a GAN or BLUE score maximization with an RNN, gradient descent is an unreasonably good inference algorithm.

Careful initialization is important. I suppose traditional optimization people would say "of course", but we're starting to appreciate the importance of good initialization for deep learning. In particular, start close to linear with eigenvalues close to 1. (Balduzzi et. al. , Poole et. al.)

Convolutions are as good as, and faster than, recurrent models for NLP. Nice work out of Facebook on causal convolutions for seq2seq. This aligns with my personal experience: we use convolutional NLP models in production for computational performance reasons.

Neural networks are overparameterized. They can be made much sparser without losing accuracy (Molchanov et. al., Lobacheva et. al.).

icml
2017
reinforcement-learning
multitask-learning
neural-net
convnet
Leveraging knowledge and memory. Our current models are powerful function approximators, but in NLP especially we need to go beyond "the current example" in order exhibit competence.

Gradient descent as inference. Whether it's inpainting with a GAN or BLUE score maximization with an RNN, gradient descent is an unreasonably good inference algorithm.

Careful initialization is important. I suppose traditional optimization people would say "of course", but we're starting to appreciate the importance of good initialization for deep learning. In particular, start close to linear with eigenvalues close to 1. (Balduzzi et. al. , Poole et. al.)

Convolutions are as good as, and faster than, recurrent models for NLP. Nice work out of Facebook on causal convolutions for seq2seq. This aligns with my personal experience: we use convolutional NLP models in production for computational performance reasons.

Neural networks are overparameterized. They can be made much sparser without losing accuracy (Molchanov et. al., Lobacheva et. al.).

june 2018 by arsyed

[1806.05393] Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

june 2018 by arsyed

In recent years, state-of-the-art methods in computer vision have utilized increasingly deep convolutional neural network architectures (CNNs), with some of the most successful models employing hundreds or even thousands of layers. A variety of pathologies such as vanishing/exploding gradients make training such deep networks challenging. While residual connections and batch normalization do enable training at these depths, it has remained unclear whether such specialized architecture designs are truly necessary to train deep CNNs. In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme. We derive this initialization scheme theoretically by developing a mean field theory for signal propagation and by characterizing the conditions for dynamical isometry, the equilibration of singular values of the input-output Jacobian matrix. These conditions require that the convolution operator be an orthogonal transformation in the sense that it is norm-preserving. We present an algorithm for generating such random initial orthogonal convolution kernels and demonstrate empirically that they enable efficient training of extremely deep architectures.

deep-learning
convnet
mean-field-theory
june 2018 by arsyed

Modeling Relational Data with Graph Convolutional Networks | SpringerLink

june 2018 by arsyed

Knowledge graphs enable a wide variety of applications, including question answering and information retrieval. Despite the great effort invested in their creation and maintenance, even the largest (e.g., Yago, DBPedia or Wikidata) remain incomplete. We introduce Relational Graph Convolutional Networks (R-GCNs) and apply them to two standard knowledge base completion tasks: Link prediction (recovery of missing facts, i.e. subject-predicate-object triples) and entity classification (recovery of missing entity attributes). R-GCNs are related to a recent class of neural networks operating on graphs, and are developed specifically to handle the highly multi-relational data characteristic of realistic knowledge bases. We demonstrate the effectiveness of R-GCNs as a stand-alone model for entity classification. We further show that factorization models for link prediction such as DistMult can be significantly improved through the use of an R-GCN encoder model to accumulate evidence over multiple inference steps in the graph, demonstrating a large improvement of 29.8% on FB15k-237 over a decoder-only baseline.

neural-net
convnet
graph
relational
june 2018 by arsyed

[1805.07883] How Many Samples are Needed to Learn a Convolutional Neural Network?

may 2018 by arsyed

"A widespread folklore for explaining the success of convolutional neural network (CNN) is that CNN is a more compact representation than the fully connected neural network (FNN) and thus requires fewer samples for learning. We initiate the study of rigorously characterizing the sample complexity of learning convolutional neural networks. We show that for learning an m-dimensional convolutional filter with linear activation acting on a d-dimensional input, the sample complexity of achieving population prediction error of ϵ is O˜(m/ϵ2), whereas its FNN counterpart needs at least Ω(d/ϵ2) samples. Since m≪d, this result demonstrates the advantage of using CNN. We further consider the sample complexity of learning a one-hidden-layer CNN with linear activation where both the m-dimensional convolutional filter and the r-dimensional output weights are unknown. For this model, we show the sample complexity is O˜((m+r)/ϵ2) when the ratio between the stride size and the filter size is a constant. For both models, we also present lower bounds showing our sample complexities are tight up to logarithmic factors. Our main tools for deriving these results are localized empirical process and a new lemma characterizing the convolutional structure. We believe these tools may inspire further developments in understanding CNN."

neural-net
convnet
complexity
learning-theory
may 2018 by arsyed

**related tags**

Copy this bookmark: