arsyed + audio   356

"Unseen is an audio comic created by Chad Allen. Written by a blind person, with a blind heroine, for blind (and sighted) audiences.

Unseen is the story of Afsana, a blind assassin living in a chaotic world in which she is invisible to society. Discounting her abilities is her enemies’ gravest mistake.

Available to stream below for a limited time."
comics  audio  blind 
26 days ago by arsyed
All Together Now: The Living Audio Dataset
The ongoing focus in speech technology research on machine learning based approaches leaves the community hungry for data. However, datasets tend to be recorded once and then released, sometimes behind registration requirements or paywalls. In this paper we describe our Living Audio Dataset. The aim is to provide audio data that is in the public domain, multilingual, and expandable by communities. We discuss the role of linguistic resources, given the success of systems such as Tacotron which use direct text-to-speech mappings, and consider how data provenance could be built into such resources. So far the data has been collected for TTS purposes, however, it is also suitable for ASR. At the time of publication audio resources already exist for Dutch, R.P. English, Irish, and Russian.
datasets  audio 
4 weeks ago by arsyed
Rare Sound Event Detection Using Deep Learning and Data Augmentation
There is an increasing interest in smart environment and a growing adoption of smart devices. Smart assistants such as Google Home and Amazon Alexa, although focus on speech, could be extended to identify domestic events in real-time to provide more and better smart functions. Sound event detection aims to detect multiple target sound events that may happen simultaneously. The task is challenging due to the overlapping of sound events, the highly imbalanced nature of target and non-target data, and the complicated real-world background noise. In this paper, we proposed a unified approach that takes advantages of both the deep learning and data augmentation. A convolutional neural network (CNN) was combined with a feed-forward neural network (FNN) to improve the detection performance, and a dynamic time warping based data augmentation (DA) method was proposed to address the data imbalance problem. Experiments on several datasets showed a more than 7% increase in accuracy compared to the state-of-the-art approaches.
audio  classification  data-augmentation 
4 weeks ago by arsyed
Acoustic Scene Classification Using Teacher-Student Learning with Soft-Labels
Acoustic scene classification identifies an input segment into one of the pre-defined classes using spectral information. The spectral information of acoustic scenes may not be mutually exclusive due to common acoustic properties across different classes, such as babble noises included in both airports and shopping malls. However, conventional training procedure based on one-hot labels does not consider the similarities between different acoustic scenes. We exploit teacher-student learning with the purpose to derive soft-labels that consider common acoustic properties among different acoustic scenes. In teacher-student learning, the teacher network produces soft-labels, based on which the student network is trained. We investigate various methods to extract soft-labels that better represent similarities across different scenes. Such attempts include extracting soft-labels from multiple audio segments that are defined as an identical acoustic scene. Experimental results demonstrate the potential of our approach, showing a classification accuracy of 77.36% on the DCASE 2018 task 1 validation set.
audio  classification  teacher-student 
4 weeks ago by arsyed
GitHub - HoerTech-gGmbH/openMHA: The open Master Hearing Aid (openMHA)
"The software contains the source code of the openMHA Toolbox library, of the openMHA framework and command line application, and of a selection of algorithm plugins forming a basic hearing aid processing chain featuring

bilateral adaptive differential microphones for noise suppression [1]
binaural coherence filter for feedback reduction and dereverberation [2]
multi-band dynamic range compressor for hearing loss compensation [3]
spatial filtering algorithms:
a delay-and-sum beamformer
a MVDR beamformer [4]
single-channel noise reduction [5]
resampling and filter plugins
STFT cyclic aliasing prevention
adaptive feedback cancellation [6]
probabilistic sound source localization [7]"
libs  algorithms  real-time  dsp  audio 
july 2019 by arsyed
The Smart Audio Report: Spring 2019
"The Spring 2019 Smart Audio Report revealed new smart speaker user behaviors and trends among the 21% of U.S. adults — 53M people — who own smart speakers today. The latest research from NPR and Edison uncovers security concerns among both smart speaker owners and non-owners, and challenges in discovery of new smart speaker functions or skills. But despite these challenges, smart speaker owners continue to integrate the device in their everyday routines: 69% of smart speaker owners use their device daily."
audio  voice  alexa  ui  machine-learning 
july 2019 by arsyed
[1906.00654] Continual Learning of New Sound Classes using Generative Replay
Continual learning consists in incrementally training a model on a sequence of datasets and testing on the union of all datasets. In this paper, we examine continual learning for the problem of sound classification, in which we wish to refine already trained models to learn new sound classes. In practice one does not want to maintain all past training data and retrain from scratch, but naively updating a model with new data(sets) results in a degradation of already learned tasks, which is referred to as "catastrophic forgetting." We develop a generative replay procedure for generating training audio spectrogram data, in place of keeping older training datasets. We show that by incrementally refining a classifier with generative replay a generator that is 4% of the size of all previous training data matches the performance of refining the classifier keeping 20% of all previous training data. We thus conclude that we can extend a trained sound classifier to learn new classes without having to keep previously used datasets.
audio  catastrophic-forgetting  continual-learning 
june 2019 by arsyed
[1906.01083] MelNet: A Generative Model for Audio in the Frequency Domain
Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps. While long-range dependencies are difficult to model directly in the time domain, we show that they can be more tractably modelled in two-dimensional time-frequency representations such as spectrograms. By leveraging this representational advantage, in conjunction with a highly expressive probabilistic model and a multiscale generation procedure, we design a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve. We apply our model to a variety of audio generation tasks, including unconditional speech generation, music generation, and text-to-speech synthesis---showing improvements over previous approaches in both density estimates and human judgments.
audio  generative  neural-net  via:chl 
june 2019 by arsyed
[1706.02361] The Effects of Noisy Labels on Deep Convolutional Neural Networks for Music Tagging
Deep neural networks (DNN) have been successfully applied to music classification including music tagging. However, there are several open questions regarding the training, evaluation, and analysis of DNNs. In this article, we investigate specific aspects of neural networks, the effects of noisy labels, to deepen our understanding of their properties. We analyse and (re-)validate a large music tagging dataset to investigate the reliability of training and evaluation. Using a trained network, we compute label vector similarities which is compared to groundtruth similarity.
The results highlight several important aspects of music tagging and neural networks. We show that networks can be effective despite relatively large error rates in groundtruth datasets, while conjecturing that label noise can be the cause of varying tag-wise performance differences. Lastly, the analysis of our trained network provides valuable insight into the relationships between music tags. These results highlight the benefit of using data-driven methods to address automatic music tagging.
music  audio  convnet  autotagging 
may 2019 by arsyed
[1903.00142] A Unified Neural Architecture for Instrumental Audio Tasks
"Within Music Information Retrieval (MIR), prominent tasks -- including pitch-tracking, source-separation, super-resolution, and synthesis -- typically call for specialised methods, despite their similarities. Conditional Generative Adversarial Networks (cGANs) have been shown to be highly versatile in learning general image-to-image translations, but have not yet been adapted across MIR. In this work, we present an end-to-end supervisable architecture to perform all aforementioned audio tasks, consisting of a WaveNet synthesiser conditioned on the output of a jointly-trained cGAN spectrogram translator. In doing so, we demonstrate the potential of such flexible techniques to unify MIR tasks, promote efficient transfer learning, and converge research to the improvement of powerful, general methods. Finally, to the best of our knowledge, we present the first application of GANs to guided instrument synthesis."
audio  music  gan  mir 
april 2019 by arsyed
[1904.07944] Expediting TTS Synthesis with Adversarial Vocoding
Recent approaches in text-to-speech (TTS) synthesis employ neural network strategies to vocode perceptually-informed spectrogram representations directly into listenable waveforms. Such vocoding procedures create a computational bottleneck in modern TTS pipelines. We propose an alternative approach which utilizes generative adversarial networks (GANs) to learn mappings from perceptually-informed spectrograms to simple magnitude spectrograms which can be heuristically vocoded. Through a user study, we show that our approach significantly outperforms naïve vocoding strategies while being hundreds of times faster than neural network vocoders used in state-of-the-art TTS systems. We also show that our method can be used to achieve state-of-the-art results in unsupervised synthesis of individual words of speech.
gan  audio  vocoder  speech-synthesis 
april 2019 by arsyed
Audioburst - Organizing the World’s Audio
"At Audioburst, we’re building the world’s largest library of audio content. Every day, our technology listens to, understands, segments and indexes millions of minutes of audio information from top radio stations and podcasts."
audio  speech  search-engines  podcasts 
february 2019 by arsyed
Homepage — Essentia 2.1-beta5-dev documentation
"Essentia is a open-source C++ library for audio analysis and audio-based music information retrieval. It contains an extensive collection of algorithms including audio input/output functionality, standard digital signal processing blocks, statistical characterization of data, and a large set of spectral, temporal, tonal and high-level music descriptors. [...] The library is also wrapped in Python and includes a number of command-line tools and third-party extensions, which facilitate its use for fast prototyping and allow setting up research experiments very rapidly."
python  libs  audio  dsp  music  mir 
february 2019 by arsyed
Convert video to audio, Catch up on your video backlog — Listen Later
"Listen Later is a free service for converting videos into an audio podcast, which makes it easier to catch up on your video backlog during chores, errands and commutes. Listen Later works with services supported by youtube-dl."
podcasts  video  audio 
february 2019 by arsyed
Connecting Deep Neural Networks to Physical, Perceptual, and Electrophysiological Auditory Signals
Deep neural networks have been recently shown to capture intricate information transformation of signals from the sensory profiles to semantic representations that facilitate recognition or discrimination of complex stimuli. In this vein, convolutional neural networks (CNNs) have been used very successfully in image and audio classification. Designed to imitate the hierarchical structure of the nervous system, CNNs reflect activation with increasing degrees of complexity that transform the incoming signal onto object-level representations. In this work, we employ a CNN trained for large-scale audio object classification to gain insights about the contribution of various audio representations that guide sound perception. The analysis contrasts activation of different layers of a CNN with acoustic features extracted directly from the scenes, perceptual salience obtained from behavioral responses of human listeners, as well as neural oscillations recorded by electroencephalography (EEG) in response to the same natural scenes. All three measures are tightly linked quantities believed to guide percepts of salience and object formation when listening to complex scenes. The results paint a picture of the intricate interplay between low-level and object-level representations in guiding auditory salience that is very much dependent on context and sound category.
auditory  neural-net  eeg  audio  classificaiton 
december 2018 by arsyed
[1805.07820] Targeted Adversarial Examples for Black Box Audio Systems
The application of deep recurrent networks to audio transcription has led to impressive gains in automatic speech recognition (ASR) systems. Many have demonstrated that small adversarial perturbations can fool deep neural networks into incorrectly predicting a specified target with high confidence. Current work on fooling ASR systems have focused on white-box attacks, in which the model architecture and parameters are known. In this paper, we adopt a black-box approach to adversarial generation, combining the approaches of both genetic algorithms and gradient estimation to solve the task. We achieve a 89.25% targeted attack similarity after 3000 generations while maintaining 94.6% audio file similarity.
adversarial-examples  audio  black-box 
december 2018 by arsyed
howler.js - JavaScript audio library for the modern web
howler.js makes working with audio in JavaScript easy and reliable across all platforms.
javascript  libs  audio  html5 
november 2018 by arsyed
interactiveaudiolab/VocalSketchDataSet: vocal imitations of everyday and musical audio concepts
A natural way of communicating an audio concept is to imitate it with one's voice. This creates an approximation of the imagined sound (e.g. a particular owl's hoot), much like how a visual sketch approximates a visual concept (e.g a drawing of the owl). If a machine could understand vocal imitations, users could communicate with software in this natural way, enabling new interactions (e.g. programming a music synthesizer by imitating the desired sound with one's voice). This data set contains thousands of crowd-sourced vocal imitations of a large set of diverse sounds, along with data on the crowd's ability to correctly label these vocal imitations. This data set will help the research community understand which audio concepts can be effectively communicated with this approach. We have released this data so the community can study the related issues and build systems that leverage vocal imitation as an interaction modality.
datasets  audio  vocal  imitation  onomatopeoia 
october 2018 by arsyed
interactiveaudiolab/nussl: A simple audio source separation library built in python
At its core, nussl contains implementations of the following source separation algorithms:
Spatialization algorithms:
Degenerate Unmixing Estimation Technique (DUET)
Repetition algorithms:
REpeating Pattern Extraction Technique (REPET)
REPET using the cosine similarity matrix (REPET-SIM)
Separation via 2DFT
General matrix decomposition/Component Analysis:
Non-negative Matrix Factorization with MFCC clustering (NMF)
Robust Principal Component Analysis (RPCA)
Independent Component Analysis (ICA)
Ideal Mask
High/Low Pass Filtering
Composite Methods
Overlap Add
Algorithm Picker (multicue separation)
Other Foreground/Background Decompositions
Harmonic/Percussive Source Separation (HPSS)
Melody Tracking separation (Melodia)
Deep Learning
Deep Clustering
python  libs  audio  source-separation  deep-clustering 
october 2018 by arsyed
[1712.01120] Wavenet based low rate speech coding
Traditional parametric coding of speech facilitates low rate but provides poor reconstruction quality because of the inadequacy of the model used. We describe how a WaveNet generative speech model can be used to generate high quality speech from the bit stream of a standard parametric coder operating at 2.4 kb/s. We compare this parametric coder with a waveform coder based on the same generative model and show that approximating the signal waveform incurs a large rate penalty. Our experiments confirm the high performance of the WaveNet based coder and show that the speech produced by the system is able to additionally perform implicit bandwidth extension and does not significantly impair recognition of the original speaker for the human listener, even when that speaker has not been used during the training of the generative model.
speech  audio  codec  wavenet 
june 2018 by arsyed
[1806.07098] End-to-End Speech Recognition From the Raw Waveform
State-of-the-art speech recognition systems rely on fixed, hand-crafted features such as mel-filterbanks to preprocess the waveform before the training pipeline. In this paper, we study end-to-end systems trained directly from the raw waveform, building on two alternatives for trainable replacements of mel-filterbanks that use a convolutional architecture. The first one is inspired by gammatone filterbanks (Hoshen et al., 2015; Sainath et al, 2015), and the second one by the scattering transform (Zeghidour et al., 2017). We propose two modifications to these architectures and systematically compare them to mel-filterbanks, on the Wall Street Journal dataset. The first modification is the addition of an instance normalization layer, which greatly improves on the gammatone-based trainable filterbanks and speeds up the training of the scattering-based filterbanks. The second one relates to the low-pass filter used in these approaches. These modifications consistently improve performances for both approaches, and remove the need for a careful initialization in scattering-based trainable filterbanks. In particular, we show a consistent improvement in word error rate of the trainable filterbanks relatively to comparable mel-filterbanks. It is the first time end-to-end models trained from the raw signal significantly outperform mel-filterbanks on a large vocabulary task under clean recording conditions.
speech  audio  asr  e2e  wave  facebook 
june 2018 by arsyed
A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy: Neuron
"A core goal of auditory neuroscience is to build quantitative models that predict cortical responses to natural sounds. Reasoning that a complete model of auditory cortex must solve ecologically relevant tasks, we optimized hierarchical neural networks for speech and music recognition. The best-performing network contained separate music and speech pathways following early shared processing, potentially replicating human cortical organization. The network performed both tasks as well as humans and exhibited human-like errors despite not being optimized to do so, suggesting common constraints on network and human performance. The network predicted fMRI voxel responses substantially better than traditional spectrotemporal filter models throughout auditory cortex. It also provided a quantitative signature of cortical representational hierarchy—primary and non-primary responses were best predicted by intermediate and late network layers, respectively. The results suggest that task optimization provides a powerful set of tools for modeling sensory systems."
auditory  speech  audio  neural-net  neuroscience 
may 2018 by arsyed
[1802.04208] Synthesizing Audio with Generative Adversarial Networks
While Generative Adversarial Networks (GANs) have seen wide success at the problem of synthesizing realistic images, they have seen little application to the problem of unsupervised audio generation. Unlike for images, a barrier to success is that the best discriminative representations for audio tend to be non-invertible, and thus cannot be used to synthesize listenable outputs. In this paper, we introduce WaveGAN, a first attempt at applying GANs to raw audio synthesis in an unsupervised setting. Our experiments on speech demonstrate that WaveGAN can produce intelligible words from a small vocabulary of human speech, as well as synthesize audio from other domains such as bird vocalizations, drums, and piano. Qualitatively, we find that human judges prefer the generated examples from WaveGAN over those from a method which naively apply GANs on image-like audio feature representations.
papers  gan  audio  wave 
february 2018 by arsyed
Descript - Transcription and Audio Editing
"Get a near-perfect transcript, and edit your media by editing text. Descript makes working with audio and video fast, easy, and fun."
transcription  audio  asr 
february 2018 by arsyed
Deep Complex Networks | OpenReview
Abstract: At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are competitive with their real-valued counterparts. We test deep complex models on several computer vision tasks, on music transcription using the MusicNet dataset and on Speech spectrum prediction using TIMIT. We achieve state-of-the-art performance on these audio-related tasks.
papers  neural-net  complex  audio 
january 2018 by arsyed
VGGish model released - Google Groups
We want a 25 ms window.  The reasons for this aren't particularly rigid; we inherited this convention from speech recognition, in which you want a short window to capture local variation in the signal, but long enough to smooth out too much fluctuation.  25 ms is a good compromise because it is long enough to smooth across the pitch pulses of typical voiced speech.  But it has also worked well, empirically, in a wide range of audio recognition applications.
speech  audio  feature-extraction  window  fft 
november 2017 by arsyed
[1504.04658] Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
"Identification and extraction of singing voice from within musical mixtures is a key challenge in source separation and machine audition. Recently, deep neural networks (DNN) have been used to estimate 'ideal' binary masks for carefully controlled cocktail party speech separation problems. However, it is not yet known whether these methods are capable of generalizing to the discrimination of voice and non-voice in the context of musical mixtures. Here, we trained a convolutional DNN (of around a billion parameters) to provide probabilistic estimates of the ideal binary mask for separation of vocal sounds from real-world musical mixtures. We contrast our DNN results with more traditional linear methods. Our approach may be useful for automatic removal of vocal sounds from musical mixtures for 'karaoke' type applications."
paper  deep-learning  convnet  audio  music  source-separation  cocktail-party-problem 
september 2017 by arsyed
[1705.08168] Look, Listen and Learn
We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself -- the correspondence between the visual and the audio streams, and we introduce a novel "Audio-Visual Correspondence" learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves, is shown to successfully solve this task, and, more interestingly, result in good visual and audio representations. These features set the new state-of-the-art on two sound classification benchmarks, and perform on par with the state-of-the-art self-supervised approaches on ImageNet classification. We also demonstrate that the network is able to localize objects in both modalities, as well as perform fine-grained recognition tasks.
papers  audio  audiovisual  speech  neural-net 
september 2017 by arsyed
[1609.09430] CNN Architectures for Large-Scale Audio Classification
"Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task."
papers  convnet  neural-net  architecture  audio  classification 
august 2017 by arsyed
Overview — mutagen
"Mutagen is a Python module to handle audio metadata. It supports ASF, FLAC, MP4, Monkey’s Audio, MP3, Musepack, Ogg Opus, Ogg FLAC, Ogg Speex, Ogg Theora, Ogg Vorbis, True Audio, WavPack, OptimFROG, and AIFF audio files. All versions of ID3v2 are supported, and all standard ID3v2.4 frames are parsed. It can read Xing headers to accurately calculate the bitrate and length of MP3s. ID3 and APEv2 tags can be edited regardless of audio format. It can also manipulate Ogg streams on an individual packet/page level."
python  libs  audio  metadata  tagging 
july 2017 by arsyed
« earlier      
per page:    204080120160

related tags

3d  4hz  aac  aachen  accent  accents  accessibility  acm  acoustic  acoustic-scene-classificaiton  acoustics  active-learning  adrian-holovaty  adversarial-examples  alexa  algorithms  animals  annotation  apple  architecture  archives  archiving  asr  assembly  asvspoof  audio  audio-visual  audioBooks  audioset  audiotool  audiovisual  auditory  auditory-scene-analysis  autoencoder  autotagging  av  babel  background  bag-of  bats  baychi  bird  black-box  blind  blogs  blume  bob  bob-sturm  bob.ap  book  books  bradDeLong  brain  british-library  browser  business  byzantine  byzantium  c  c++  canvas  captcha  captioning  catastrophic-forgetting  cdf  cepstral  challenges  channel  check  chirplet  chrome  classical-music  classificaiton  classification  clip  clips  clustering  cocktail-party-problem  code  codebook  codec  codecs  columbia  comics  companies  complex  compressed-sensing  compression  computational-scene-analysis  concurrency  conferences  confidence-measures  continual-learning  conversion  converter  convnet  cool  corpus  courses  cqt  crypto  dan-ellis  data  data-analysis  data-augmentation  datasets  decoder  decrypt  deep-clustering  deep-learning  deepgram  demos  dft  dialup  digital  dimensionality-reduction  directory  display  distortion  divx  dnn  download  drawing  drums  dsp  dtw  e2e  echonest  economics  editor  eeg  effects  electronics  embedding  encoder-decoder  encoding  energy-detection  english  entropy  estimation  ethan-winer  evaluation  event-classification  event-detection  evidence  extraction  f0  facebook  faq  feacat  feature-extraction  feature-learning  fft  ffv  fidelity  file  file-format  file-formats  finance  fingerprint  fisher-kernel  flash  food  format  formats  fourier  free  fundamental-frequency  funny  game  games  gan  geek  generative  generative-models  generator  genre  geometry  glee  global  gmm  google  gradient  graph  graphics  gui  hardware  hashing  haskell  hci  hcopy  hearing  hearing-aid  history  hosting  htk  html  html5  httpd  humor  hypothesis  iclr  id  image  imitation  inception  indexing  information-geometry  information-retrieval  insects  interesting  inverse-spectrogram  invertible  ios  iphone  ipod  ipynb  ipynbs  ir  istft  itunes  java  javascript  jitter  jonathan-coulton  jquery  julia  k-means  keras  knn  kyle-kastner  label-noise  labs  language  lda  lecture  library  librosa  libs  likelihood  linguistics  links  listen  listening  liveCoding  locking  longnow  lsh  mac  machine-learning  mailing-lists  make  math  matlab  melody  metadata  metric-learning  mfcc  midi  minidisc  mir  mixer  mlpy  model  modeling  models  modulation  monitoring  mp3  mpeg  multilingual  multimedia  multimodal  music  music-transcription  neural-net  neuroscience  ngram  nist  nlp  nmf  noise  normalization  notation  notes  numpy  onomatopeoia  openal  opensmile  opensource  ordinal  paper  papers  parc  parsing  pencil  people  perception  pfile  phase  phonetics  pitch  player  players  playlist  plp  podcast  podcasts  praat  preprocessing  presentations  problems  proglang  programming  pronunciation  proxy  psychoacoustics  python  q&a  quality  quantization  radio  rap  rasta  rbm  real-time  realtime  reconstruction  record  recording  reddit  ref  replay  representation  representation-learning  resampling  research  resynthesis  retrieval  reverberation  reviews  rip  rnn  rss  rtsp  rwth  sad  samples  sane  scattering  scene-analysis  scene-classification  scheme  scicomp  science  scripts  search  search-engines  searchEngine  searching  segmentation  sem  semisupervised-learning  seq2seq  sequential  server  service  shell  shimmer  signal-processing  silence  similarity  sklearn  skype  slides  small-data  software  song  sonification  sound  sound-synthesis  sounds  soundslice  source-separation  sox  speaker-verification  spectral  spectrogram  spectrogram-inversion  speech  speech-enhancement  speech-synthesis  speeches  sph  sphere  splitter  spoofing  sptk  stackexchange  statistics  stft  stories  stream  streaming  structure-learning  style-transfer  super-resolution  synthesizer  tabs  tagging  talk  talks  tcl  teacher-student  tensorflow  test  textToSpeech  texture-synthesis  thesis  time-series  time-warping  tips  toolkit  tools  topic-models  torch  transcoder  transcription  transfer-learning  transformation  tts  tutorials  ui  unix  unsupervised-learning  urban  usda  utils  vad  via:arthegall  via:chl  via:jm  via:syhw  via:tdjones  video  visualization  vocal  vocoder  voice  volume  wav  wave  waveform  wavenet  web  webapp  webapps  webaudio  webdesign  webdesktop  whales  whitening  window  windows  words  workshop  workshops  xvid  youtube 

Copy this bookmark: