jm + speech   13

Snowboy Hotword Detection
Open-source, Apache-license hotword detection library for homebrew IoT:

'Snowboy is an highly customizable hotword detection engine that is embedded real-time and is always listening (even when off-line) compatible with Raspberry Pi, (Ubuntu) Linux, and Mac OS X.

Currently, Snowboy supports:

all versions of Raspberry Pi (with Raspbian based on Debian Jessie 8.0)
64bit Mac OS X
64bit Ubuntu (12.04 and 14.04)
Android with ARMv7 CPUs
Pine 64 with Debian Jessie 8.5 (3.10.102)
Intel Edison with Ubilinux (Debian Wheezy 7.8)'
audio  iot  hardware  hotwords  speech-recognition  speech  devices 
10 weeks ago by jm
Google AI Blog: An All-Neural On-Device Speech Recognizer
This is very impressive. Can't wait to see it rolled out widely.
Today, we're happy to announce the rollout of an end-to-end, all-neural, on-device speech recognizer to power speech input in Gboard. In our recent paper, "Streaming End-to-End Speech Recognition for Mobile Devices", we present a model trained using RNN transducer (RNN-T) technology that is compact enough to reside on a phone. This means no more network latency or spottiness — the new recognizer is always available, even when you are offline. The model works at the character level, so that as you speak, it outputs words character-by-character, just as if someone was typing out what you say in real-time, and exactly as you'd expect from a keyboard dictation system.

(Via Nelson)
asr  voice  ai  google  rnn-t  rnn  neural-networks  via:nelson  offline  gboard  speech 
11 weeks ago by jm
Facing the Great Reckoning Head-On - danah boyd - Medium
“Move fast and break things” is an abomination if your goal is to create a healthy society. Taking short-cuts may be financially profitable in the short-term, but the cost to society is too great to be justified. In a healthy society, we accommodate differently abled people through accessibility standards, not because it’s financially prudent but because it’s the right thing to do. In a healthy society, we make certain that the vulnerable amongst us are not harassed into silence because that is not the value behind free speech. In a healthy society, we strategically design to increase social cohesion because binaries are machine logic not human logic.
medialab  mit  speech  tech  society  danah-boyd 
september 2019 by jm
An Algorithmic Investigation of the Highfalutin 'Poet Voice' - Atlas Obscura
'It’s easy to make fun of Poet Voice. But its proliferation across the space of academic poetry may have more serious implications as well. In a 2014 essay, “Poet Voice and Flock Mentality,” the poet Lisa Marie Basile connects it to an overall lack of diversity in the field, and a fear of breaking the mold. The consistent use of it, she writes, “delivers two messages: I am educated, I am taught, I am part-of a group … I am afraid to tell my own story in my own voice.”'
poet-voice  talking  speech  voices  intonation  droning  poetry 
may 2018 by jm
'DolphinAttack: Inaudible Voice Commands' [pdf]
'Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems(VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though hidden, are nonetheless audible. In this work, we design a completely inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validate DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions. We validate that it is feasible to detect DolphinAttack by classifying the audios using supported vector machine (SVM), and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.'

via Zeynep (
alexa  siri  attacks  security  exploits  google-now  speech-recognition  speech  audio  acm  papers  cortana 
january 2018 by jm
Targeted Audio Adversarial Examples
This is phenomenal:
We have constructed targeted audio adversarial examples on speech-to-text transcription neural networks: given an arbitrary waveform, we can make a small perturbation that when added to the original waveform causes it to transcribe as any phrase we choose.

In prior work, we constructed hidden voice commands, audio that sounded like noise but transcribed to any phrases chosen by an adversary. With our new attack, we are able to improve this and make an arbitrary waveform transcribe as any target phrase.

The audio examples on this page are impressive -- a little bit of background noise, such as you might hear on a telephone call with high compression, hard to perceive if you aren't listening out for it.

Paper here:

(Via Parker Higgins, )
papers  audio  adversarial-classification  neural-networks  speech-to-text  speech  recognition  voice  attacks  exploits  via:xor 
january 2018 by jm
Where do 'mama'/'papa' words come from?
The sounds came first — as experiments in vocalization — and parents adopted them as pet names for themselves.

If you open your mouth and make a sound, it will probably be an open vowel like /a/ unless you move your tongue or lips. The easiest consonants are perhaps the bilabials /m/, /p/, and /b/, requiring no movement of the tongue, followed by consonants made by raising the front of the tongue: /d/, /t/, and /n/. Add a dash of reduplication, and you get mama, papa, baba, dada, tata, nana.

That such words refer to people (typically parents or other guardians) is something we have imposed on the sounds and incorporated into our languages and cultures; the meanings don’t inhere in the sounds as uttered by babies, which are more likely calls for food or attention.
sounds  voice  speech  babies  kids  phonetics  linguist  language 
october 2015 by jm
How the NSA Converts Spoken Words Into Searchable Text - The Intercept
This hits the nail on the head, IMO:
To Phillip Rogaway, a professor of computer science at the University of California, Davis, keyword-search is probably the “least of our problems.” In an email to The Intercept, Rogaway warned that “When the NSA identifies someone as ‘interesting’ based on contemporary NLP methods, it might be that there is no human-understandable explanation as to why beyond: ‘his corpus of discourse resembles those of others whom we thought interesting'; or the conceptual opposite: ‘his discourse looks or sounds different from most people’s.' If the algorithms NSA computers use to identify threats are too complex for humans to understand, it will be impossible to understand the contours of the surveillance apparatus by which one is judged.  All that people will be able to do is to try your best to behave just like everyone else.”
privacy  security  gchq  nsa  surveillance  machine-learning  liberty  future  speech  nlp  pattern-analysis  cs 
may 2015 by jm
Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers
How to build an Intelligent Personal Assistant:

'Sirius is an open end-to-end standalone speech and vision based intelligent personal assistant (IPA) similar to Apple’s Siri, Google’s Google Now, Microsoft’s Cortana, and Amazon’s Echo. Sirius implements the core functionalities of an IPA including speech recognition, image matching, natural language processing and a question-and-answer system. Sirius is developed by Clarity Lab at the University of Michigan. Sirius is published at the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2015.'
sirius  siri  cortana  google-now  echo  ok-google  ipa  assistants  search  video  audio  speech  papers  clarity  nlp  wikipedia 
april 2015 by jm
How Emoji Get Lost In Translation
I recently texted a friend to say how I was excited to meet her new boyfriend, and, because "excited" doesn't look so exciting on an iPhone screen, I editorialized with what seemed then like an innocent "[dancer]". (Translation: Can't wait for the fun night out!) On an Android phone, I realized later, that panache would have been a put-down: The dancers become "[playboy bunny]." (Translation: You’re a Playboy bunny who gets around!)
emoji  icons  graphics  text  speech  phones 
june 2014 by jm
JS1k, 1k demo submission
a speech synthesizer in 1 KB of javascript. truly awesome, nice work by @p01
js1k  javascript  demos  speech  hacks  coding 
march 2012 by jm
simon listens
open-source speech recognition for Linux and Windows. must give this a go! (Via Alexander Seewald)
speech-recognition  floss  free-software  kde  speech  recognition  linux  audio  accessibility  from delicious
october 2010 by jm

Copy this bookmark: