jm + speech   9

'DolphinAttack: Inaudible Voice Commands' [pdf]
'Speech recognition (SR) systems such as Siri or Google Now have become an increasingly popular human-computer interaction method, and have turned various systems into voice controllable systems(VCS). Prior work on attacking VCS shows that the hidden voice commands that are incomprehensible to people can control the systems. Hidden voice commands, though hidden, are nonetheless audible. In this work, we design a completely inaudible attack, DolphinAttack, that modulates voice commands on ultrasonic carriers (e.g., f > 20 kHz) to achieve inaudibility. By leveraging the nonlinearity of the microphone circuits, the modulated low frequency audio commands can be successfully demodulated, recovered, and more importantly interpreted by the speech recognition systems. We validate DolphinAttack on popular speech recognition systems, including Siri, Google Now, Samsung S Voice, Huawei HiVoice, Cortana and Alexa. By injecting a sequence of inaudible voice commands, we show a few proof-of-concept attacks, which include activating Siri to initiate a FaceTime call on iPhone, activating Google Now to switch the phone to the airplane mode, and even manipulating the navigation system in an Audi automobile. We propose hardware and software defense solutions. We validate that it is feasible to detect DolphinAttack by classifying the audios using supported vector machine (SVM), and suggest to re-design voice controllable systems to be resilient to inaudible voice command attacks.'

via Zeynep (
alexa  siri  attacks  security  exploits  google-now  speech-recognition  speech  audio  acm  papers  cortana 
7 weeks ago by jm
Targeted Audio Adversarial Examples
This is phenomenal:
We have constructed targeted audio adversarial examples on speech-to-text transcription neural networks: given an arbitrary waveform, we can make a small perturbation that when added to the original waveform causes it to transcribe as any phrase we choose.

In prior work, we constructed hidden voice commands, audio that sounded like noise but transcribed to any phrases chosen by an adversary. With our new attack, we are able to improve this and make an arbitrary waveform transcribe as any target phrase.

The audio examples on this page are impressive -- a little bit of background noise, such as you might hear on a telephone call with high compression, hard to perceive if you aren't listening out for it.

Paper here:

(Via Parker Higgins, )
papers  audio  adversarial-classification  neural-networks  speech-to-text  speech  recognition  voice  attacks  exploits  via:xor 
8 weeks ago by jm
Where do 'mama'/'papa' words come from?
The sounds came first — as experiments in vocalization — and parents adopted them as pet names for themselves.

If you open your mouth and make a sound, it will probably be an open vowel like /a/ unless you move your tongue or lips. The easiest consonants are perhaps the bilabials /m/, /p/, and /b/, requiring no movement of the tongue, followed by consonants made by raising the front of the tongue: /d/, /t/, and /n/. Add a dash of reduplication, and you get mama, papa, baba, dada, tata, nana.

That such words refer to people (typically parents or other guardians) is something we have imposed on the sounds and incorporated into our languages and cultures; the meanings don’t inhere in the sounds as uttered by babies, which are more likely calls for food or attention.
sounds  voice  speech  babies  kids  phonetics  linguist  language 
october 2015 by jm
How the NSA Converts Spoken Words Into Searchable Text - The Intercept
This hits the nail on the head, IMO:
To Phillip Rogaway, a professor of computer science at the University of California, Davis, keyword-search is probably the “least of our problems.” In an email to The Intercept, Rogaway warned that “When the NSA identifies someone as ‘interesting’ based on contemporary NLP methods, it might be that there is no human-understandable explanation as to why beyond: ‘his corpus of discourse resembles those of others whom we thought interesting'; or the conceptual opposite: ‘his discourse looks or sounds different from most people’s.' If the algorithms NSA computers use to identify threats are too complex for humans to understand, it will be impossible to understand the contours of the surveillance apparatus by which one is judged.  All that people will be able to do is to try your best to behave just like everyone else.”
privacy  security  gchq  nsa  surveillance  machine-learning  liberty  future  speech  nlp  pattern-analysis  cs 
may 2015 by jm
Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers
How to build an Intelligent Personal Assistant:

'Sirius is an open end-to-end standalone speech and vision based intelligent personal assistant (IPA) similar to Apple’s Siri, Google’s Google Now, Microsoft’s Cortana, and Amazon’s Echo. Sirius implements the core functionalities of an IPA including speech recognition, image matching, natural language processing and a question-and-answer system. Sirius is developed by Clarity Lab at the University of Michigan. Sirius is published at the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) 2015.'
sirius  siri  cortana  google-now  echo  ok-google  ipa  assistants  search  video  audio  speech  papers  clarity  nlp  wikipedia 
april 2015 by jm
How Emoji Get Lost In Translation
I recently texted a friend to say how I was excited to meet her new boyfriend, and, because "excited" doesn't look so exciting on an iPhone screen, I editorialized with what seemed then like an innocent "[dancer]". (Translation: Can't wait for the fun night out!) On an Android phone, I realized later, that panache would have been a put-down: The dancers become "[playboy bunny]." (Translation: You’re a Playboy bunny who gets around!)
emoji  icons  graphics  text  speech  phones 
june 2014 by jm
JS1k, 1k demo submission
a speech synthesizer in 1 KB of javascript. truly awesome, nice work by @p01
js1k  javascript  demos  speech  hacks  coding 
march 2012 by jm
simon listens
open-source speech recognition for Linux and Windows. must give this a go! (Via Alexander Seewald)
speech-recognition  floss  free-software  kde  speech  recognition  linux  audio  accessibility  from delicious
october 2010 by jm

Copy this bookmark: