Tag Archives: Cloud Speech API

Bringing Live Transcribe’s Speech Engine to Everyone

Earlier this year, Google launched Live Transcribe, an Android application that provides real-time automated captions for people who are deaf or hard of hearing. Through many months of user testing, we've learned that robustly delivering good captions for long-form conversations isn't so easy, and we want to make it easier for developers to build upon what we've learned. Live Transcribe's speech recognition is provided by Google's state-of-the-art Cloud Speech API, which under most conditions delivers pretty impressive transcript accuracy. However, relying on the cloud introduces several complications—most notably robustness to ever-changing network connections, data costs, and latency. Today, we are sharing our transcription engine with the world so that developers everywhere can build applications with robust transcription.

Those who have worked with our Cloud Speech API know that sending infinitely long streams of audio is currently unsupported. To help solve this challenge, we take measures to close and restart streaming requests prior to hitting the timeout, including restarting the session during long periods of silence and closing whenever there is a detected pause in the speech. Otherwise, this would result in a truncated sentence or word. In between sessions, we buffer audio locally and send it upon reconnection. This reduces the amount of text lost mid-conversation—either due to restarting speech requests or switching between wireless networks.



Endlessly streaming audio comes with its own challenges. In many countries, network data is quite expensive and in spots with poor internet, bandwidth may be limited. After much experimentation with audio codecs (in particular, we evaluated the FLAC, AMR-WB, and Opus codecs), we were able to achieve a 10x reduction in data usage without compromising accuracy. FLAC, a lossless codec, preserves accuracy completely, but doesn't save much data. It also has noticeable codec latency. AMR-WB, on the other hand, saves a lot of data, but delivers much worse accuracy in noisy environments. Opus was a clear winner, allowing data rates many times lower than most music streaming services while still preserving the important details of the audio signal—even in noisy environments. Beyond relying on codecs to keep data usage to a minimum, we also support using speech detection to close the network connection during extended periods of silence. That means if you accidentally leave your phone on and running Live Transcribe when nobody is around, it stops using your data.

Finally, we know that if you are relying on captions, you want them immediately, so we've worked hard to keep latency to a minimum. Though most of the credit for speed goes to the Cloud Speech API, Live Transcribe's final trick lies in our custom Opus encoder. At the cost of only a minor increase in bitrate, we see latency that is visually indistinguishable to sending uncompressed audio.

Today, we are excited to make all of this available to developers everywhere. We hope you'll join us in trying to build a world that is more accessible for everyone.

By Chet Gnegy, Alex Huang, and Ausmus Chang from the Live Transcribe Team

Running Android Things on the AIY Voice Kit

Posted by Ryan Bae, Android Things

A major benefit of using Android Things is the ability to prototype connected devices and quickly scale to full commercial products. To further that goal, the Android Things team is partnering with AIY Projects, a new initiative to bring do-it-yourself artificial intelligence to makers. Today, the AIY Projects team launched their first open source reference project: a Raspberry Pi-based Voice Kit with instructions to build a Voice User Interface (VUI) that can use cloud services (like the new Google Assistant SDK or Cloud Speech API) or run completely on-device with TensorFlow. We are releasing a special Android Things Developer Preview 3.1 build for Raspberry Pi 3 to support the Voice Kit. Developers can run Android Things on the Voice Kit with full functionality, including integration with the Google Assistant SDK. To get started, visit the AIY website, download the latest Android Things Developer Preview, and follow the instructions.

The Voice Kit ships out to all MagPi Magazine subscribers on May 4, 2017, and the parts list, assembly instructions, source code, as well as suggested extensions are available on AIY Projects website. The complete kit is also for sale at over 500 Barnes & Noble stores nationwide, as well as UK retailers WH Smith, Tesco, Sainsburys, and Asda.

We are excited to see what you build with the Voice Kit on Android Things. We also encourage you to join Google's IoT Developers Community and Google Assistant SDK Developers on Google+, a great resource to keep up to date and discuss ideas with other developers.