Cloud Speech API improves longform audio recognition and adds 30 new language variants

By Dan Aharon, Product Manager

Since its launch in 2016, businesses have used the Google Cloud Speech API to improve speech recognition for everything from voice-activated commands to call center routing to data analytics. And since then, we’ve gotten a lot of feedback that our users would like even more functionality and control. That’s why today we’re announcing Cloud Speech API features that expand support for long-form audio and further extend our language support to help even more customers inject AI into their businesses.

Here’s more on what the updated Cloud Speech API can do:

Word-level timestamps

Our number one most requested feature has been providing timestamp information for each word in the transcript. Word-level timestamps let users jump to the moment in the audio where the text was spoken, or display the relevant text while the audio is playing. You can find more information on timestamps here.

Happy Scribe uses Cloud Speech API to power its easy-to-use and affordable voice-to-text transcription service, helping professionals such as reporters and researchers transcribe interviews.

“Having the ability to map the audio to the text with timestamps significantly reduces the time spent proofreading transcripts.”

— Happy Scribe Co-founder, André Bastie

VoxImplant enables companies to build voice and video applications, including IVR and speech analytics applications.

“Now with Google Cloud Speech API timestamps, we can accurately analyze phone call conversations between two individuals with real-time speech-to-text transcription, helping our customers drive business impact. The ability to easily find the place in a call when something was said using timestamps makes Cloud Speech API much more useful and will save our customers’ time”

— VoxImplant CEO, Alexey Aylarov

Support for files up to 3 hours long

To help our users with long-form audio needs, we’re increasing the length of supported files from 80 minutes to up to 3 hours. Additionally, files longer than 3 hours could be supported on a case-by-case basis by applying for a quota extension through Cloud Support.

Expanded language coverage

Cloud Speech API already supports 89 language varieties. Today, coinciding with the broader announcement this morning, we’re adding 30 additional language varieties, from Bengali to Latvian to Swahili, covering more than one billion additional speakers. Our new expanded language support helps Cloud Speech API customers reach more users in more countries for an almost global reach. In addition, it enables users in more countries to use speech to access products and services that up until now have never been available to them.

You can find our complete list of supported languages here.

We hope these updates will help our users do more with Cloud Speech API. To learn more, visit Cloud.google.com/speech/.

googblogs.com

All Google blogs and Press in one site

Cloud Speech API improves longform audio recognition and adds 30 new language variants

Word-level timestamps

Support for files up to 3 hours long

Expanded language coverage

Source: Google Cloud Platform Blog