Tag Archives: Audio

Fast Pair Update

Posted by Seang Chau (VP Engineering)

Last year we announced Fast Pair, a set of specs that make it easier to connect Bluetooth headsets and speakers to Android devices.

Today, we're making it easier for people to connect Fast Pair compatible accessories to devices associated with the same Google Account. Fast Pair will connect accessories to a user's current and future Android phones (6.0+), and we're adding support for Chromebooks in 2019.

Fast Pair provides stress-free Bluetooth pairing for your Android phone.

We have been working closely with dozens of manufacturers, many of which are bringing new Fast Pair compatible devices to market over the coming months. This includes Jaybird, who is already selling the Tarah Wireless Sport Headphones, as well as upcoming products from prominent brands such as Anker SoundCore, Bose, and many more.

The Jaybird Tarah, Fast Pair compatible sport headphones already available in the market.

We also want to make it easy for manufacturers to ship compatible products with minimal additional engineering effort. We collaborated with industry leading Bluetooth audio companies such as Airoha Technology Corp., BES and Qualcomm Technologies International, Ltd. (QTIL) to add native Fast Pair support to their software development kits.

If you are a manufacturer interested in creating Fast Pair compatible Bluetooth devices, just head to our Nearby Devices console to register your product and validate that it has correctly implemented the Fast Pair specs.

Introducing Oboe: A C++ library for low latency audio

Posted by Don Turner, Developer Advocate, Android Audio Framework

This week we released the first production-ready version of Oboe - a C++ library for building real-time audio apps. Oboe provides the lowest possible audio latency across the widest range of Android devices, as well as several other benefits.

Single API

Oboe takes advantage of the improved performance and features of AAudio on Oreo MR1 (API 27+) whilst maintaining backward compatibility (using OpenSL ES) on API 16+. It's kind of like AndroidX for native audio.

Diagram showing the underlying audio API which Oboe will use

Less code to write and maintain

Using Oboe you can create an audio stream in just 3 lines of code (vs 50+ lines in OpenSL ES):

AudioStreamBuilder builder;
AudioStream *stream = nullptr;
Result result = builder.openStream(&stream);

Other benefits

  • Convenient C++ API (uses the C++11 standard)
  • Fast release process: supplied as a source library, bug fixes can be rolled out in days, quite a bit faster than the Android platform release cycle
  • Less guesswork: Provides workarounds for known audio bugs and has sensible default behaviour for stream properties, such as sample rate and audio data formats
  • Open source and maintained by Google engineers (although we welcome outside contributions)

Getting started

Take a look at the short video introduction:

Check out the documentation, code samples and API reference. There's even a codelab which shows you how to build a rhythm-based game.

If you have any issues, please file them here, we'd love to hear how you get on.

Open sourcing Resonance Audio

Spatial audio adds to your sense of presence when you’re in VR or AR, making it feel and sound, like you’re surrounded by a virtual or augmented world. And regardless of the display hardware you’re using, spatial audio makes it possible to hear sounds coming from all around you.

Resonance Audio, our spatial audio SDK launched last year, enables developers to create more realistic VR and AR experiences on mobile and desktop. We’ve seen a number of exciting experiences emerge across a variety of platforms using our SDK. Recent examples include apps like Pixar’s Coco VR for Gear VR, Disney’s Star Wars™: Jedi Challenges AR app for Android and iOS, and Runaway’s Flutter VR for Daydream, which all used Resonance Audio technology.

To accelerate adoption of immersive audio technology and strengthen the developer community around it, we’re opening Resonance Audio to a community-driven development model. By creating an open source spatial audio project optimized for mobile and desktop computing, any platform or software development tool provider can easily integrate with Resonance Audio. More cross-platform and tooling support means more distribution opportunities for content creators, without the worry of investing in costly porting projects.

What’s Included in the Open Source Project

As part of our open source project, we’re providing a reference implementation of YouTube’s Ambisonic-based spatial audio decoder, compatible with the same Ambisonics format (Ambix ACN/SN3D) used by others in the industry. Using our reference implementation, developers can easily render Ambisonic content in their VR media and other applications, while benefiting from Ambisonics open source, royalty-free model. The project also includes encoding, sound field manipulation and decoding techniques, as well as head related transfer functions (HRTFs) that we’ve used to achieve rich spatial audio that scales across a wide spectrum of device types and platforms. Lastly, we’re making our entire library of highly optimized DSP classes and functions, open to all. This includes resamplers, convolvers, filters, delay lines and other DSP capabilities. Additionally, developers can now use Resonance Audio’s brand new Spectral Reverb, an efficient, high quality, constant complexity reverb effect, in their own projects.

We’ve open sourced Resonance Audio as a standalone library and associated engine plugins, VST plugin, tutorials, and examples with the Apache 2.0 license. This means Resonance Audio is yours, so you’re free to use Resonance Audio in your projects, no matter where you work. And if you see something you’d like to improve, submit a GitHub pull request to be reviewed by the Resonance Audio project committers. While the engine plugins for Unity, Unreal, FMOD, and Wwise will remain open source, going forward they will be maintained by project committers from our partners, Unity, Epic, Firelight Technologies, and Audiokinetic, respectively.

If you’re interested in learning more about Resonance Audio, check out the documentation on our developer site. If you want to get more involved, visit our GitHub to access the source code, build the project, download the latest release, or even start contributing. We’re looking forward to building the future of immersive audio with all of you.

By Eric Mauskopf, Google AR/VR Team

Tacotron 2: Generating Human-like Speech from Text



Generating very natural sounding speech from text (text-to-speech, TTS) has been a research goal for decades. There has been great progress in TTS research over the last few years and many individual pieces of a complete TTS system have greatly improved. Incorporating ideas from past work such as Tacotron and WaveNet, we added more improvements to end up with our new system, Tacotron 2. Our approach does not use complex linguistic and acoustic features as input. Instead, we generate human-like speech from text using neural networks trained using only speech examples and corresponding text transcripts.

A full description of our new system can be found in our paper “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.” In a nutshell it works like this: We use a sequence-to-sequence model optimized for TTS to map a sequence of letters to a sequence of features that encode the audio. These features, an 80-dimensional audio spectrogram with frames computed every 12.5 milliseconds, capture not only pronunciation of words, but also various subtleties of human speech, including volume, speed and intonation. Finally these features are converted to a 24 kHz waveform using a WaveNet-like architecture.
A detailed look at Tacotron 2's model architecture. The lower half of the image describes the sequence-to-sequence model that maps a sequence of letters to a spectrogram. For technical details, please refer to the paper.
You can listen to some of the Tacotron 2 audio samples that demonstrate the results of our state-of-the-art TTS system. In an evaluation where we asked human listeners to rate the naturalness of the generated speech, we obtained a score that was comparable to that of professional recordings.

While our samples sound great, there are still some difficult problems to be tackled. For example, our system has difficulties pronouncing complex words (such as “decorum” and “merlot”), and in extreme cases it can even randomly generate strange noises. Also, our system cannot yet generate audio in realtime. Furthermore, we cannot yet control the generated speech, such as directing it to sound happy or sad. Each of these is an interesting research problem on its own.

Acknowledgements
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu, Sound Understanding team, TTS Research team, and TensorFlow team.

Resonance Audio: Multi-platform spatial audio at scale

Posted by Eric Mauskopf, Product Manager

As humans, we rely on sound to guide us through our environment, help us communicate with others and connect us with what's happening around us. Whether walking along a busy city street or attending a packed music concert, we're able to hear hundreds of sounds coming from different directions. So when it comes to AR, VR, games and even 360 video, you need rich sound to create an engaging immersive experience that makes you feel like you're really there. Today, we're releasing a new spatial audio software development kit (SDK) called Resonance Audio. It's based on technology from Google's VR Audio SDK, and it works at scale across mobile and desktop platforms.

Experience spatial audio in our Audio Factory VR app for Daydreamand SteamVR

Performance that scales on mobile and desktop

Bringing rich, dynamic audio environments into your VR, AR, gaming, or video experiences without affecting performance can be challenging. There are often few CPU resources allocated for audio, especially on mobile, which can limit the number of simultaneous high-fidelity 3D sound sources for complex environments. The SDK uses highly optimized digital signal processing algorithms based on higher order Ambisonics to spatialize hundreds of simultaneous 3D sound sources, without compromising audio quality, even on mobile. We're also introducing a new feature in Unity for precomputing highly realistic reverb effects that accurately match the acoustic properties of the environment, reducing CPU usage significantly during playback.

Using geometry-based reverb by assigning acoustic materials to a cathedral in Unity

Multi-platform support for developers and sound designers

We know how important it is that audio solutions integrate seamlessly with your preferred audio middleware and sound design tools. With Resonance Audio, we've released cross-platform SDKs for the most popular game engines, audio engines, and digital audio workstations (DAW) to streamline workflows, so you can focus on creating more immersive audio. The SDKs run on Android, iOS, Windows, MacOS and Linux platforms and provide integrations for Unity, Unreal Engine, FMOD, Wwise and DAWs. We also provide native APIs for C/C++, Java, Objective-C and the web. This multi-platform support enables developers to implement sound designs once, and easily deploy their project with consistent sounding results across the top mobile and desktop platforms. Sound designers can save time by using our new DAW plugin for accurately monitoring spatial audio that's destined for YouTube videos or apps developed with Resonance Audio SDKs. Web developers get the open source Resonance Audio Web SDK that works in the top web browsers by using the Web Audio API.

DAW plugin for sound designers to monitor audio destined for YouTube 360 videos or apps developed with the SDK

Model complex Sound Environments Cutting edge features

By providing powerful tools for accurately modeling complex sound environments, Resonance Audio goes beyond basic 3D spatialization. The SDK enables developers to control the direction acoustic waves propagate from sound sources. For example, when standing behind a guitar player, it can sound quieter than when standing in front. And when facing the direction of the guitar, it can sound louder than when your back is turned.

Controlling sound wave directivity for an acoustic guitar using the SDK

Another SDK feature is automatically rendering near-field effects when sound sources get close to a listener's head, providing an accurate perception of distance, even when sources are close to the ear. The SDK also enables sound source spread, by specifying the width of the source, allowing sound to be simulated from a tiny point in space up to a wall of sound. We've also released an Ambisonic recording tool to spatially capture your sound design directly within Unity, save it to a file, and use it anywhere Ambisonic soundfield playback is supported, from game engines to YouTube videos.

If you're interested in creating rich, immersive soundscapes using cutting-edge spatial audio technology, check out the Resonance Audio documentation on our developer site, let us know what you think through GitHub, and show us what you build with #ResonanceAudio on social media; we'll be resharing our favorites.

Bringing Real-time Spatial Audio to the Web with Songbird

Posted by Jamieson Brettle and Drew Allen, Chrome Media Team

For a virtual scene to be truly immersive, stunning visuals need to be accompanied by true spatial audio to create a realistic and believable experience. Spatial audio tools allow developers to include sounds that can come from any direction, and that are associated in 3D space with audio sources, thus completely enveloping the user in 360-degree sound.

Spatial audio helps draw the user into a scene and creates the illusion of entering an entirely new world. To make this possible, the Chrome Media team has created Songbird, an open source, spatial audio encoding engine that works in any web browser by using the Web Audio API.

The Songbird library takes in any number of mono audio streams and allows developers to programmatically place them in 3D space around the user. Songbird allows you to create immersive soundscapes, realistically reproducing reflection and reverb for the space you describe. Sounds bounce off walls and reflect off materials just as they would in real-life, capturing truly 360-degree sound. Songbird creates an ambisonic soundfield that can then be rendered in real-time for use in your application. We've partnered with the Omnitoneproject, which we blogged about last year, to add higher-order ambisonic support to Omnitone's binaural rendererto produce far more accurate sounding audio than ever before.

Songbird encapsulates Omnitone and with it, developers can now add interactive, full-sphere audio to any web based application. Songbird can scale to any order ambisonics, thereby creating a more realistic sound and higher performance than what is achievable through standard Web Audio API.

Songbird Audio Processing Diagram

The implementation of Songbird is based on the Google spatial mediaspecification. It expects mono input and outputs ambisonic (multichannel) ACN channel layout with SN3D normalization. Detailed documentation may be found here.

As the web emerges as an important VR platformfor delivering content, spatial audio will play a vital role in users' embrace of this new medium. Songbird and Omnitone are key tools in enabling spatial audio on the web platform and establishing it as a preeminent platform for compelling VR experiences. Combining these audio experiences with 3D JavaScript libraries like three.js gives a glimpseinto the future on the web.

Demo combining spatial sound in 3D environment

This project was made possible through close collaboration with Google's Daydream and Web Audio teams. This collaboration allowed us to deliver similar audio capabilities to the web as are available to developers creating Daydream applications.

We look forward to seeing what people do with Songbird now that it's open source. Check out the code on GitHub and let us know what you think. Also available are a number of demoson creating full spherical audio with Songbird.

Bringing Real-time Spatial Audio to the Web with Songbird

For a virtual scene to be truly immersive, stunning visuals need to be accompanied by true spatial audio to create a realistic and believable experience. Spatial audio tools allow developers to include sounds that can come from any direction, and that are associated in 3D space with audio sources, thus completely enveloping the user in 360-degree sound.

Spatial audio helps draw the user into a scene and creates the illusion of entering an entirely new world. To make this possible, the Chrome Media team has created Songbird, an open source, spatial audio encoding engine that works in any web browser by using the Web Audio API.

The Songbird library takes in any number of mono audio streams and allows developers to programmatically place them in 3D space around the user. Songbird allows you to create immersive soundscapes, realistically reproducing reflection and reverb for the space you describe. Sounds bounce off walls and reflect off materials just as they would in real-life, capturing truly 360-degree sound. Songbird creates an ambisonic soundfield that can then be rendered in real-time for use in your application. We’ve partnered with the Omnitone project, which we blogged about last year, to add higher-order ambisonic support to Omnitone’s binaural renderer to produce far more accurate sounding audio than ever before.

Songbird encapsulates Omnitone and with it, developers can now add interactive, full-sphere audio to any web based application. Songbird can scale to any order ambisonics, thereby creating a more realistic sound and higher performance than what is achievable through standard Web Audio API.
Songbird Audio Processing Diagram
The implementation of Songbird is based on the Google spatial media specification. It expects mono input and outputs ambisonic (multichannel) ACN channel layout with SN3D normalization. Detailed documentation may be found here.

As the web emerges as an important VR platform for delivering content, spatial audio will play a vital role in users’ embrace of this new medium. Songbird and Omnitone are key tools in enabling spatial audio on the web platform and establishing it as a preeminent platform for compelling VR experiences. Combining these audio experiences with 3D JavaScript libraries like three.js gives a glimpse into the future on the web.
Demo combining spatial sound in 3D environment
This project was made possible through close collaboration with Google’s Daydream and Web Audio teams. This collaboration allowed us to deliver similar audio capabilities to the web as are available to developers creating Daydream applications.

We look forward to seeing what people do with Songbird now that it's open source. Check out the code on GitHub and let us know what you think. Also available are a number of demos on creating full spherical audio with Songbird.

By Jamieson Brettle and Drew Allen, Chrome Media Team

Omnitone: Spatial audio on the web


Spatial audio is a key element for an immersive virtual reality (VR) experience. By bringing spatial audio to the web, the browser can be transformed into a complete VR media player with incredible reach and engagement. That’s why the Chrome WebAudio team has created and is releasing the Omnitone project, an open source spatial audio renderer with the cross-browser support.

Our challenge was to introduce the audio spatialization technique called ambisonics so the user can hear the full-sphere surround sound on the browser. In order to achieve this, we implemented the ambisonic decoding with binaural rendering using web technology. There are several paths for introducing a new feature into the web platform, but we chose to use only the Web Audio API. In doing so, we can reach a larger audience with this cross-browser technology, and we can also avoid the lengthy standardization process for introducing a new Web Audio component. This is possible because the Web Audio API provides all the necessary building blocks for this audio spatialization technique.



Omnitone Audio Processing Diagram

The AmbiX format recording, which is the target of the Omnitone decoder, contains 4 channels of audio that are encoded using ambisonics, which can then be decoded into an arbitrary speaker setup. Instead of the actual speaker array, Omnitone uses 8 virtual speakers based on an the head-related transfer function (HRTF) convolution to render the final audio stream binaurally. This binaurally-rendered audio can convey a sense of space when it is heard through headphones.

The beauty of this mechanism lies in the sound-field rotation applied to the incoming spatial audio stream. The orientation sensor of a VR headset or a smartphone can be linked to Omnitone’s decoder to seamlessly rotate the entire sound field. The rest of the spatialization process will be handled automatically by Omnitone. A live demo can be found at the project landing page.

Throughout the project, we worked closely with the Google VR team for their VR audio expertise. Not only was their knowledge on the spatial audio a tremendous help for the project, but the collaboration also ensured identical audio spatialization across all of Google’s VR applications - both on the web and Android (e.g. Google VR SDK, YouTube Android app). The Spatial Media Specification and HRTF sets are great examples of the Google VR team’s efforts, and Omnitone is built on top of this specification and HRTF sets.

With emerging web-based VR projects like WebVR, Omnitone’s audio spatialization can play a critical role in a more immersive VR experience on the web. Web-based VR applications will also benefit from high-quality streaming spatial audio, as the Chrome Media team has recently added FOA compression to the open source audio codec Opus. More exciting things like VR view integration, higher-order ambisonics and mobile web support will also be coming soon to Omnitone.

We look forward to seeing what people do with Omnitone now that it's open source. Feel free to reach out to us or leave a comment with your thoughts and feedback on the issue tracker on GitHub.

By Hongchan Choi and Raymond Toy, Chrome Team

Due to the incomplete implementation of multichannel audio decoding on various browsers, Omnitone does not support mobile web at the time of writing.

A new method to measure touch and audio latency

Posted by Mark Koudritsky, software engineer

There is a new addition in the arsenal of instruments used by Android and ChromeOS teams in the battle to measure and minimize touch and audio latency: the WALT Latency Timer.

When you use a mobile device, you expect it to respond instantly to your touch or voice: the more immediate the response, the more you feel directly connected to the device. Over the past few years, we have been trying to measure, understand, and reduce latency in our Chromebook and Android products.

Before we can reduce latency, we must first understand where it comes from. In the case of tapping a touchscreen, the time for a response includes the touch-sensing hardware and driver, the application, and the display and graphics output. For a voice command, there is time spent in sampling input audio, the application, and in audio output. Sometimes we have a mixture of these (for example, a piano app would include touch input and audio output).

Most previous work to study latency has focused on measuring a single round-trip latency number. For example, to measure audio latency, an app would measure time from app to speaker/mic and back to the app using the Dr. Rick O'Rang loopback audio dongle together with an appropriate app such as the Dr Rick O’Rang Loopback app or Superpowered Mobile Audio Latency Test App. Similarly, the TouchBot uses a fast camera to measure the round-trip delay from physical touch until a change on the screen is visible. While valuable, the problem with such a setup is that it’s very difficult to break down the latency into input vs output components.

An important innovation in WALT (a descendant of QuickStep) is that it synchronizes an external hardware clock with the Android device or Chromebook to within a millisecond. This allows it to measure input and output latencies separately as opposed to measuring a round-trip latency.

WALT is simple. The parts cost less than $50 and with some basic hobby electronics skills, you can build it yourself.

We’ve been using WALT within Google for Nexus and Chromebook development. We’re now opening this tool to app developers and anyone who wants to precisely measure real-world latencies. We hope that having readily accessible tools will help the industry as a whole improve and make all our devices more responsive to touch and voice.