Tag Archives: Audio

Audio and Visual Quality Measurement using Fréchet Distance

Posted by Kevin Kilgour, Software Engineer and Thomas Unterthiner, Research Software Engineer, Google Research, Zurich

"I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.”

William Thomson

Popular Lectures Vol. I, p. 73

The rate of scientific progress in machine learning has often been determined by the availability of good datasets, and metrics. In deep learning, benchmark datasets such as ImageNet or Penn Treebank were among the driving forces that established deep artificial neural networks for image recognition and language modeling. Yet, while the available “ground-truth” datasets lend themselves nicely as measures of performance on these prediction tasks, determining the “ground-truth” for comparison to generative models is not so straightforward. Imagine a model that generates videos of StarCraft video game sequences — how does one determine which model is best? Clearly some of the videos shown below look more realistic than others, but can the differences between them be quantified? Access to robust metrics for evaluation of generative models is crucial for measuring (and making) progress in the fields of audio and video understanding, but currently no such metrics exist.

Videos generated from various models trained on sequences from the StarCraft Video (SCV) dataset.

In “Fréchet Audio Distance: A Metric for Evaluating Music Enhancement Algorithms” and “Towards Accurate Generative Models of Video: A New Metric & Challenges”, we present two such metrics — the Fréchet Audio Distance (FAD) and Fréchet Video Distance (FVD). We document our large-scale human evaluations using 10k video and 69k audio clip pairwise comparisons that demonstrate high correlations between our metrics and human perception. We are also releasing the source code for both Fréchet Video Distance and Fréchet Audio Distance on github (FVD; FAD).

General Description of Fréchet Distance
The goal of a generative model is to learn to produce samples that look similar to the ones on which it has been trained, such that it knows what properties and features are likely to appear in the data, and which ones are unlikely. In other words, a generative model must learn the probability distribution of the training data. In many cases, the target distributions for generative models are very high-dimensional. For example, a single image of 128x128 pixels with 3 color channels has almost 50k dimensions, while a second-long video clip might consist of dozens (or hundreds) of such frames with audio that may have 16,000 samples. Calculating distances between such high dimensional distributions in order to quantify how well a given model succeeds at a task is very difficult. In the case of pictures, one could look at a few samples to gauge visual quality, but doing so for every model trained is not feasible.

In addition, generative adversarial networks (GANs) tend to focus on a few modes of the overall target distribution, while completely ignoring others. For example, they may learn to generate only one type of object or only a select few viewing angles. As a consequence, looking only at a limited number of samples from the model may not indicate whether the network learned the entire distribution successfully. To remedy this, a metric is needed that aligns well with human judgement of quality, while also taking the properties of the target distribution into account.

One common solution for this problem is the so-called Fréchet Inception Distance (FID) metric, which was specifically designed for images. The FID takes a large number of images from both the target distribution and the generative model, and uses the Inception object-recognition network to embed each image into a lower-dimensional space that captures the important features. Then it computes the so-called Fréchet distance between these samples, which is a common way of calculating distances between distributions that provides a quantitative measure of how similar the two distributions actually are.

A key component for both metrics is a pre-trained model that converts the video or audio clip into an N-dimensional embedding.

Fréchet Audio Distance and Fréchet Video Distance
Building on the principles of FID that have been successfully applied to the image domain, we propose both Fréchet Video Distance (FVD) and Fréchet Audio Distance (FAD). Unlike popular metrics such as peak signal-to-noise ratio or the structural similarity index, FVD looks at videos in their entirety, and thereby avoids the drawbacks of framewise metrics.

Examples of videos of a robot arm, judged by the new FVD metric. FVD values were found to be approximately 2000, 1000, 600, 400, 300 and 150 (left-to-right; top-to-bottom). A lower FVD clearly correlates with higher video quality.

In the audio domain, existing metrics either require a time-aligned ground truth signal, such as source-to-distortion ratio (SDR), or only target a specific domain, like speech quality. FAD on the other hand is reference-free and can be used on any type of audio.

Below is a 2-D visualization of the audio embedding vectors from which we compute the FAD. Each point corresponds to the embedding of a 5-second audio clip, where the blue points are from clean music and other points represent audio that has been distorted in some way. The estimated multivariate Gaussian distributions are presented as concentric ellipses. As the magnitude of the distortions increase, the overlap between their distributions and that of the clean audio decreases. The separation between these distributions is what the Fréchet distance is measuring.

In the animation, we can see that as the magnitude of the distortions increases, the Gaussian distributions of the distorted audio overlaps less with the clean distribution. The magnitude of this separation is what the Fréchet distance is measuring.

Evaluation
It is important for FAD and FVD to closely track human judgement, since that is the gold standard for what looks and sounds “realistic”. So, we performed a large-scale human study to determine how well our new metrics align with qualitative human judgment of generated audio and video. For the study, human raters examined 10,000 video pairs and 69,000 5-second audio clips. For the FAD we asked human raters to compare the effect of two different distortions on the same audio segment, randomizing both the pair of distortions that they compared and the order in which they appeared. The raters were asked “Which audio clip sounds most like a studio-produced recording?” The collected set of pairwise evaluations was then ranked using a Plackett-Luce model, which estimates a worth value for each parameter configuration. Comparison of the worth values to the FAD demonstrates that the FAD correlates quite well with human judgement.

This figure compares the FAD calculated between clean background music and music distorted by a variety of methods (e.g., pitch down, Gaussian noise, etc.) to the associated worth values from human evaluation. Each type of distortion has two data points representing high and low extremes in the distortion applied. The quantization distortion (purple circles), for example, limits the audio to a specific number of bits per sample, where the two data points represent two different bitrates. Both human raters and the FAD assigned higher values (i.e., “less realistic”) to the lower bitrate quantization. Overall log FAD correlates well with human judgement — a perfect correlation between the log FAD and human perception would result in a straight line.

Conclusion
We are currently making great strides in generative models. FAD and FVD will help us keeping this progress measurable, and will hopefully lead us to improve our models for audio and video generation.

Acknowledgements
There are many people who contributed to this large effort, and we’d like to highlight some of the key contributors: Sjoerd van Steenkiste, Karol Kurach, Raphael Marinier, Marcin Michalski, Sylvain Gelly, Mauricio Zuluaga, Dominik Roblek, Matthew Sharifi as well as the extended Google Brain team in Zurich.

Source: Google AI Blog

What’s new with Fast Pair

Posted by Catherina Xu (Product Manager)

Last November, we released Fast Pair with the Jaybird Tarah Bluetooth headphones. Since then, we’ve engaged with dozens of OEMs, ODMs, and silicon partners to bring Fast Pair to even more devices. Last month, we held a talk at I/O announcing 10+ certified devices, support for Qualcomm’s Smart Headset Development Kit, and upcoming experiences for Fast Pair devices.

The Fast Pair team presenting at I/O 2019.

The Fast Pair team presenting at I/O 2019.

Upcoming experiences

Fast Pair makes pairing seamless across Android phones - this year, we are introducing additional features to improve Bluetooth device management.

True Wireless Features. As True Wireless Stereo (TWS) headphones continue to gain momentum in the market and with users, it is important to build system-wide support for TWS. Later this year, TWS headsets with Fast Pair will be able to broadcast individual battery information for the case and buds. This enables features such as case open and close battery notifications and per-component battery reporting throughout the UI.

Detailed battery level notifications surfaced during “case open” for TWS headphones.

Detailed battery level notifications surfaced during “case open” for TWS headphones.

• Find My Device. Fast Pair devices will soon be surfaced in the Find My Device app and website, allowing users to easily track down lost devices. Headset owners can view the location and time of last use, as well as unpair or ring the buds to locate when they are in range.

• Connected Device Details. In Android Q, Fast Pair devices will have an enhanced Bluetooth device details page to centralize management and key settings. This includes links to Find My Device, Assistant settings (if available), and additional OEM-specified settings that will link to the OEM’s companion app.

The updated Device details screen in Q allows easy access to key settings and the headphone’s companion app.

The updated Device details screen in Q allows easy access to key settings and the headphone’s companion app.

Compatible Devices

Below is a list of devices that were showcased during our I/O talk:

Anker Spirit Pro GVA
Anker SoundCore Flare+ (Speaker)
JBL Live 220BT
JBL Live 400BT
JBL Live 500BT
JBL Live 650BT
Jaybird Tarah
1More Dual Driver BT ANC
LG HBS-SL5
LG HBS-PL6S
LG HBS-SL6S
LG HBS-PL5
Cleer Ally Plus

Interested in Fast Pair?

If you are interested in creating Fast Pair compatible Bluetooth devices, please take a look at:

Once you have selected devices to integrate, head to our Nearby Devices console to register your product. Reach out to us at [email protected] if you have any questions.

Source: Android Developers Blog

Fast Pair Update

Posted by Seang Chau (VP Engineering)

Last year we announced Fast Pair, a set of specs that make it easier to connect Bluetooth headsets and speakers to Android devices.

Today, we're making it easier for people to connect Fast Pair compatible accessories to devices associated with the same Google Account. Fast Pair will connect accessories to a user's current and future Android phones (6.0+), and we're adding support for Chromebooks in 2019.

Fast Pair provides stress-free Bluetooth pairing for your Android phone.

We have been working closely with dozens of manufacturers, many of which are bringing new Fast Pair compatible devices to market over the coming months. This includes Jaybird, who is already selling the Tarah Wireless Sport Headphones, as well as upcoming products from prominent brands such as Anker SoundCore, Bose, and many more.

The Jaybird Tarah, Fast Pair compatible sport headphones already available in the market.

We also want to make it easy for manufacturers to ship compatible products with minimal additional engineering effort. We collaborated with industry leading Bluetooth audio companies such as Airoha Technology Corp., BES and Qualcomm Technologies International, Ltd. (QTIL) to add native Fast Pair support to their software development kits.

If you are a manufacturer interested in creating Fast Pair compatible Bluetooth devices, just head to our Nearby Devices console to register your product and validate that it has correctly implemented the Fast Pair specs.

Source: Android Developers Blog

Introducing Oboe: A C++ library for low latency audio

Posted by Don Turner, Developer Advocate, Android Audio Framework

This week we released the first production-ready version of Oboe - a C++ library for building real-time audio apps. Oboe provides the lowest possible audio latency across the widest range of Android devices, as well as several other benefits.

Single API

Oboe takes advantage of the improved performance and features of AAudio on Oreo MR1 (API 27+) whilst maintaining backward compatibility (using OpenSL ES) on API 16+. It's kind of like AndroidX for native audio.

Less code to write and maintain

Using Oboe you can create an audio stream in just 3 lines of code (vs 50+ lines in OpenSL ES):

AudioStreamBuilder builder;
AudioStream *stream = nullptr;
Result result = builder.openStream(&stream);

Other benefits

Convenient C++ API (uses the C++11 standard)
Fast release process: supplied as a source library, bug fixes can be rolled out in days, quite a bit faster than the Android platform release cycle
Less guesswork: Provides workarounds for known audio bugs and has sensible default behaviour for stream properties, such as sample rate and audio data formats
Open source and maintained by Google engineers (although we welcome outside contributions)

Getting started

Take a look at the short video introduction:

Check out the documentation, code samples and API reference. There's even a codelab which shows you how to build a rhythm-based game.

If you have any issues, please file them here, we'd love to hear how you get on.

Source: Android Developers Blog

Marketing with podcasts

Reading time: 5 minutes

The podcast industry is booming, so it’s no wonder savvy marketers are weaving them into their content marketing plans. They are a great way to reach your audience, and podcast listeners are also a loyal bunch: 80% will listen to a full podcast once they’ve started it, and 45% will follow calls to action after hearing them.*

With the marketplace experiencing a growth spurt and over half a million podcasts to choose from, there are no signs of podcasts losing popularity. But, it takes more than a microphone and an idea to attract such a highly engaged crowd. So, here are some tips on creating a podcast that survives the marketing ecosystem and stands out from the crowd.

Be remarkable

It can be tempting to follow in the footsteps of the podcast greats, but if you want to generate interest and excitement, you should venture off the beaten path and find your niche topic. To do that, become the market and start researching what’s already out there. Start with something that interests you. Show up in the comments on some of the podcasts that you find useful, listen to what could be improved, and participate in the forums. Once you gather all your insights, sit down and deep-dive into your findings - an opportunity will be hiding in there, you just need to listen for it.

Be committed and consistent

Team up with like-minded people who enjoy the same type of content, to ensure you are all aligned on tone, topics and what you want to achieve. Building up interest and listenership takes time and effort, so having people pulling in the same direction is key to getting across the finish line. Consistency is key, so commit to a schedule you are comfortable with and stick with it. To set expectations, make sure you post regularly and at the same time.

Take risks

Don’t be afraid to amp up your promotions or content to be heard. Recycling is great, but reciting the same old repertoire will only get you so far. However, this doesn’t mean that to succeed you have to reinvent the wheel and come up with something completely original. Instead of thinking brand-new, think better-new. Take something that’s already out there and put a spin on it; introduce a new way of looking at the topic, mix it up with a twist on delivery, or delight with a surprising host choice.

For example, the gaming podcast What’s Good Games is hosted by four female experts who are taking the predominantly male gaming industry by storm. The topic itself isn’t new, but the hosts’ unique insights put them at the top of the leaderboard.

Reward your listeners

Making your audience feel special and appreciated is a great way to encourage your audience to tune in to your podcast. So, reach out to them on social channels, and reward your audience with exclusive offers, content, and freebies. Shout outs and special mentions at the end of each podcast can also be an excellent incentive for people to stay tuned in until the very end, and it can help you connect with your listeners on a more personal level. Let them know that you’re listening to their thoughts, suggestions, and concerns. Not listening to feedback is the fastest way to isolate your fans, so ensure you’re actively taking notes on what’s working and what could be amplified.

Podcast promotion

Be loud and proud about your podcast! Learn where your audience lives online, to know how to reach them on channels they love to follow. Make sure to create a call to action - what is it that you would like them to do after listening to your podcast? Visit your website, download your e-book, subscribe to your newsletter? Choose one and sprinkle it at the end of each episode.

Create a buzz around upcoming episodes by teasing your audience with a short audio snippet, sharing fun facts about the topic, and what they will learn if they tune in. Harness the power and reach of social media to let your followers know about your podcast. Don’t be shy about sharing your podcast more than once - the social media landscaping is overflowing with content, and it is more than likely that some people will have missed your original post.

A podcast should be like a pop song - it should be playing on all stations. Make your podcast available on all main directories, such as Google Podcasts, iTunes, Spotify, SoundCloud, Stitcher, TuneIn, Podbay and Podtail to make it easier to find for your audience.

Repurpose podcast content

It takes a lot of time and effort to record a podcast. Repurpose that content and publish it across different mediums to make the most of it. Write a blog post using key takeaways from your podcast, publish videos of your podcast recordings on YouTube, create social media posts using quotes from guests, and promote your other content within your podcast. Look at your content marketing engine as an ecosystem that supports each other and helps it thrive.

Starting a podcast is no easy feat - it requires a lot of time to build up your audience, and hard work to cut through the white noise. But with a little planning, commitment, and research, this could become your path to a regular, highly engaged audience. Check out the video below for more tips on podcasts that resonate, and the Google Partners Podcast for more marketing insights and these tips in action.

*Source: Podcast Insights

Source: AdWords Agency Blog

Open sourcing Resonance Audio

Spatial audio adds to your sense of presence when you’re in VR or AR, making it feel and sound, like you’re surrounded by a virtual or augmented world. And regardless of the display hardware you’re using, spatial audio makes it possible to hear sounds coming from all around you.

Resonance Audio, our spatial audio SDK launched last year, enables developers to create more realistic VR and AR experiences on mobile and desktop. We’ve seen a number of exciting experiences emerge across a variety of platforms using our SDK. Recent examples include apps like Pixar’s Coco VR for Gear VR, Disney’s Star Wars™: Jedi Challenges AR app for Android and iOS, and Runaway’s Flutter VR for Daydream, which all used Resonance Audio technology.

To accelerate adoption of immersive audio technology and strengthen the developer community around it, we’re opening Resonance Audio to a community-driven development model. By creating an open source spatial audio project optimized for mobile and desktop computing, any platform or software development tool provider can easily integrate with Resonance Audio. More cross-platform and tooling support means more distribution opportunities for content creators, without the worry of investing in costly porting projects.

What’s Included in the Open Source Project

As part of our open source project, we’re providing a reference implementation of YouTube’s Ambisonic-based spatial audio decoder, compatible with the same Ambisonics format (Ambix ACN/SN3D) used by others in the industry. Using our reference implementation, developers can easily render Ambisonic content in their VR media and other applications, while benefiting from Ambisonics open source, royalty-free model. The project also includes encoding, sound field manipulation and decoding techniques, as well as head related transfer functions (HRTFs) that we’ve used to achieve rich spatial audio that scales across a wide spectrum of device types and platforms. Lastly, we’re making our entire library of highly optimized DSP classes and functions, open to all. This includes resamplers, convolvers, filters, delay lines and other DSP capabilities. Additionally, developers can now use Resonance Audio’s brand new Spectral Reverb, an efficient, high quality, constant complexity reverb effect, in their own projects.

We’ve open sourced Resonance Audio as a standalone library and associated engine plugins, VST plugin, tutorials, and examples with the Apache 2.0 license. This means Resonance Audio is yours, so you’re free to use Resonance Audio in your projects, no matter where you work. And if you see something you’d like to improve, submit a GitHub pull request to be reviewed by the Resonance Audio project committers. While the engine plugins for Unity, Unreal, FMOD, and Wwise will remain open source, going forward they will be maintained by project committers from our partners, Unity, Epic, Firelight Technologies, and Audiokinetic, respectively.

If you’re interested in learning more about Resonance Audio, check out the documentation on our developer site. If you want to get more involved, visit our GitHub to access the source code, build the project, download the latest release, or even start contributing. We’re looking forward to building the future of immersive audio with all of you.

By Eric Mauskopf, Google AR/VR Team

Source: Google Open Source Blog

Tacotron 2: Generating Human-like Speech from Text

Posted by Jonathan Shen and Ruoming Pang, Software Engineers, on behalf of the Google Brain and Machine Perception Teams

Generating very natural sounding speech from text (text-to-speech, TTS) has been a research goal for decades. There has been great progress in TTS research over the last few years and many individual pieces of a complete TTS system have greatly improved. Incorporating ideas from past work such as Tacotron and WaveNet, we added more improvements to end up with our new system, Tacotron 2. Our approach does not use complex linguistic and acoustic features as input. Instead, we generate human-like speech from text using neural networks trained using only speech examples and corresponding text transcripts.

A full description of our new system can be found in our paper “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.” In a nutshell it works like this: We use a sequence-to-sequence model optimized for TTS to map a sequence of letters to a sequence of features that encode the audio. These features, an 80-dimensional audio spectrogram with frames computed every 12.5 milliseconds, capture not only pronunciation of words, but also various subtleties of human speech, including volume, speed and intonation. Finally these features are converted to a 24 kHz waveform using a WaveNet-like architecture.

A detailed look at Tacotron 2's model architecture. The lower half of the image describes the sequence-to-sequence model that maps a sequence of letters to a spectrogram. For technical details, please refer to the paper.

You can listen to some of the Tacotron 2 audio samples that demonstrate the results of our state-of-the-art TTS system. In an evaluation where we asked human listeners to rate the naturalness of the generated speech, we obtained a score that was comparable to that of professional recordings.

While our samples sound great, there are still some difficult problems to be tackled. For example, our system has difficulties pronouncing complex words (such as “decorum” and “merlot”), and in extreme cases it can even randomly generate strange noises. Also, our system cannot yet generate audio in realtime. Furthermore, we cannot yet control the generated speech, such as directing it to sound happy or sad. Each of these is an interesting research problem on its own.

Acknowledgements
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu, Sound Understanding team, TTS Research team, and TensorFlow team.

Source: Google Research Blog

Resonance Audio: Multi-platform spatial audio at scale

Posted by Eric Mauskopf, Product Manager

As humans, we rely on sound to guide us through our environment, help us communicate with others and connect us with what's happening around us. Whether walking along a busy city street or attending a packed music concert, we're able to hear hundreds of sounds coming from different directions. So when it comes to AR, VR, games and even 360 video, you need rich sound to create an engaging immersive experience that makes you feel like you're really there. Today, we're releasing a new spatial audio software development kit (SDK) called Resonance Audio. It's based on technology from Google's VR Audio SDK, and it works at scale across mobile and desktop platforms.

Experience spatial audio in our Audio Factory VR app for Daydreamand SteamVR

Performance that scales on mobile and desktop

Bringing rich, dynamic audio environments into your VR, AR, gaming, or video experiences without affecting performance can be challenging. There are often few CPU resources allocated for audio, especially on mobile, which can limit the number of simultaneous high-fidelity 3D sound sources for complex environments. The SDK uses highly optimized digital signal processing algorithms based on higher order Ambisonics to spatialize hundreds of simultaneous 3D sound sources, without compromising audio quality, even on mobile. We're also introducing a new feature in Unity for precomputing highly realistic reverb effects that accurately match the acoustic properties of the environment, reducing CPU usage significantly during playback.

Using geometry-based reverb by assigning acoustic materials to a cathedral in Unity

Multi-platform support for developers and sound designers

We know how important it is that audio solutions integrate seamlessly with your preferred audio middleware and sound design tools. With Resonance Audio, we've released cross-platform SDKs for the most popular game engines, audio engines, and digital audio workstations (DAW) to streamline workflows, so you can focus on creating more immersive audio. The SDKs run on Android, iOS, Windows, MacOS and Linux platforms and provide integrations for Unity, Unreal Engine, FMOD, Wwise and DAWs. We also provide native APIs for C/C++, Java, Objective-C and the web. This multi-platform support enables developers to implement sound designs once, and easily deploy their project with consistent sounding results across the top mobile and desktop platforms. Sound designers can save time by using our new DAW plugin for accurately monitoring spatial audio that's destined for YouTube videos or apps developed with Resonance Audio SDKs. Web developers get the open source Resonance Audio Web SDK that works in the top web browsers by using the Web Audio API.

DAW plugin for sound designers to monitor audio destined for YouTube 360 videos or apps developed with the SDK

Model complex Sound Environments Cutting edge features

By providing powerful tools for accurately modeling complex sound environments, Resonance Audio goes beyond basic 3D spatialization. The SDK enables developers to control the direction acoustic waves propagate from sound sources. For example, when standing behind a guitar player, it can sound quieter than when standing in front. And when facing the direction of the guitar, it can sound louder than when your back is turned.

Controlling sound wave directivity for an acoustic guitar using the SDK

Another SDK feature is automatically rendering near-field effects when sound sources get close to a listener's head, providing an accurate perception of distance, even when sources are close to the ear. The SDK also enables sound source spread, by specifying the width of the source, allowing sound to be simulated from a tiny point in space up to a wall of sound. We've also released an Ambisonic recording tool to spatially capture your sound design directly within Unity, save it to a file, and use it anywhere Ambisonic soundfield playback is supported, from game engines to YouTube videos.

If you're interested in creating rich, immersive soundscapes using cutting-edge spatial audio technology, check out the Resonance Audio documentation on our developer site, let us know what you think through GitHub, and show us what you build with #ResonanceAudio on social media; we'll be resharing our favorites.

Source: Google Developers Blog

Bringing Real-time Spatial Audio to the Web with Songbird

Posted by Jamieson Brettle and Drew Allen, Chrome Media Team

Spatial audio helps draw the user into a scene and creates the illusion of entering an entirely new world. To make this possible, the Chrome Media team has created Songbird, an open source, spatial audio encoding engine that works in any web browser by using the Web Audio API.

The Songbird library takes in any number of mono audio streams and allows developers to programmatically place them in 3D space around the user. Songbird allows you to create immersive soundscapes, realistically reproducing reflection and reverb for the space you describe. Sounds bounce off walls and reflect off materials just as they would in real-life, capturing truly 360-degree sound. Songbird creates an ambisonic soundfield that can then be rendered in real-time for use in your application. We've partnered with the Omnitoneproject, which we blogged about last year, to add higher-order ambisonic support to Omnitone's binaural rendererto produce far more accurate sounding audio than ever before.

Songbird encapsulates Omnitone and with it, developers can now add interactive, full-sphere audio to any web based application. Songbird can scale to any order ambisonics, thereby creating a more realistic sound and higher performance than what is achievable through standard Web Audio API.

Songbird Audio Processing Diagram

As the web emerges as an important VR platformfor delivering content, spatial audio will play a vital role in users' embrace of this new medium. Songbird and Omnitone are key tools in enabling spatial audio on the web platform and establishing it as a preeminent platform for compelling VR experiences. Combining these audio experiences with 3D JavaScript libraries like three.js gives a glimpseinto the future on the web.

Demo combining spatial sound in 3D environment

This project was made possible through close collaboration with Google's Daydream and Web Audio teams. This collaboration allowed us to deliver similar audio capabilities to the web as are available to developers creating Daydream applications.

We look forward to seeing what people do with Songbird now that it's open source. Check out the code on GitHub and let us know what you think. Also available are a number of demoson creating full spherical audio with Songbird.

Source: Google Developers Blog

Bringing Real-time Spatial Audio to the Web with Songbird

For a virtual scene to be truly immersive, stunning visuals need to be accompanied by true spatial audio to create a realistic and believable experience. Spatial audio tools allow developers to include sounds that can come from any direction, and that are associated in 3D space with audio sources, thus completely enveloping the user in 360-degree sound.

Spatial audio helps draw the user into a scene and creates the illusion of entering an entirely new world. To make this possible, the Chrome Media team has created Songbird, an open source, spatial audio encoding engine that works in any web browser by using the Web Audio API.

The Songbird library takes in any number of mono audio streams and allows developers to programmatically place them in 3D space around the user. Songbird allows you to create immersive soundscapes, realistically reproducing reflection and reverb for the space you describe. Sounds bounce off walls and reflect off materials just as they would in real-life, capturing truly 360-degree sound. Songbird creates an ambisonic soundfield that can then be rendered in real-time for use in your application. We’ve partnered with the Omnitone project, which we blogged about last year, to add higher-order ambisonic support to Omnitone’s binaural renderer to produce far more accurate sounding audio than ever before.

Songbird encapsulates Omnitone and with it, developers can now add interactive, full-sphere audio to any web based application. Songbird can scale to any order ambisonics, thereby creating a more realistic sound and higher performance than what is achievable through standard Web Audio API.

Songbird Audio Processing Diagram

The implementation of Songbird is based on the Google spatial media specification. It expects mono input and outputs ambisonic (multichannel) ACN channel layout with SN3D normalization. Detailed documentation may be found here.

As the web emerges as an important VR platform for delivering content, spatial audio will play a vital role in users’ embrace of this new medium. Songbird and Omnitone are key tools in enabling spatial audio on the web platform and establishing it as a preeminent platform for compelling VR experiences. Combining these audio experiences with 3D JavaScript libraries like three.js gives a glimpse into the future on the web.

Demo combining spatial sound in 3D environment

This project was made possible through close collaboration with Google’s Daydream and Web Audio teams. This collaboration allowed us to deliver similar audio capabilities to the web as are available to developers creating Daydream applications.

We look forward to seeing what people do with Songbird now that it's open source. Check out the code on GitHub and let us know what you think. Also available are a number of demos on creating full spherical audio with Songbird.

By Jamieson Brettle and Drew Allen, Chrome Media Team

googblogs.com

All Google blogs and Press in one site

Tag Archives: Audio

Audio and Visual Quality Measurement using Fréchet Distance

Source: Google AI Blog

What’s new with Fast Pair

Upcoming experiences

Source: Android Developers Blog

Fast Pair Update

Source: Android Developers Blog

Introducing Oboe: A C++ library for low latency audio

Single API

Less code to write and maintain

Other benefits

Getting started

Source: Android Developers Blog

Marketing with podcasts

Reading time: 5 minutes

Be remarkable

Be committed and consistent

Take risks

Reward your listeners

Podcast promotion

Repurpose podcast content

Source: AdWords Agency Blog

Open sourcing Resonance Audio

What’s Included in the Open Source Project

Source: Google Open Source Blog

Tacotron 2: Generating Human-like Speech from Text

Source: Google Research Blog

Resonance Audio: Multi-platform spatial audio at scale

Performance that scales on mobile and desktop

Multi-platform support for developers and sound designers

Model complex Sound Environments Cutting edge features

Source: Google Developers Blog

Bringing Real-time Spatial Audio to the Web with Songbird

Source: Google Developers Blog

Bringing Real-time Spatial Audio to the Web with Songbird

Source: Google Open Source Blog