Tag Archives: Virtual Reality

Google Cardboard XR Plugin for Unity

Late in 2019, we decided to open source Google Cardboard. Since then, our developer community has had access to create a plethora of experiences on both iOS and Android platforms, while reaching millions of users around the world. While this release has been considered a success by our developer community, we also promised that we would release a plugin for Unity. Our users have long preferred developing Cardboard experiences in Unity, so we made it a priority to develop a Unity SDK. Today, we have fulfilled that promise, and the Google Cardboard open source plugin for Unity is now available via the Unity Asset Store

What's Included in the Cardboard Unity SDK

Today, we’re releasing the Cardboard Unity SDK to our users so that they can continue creating smartphone XR experiences using Unity. Unity is one of the most popular 3D and XR development platforms in the world, and our release of this SDK will give our content creators a smoother workflow with Unity when developing for Cardboard.

In addition to the Unity SDK, we are also providing a sample application for iOS/Android, which will be a great aid for developers trying to debug their own creations. This release not only fulfills a promise we made to our Cardboard community, but also shows our support, as we move away from smartphone VR and leave it in the more-than-capable hands of our development community.



If you’re interested in learning how to develop with the Cardboard open source project, please see our developer documentation or visit the Google VR GitHub repo to access source code, build the project, and download the latest release.

By Jonathan Goodlow, Product Manager, AR & VR

Open sourcing Google Cardboard

Posted by Jeffrey Chen, Product Manager, AR & VRGoogle CardboardFive years ago, we launched Google Cardboard—a simple cardboard viewer that anyone can use to experience virtual reality (VR). From a giveaway at Google I/O to more than 15 million units worldwide, Cardboard has played an important role in introducing people to VR through experiences like YouTube and Expeditions. In many cases, it provided access to VR to people who otherwise couldn’t have afforded it.

With Cardboard and the Google VR software development kit (SDK), developers have created and distributed VR experiences across both Android and iOS devices, giving them the ability to reach millions of users. While we’ve seen overall usage of Cardboard decline over time and we’re no longer actively developing the Google VR SDK, we still see consistent usage around entertainment and education experiences, like YouTube and Expeditions, and want to ensure that Cardboard’s no-frills, accessible-to-everyone approach to VR remains available.

Today, we’re releasing the Cardboard open source project to let the developer community continue to build Cardboard experiences and add support to their apps for an ever increasing diversity of smartphone screen resolutions and configurations. We think that an open source model—with additional contributions from us—is the best way for developers to continue to build experiences for Cardboard. We’ve already seen success with this approach with our Cardboard Manufacturer Kit—an open source project to enable third-party manufacturers to design and build their own unique compatible VR viewers—and we’re excited to see where the developer community takes Cardboard in the future.

What's Included in the open source project

We're releasing libraries for developers to build their Cardboard apps for iOS and Android and render VR experiences on Cardboard viewers. The open source project provides APIs for head tracking, lens distortion rendering and input handling. We’ve also included an Android QR code library, so that apps can pair any Cardboard viewer without depending on the Cardboard app.

An open source model will enable the community to continue to improve Cardboard support and expand its capabilities, for example adding support for new smartphone display configurations and Cardboard viewers as they become available. We’ll continue to contribute to the Cardboard open source project by releasing new features, including an SDK package for Unity.

If you’re interested in learning how to develop with the Cardboard open source project, please see our developer documentation, or visit the Cardboard GitHub repo to access source code, build the project and download the latest release.

Diagnose and understand your app’s GPU behavior with GAPID

Posted by Andrew Woloszyn, Software Engineer

Developing for 3D is complicated. Whether you're using a native graphics API or enlisting the help of your favorite game engine, there are thousands of graphics commands that have to come together perfectly to produce beautiful 3D images on your phone, desktop or VR headsets.

GAPID (Graphics API Debugger) is a new tool that helps developers diagnose rendering and performance issues with their applications. With GAPID, you can capture a trace of your application and step through each graphics command one-by-one. This lets you visualize how your final image is built and isolate problematic calls, so you spend less time debugging through trial-and-error.

GAPID supports OpenGL ES on Android, and Vulkan on Android, Windows and Linux.

Debugging in action, one draw call at a time

GAPID not only enables you to diagnose issues with your rendering commands, but also acts as a tool to run quick experiments and see immediately how these changes would affect the presented frame.

Here are a few examples where GAPID can help you isolate and fix issues with your application:

What's the GPU doing?

Why isn't my text appearing?!

Working with a graphics API can be frustrating when you get an unexpected result, whether it's a blank screen, an upside-down triangle, or a missing mesh. As an offline debugger, GAPID lets you take a trace of these applications, and then inspect the calls afterwards. You can track down exactly which command produced the incorrect result by looking at the framebuffer, and inspect the state at that point to help you diagnose the issue.

What happens if I do X?

Using GAPID to edit shader code

Even when a program is working as expected, sometimes you want to experiment. GAPID allows you to modify API calls and shaders at will, so you can test things like:

  • What if I used a different texture on this object?
  • What if I changed the calculation of bloom in this shader?

With GAPID, you can now iterate on the look and feel of your app without having to recompile your application or rebuild your assets.

Whether you're building a stunning new desktop game with Vulkan or a beautifully immersive VR experience on Android, we hope that GAPID will save you both time and frustration and help you get the most out of your GPU. To get started with GAPID and see just how powerful it is, download it, take your favorite application, and capture a trace!

Introducing a New Foveation Pipeline for Virtual/Mixed Reality



Virtual Reality (VR) and Mixed Reality (MR) offer a novel way to immerse people into new and compelling experiences, from gaming to professional training. However, current VR/MR technologies present a fundamental challenge: to present images at the extremely high resolution required for immersion places enormous demands on the rendering engine and transmission process. Headsets often have insufficient display resolution, which can limit the field of view, worsening the experience. But, to drive a higher resolution headset, the traditional rendering pipeline requires significant processing power that even high-end mobile processors cannot achieve. As research continues to deliver promising new techniques to increase display resolution, the challenges of driving those displays will continue to grow.

In order to further improve the visual experience in VR and MR, we introduce a pipeline that takes advantage of the characteristics of human visual perception to enable a amazing visual experience at low compute and power cost. The pipeline proposed in this article considers the full system dependency including the rendering engine, memory bandwidth and capability of display module itself. We determined that the current limitation is not just in the content creation, but it also may be in transmitting data, handling latency and enabling interaction with real objects (mixed reality applications). The pipeline consists of 1. Foveated Rendering with a focus on reducing of compute per pixel. 2. Foveated Image Processing with a focus on the reduction of visual artifacts and 3. Foveated Transmission with a focus on bits per pixel transmitted.

Foveated Rendering
In the human visual system, the fovea centralis allows us to see at high-fidelity in the center of our vision, allowing our brain to pay less attention to things in our peripheral vision. Foveated rendering takes advantage of this characteristic to improve the performance of the rendering engine by reducing the spatial or bit-depth resolution of objects in our peripheral vision. To make this work, the location of the High Acuity (HA) region needs to be updated with eye-tracking to align with eye saccades, which preserves the perception of a constant high-resolution across the field of view. In contrast, systems with no eye-tracking may need to render a much larger HA region.
The left image is rendered at full resolution. The right image uses two layers of foveation — one rendered at high resolution (inside the yellow region) and one at lower resolution (outside).
A traditional foveation technique may divide a frame buffer into multiple spatial resolution regions. Aliasing introduced by rendering to lower spatial resolution may cause perceptible temporal artifacts when there is motion in the content due to head motion or animation. Below we show an example of temporal artifacts introduced by head rotation.
A smooth full rendering (image on the left). The image on the right shows temporal artifacts introduced by motion in foveated region.
In the following sections, we present two different methods we use aimed at reducing these artifacts: Phase-Aligned Foveated Rendering and Conformal Foveated Rendering. Each of these methods provide different benefits for visual quality during rendering and are useful under different conditions.

Phase-Aligned Rendering
Aliasing occurs in the Low-Acuity (LA) region during foveated rendering due to the subsampling of rendered content. In traditional foveated rendering discussed above, these aliasing artifacts flicker from frame to frame, since the display pixel grid moves across the virtual scene as the user moves their head. The motion of these pixels relative to the scene cause any existing aliasing artifacts to flicker, which is highly perceptible to the user, even in the periphery.

In Phase-Aligned rendering, we force the LA region frustums to be aligned rotationally to the world (e.g. always facing north, east, south, etc.), not the current frame's head-rotation. The aliasing artifacts are mostly invariant to head pose and therefore much less detectable. After upsampling, these regions are then reprojected onto the final display screen to compensate for the user's head rotation, which reduces temporal flicker. As with traditional foveation, we render the high-acuity region in a separate pass, and overlay it onto the merged image at the location of the fovea. The figure below compares traditional foveated rendering with phase-aligned rendering, both at the same level of foveation.
Temporal artifacts in non-world aligned foveated rendered content (left) and the phase-aligned method (right).
This method gives a major benefit to reducing the severity of visual artifacts during foveated rendering. Although phase-aligned rendering is more expensive to compute than traditional foveation under the same level of acuity reduction, we can still yield a net savings by pushing foveation to more aggressive levels that would otherwise have yielded too many artifacts.

Conformal Rendering
Another approach for foveated rendering is to render content in a space that matches the smoothly varying reduction in resolution of our visual acuity, based on a nonlinear mapping of the screen distance from the visual fixation point.

This method gives two main benefits. First, by more closely matching the visual fidelity fall-off of the human eye, we can reduce the total number of pixels computed compared to other foveation techniques. Second, by using a smooth fall-off in fidelity, we prevent the user from seeing a clear dividing line between High-Acuity and Low-Acuity, which is often one of the first artifacts that is noticed. These benefits allow for aggressive foveation to be used while preserving the same quality levels, yielding more savings.

We perform this method by warping the vertices of the virtual scene into non-linear space. This scene is then rasterized at a reduced resolution, then unwarped into linear space as a post-processing effect combined with lens distortion correction.
Comparison of traditional foveation (left) to conformal rendering (right), where content is rendered to a space matched to visual perception acuity and HMD lens characteristics. Both methods use the same number of total pixels.
A major benefit of this method over the phase-aligned method above is that conformal rendering only requires a single pass of rasterization. For scenes with lots of vertices, this difference can provide major savings. Additionally, although phase-aligned rendering reduces flicker, it still produces a distinct boundary between the high- and low-acuity regions, whereas conformal rendering does not show this artifact. However, a downside of conformal rendering compared to phase-alignment is that aliasing artifacts still flicker in the periphery, which may be less desirable for applications that require high visual fidelity.

Foveated Image Processing
HMDs often require image processing steps to be performed after rendering, such as local tone mapping, lens distortion correction, or lighting blending. With foveated image-processing, different operations are applied for different foveation regions. As an example, lens distortion correction, including chromatic aberration correction, may not require the same spatial accuracy for each part of the display. By running lens distortion correction on foveated content before upscaling, significant savings are gained in computation. This technique does not introduce perceptible artifacts.
Correction for head-mounted-display lens chromatic aberration in foveated space. Top image shows the conventional pipeline. The bottom image (in Green) shows the operation in the foveated space.
The left image shows reconstructed foveated content after lens distortion. The right image shows image difference when lens distortion correction is performed in a foveated manner. The right image shows that minimal error is introduced close to edges of frame buffer. These errors are imperceptible in an HMD.

Foveated Transmission
A non-trivial source of power consumption for standalone HMDs is data transmission from the system-on-a-chip (SoC) to the display module. Foveated transmission aims to save power and bandwidth by transmitting the minimum amount of data necessary to the display as shown in figure below.
Rather than streaming upscaled foveated content (left image), foveated transmission enables streaming content pre-reconstruction (right image) and reducing the number of bits transmitted.
This change requires moving the simple upscaling and blending operations to the display side and transmitting only the foveated rendered content. Complexity arises if the foveal region, the red box in above figure, moves with eyetracking. Such motion may cause temporal artifacts (figure below) since Display Stream Compression (DSC) used between SoC and the display is not designed for foveated content.
Comparison of full integration of foveation and compression techniques (left) versus typical flickering artifacts that may be introduced by applying DSC to foveated content (right).
Toward a New Pipeline
We have focused on a few components of a “foveation pipeline” for MR and VR applications. By considering the impact of foveation in every part of a display system — rendering, processing and transmission — we can enable the next generation of lightweight, low-power, and high resolution MR/VR HMDs. This topic has been an active area of research for many years and it seems reasonable to expect the appearance of VR and MR headsets with foveated pipelines in the coming years.

Acknowledgements
We would like to recognize the work done by the following collaborators:
  • Haomiao Jiang and Carlin Vieri on display compression and foveated transmission
  • Brian Funt and Sylvain Vignaud on the development of new foveated rendering algorithms

ARCore: Augmented reality at Android scale

Posted by Dave Burke, VP, Android Engineering

With more than two billion active devices, Android is the largest mobile platform in the world. And for the past nine years, we've worked to create a rich set of tools, frameworks and APIs that deliver developers' creations to people everywhere. Today, we're releasing a preview of a new software development kit (SDK) called ARCore. It brings augmented reality capabilities to existing and future Android phones. Developers can start experimenting with it right now.

We've been developing the fundamental technologies that power mobile AR over the last three years with Tango, and ARCore is built on that work. But, it works without any additional hardware, which means it can scale across the Android ecosystem. ARCore will run on millions of devices, starting today with the Pixel and Samsung's S8, running 7.0 Nougat and above. We're targeting 100 million devices at the end of the preview. We're working with manufacturers like Samsung, Huawei, LG, ASUS and others to make this possible with a consistent bar for quality and high performance.

ARCore works with Java/OpenGL, Unity and Unreal and focuses on three things:

  • Motion tracking: Using the phone's camera to observe feature points in the room and IMU sensor data, ARCore determines both the position and orientation (pose) of the phone as it moves. Virtual objects remain accurately placed.
  • Environmental understanding: It's common for AR objects to be placed on a floor or a table. ARCore can detect horizontal surfaces using the same feature points it uses for motion tracking.
  • Light estimation: ARCore observes the ambient light in the environment and makes it possible for developers to light virtual objects in ways that match their surroundings, making their appearance even more realistic.

Alongside ARCore, we've been investing in apps and services which will further support developers in creating great AR experiences. We built Blocks and Tilt Brush to make it easy for anyone to quickly create great 3D content for use in AR apps. As we mentioned at I/O, we're also working on Visual Positioning Service (VPS), a service which will enable world scale AR experiences well beyond a tabletop. And we think the Web will be a critical component of the future of AR, so we're also releasing prototype browsers for web developers so they can start experimenting with AR, too. These custom browsers allow developers to create AR-enhanced websites and run them on both Android/ARCore and iOS/ARKit.

ARCore is our next step in bringing AR to everyone, and we'll have more to share later this year. Let us know what you think through GitHub, and check out our new AR Experiments showcase where you can find some fun examples of what's possible. Show us what you build on social media with #ARCore; we'll be resharing some of our favorites.

Expressions in Virtual Reality



Recently Google Machine Perception researchers, in collaboration with Daydream Labs and YouTube Spaces, presented a solution for virtual headset ‘removal’ for mixed reality in order to create a more rich and engaging VR experience. While that work could infer eye-gaze directions and blinks, enabled by a headset modified with eye-tracking technology, a richer set of facial expressions — which are key to understanding a person's experience in VR, as well as conveying important social engagement cues — were missing.

Today we present an approach to infer select facial action units and expressions entirely by analyzing a small part of the face while the user is engaged in a virtual reality experience. Specifically, we show that images of the user’s eyes captured from an infrared (IR) gaze-tracking camera within a VR headset are sufficient to infer at least a subset of facial expressions without the use of any external cameras or additional sensors.
Left: A user wearing a VR HMD modified with eye-tracking used for expression classification (Note that no external camera is used in our method; this is just for visualization). Right: inferred expression from eye images using our model. A video demonstrating the work can be seen here.
We use supervised deep learning to classify facial expressions from images of the eyes and surrounding areas, which typically contain the iris, sclera, eyelids and may include parts of the eyebrows and top of cheeks. Obtaining large scale annotated data from such novel sensors is a challenging task, hence we collected training data by capturing 46 subjects while performing a set of facial expressions.

To perform expression classification, we fine-tuned a variant of the widespread Inception architecture with TensorFlow using weights from a model trained to convergence on Imagenet. We attempted to partially remove variance due to differences in participant appearance (i.e., individual differences that do not depend on expression), inspired by the standard practice of mean image subtraction. Since this variance removal occurs within-subject, it is effectively personalization. Further details, along with examples of eye-images, and results are presented in our accompanying paper.

Results and Extensions
We demonstrate that the information required to classify a variety of facial expressions is reliably present in IR eye images captured by a commercial HMD sensor, and that this information can be decoded using a CNN-based method, even though classifying facial expressions from eye-images alone is non-trivial even for humans. Our model inference can be performed in real-time, and we show this can be used to generate expressive avatars in real-time, which can function as an expressive surrogate for users engaged in VR. This interaction mechanism also yields a more intuitive interface for sharing expression in VR as opposed to gestures or keyboard inputs.

The ability to capture a user’s facial expressions using existing eye-tracking cameras enables a fully mobile solution to facial performance capture in VR, without additional external cameras. This technology extends beyond animating cartoon avatars; it could be used to provide a richer headset removal experience, enhancing communication and social interaction in VR by transmitting far more authentic and emotionally textured information.

Acknowledgements
The research described in this post was performed by Steven Hickson (as an intern), Nick Dufour, Avneesh Sud, Vivek Kwatra and Irfan Essa. We also thank Hayes Raffle and Alex Wong from Daydream, and Chris Bregler, Sergey Ioffe and authors of TF-Slim from Google Research for their guidance and suggestions.

This technology, along with headset removal, will be demonstrated at Siggraph 2017 Emerging Technologies.

Experimenting with VR Ad formats at Area 120

Posted by Aayush Upadhyay and Neel Rao, Area 120

At Area 120, Google's internal workshop for experimental ideas, we're working on early-stage projects and quickly iterate to test concepts. We heard from developers that they're looking at how to make money to fund their VR applications, so we started experimenting with what a native, mobile VR ad format might look like.

Developers and users have told us they want to avoid disruptive, hard-to-implement ad experiences in VR. So our first idea for a potential format presents a cube to users, with the option to engage with it and then see a video ad. By tapping on the cube or gazing at it for a few seconds, the cube opens a video player where the user can watch, and then easily close, the video. Here's how it works:

Our work focuses on a few key principles - VR ad formats should be easy for developers to implement, native to VR, flexible enough to customize, and useful and non-intrusive for users. Our Area 120 team has seen some encouraging results with a few test partners, and would love to work with the developer community as this work evolves - across Cardboard (on Android and iOS), Daydream and Samsung Gear VR.

If you're a VR developer (or want to be one) and are interested in testing this format with us, please fill out this form to apply for our early access program. We have an early-stage SDK available and you can get up and running easily. We're excited to continue experimenting with this format and hope you'll join us for the ride!

Headset “Removal” for Virtual and Mixed Reality



Virtual Reality (VR) enables remarkably immersive experiences, offering new ways to view the world and the ability to explore novel environments, both real and imaginary. However, compared to physical reality, sharing these experiences with others can be difficult, as VR headsets make it challenging to create a complete picture of the people participating in the experience.

Some of this disconnect is alleviated by Mixed Reality (MR), a related medium that shares the virtual context of a VR user in a two dimensional video format allowing other viewers to get a feel for the user’s virtual experience. Even though MR facilitates sharing, the headset continues to block facial expressions and eye gaze, presenting a significant hurdle to a fully engaging experience and complete view of the person in VR.

Google Machine Perception researchers, in collaboration with Daydream Labs and YouTube Spaces, have been working on solutions to address this problem wherein we reveal the user’s face by virtually “removing” the headset and create a realistic see-through effect.
VR user captured in front of a green-screen is blended with the virtual environment to generate the MR output: Traditional MR output has the user face occluded, while our result reveals the face. Note how the headset is modified with a marker to aid tracking.
Our approach uses a combination of 3D vision, machine learning and graphics techniques, and is best explained in the context of enhancing Mixed Reality video (also discussed in the Google-VR blog). It consists of three main components:

Dynamic face model capture
The core idea behind our technique is to use a 3D model of the user’s face as a proxy for the hidden face. This proxy is used to synthesize the face in the MR video, thereby creating an impression of the headset being removed. First, we capture a personalized 3D face model for the user with what we call gaze-dependent dynamic appearance. This initial calibration step requires the user to sit in front of a color+depth camera and a monitor, and then track a marker on the monitor with their eyes. We use this one-time calibration procedure — which typically takes less than a minute — to acquire a 3D face model of the user, and learn a database that maps appearance images (or textures) to different eye-gaze directions and blinks. This gaze database (i.e. the face model with textures indexed by eye-gaze) allows us to dynamically change the appearance of the face during synthesis and generate any desired eye-gaze, thus making the synthesized face look natural and alive
On the left, the user’s face is captured by a camera as she tracks a marker on the monitor with her eyes. On the right, we show the dynamic nature of reconstructed 3D face model: by moving or clicking on the mouse, we are able to simulate both apparent eye gaze and blinking.
Calibration and Alignment
Creating a Mixed Reality video requires a specialized setup consisting of an external camera, calibrated and time-synced with the headset. The camera captures a video stream of the VR user in front of a green screen and then composites a cutout of the user with the virtual world to create the final MR video. An important step here is to accurately estimate the calibration (the fixed 3D transformation) between the camera and headset coordinate systems. These calibration techniques typically involve significant manual intervention and are done in multiple steps. We simplify the process by adding a physical marker to the front of the headset and tracking it visually in 3D, which allows us to optimize for the calibration parameters automatically from the VR session.

For headset “removal”, we need to align the 3D face model with the visible portion of the face in the camera stream, so that they would blend seamlessly with each other. A reasonable proxy to this alignment is to place the face model just behind the headset. The calibration described above, coupled with VR headset tracking, provides sufficient information to determine this placement, allowing us to modify the camera stream by rendering the virtual face into it.

Compositing and Rendering
Having tackled the alignment, the last step involves producing a suitable rendering of the 3D face model, consistent with the content in the camera stream. We are able to reproduce the true eye-gaze of the user by combining our dynamic gaze database with an HTC Vive headset that has been modified by SMI to incorporate eye-tracking technology. Images from these eye trackers lack sufficient detail to directly reproduce the occluded face region, but are well suited to provide fine-grained gaze information. Using the live gaze data from the tracker, we synthesize a face proxy that accurately represents the user’s attention and blinks. At run-time, the gaze database, captured in the preprocessing step, is searched for the most appropriate face image corresponding to the query gaze state, while also respecting aesthetic considerations such as temporal smoothness. Additionally, to account for lighting changes between gaze database acquisition and run-time, we apply color correction and feathering, such that the synthesized face region matches with the rest of the face.

Humans are highly sensitive to artifacts on faces, and even small imperfections in synthesis of the occluded face can feel unnatural and distracting, a phenomenon known as the “uncanny valley.” To mitigate this problem, we do not remove the headset completely, instead we have chosen a user experience that conveys a ‘scuba mask effect’ by compositing the color corrected face proxy with a translucent headset. Reminding the viewer of the presence of the headset helps us avoid the uncanny valley, and also makes our algorithm robust to small errors in alignment and color correction.

This modified camera stream, displaying a see-through headset, with the user’s face revealed and their true eye-gaze recreated, is subsequently merged with the virtual environment to create the final MR video.

Results and Extensions
We have used our headset removal technology to enhance Mixed Reality, allowing the medium to not only convey a VR user’s interaction with the virtual environment but also show their face in a natural and convincing fashion. The example below demonstrates our tech applied to an artist using Google Tilt Brush in a virtual environment:
An artist creates 3D art using Google Tilt Brush, shown in Mixed Reality. On the top is the traditional MR result where the face is hidden behind the headset. On the bottom is our result, which reveals the entire face and eyes for a more natural and engaging experience.
While we have shown the potential of our technology, its applications extend beyond Mixed Reality. Headset removal is poised to enhance communication and social interaction in VR itself with diverse applications like VR video conference meetings, multiplayer VR gaming, and exploration with friends and family. Going from an utterly blank headset to being able to see, with photographic realism, the faces of fellow VR users promises to be a significant transition in the VR world, and we are excited to be a part of it.

YouTube VR: A whole new way to watch… and create

We’re inspired every day by the videos people share on YouTube. The idea that anybody, from any part of the globe, can create something that all of us can watch, laugh at, cry to, or relate to. Not only has this turned YouTube into a home for just about any idea imaginable, it’s made it the place where new video formats, new ways to connect, and new ways to tell stories are born.

We want to continue to provide you with new ways to engage with the world and with your community, and we believe virtual reality will play an important role in the future of storytelling. More than just an amazing new technology, VR allows us to make deep, human connections with people, places and stories. That’s why we’re committed to giving creators the space and resources they need to learn about, experiment with, and create virtual reality video. In fact, we’ve already started working with some awesome creators, recording artists, and partners who are producing VR videos across a wide variety of genres and interest areas on YouTube.

Want to spend some time with beauty vlogger Meredith Foster? Check out an immersive tour of her apartment. More of a foodie? Tastemade’s VR cooking videos breathe new life into learning new recipes. Rooster Teeth reimagines their gaming comedy “Red vs. Blue” for some fresh laughs. You can even watch breaking news in VR from HuffPost RYOT.

We’ve also been working to allow you to have experiences or visit places you might not be able to (or might not dare to!) in real life. Go swimming with sharks thanks to Curiscope, get a first-hand look at a living, breathing dinosaur at the Natural History Museum in London, travel to Belize with StyleHaul, hike a trail a thousand miles away with Daniel and Kelli at Fitness Blender, or watch Tritonal in concert no matter where you are.


And today you can experience all of this amazing content in a more immersive way with the brand new YouTube VR app, available first on Daydream. This new standalone app was built from the ground up and optimized for VR. You just need a Daydream-ready phone like Pixel and the new Daydream View headset and controller to get started. Every single video on the platform becomes an immersive VR experience, from 360-degree videos that let you step inside the content to standard videos shown on a virtual movie screen in the new theater mode. The app even includes some familiar features like voice search and a signed in experience so you can follow the channels you subscribe to, check out your playlists and more.


YTVR-Watch-Spherical-Player.jpg

We can’t tell you how inspired we all are after watching the amazing VR videos made by creators on YouTube. We can't wait to see what you dream up next!

Erin Teague, Product Manager, and Jamie Byrne, Director of YouTube Creators, recently watched "Tour of Man At Arms: Reforged Shop in 360° - The Forging Room!"

Source: YouTube Blog


Omnitone: Spatial audio on the web


Spatial audio is a key element for an immersive virtual reality (VR) experience. By bringing spatial audio to the web, the browser can be transformed into a complete VR media player with incredible reach and engagement. That’s why the Chrome WebAudio team has created and is releasing the Omnitone project, an open source spatial audio renderer with the cross-browser support.

Our challenge was to introduce the audio spatialization technique called ambisonics so the user can hear the full-sphere surround sound on the browser. In order to achieve this, we implemented the ambisonic decoding with binaural rendering using web technology. There are several paths for introducing a new feature into the web platform, but we chose to use only the Web Audio API. In doing so, we can reach a larger audience with this cross-browser technology, and we can also avoid the lengthy standardization process for introducing a new Web Audio component. This is possible because the Web Audio API provides all the necessary building blocks for this audio spatialization technique.



Omnitone Audio Processing Diagram

The AmbiX format recording, which is the target of the Omnitone decoder, contains 4 channels of audio that are encoded using ambisonics, which can then be decoded into an arbitrary speaker setup. Instead of the actual speaker array, Omnitone uses 8 virtual speakers based on an the head-related transfer function (HRTF) convolution to render the final audio stream binaurally. This binaurally-rendered audio can convey a sense of space when it is heard through headphones.

The beauty of this mechanism lies in the sound-field rotation applied to the incoming spatial audio stream. The orientation sensor of a VR headset or a smartphone can be linked to Omnitone’s decoder to seamlessly rotate the entire sound field. The rest of the spatialization process will be handled automatically by Omnitone. A live demo can be found at the project landing page.

Throughout the project, we worked closely with the Google VR team for their VR audio expertise. Not only was their knowledge on the spatial audio a tremendous help for the project, but the collaboration also ensured identical audio spatialization across all of Google’s VR applications - both on the web and Android (e.g. Google VR SDK, YouTube Android app). The Spatial Media Specification and HRTF sets are great examples of the Google VR team’s efforts, and Omnitone is built on top of this specification and HRTF sets.

With emerging web-based VR projects like WebVR, Omnitone’s audio spatialization can play a critical role in a more immersive VR experience on the web. Web-based VR applications will also benefit from high-quality streaming spatial audio, as the Chrome Media team has recently added FOA compression to the open source audio codec Opus. More exciting things like VR view integration, higher-order ambisonics and mobile web support will also be coming soon to Omnitone.

We look forward to seeing what people do with Omnitone now that it's open source. Feel free to reach out to us or leave a comment with your thoughts and feedback on the issue tracker on GitHub.

By Hongchan Choi and Raymond Toy, Chrome Team

Due to the incomplete implementation of multichannel audio decoding on various browsers, Omnitone does not support mobile web at the time of writing.