Tag Archives: Virtual Reality

VALID: A perceptually validated virtual avatar library for inclusion and diversity

As virtual reality (VR) and augmented reality (AR) technologies continue to grow in popularity, virtual avatars are becoming an increasingly important part of our digital interactions. In particular, virtual avatars are at the center of many social VR and AR interactions, as they are key to representing remote participants and facilitating collaboration.

In the last decade, interdisciplinary scientists have dedicated a significant amount of effort to better understand the use of avatars, and have made many interesting observations, including the capacity of the users to embody their avatar (i.e., the illusion that the avatar body is their own) and the self-avatar follower effect, which creates a binding between the actions of the avatar and the user strong enough that the avatar can actually affect user behavior.

The use of avatars in experiments isn’t just about how users will interact and behave in VR spaces, but also about discovering the limits of human perception and neuroscience. In fact, some VR social experiments often rely on recreating scenarios that can’t be reproduced easily in the real world, such as bar crawls to explore ingroup vs. outgroup effects, or deception experiments, such as the Milgram obedience to authority inside virtual reality. Other studies try to explore deep neuroscientific phenomena, like the human mechanisms for motor control. This perhaps follows the trail of the rubber hand illusion on brain plasticity, where a person can start feeling as if they own a rubber hand while their real hand is hidden behind a curtain. There is also an increased number of possible therapies for psychiatric treatment using personalized avatars. In these cases, VR becomes an ecologically valid tool that allows scientists to explore or treat human behavior and perception.

None of these experiments and therapies could exist without good access to research tools and libraries that can enable easy experimentation. As such, multiple systems and open source tools have been released around avatar creation and animation over recent years. However, existing avatar libraries have not been validated systematically on the diversity spectrum. Societal bias and dynamics also transfer to VR/AR when interacting with avatars, which could lead to incomplete conclusions for studies on human behavior inside VR/AR.

To partially overcome this problem, we partnered with the University of Central Florida to create and release the open-source Virtual Avatar Library for Inclusion and Diversity (VALID). Described in our recent paper, published in Frontiers in Virtual Reality, this library of avatars is readily available for usage in VR/AR experiments and includes 210 avatars of seven different races and ethnicities recognized by the US Census Bureau. The avatars have been perceptually validated and designed to advance diversity and inclusion in virtual avatar research.

Headshots of all 42 base avatars available on the VALID library were created in extensive interaction with members of the 7 ethnic and racial groups from the Federal Register, which include (AIAN, Asian, Black, Hispanic, MENA, NHPI and White).

Creation and validation of the library

Our initial selection of races and ethnicities for the diverse avatar library follows the most recent guidelines of the US Census Bureau that as of 2023 recommended the use of 7 ethnic and racial groups representing a large demographic of the US society, which can also be extrapolated to the global population. These groups include Hispanic or Latino, American Indian or Alaska Native (AIAN), Asian, Black or African American, Native Hawaiian or Other Pacific Islander (NHPI), White, Middle East or North Africa (MENA). We envision the library will continue to evolve to bring even more diversity and representation with future additions of avatars.

The avatars were hand modeled and created using a process that combined average facial features with extensive collaboration with representative stakeholders from each racial group, where their feedback was used to artistically modify the facial mesh of the avatars. Then we conducted an online study with participants from 33 countries to determine whether the race and gender of each avatar in the library are recognizable. In addition to the avatars, we also provide labels statistically validated through observation of users for the race and gender of all 42 base avatars (see below).

Example of the headshots of a Black/African American avatar presented to participants during the validation of the library.

We found that all Asian, Black, and White avatars were universally identified as their modeled race by all participants, while our American Indian or Native Alaskan (AIAN), Hispanic, and Middle Eastern or North African (MENA) avatars were typically only identified by participants of the same race. This also indicates that participant race can improve identification of a virtual avatar of the same race. The paper accompanying the library release highlights how this ingroup familiarity should also be taken into account when studying avatar behavior in VR.

Confusion matrix heatmap of agreement rates for the 42 base avatars separated by other-race participants and same-race participants. One interesting aspect visible in this matrix, is that participants were significantly better at identifying the avatars of their own race than other races.

Dataset details

Our models are available in FBX format, are compatible with previous avatar libraries like the commonly used Rocketbox, and can be easily integrated into most game engines such as Unity and Unreal. Additionally, the avatars come with 69 bones and 65 facial blendshapes to enable researchers and developers to easily create and apply dynamic facial expressions and animations. The avatars were intentionally made to be partially cartoonish to avoid extreme look-a-like scenarios in which a person could be impersonated, but still representative enough to be able to run reliable user studies and social experiments.

Images of the skeleton rigging (bones that allow for animation) and some facial blend shapes included with the VALID avatars.

The avatars can be further combined with variations of casual attires and five professional attires, including medical, military, worker and business. This is an intentional improvement from prior libraries that in some cases reproduced stereotypical gender and racial bias into the avatar attires, and provided very limited diversity to certain professional avatars.

Images of some sample attire included with the VALID avatars.

Get started with VALID

We believe that the Virtual Avatar Library for Inclusion and Diversity (VALID) will be a valuable resource for researchers and developers working on VR/AR applications. We hope it will help to create more inclusive and equitable virtual experiences. To this end, we invite you to explore the avatar library, which we have released under the open source MIT license. You can download the avatars and use them in a variety of settings at no charge.


Acknowledgements

This library of avatars was born out of a collaboration with Tiffany D. Do, Steve Zelenty and Prof. Ryan P McMahan from the University of Central Florida.

Source: Google AI Blog


Google Cardboard XR Plugin for Unity

Late in 2019, we decided to open source Google Cardboard. Since then, our developer community has had access to create a plethora of experiences on both iOS and Android platforms, while reaching millions of users around the world. While this release has been considered a success by our developer community, we also promised that we would release a plugin for Unity. Our users have long preferred developing Cardboard experiences in Unity, so we made it a priority to develop a Unity SDK. Today, we have fulfilled that promise, and the Google Cardboard open source plugin for Unity is now available via the Unity Asset Store

What's Included in the Cardboard Unity SDK

Today, we’re releasing the Cardboard Unity SDK to our users so that they can continue creating smartphone XR experiences using Unity. Unity is one of the most popular 3D and XR development platforms in the world, and our release of this SDK will give our content creators a smoother workflow with Unity when developing for Cardboard.

In addition to the Unity SDK, we are also providing a sample application for iOS/Android, which will be a great aid for developers trying to debug their own creations. This release not only fulfills a promise we made to our Cardboard community, but also shows our support, as we move away from smartphone VR and leave it in the more-than-capable hands of our development community.



If you’re interested in learning how to develop with the Cardboard open source project, please see our developer documentation or visit the Google VR GitHub repo to access source code, build the project, and download the latest release.

By Jonathan Goodlow, Product Manager, AR & VR

Open sourcing Google Cardboard

Posted by Jeffrey Chen, Product Manager, AR & VRGoogle CardboardFive years ago, we launched Google Cardboard—a simple cardboard viewer that anyone can use to experience virtual reality (VR). From a giveaway at Google I/O to more than 15 million units worldwide, Cardboard has played an important role in introducing people to VR through experiences like YouTube and Expeditions. In many cases, it provided access to VR to people who otherwise couldn’t have afforded it.

With Cardboard and the Google VR software development kit (SDK), developers have created and distributed VR experiences across both Android and iOS devices, giving them the ability to reach millions of users. While we’ve seen overall usage of Cardboard decline over time and we’re no longer actively developing the Google VR SDK, we still see consistent usage around entertainment and education experiences, like YouTube and Expeditions, and want to ensure that Cardboard’s no-frills, accessible-to-everyone approach to VR remains available.

Today, we’re releasing the Cardboard open source project to let the developer community continue to build Cardboard experiences and add support to their apps for an ever increasing diversity of smartphone screen resolutions and configurations. We think that an open source model—with additional contributions from us—is the best way for developers to continue to build experiences for Cardboard. We’ve already seen success with this approach with our Cardboard Manufacturer Kit—an open source project to enable third-party manufacturers to design and build their own unique compatible VR viewers—and we’re excited to see where the developer community takes Cardboard in the future.

What's Included in the open source project

We're releasing libraries for developers to build their Cardboard apps for iOS and Android and render VR experiences on Cardboard viewers. The open source project provides APIs for head tracking, lens distortion rendering and input handling. We’ve also included an Android QR code library, so that apps can pair any Cardboard viewer without depending on the Cardboard app.

An open source model will enable the community to continue to improve Cardboard support and expand its capabilities, for example adding support for new smartphone display configurations and Cardboard viewers as they become available. We’ll continue to contribute to the Cardboard open source project by releasing new features, including an SDK package for Unity.

If you’re interested in learning how to develop with the Cardboard open source project, please see our developer documentation, or visit the Cardboard GitHub repo to access source code, build the project and download the latest release.

Diagnose and understand your app’s GPU behavior with GAPID

Posted by Andrew Woloszyn, Software Engineer

Developing for 3D is complicated. Whether you're using a native graphics API or enlisting the help of your favorite game engine, there are thousands of graphics commands that have to come together perfectly to produce beautiful 3D images on your phone, desktop or VR headsets.

GAPID (Graphics API Debugger) is a new tool that helps developers diagnose rendering and performance issues with their applications. With GAPID, you can capture a trace of your application and step through each graphics command one-by-one. This lets you visualize how your final image is built and isolate problematic calls, so you spend less time debugging through trial-and-error.

GAPID supports OpenGL ES on Android, and Vulkan on Android, Windows and Linux.

Debugging in action, one draw call at a time

GAPID not only enables you to diagnose issues with your rendering commands, but also acts as a tool to run quick experiments and see immediately how these changes would affect the presented frame.

Here are a few examples where GAPID can help you isolate and fix issues with your application:

What's the GPU doing?

Why isn't my text appearing?!

Working with a graphics API can be frustrating when you get an unexpected result, whether it's a blank screen, an upside-down triangle, or a missing mesh. As an offline debugger, GAPID lets you take a trace of these applications, and then inspect the calls afterwards. You can track down exactly which command produced the incorrect result by looking at the framebuffer, and inspect the state at that point to help you diagnose the issue.

What happens if I do X?

Using GAPID to edit shader code

Even when a program is working as expected, sometimes you want to experiment. GAPID allows you to modify API calls and shaders at will, so you can test things like:

  • What if I used a different texture on this object?
  • What if I changed the calculation of bloom in this shader?

With GAPID, you can now iterate on the look and feel of your app without having to recompile your application or rebuild your assets.

Whether you're building a stunning new desktop game with Vulkan or a beautifully immersive VR experience on Android, we hope that GAPID will save you both time and frustration and help you get the most out of your GPU. To get started with GAPID and see just how powerful it is, download it, take your favorite application, and capture a trace!

Introducing a New Foveation Pipeline for Virtual/Mixed Reality



Virtual Reality (VR) and Mixed Reality (MR) offer a novel way to immerse people into new and compelling experiences, from gaming to professional training. However, current VR/MR technologies present a fundamental challenge: to present images at the extremely high resolution required for immersion places enormous demands on the rendering engine and transmission process. Headsets often have insufficient display resolution, which can limit the field of view, worsening the experience. But, to drive a higher resolution headset, the traditional rendering pipeline requires significant processing power that even high-end mobile processors cannot achieve. As research continues to deliver promising new techniques to increase display resolution, the challenges of driving those displays will continue to grow.

In order to further improve the visual experience in VR and MR, we introduce a pipeline that takes advantage of the characteristics of human visual perception to enable a amazing visual experience at low compute and power cost. The pipeline proposed in this article considers the full system dependency including the rendering engine, memory bandwidth and capability of display module itself. We determined that the current limitation is not just in the content creation, but it also may be in transmitting data, handling latency and enabling interaction with real objects (mixed reality applications). The pipeline consists of 1. Foveated Rendering with a focus on reducing of compute per pixel. 2. Foveated Image Processing with a focus on the reduction of visual artifacts and 3. Foveated Transmission with a focus on bits per pixel transmitted.

Foveated Rendering
In the human visual system, the fovea centralis allows us to see at high-fidelity in the center of our vision, allowing our brain to pay less attention to things in our peripheral vision. Foveated rendering takes advantage of this characteristic to improve the performance of the rendering engine by reducing the spatial or bit-depth resolution of objects in our peripheral vision. To make this work, the location of the High Acuity (HA) region needs to be updated with eye-tracking to align with eye saccades, which preserves the perception of a constant high-resolution across the field of view. In contrast, systems with no eye-tracking may need to render a much larger HA region.
The left image is rendered at full resolution. The right image uses two layers of foveation — one rendered at high resolution (inside the yellow region) and one at lower resolution (outside).
A traditional foveation technique may divide a frame buffer into multiple spatial resolution regions. Aliasing introduced by rendering to lower spatial resolution may cause perceptible temporal artifacts when there is motion in the content due to head motion or animation. Below we show an example of temporal artifacts introduced by head rotation.
A smooth full rendering (image on the left). The image on the right shows temporal artifacts introduced by motion in foveated region.
In the following sections, we present two different methods we use aimed at reducing these artifacts: Phase-Aligned Foveated Rendering and Conformal Foveated Rendering. Each of these methods provide different benefits for visual quality during rendering and are useful under different conditions.

Phase-Aligned Rendering
Aliasing occurs in the Low-Acuity (LA) region during foveated rendering due to the subsampling of rendered content. In traditional foveated rendering discussed above, these aliasing artifacts flicker from frame to frame, since the display pixel grid moves across the virtual scene as the user moves their head. The motion of these pixels relative to the scene cause any existing aliasing artifacts to flicker, which is highly perceptible to the user, even in the periphery.

In Phase-Aligned rendering, we force the LA region frustums to be aligned rotationally to the world (e.g. always facing north, east, south, etc.), not the current frame's head-rotation. The aliasing artifacts are mostly invariant to head pose and therefore much less detectable. After upsampling, these regions are then reprojected onto the final display screen to compensate for the user's head rotation, which reduces temporal flicker. As with traditional foveation, we render the high-acuity region in a separate pass, and overlay it onto the merged image at the location of the fovea. The figure below compares traditional foveated rendering with phase-aligned rendering, both at the same level of foveation.
Temporal artifacts in non-world aligned foveated rendered content (left) and the phase-aligned method (right).
This method gives a major benefit to reducing the severity of visual artifacts during foveated rendering. Although phase-aligned rendering is more expensive to compute than traditional foveation under the same level of acuity reduction, we can still yield a net savings by pushing foveation to more aggressive levels that would otherwise have yielded too many artifacts.

Conformal Rendering
Another approach for foveated rendering is to render content in a space that matches the smoothly varying reduction in resolution of our visual acuity, based on a nonlinear mapping of the screen distance from the visual fixation point.

This method gives two main benefits. First, by more closely matching the visual fidelity fall-off of the human eye, we can reduce the total number of pixels computed compared to other foveation techniques. Second, by using a smooth fall-off in fidelity, we prevent the user from seeing a clear dividing line between High-Acuity and Low-Acuity, which is often one of the first artifacts that is noticed. These benefits allow for aggressive foveation to be used while preserving the same quality levels, yielding more savings.

We perform this method by warping the vertices of the virtual scene into non-linear space. This scene is then rasterized at a reduced resolution, then unwarped into linear space as a post-processing effect combined with lens distortion correction.
Comparison of traditional foveation (left) to conformal rendering (right), where content is rendered to a space matched to visual perception acuity and HMD lens characteristics. Both methods use the same number of total pixels.
A major benefit of this method over the phase-aligned method above is that conformal rendering only requires a single pass of rasterization. For scenes with lots of vertices, this difference can provide major savings. Additionally, although phase-aligned rendering reduces flicker, it still produces a distinct boundary between the high- and low-acuity regions, whereas conformal rendering does not show this artifact. However, a downside of conformal rendering compared to phase-alignment is that aliasing artifacts still flicker in the periphery, which may be less desirable for applications that require high visual fidelity.

Foveated Image Processing
HMDs often require image processing steps to be performed after rendering, such as local tone mapping, lens distortion correction, or lighting blending. With foveated image-processing, different operations are applied for different foveation regions. As an example, lens distortion correction, including chromatic aberration correction, may not require the same spatial accuracy for each part of the display. By running lens distortion correction on foveated content before upscaling, significant savings are gained in computation. This technique does not introduce perceptible artifacts.
Correction for head-mounted-display lens chromatic aberration in foveated space. Top image shows the conventional pipeline. The bottom image (in Green) shows the operation in the foveated space.
The left image shows reconstructed foveated content after lens distortion. The right image shows image difference when lens distortion correction is performed in a foveated manner. The right image shows that minimal error is introduced close to edges of frame buffer. These errors are imperceptible in an HMD.

Foveated Transmission
A non-trivial source of power consumption for standalone HMDs is data transmission from the system-on-a-chip (SoC) to the display module. Foveated transmission aims to save power and bandwidth by transmitting the minimum amount of data necessary to the display as shown in figure below.
Rather than streaming upscaled foveated content (left image), foveated transmission enables streaming content pre-reconstruction (right image) and reducing the number of bits transmitted.
This change requires moving the simple upscaling and blending operations to the display side and transmitting only the foveated rendered content. Complexity arises if the foveal region, the red box in above figure, moves with eyetracking. Such motion may cause temporal artifacts (figure below) since Display Stream Compression (DSC) used between SoC and the display is not designed for foveated content.
Comparison of full integration of foveation and compression techniques (left) versus typical flickering artifacts that may be introduced by applying DSC to foveated content (right).
Toward a New Pipeline
We have focused on a few components of a “foveation pipeline” for MR and VR applications. By considering the impact of foveation in every part of a display system — rendering, processing and transmission — we can enable the next generation of lightweight, low-power, and high resolution MR/VR HMDs. This topic has been an active area of research for many years and it seems reasonable to expect the appearance of VR and MR headsets with foveated pipelines in the coming years.

Acknowledgements
We would like to recognize the work done by the following collaborators:
  • Haomiao Jiang and Carlin Vieri on display compression and foveated transmission
  • Brian Funt and Sylvain Vignaud on the development of new foveated rendering algorithms

ARCore: Augmented reality at Android scale

Posted by Dave Burke, VP, Android Engineering

With more than two billion active devices, Android is the largest mobile platform in the world. And for the past nine years, we've worked to create a rich set of tools, frameworks and APIs that deliver developers' creations to people everywhere. Today, we're releasing a preview of a new software development kit (SDK) called ARCore. It brings augmented reality capabilities to existing and future Android phones. Developers can start experimenting with it right now.

We've been developing the fundamental technologies that power mobile AR over the last three years with Tango, and ARCore is built on that work. But, it works without any additional hardware, which means it can scale across the Android ecosystem. ARCore will run on millions of devices, starting today with the Pixel and Samsung's S8, running 7.0 Nougat and above. We're targeting 100 million devices at the end of the preview. We're working with manufacturers like Samsung, Huawei, LG, ASUS and others to make this possible with a consistent bar for quality and high performance.

ARCore works with Java/OpenGL, Unity and Unreal and focuses on three things:

  • Motion tracking: Using the phone's camera to observe feature points in the room and IMU sensor data, ARCore determines both the position and orientation (pose) of the phone as it moves. Virtual objects remain accurately placed.
  • Environmental understanding: It's common for AR objects to be placed on a floor or a table. ARCore can detect horizontal surfaces using the same feature points it uses for motion tracking.
  • Light estimation: ARCore observes the ambient light in the environment and makes it possible for developers to light virtual objects in ways that match their surroundings, making their appearance even more realistic.

Alongside ARCore, we've been investing in apps and services which will further support developers in creating great AR experiences. We built Blocks and Tilt Brush to make it easy for anyone to quickly create great 3D content for use in AR apps. As we mentioned at I/O, we're also working on Visual Positioning Service (VPS), a service which will enable world scale AR experiences well beyond a tabletop. And we think the Web will be a critical component of the future of AR, so we're also releasing prototype browsers for web developers so they can start experimenting with AR, too. These custom browsers allow developers to create AR-enhanced websites and run them on both Android/ARCore and iOS/ARKit.

ARCore is our next step in bringing AR to everyone, and we'll have more to share later this year. Let us know what you think through GitHub, and check out our new AR Experiments showcase where you can find some fun examples of what's possible. Show us what you build on social media with #ARCore; we'll be resharing some of our favorites.

Expressions in Virtual Reality



Recently Google Machine Perception researchers, in collaboration with Daydream Labs and YouTube Spaces, presented a solution for virtual headset ‘removal’ for mixed reality in order to create a more rich and engaging VR experience. While that work could infer eye-gaze directions and blinks, enabled by a headset modified with eye-tracking technology, a richer set of facial expressions — which are key to understanding a person's experience in VR, as well as conveying important social engagement cues — were missing.

Today we present an approach to infer select facial action units and expressions entirely by analyzing a small part of the face while the user is engaged in a virtual reality experience. Specifically, we show that images of the user’s eyes captured from an infrared (IR) gaze-tracking camera within a VR headset are sufficient to infer at least a subset of facial expressions without the use of any external cameras or additional sensors.
Left: A user wearing a VR HMD modified with eye-tracking used for expression classification (Note that no external camera is used in our method; this is just for visualization). Right: inferred expression from eye images using our model. A video demonstrating the work can be seen here.
We use supervised deep learning to classify facial expressions from images of the eyes and surrounding areas, which typically contain the iris, sclera, eyelids and may include parts of the eyebrows and top of cheeks. Obtaining large scale annotated data from such novel sensors is a challenging task, hence we collected training data by capturing 46 subjects while performing a set of facial expressions.

To perform expression classification, we fine-tuned a variant of the widespread Inception architecture with TensorFlow using weights from a model trained to convergence on Imagenet. We attempted to partially remove variance due to differences in participant appearance (i.e., individual differences that do not depend on expression), inspired by the standard practice of mean image subtraction. Since this variance removal occurs within-subject, it is effectively personalization. Further details, along with examples of eye-images, and results are presented in our accompanying paper.

Results and Extensions
We demonstrate that the information required to classify a variety of facial expressions is reliably present in IR eye images captured by a commercial HMD sensor, and that this information can be decoded using a CNN-based method, even though classifying facial expressions from eye-images alone is non-trivial even for humans. Our model inference can be performed in real-time, and we show this can be used to generate expressive avatars in real-time, which can function as an expressive surrogate for users engaged in VR. This interaction mechanism also yields a more intuitive interface for sharing expression in VR as opposed to gestures or keyboard inputs.

The ability to capture a user’s facial expressions using existing eye-tracking cameras enables a fully mobile solution to facial performance capture in VR, without additional external cameras. This technology extends beyond animating cartoon avatars; it could be used to provide a richer headset removal experience, enhancing communication and social interaction in VR by transmitting far more authentic and emotionally textured information.

Acknowledgements
The research described in this post was performed by Steven Hickson (as an intern), Nick Dufour, Avneesh Sud, Vivek Kwatra and Irfan Essa. We also thank Hayes Raffle and Alex Wong from Daydream, and Chris Bregler, Sergey Ioffe and authors of TF-Slim from Google Research for their guidance and suggestions.

This technology, along with headset removal, will be demonstrated at Siggraph 2017 Emerging Technologies.

Experimenting with VR Ad formats at Area 120

Posted by Aayush Upadhyay and Neel Rao, Area 120

At Area 120, Google's internal workshop for experimental ideas, we're working on early-stage projects and quickly iterate to test concepts. We heard from developers that they're looking at how to make money to fund their VR applications, so we started experimenting with what a native, mobile VR ad format might look like.

Developers and users have told us they want to avoid disruptive, hard-to-implement ad experiences in VR. So our first idea for a potential format presents a cube to users, with the option to engage with it and then see a video ad. By tapping on the cube or gazing at it for a few seconds, the cube opens a video player where the user can watch, and then easily close, the video. Here's how it works:

Our work focuses on a few key principles - VR ad formats should be easy for developers to implement, native to VR, flexible enough to customize, and useful and non-intrusive for users. Our Area 120 team has seen some encouraging results with a few test partners, and would love to work with the developer community as this work evolves - across Cardboard (on Android and iOS), Daydream and Samsung Gear VR.

If you're a VR developer (or want to be one) and are interested in testing this format with us, please fill out this form to apply for our early access program. We have an early-stage SDK available and you can get up and running easily. We're excited to continue experimenting with this format and hope you'll join us for the ride!

Headset “Removal” for Virtual and Mixed Reality



Virtual Reality (VR) enables remarkably immersive experiences, offering new ways to view the world and the ability to explore novel environments, both real and imaginary. However, compared to physical reality, sharing these experiences with others can be difficult, as VR headsets make it challenging to create a complete picture of the people participating in the experience.

Some of this disconnect is alleviated by Mixed Reality (MR), a related medium that shares the virtual context of a VR user in a two dimensional video format allowing other viewers to get a feel for the user’s virtual experience. Even though MR facilitates sharing, the headset continues to block facial expressions and eye gaze, presenting a significant hurdle to a fully engaging experience and complete view of the person in VR.

Google Machine Perception researchers, in collaboration with Daydream Labs and YouTube Spaces, have been working on solutions to address this problem wherein we reveal the user’s face by virtually “removing” the headset and create a realistic see-through effect.
VR user captured in front of a green-screen is blended with the virtual environment to generate the MR output: Traditional MR output has the user face occluded, while our result reveals the face. Note how the headset is modified with a marker to aid tracking.
Our approach uses a combination of 3D vision, machine learning and graphics techniques, and is best explained in the context of enhancing Mixed Reality video (also discussed in the Google-VR blog). It consists of three main components:

Dynamic face model capture
The core idea behind our technique is to use a 3D model of the user’s face as a proxy for the hidden face. This proxy is used to synthesize the face in the MR video, thereby creating an impression of the headset being removed. First, we capture a personalized 3D face model for the user with what we call gaze-dependent dynamic appearance. This initial calibration step requires the user to sit in front of a color+depth camera and a monitor, and then track a marker on the monitor with their eyes. We use this one-time calibration procedure — which typically takes less than a minute — to acquire a 3D face model of the user, and learn a database that maps appearance images (or textures) to different eye-gaze directions and blinks. This gaze database (i.e. the face model with textures indexed by eye-gaze) allows us to dynamically change the appearance of the face during synthesis and generate any desired eye-gaze, thus making the synthesized face look natural and alive
On the left, the user’s face is captured by a camera as she tracks a marker on the monitor with her eyes. On the right, we show the dynamic nature of reconstructed 3D face model: by moving or clicking on the mouse, we are able to simulate both apparent eye gaze and blinking.
Calibration and Alignment
Creating a Mixed Reality video requires a specialized setup consisting of an external camera, calibrated and time-synced with the headset. The camera captures a video stream of the VR user in front of a green screen and then composites a cutout of the user with the virtual world to create the final MR video. An important step here is to accurately estimate the calibration (the fixed 3D transformation) between the camera and headset coordinate systems. These calibration techniques typically involve significant manual intervention and are done in multiple steps. We simplify the process by adding a physical marker to the front of the headset and tracking it visually in 3D, which allows us to optimize for the calibration parameters automatically from the VR session.

For headset “removal”, we need to align the 3D face model with the visible portion of the face in the camera stream, so that they would blend seamlessly with each other. A reasonable proxy to this alignment is to place the face model just behind the headset. The calibration described above, coupled with VR headset tracking, provides sufficient information to determine this placement, allowing us to modify the camera stream by rendering the virtual face into it.

Compositing and Rendering
Having tackled the alignment, the last step involves producing a suitable rendering of the 3D face model, consistent with the content in the camera stream. We are able to reproduce the true eye-gaze of the user by combining our dynamic gaze database with an HTC Vive headset that has been modified by SMI to incorporate eye-tracking technology. Images from these eye trackers lack sufficient detail to directly reproduce the occluded face region, but are well suited to provide fine-grained gaze information. Using the live gaze data from the tracker, we synthesize a face proxy that accurately represents the user’s attention and blinks. At run-time, the gaze database, captured in the preprocessing step, is searched for the most appropriate face image corresponding to the query gaze state, while also respecting aesthetic considerations such as temporal smoothness. Additionally, to account for lighting changes between gaze database acquisition and run-time, we apply color correction and feathering, such that the synthesized face region matches with the rest of the face.

Humans are highly sensitive to artifacts on faces, and even small imperfections in synthesis of the occluded face can feel unnatural and distracting, a phenomenon known as the “uncanny valley.” To mitigate this problem, we do not remove the headset completely, instead we have chosen a user experience that conveys a ‘scuba mask effect’ by compositing the color corrected face proxy with a translucent headset. Reminding the viewer of the presence of the headset helps us avoid the uncanny valley, and also makes our algorithm robust to small errors in alignment and color correction.

This modified camera stream, displaying a see-through headset, with the user’s face revealed and their true eye-gaze recreated, is subsequently merged with the virtual environment to create the final MR video.

Results and Extensions
We have used our headset removal technology to enhance Mixed Reality, allowing the medium to not only convey a VR user’s interaction with the virtual environment but also show their face in a natural and convincing fashion. The example below demonstrates our tech applied to an artist using Google Tilt Brush in a virtual environment:
An artist creates 3D art using Google Tilt Brush, shown in Mixed Reality. On the top is the traditional MR result where the face is hidden behind the headset. On the bottom is our result, which reveals the entire face and eyes for a more natural and engaging experience.
While we have shown the potential of our technology, its applications extend beyond Mixed Reality. Headset removal is poised to enhance communication and social interaction in VR itself with diverse applications like VR video conference meetings, multiplayer VR gaming, and exploration with friends and family. Going from an utterly blank headset to being able to see, with photographic realism, the faces of fellow VR users promises to be a significant transition in the VR world, and we are excited to be a part of it.

YouTube VR: A whole new way to watch… and create

We’re inspired every day by the videos people share on YouTube. The idea that anybody, from any part of the globe, can create something that all of us can watch, laugh at, cry to, or relate to. Not only has this turned YouTube into a home for just about any idea imaginable, it’s made it the place where new video formats, new ways to connect, and new ways to tell stories are born.

We want to continue to provide you with new ways to engage with the world and with your community, and we believe virtual reality will play an important role in the future of storytelling. More than just an amazing new technology, VR allows us to make deep, human connections with people, places and stories. That’s why we’re committed to giving creators the space and resources they need to learn about, experiment with, and create virtual reality video. In fact, we’ve already started working with some awesome creators, recording artists, and partners who are producing VR videos across a wide variety of genres and interest areas on YouTube.

Want to spend some time with beauty vlogger Meredith Foster? Check out an immersive tour of her apartment. More of a foodie? Tastemade’s VR cooking videos breathe new life into learning new recipes. Rooster Teeth reimagines their gaming comedy “Red vs. Blue” for some fresh laughs. You can even watch breaking news in VR from HuffPost RYOT.

We’ve also been working to allow you to have experiences or visit places you might not be able to (or might not dare to!) in real life. Go swimming with sharks thanks to Curiscope, get a first-hand look at a living, breathing dinosaur at the Natural History Museum in London, travel to Belize with StyleHaul, hike a trail a thousand miles away with Daniel and Kelli at Fitness Blender, or watch Tritonal in concert no matter where you are.


And today you can experience all of this amazing content in a more immersive way with the brand new YouTube VR app, available first on Daydream. This new standalone app was built from the ground up and optimized for VR. You just need a Daydream-ready phone like Pixel and the new Daydream View headset and controller to get started. Every single video on the platform becomes an immersive VR experience, from 360-degree videos that let you step inside the content to standard videos shown on a virtual movie screen in the new theater mode. The app even includes some familiar features like voice search and a signed in experience so you can follow the channels you subscribe to, check out your playlists and more.


YTVR-Watch-Spherical-Player.jpg

We can’t tell you how inspired we all are after watching the amazing VR videos made by creators on YouTube. We can't wait to see what you dream up next!

Erin Teague, Product Manager, and Jamie Byrne, Director of YouTube Creators, recently watched "Tour of Man At Arms: Reforged Shop in 360° - The Forging Room!"

Source: YouTube Blog