Tag Archives: augmented reality

Real-Time AR Self-Expression with Machine Learning



Augmented reality (AR) helps you do more with what you see by overlaying digital content and information on top of the physical world. For example, AR features coming to Google Maps will let you find your way with directions overlaid on top of your real world. With Playground - a creative mode in the Pixel camera -- you can use AR to see the world differently. And with the latest release of YouTube Stories and ARCore's new Augmented Faces API you can add objects like animated masks, glasses, 3D hats and more to your own selfies!

One of the key challenges in making these AR features possible is proper anchoring of the virtual content to the real world; a process that requires a unique set of perceptive technologies able to track the highly dynamic surface geometry across every smile, frown or smirk.
Our 3D mesh and some of the effects it enables
To make all this possible, we employ machine learning (ML) to infer approximate 3D surface geometry to enable visual effects, requiring only a single camera input without the need for a dedicated depth sensor. This approach provides the use of AR effects at realtime speeds, using TensorFlow Lite for mobile CPU inference or its new mobile GPU functionality where available. This technology is the same as what powers YouTube Stories' new creator effects, and is also available to the broader developer community via the latest ARCore SDK release and the ML Kit Face Contour Detection API.

An ML Pipeline for Selfie AR
Our ML pipeline consists of two real-time deep neural network models that work together: A detector that operates on the full image and computes face locations, and a generic 3D mesh model that operates on those locations and predicts the approximate surface geometry via regression. Having the face accurately cropped drastically reduces the need for common data augmentations like affine transformations consisting of rotations, translation and scale changes. Instead it allows the network to dedicate most of its capacity towards coordinate prediction accuracy, which is critical to achieve proper anchoring of the virtual content.

Once the location of interest is cropped, the mesh network is only applied to a single frame at a time, using a windowed smoothing in order to reduce noise when the face is static while avoiding lagging during significant movement.
Our 3D mesh in action
For our 3D mesh we employed transfer learning and trained a network with several objectives: the network simultaneously predicts 3D mesh coordinates on synthetic, rendered data and 2D semantic contours on annotated, real world data similar to those MLKit provides. The resulting network provided us with reasonable 3D mesh predictions not just on synthetic but also on real world data. All models are trained on data sourced from a geographically diverse dataset and subsequently tested on a balanced, diverse testset for qualitative and quantitative performance.

The 3D mesh network receives as input a cropped video frame. It doesn't rely on additional depth input, so it can also be applied to pre-recorded videos. The model outputs the positions of the 3D points, as well as the probability of a face being present and reasonably aligned in the input. A common alternative approach is to predict a 2D heatmap for each landmark, but it is not amenable to depth prediction and has high computational costs for so many points.

We further improve the accuracy and robustness of our model by iteratively bootstrapping and refining predictions. That way we can grow our dataset to increasingly challenging cases, such as grimaces, oblique angle and occlusions. Dataset augmentation techniques also expanded the available ground truth data, developing model resilience to artifacts like camera imperfections or extreme lighting conditions.
Dataset expansion and improvement pipeline
Hardware-tailored Inference
We use TensorFlow Lite for on-device neural network inference. The newly introduced GPU back-end acceleration boosts performance where available, and significantly lowers the power consumption. Furthermore, to cover a wide range of consumer hardware, we designed a variety of model architectures with different performance and efficiency characteristics. The most important differences of the lighter networks are the residual block layout and the accepted input resolution (128x128 pixels in the lightest model vs. 256x256 in the most complex). We also vary the number of layers and the subsampling rate (how fast the input resolution decreases with network depth).
Inference time per frame: CPU vs. GPU
The result of these optimizations is a substantial speedup from using lighter models, with minimal degradation in AR effect quality.
Comparison of the most complex (left) and the lightest models (right). Temporal consistency as well as lip and eye tracking is slightly degraded on light models.
The end result of these efforts empowers a user experience with convincing, realistic selfie AR effects in YouTube, ARCore, and other clients by:
  • Simulating light reflections via environmental mapping for realistic rendering of glasses
  • Natural lighting by casting virtual object shadows onto the face mesh
  • Modelling face occlusions to hide virtual object parts behind a face, e.g. virtual glasses, as shown below.
YouTube Stories includes Creator Effects like realistic virtual glasses, based on our 3D mesh
In addition, we achieve highly realistic makeup effects by:
  • Modelling Specular reflections applied on lips and
  • Face painting by using luminance-aware material 
Case study comparing real make-up against our AR make-up on 5 subjects under different lighting conditions.
We are excited to share this new technology with creators, users and developers alike, who can use this new technology immediately by downloading the latest ARCore SDK. In the future we plan to broaden this technology to more Google products.

Acknowledgements
We would like to thank Yury Kartynnik, Valentin Bazarevsky, Andrey Vakunov, Siargey Pisarchyk, Andrei Tkachenka, and Matthias Grundmann for collaboration on developing the current mesh technology; Nick Dufour, Avneesh Sud and Chris Bregler for an earlier version of the technology based on parametric models; Kanstantsin Sokal, Matsvei Zhdanovich, Gregory Karpiak, Alexander Kanaukou, Suril Shah, Buck Bourdon, Camillo Lugaresi, Siarhei Kazakou and Igor Kibalchich for building the ML pipeline to drive impressive effects; Aleksandra Volf and the annotation team for their diligence and dedication to perfection; Andrei Kulik, Juhyun Lee, Raman Sarokin, Ekaterina Ignasheva, Nikolay Chirkov, and Yury Pisarchyk for careful benchmarking and insights on mobile GPU-centric network architecture optimizations.

Source: Google AI Blog


New UI tools and a richer creative canvas come to ARCore

Posted by Evan Hardesty Parker, Software Engineer

ARCore and Sceneform give developers simple yet powerful tools for creating augmented reality (AR) experiences. In our last update (version 1.6) we focused on making virtual objects appear more realistic within a scene. In version 1.7, we're focusing on creative elements like AR selfies and animation as well as helping you improve the core user experience in your apps.

Creating AR Selfies

Example of 3D face mesh application

ARCore's new Augmented Faces API (available on the front-facing camera) offers a high quality, 468-point 3D mesh that lets users attach fun effects to their faces. From animated masks, glasses, and virtual hats to skin retouching, the mesh provides coordinates and region specific anchors that make it possible to add these delightful effects.

You can get started in Unity or Sceneform by creating an ARCore session with the "front-facing camera" and Augmented Faces "mesh" mode enabled. Note that other AR features such as plane detection aren't currently available when using the front-facing camera. AugmentedFace extends Trackable, so faces are detected and updated just like planes, Augmented Images, and other trackables.

// Create ARCore session that support Augmented Faces for use in Sceneform.
public Session createAugmentedFacesSession(Activity activity) throws UnavailableException {
// Use the front-facing (selfie) camera.
Session session = new Session(activity, EnumSet.of(Session.Feature.FRONT_CAMERA));
// Enable Augmented Faces.
Config config = session.getConfig();
config.setAugmentedFaceMode(Config.AugmentedFaceMode.MESH3D);
session.configure(config);
return session;
}

Animating characters in your Sceneform AR apps

Another way version 1.7 expands the AR creative canvas is by letting your objects dance, jump, spin and move around with support for animations in Sceneform. To start an animation, initialize a ModelAnimator (an extension of the existing Android animation support) with animation data from your ModelRenderable.

void startDancing(ModelRenderable andyRenderable) {
AnimationData data = andyRenderable.getAnimationData("andy_dancing");
animator = new ModelAnimator(data, andyRenderable);
animator.start();
}

Solving common AR UX challenges in Unity with new UI components

In ARCore version 1.7 we also focused on helping you improve your user experience with a simplified workflow. We've integrated "ARCore Elements" -- a set of common AR UI components that have been validated with user testing -- into the ARCore SDK for Unity. You can use ARCore Elements to insert AR interactive patterns in your apps without having to reinvent the wheel. ARCore Elements also makes it easier to follow Google's recommended AR UX guidelines.

ARCore Elements includes two AR UI components that are especially useful:

  • Plane Finding - streamlining the key steps involved in detecting a surface
  • Object Manipulation - using intuitive gestures to rotate, elevate, move, and resize virtual objects

We plan to add more to ARCore Elements over time. You can download the ARCore Elements app available in the Google Play Store to learn more.

Improving the User Experience with Shared Camera Access

ARCore version 1.7 also includes UX enhancements for the smartphone camera -- specifically, the experience of switching in and out of AR mode. Shared Camera access in the ARCore SDK for Java lets users pause an AR experience, access the camera, and jump back in. This can be particularly helpful if users want to take a picture of the action in your app.

More details are available in the Shared Camera developer documentation and Java sample.

Learn more and get started

For AR experiences to capture users' imaginations they need to be both immersive and easily accessible. With tools for adding AR selfies, animation, and UI enhancements, ARCore version 1.7 can help with both these objectives.

You can learn more about these new updates on our ARCore developer website.

Using Global Localization to Improve Navigation



One of the consistent challenges when navigating with Google Maps is figuring out the right direction to go: sure, the app tells you to go north - but many times you're left wondering, "Where exactly am I, and which way is north?" Over the years, we've attempted to improve the accuracy of the blue dot with tools like GPS and compass, but found that both have physical limitations that make solving this challenge difficult, especially in urban environments.

We're experimenting with a way to solve this problem using a technique we call global localization, which combines Visual Positioning Service (VPS), Street View, and machine learning to more accurately identify position and orientation. Using the smartphone camera as a sensor, this technology enables a more powerful and intuitive way to help people quickly determine which way to go.
Due to limitations with accuracy and orientation, guidance via GPS alone is limited in urban environments. Using VPS, Street View and machine learning, Global Localization can provide better context on where you are relative to where you're going.
In this post, we'll discuss some of the limitations of navigation in urban environments and how global localization can help overcome them.

Where GPS Falls Short
The process of identifying the position and orientation of a device relative to some reference point is referred to as localization. Various techniques approach localization in different ways. GPS relies on measuring the delay of radio signals from multiple dedicated satellites to determine a precise location. However, in dense urban environments like New York or San Francisco, it can be incredibly hard to pinpoint a geographic location due to low visibility to the sky and signals reflecting off of buildings. This can result in highly inaccurate placements on the map, meaning that your location could appear on the wrong side of the street, or even a few blocks away.
GPS signals bouncing off facades in an urban environment.
GPS has another technical shortcoming: it can only determine the location of the device, not the orientation. Sometimes, sensors in your mobile device can remedy the situation by measuring the magnetic and gravity field of the earth and the relative motion of the device in order to give rough estimates of your orientation. But these sensors are easily skewed by magnetic objects such as cars, pipes, buildings, and even electrical wires inside the phone, resulting in errors that can be inaccurate by up to 180 degrees.

A New Approach to Localization
To improve the precision position and orientation of the blue dot on the map, a new complementary technology is necessary. When walking down the street, you orient yourself by comparing what you see with what you expect to see. Global localization uses a combination of techniques that enable the camera on your mobile device to orient itself much as you would.

VPS determines the location of a device based on imagery rather than GPS signals. VPS first creates a map by taking a series of images which have a known location and analyzing them for key visual features, such as the outline of buildings or bridges, to create a large scale and fast searchable index of those visual features. To localize the device, VPS compares the features in imagery from the phone to those in the VPS index. However, the accuracy of localization through VPS is greatly affected by the quality of the both the imagery and the location associated with it. And that poses another question—where does one find an extensive source of high-quality global imagery?

Enter Street View
Over 10 years ago we launched Street View in Google Maps in order to help people explore the world more deeply. In that time, Street View has continued to expand its coverage of the world, empowering people to not only preview their route, but also step inside famous landmarks and museums, no matter where they are. To deliver global localization with VPS, we connected it with Street View data, making use of information gathered and tested from over 93 countries across the globe. This rich dataset provides trillions of strong reference points to apply triangulation, helping more accurately determine the position of a device and guide people towards their destination.
Features matched from multiple images.
Although this approach works well in theory, making it work well in practice is a challenge. The problem is that the imagery from the phone at the time of localization may differ from what the scene looked like when the Street View imagery was collected, perhaps months earlier. For example, trees have lots of rich detail, but change as the seasons change and even as the wind blows. To get a good match, we need to filter out temporary parts of the scene and focus on permanent structure that doesn't change over time. That's why a core ingredient in this new approach is applying machine learning to automatically decide which features to pay attention to, prioritizing features that are likely to be permanent parts of the scene and ignoring things like trees, dynamic light movement, and construction that are likely transient. This is just one of the many ways in which we use machine learning to improve accuracy.

Combining Global Localization with Augmented Reality
Global localization is an additional option that users can enable when they most need accuracy. And, this increased precision has enabled the possibility of a number of new experiences. One of the newest features we're testing is the ability to use ARCore, Google's platform for building augmented reality experiences, to overlay directions right on top of Google Maps when someone is in walking navigation mode. With this feature, a quick glance at your phone shows you exactly which direction you need to go.
Although early results are promising, there's significant work to be done. One outstanding challenge is making this technology work everywhere, in all types of conditions—think late at night, in a snowstorm, or in torrential downpour. To make sure we're building something that's truly useful, we're starting to test this feature with select Local Guides, a small group of Google Maps enthusiasts around the world who we know will offer us the feedback about how this approach can be most helpful.

Like other AI-driven camera experiences such as Google Lens (which uses the camera to let you search what you see), we believe the ability to overlay directions over the real world environment offers an exciting and useful way to use the technology that already exists in your pocket. We look forward to continuing to develop this technology, and the potential for smartphone cameras to add new types of valuable experiences.

Source: Google AI Blog


Creating More Realistic AR experiences with updates to ARCore & Sceneform

Posted by Ashish Shah, Product Manager, Google AR & VR

The magic of augmented reality is in the way it blends the digital and the physical worlds. For AR experiences to feel truly immersive, digital objects need to look realistic -- as if they were actually there with you, in your space. This is something we continue to prioritize as we update ARCore and Sceneform, our 3D rendering library for Java developers.

Today, with the release of ARCore 1.6, we're bringing further improvements to help you build more realistic and compelling experiences, including better plane boundary tracking and several lighting improvements in Sceneform.

With 250M devices now supporting ARCore, developers can bring these experiences to an even larger and growing user base.

More Realistic Lighting in Sceneform

Previous versions of Sceneform defaulted to optimizing ambient light as yellow. Version 1.6 defaults to neutral and white. This aligns more closely to the way light appears in the real world, making digital objects look more natural. You can see the differences below.

Left side image: Sceneform 1.5Right side image: Sceneform 1.6

This change will also make objects rendered with Sceneform look as if they're affected more naturally by color and lighting in the surrounding environment. For example, if you're viewing an AR object at sunset, it would appear to be illuminated by the red and orange hues, just like real objects in the scene.

In addition, we've updated Sceneform's built-in environmental image to provide a more neutral scene for your app. This will be most noticeable when viewing reflections in smooth metallic surfaces.

Adding screen capture and recording to the mix

To help you further improve quality and engagement in your AR apps, we're adding screen capture and recording to Sceneform. This is something a number of developers have requested to help with demo recording and prototyping. It can also be used as an external facing feature, allowing your users to share screenshots and videos on social media more easily, which can help get the word out about your app.

You can access this functionality through the surface mirroring API for the SceneView class. The API allows you to display the Sceneform view on a device's screen at the same time it's being rendered to another surface (such as the input surface for the Android MediaRecorder).

Learn more and get started

The new updates to Sceneform and ARCore are available today. With these new versions also comes support for new devices, such as the Samsung Galaxy A3 and the Huawei P20 Lite, that will join the list of ARCore-enabled devices. More information is available on the ARCore developer website.

The Instant Motion Tracking Behind Motion Stills AR



Last summer, we launched Motion Stills on Android, which delivered a great video capture and viewing experience on a wide range of Android phones. Then, we refined our Motion Stills technology further to enable the new motion photos feature in Pixel 2.

Today, we are excited to announce the new Augmented Reality (AR) mode in Motion Stills for Android. With the new AR mode, a user simply touches the viewfinder to place fun, virtual 3D objects on static or moving horizontal surfaces (e.g. tables, floors, or hands), allowing them to seamlessly interact with a dynamic real-world environment. You can also record and share the clips as GIFs and videos.
Motion Stills with instant motion tracking in action
AR mode is powered by instant motion tracking, a six degree of freedom tracking system built upon the technology that powers Motion Text in Motion Stills iOS and the privacy blur on YouTube to accurately track static and moving objects. We refined and enhanced this technology to enable fun AR experiences that can run on any Android device with a gyroscope.

When you touch the viewfinder, Motion Stills AR “sticks” a 3D virtual object to that location, making it look as if it’s part of the real-world scene. By assuming that the tracked surface is parallel to the ground plane, and using the device’s accelerometer sensor to provide the initial orientation of the phone with respect to the ground plane, one can track the six degrees of freedom of the camera (3 for translation and 3 for rotation). This allows us to accurately transform and render the virtual object within the scene.
When the phone is approximately steady, the accelerometer sensor provides the acceleration due to the Earth’s gravity. For horizontal planes the gravity vector is parallel to normal of the tracked plane and can accurately provide the initial orientation of phone.
Instant Motion Tracking
The core idea behind instant motion tracking is to decouple the camera’s translation and rotation estimation, treating them instead as independent optimization problems. First, we determine the 3D camera translation solely from the visual signal of the camera. To do this, we observe the target region's apparent 2D translation and relative scale across frames. A simple pinhole camera model relates both translation and scale of a box in the image plane with the final 3D translation of the camera.
The translation and the change in size (relative scale) of the box in the image plane can be used to determine 3D translation between two camera position C1 and C2. However, as our camera model doesn’t assume the focal length of the camera lens, we do not know the true distance/depth of the tracked plane.
To account for this, we added scale estimation to our existing tracker (the one used in Motion Text) as well as region tracking outside the field of view of the camera. When the camera gets closer to the tracked surface, the virtual content scales accurately, which is consistent with perception of real-world objects. When you pan outside the field of view of the target region and back the virtual object will reappear in approximately the same spot.
Independent translation (from visual signal only as shown by red box) and rotation tracking (from gyro; not shown)
After all this, we obtain the device’s 3D rotation (roll, pitch and yaw) using the phone’s built-in gyroscope. The estimated 3D translation combined with the 3D rotation provides us with the ability to render the virtual content correctly in the viewfinder. And because we treat rotation and translation separately, our instant motion tracking approach is calibration free and works on any Android device with a gyroscope.
Augmented chicken family with Motion Stills AR mode
We are excited to bring this new mode to Motion Stills for Android, and we hope you’ll enjoy it. Please download the new release of Motion Stills and keep sending us feedback with #motionstills on your favorite social media.

Acknowledgements
For rendering, we are thankful we were able to leverage Google’s Lullaby engine using animated Poly models. A thank you to our team members who worked on the tech and this launch with us: John Nack, Suril Shah, Igor Kibalchich, Siarhei Kazakou, and Matthias Grundmann.

ARCore: Augmented reality at Android scale

Posted by Dave Burke, VP, Android Engineering

With more than two billion active devices, Android is the largest mobile platform in the world. And for the past nine years, we've worked to create a rich set of tools, frameworks and APIs that deliver developers' creations to people everywhere. Today, we're releasing a preview of a new software development kit (SDK) called ARCore. It brings augmented reality capabilities to existing and future Android phones. Developers can start experimenting with it right now.

We've been developing the fundamental technologies that power mobile AR over the last three years with Tango, and ARCore is built on that work. But, it works without any additional hardware, which means it can scale across the Android ecosystem. ARCore will run on millions of devices, starting today with the Pixel and Samsung's S8, running 7.0 Nougat and above. We're targeting 100 million devices at the end of the preview. We're working with manufacturers like Samsung, Huawei, LG, ASUS and others to make this possible with a consistent bar for quality and high performance.

ARCore works with Java/OpenGL, Unity and Unreal and focuses on three things:

  • Motion tracking: Using the phone's camera to observe feature points in the room and IMU sensor data, ARCore determines both the position and orientation (pose) of the phone as it moves. Virtual objects remain accurately placed.
  • Environmental understanding: It's common for AR objects to be placed on a floor or a table. ARCore can detect horizontal surfaces using the same feature points it uses for motion tracking.
  • Light estimation: ARCore observes the ambient light in the environment and makes it possible for developers to light virtual objects in ways that match their surroundings, making their appearance even more realistic.

Alongside ARCore, we've been investing in apps and services which will further support developers in creating great AR experiences. We built Blocks and Tilt Brush to make it easy for anyone to quickly create great 3D content for use in AR apps. As we mentioned at I/O, we're also working on Visual Positioning Service (VPS), a service which will enable world scale AR experiences well beyond a tabletop. And we think the Web will be a critical component of the future of AR, so we're also releasing prototype browsers for web developers so they can start experimenting with AR, too. These custom browsers allow developers to create AR-enhanced websites and run them on both Android/ARCore and iOS/ARKit.

ARCore is our next step in bringing AR to everyone, and we'll have more to share later this year. Let us know what you think through GitHub, and check out our new AR Experiments showcase where you can find some fun examples of what's possible. Show us what you build on social media with #ARCore; we'll be resharing some of our favorites.