Tag Archives: augmented reality

Blending Realities with the ARCore Depth API

Posted by Shahram Izadi, Director of Research and Engineering

ARCore, our developer platform for building augmented reality (AR) experiences, allows your devices to display content immersively in the context of the world around us-- making them instantly accessible and useful.
Earlier this year, we introduced Environmental HDR, which brings real world lighting to AR objects and scenes, enhancing immersion with more realistic reflections, shadows, and lighting. Today, we're opening a call for collaborators to try another tool that helps improve immersion with the new Depth API in ARCore, enabling experiences that are vastly more natural, interactive, and helpful.
The ARCore Depth API allows developers to use our depth-from-motion algorithms to create a depth map using a single RGB camera. The depth map is created by taking multiple images from different angles and comparing them as you move your phone to estimate the distance to every pixel.
Example depth map

Example depth map, with red indicating areas that are close by, and blue representing areas that are farther away.

One important application for depth is occlusion: the ability for digital objects to accurately appear in front of or behind real world objects. Occlusion helps digital objects feel as if they are actually in your space by blending them with the scene. We will begin making occlusion available in Scene Viewer, the developer tool that powers AR in Search, to an initial set of over 200 million ARCore-enabled Android devices today.

A virtual cat with occlusion off and with occlusion on.

We’ve also been working with Houzz, a company that focuses on home renovation and design, to bring the Depth API to the “View in My Room” experience in their app. “Using the ARCore Depth API, people can see a more realistic preview of the products they’re about to buy, visualizing our 3D models right next to the existing furniture in a room,” says Sally Huang, Visual Technologies Lead at Houzz. “Doing this gives our users much more confidence in their purchasing decisions.”
The Houzz app with occlusion is available today.
The Houzz app with occlusion is available today.
In addition to enabling occlusion, having a 3D understanding of the world on your device unlocks a myriad of other possibilities. Our team has been exploring some of these, playing with realistic physics, path planning, surface interaction, and more.

Physics, path planning, and surface interaction examples.

When applications of the Depth API are combined together, you can also create experiences in which objects accurately bounce and splash across surfaces and textures, as well as new interactive game mechanics that enable players to duck and hide behind real-world objects.
A demo experience we created where you have to dodge and throw food at a robot chef
A demo experience we created where you have to dodge and throw food at a robot chef.
The Depth API is not dependent on specialized cameras and sensors, and it will only get better as hardware improves. For example, the addition of depth sensors, like time-of-flight (ToF) sensors, to new devices will help create more detailed depth maps to improve existing capabilities like occlusion, and unlock new capabilities such as dynamic occlusion—the ability to occlude behind moving objects.
We’ve only begun to scratch the surface of what’s possible with the Depth API and we want to see how you will innovate with this feature. If you are interested in trying the new Depth API, please fill out our call for collaborators form.

ARCore updates to Augmented Faces and Cloud Anchors enable new shared cross-platform experiences

Posted by Christina Tong, Product Manager, Augmented Reality

Two years ago, we launched ARCore, our developer platform for building augmented reality (AR) experiences. Since then, we’ve seen developers create thousands of AR apps across Android and iOS that transform the way people play, shop, learn and create together. To enable even more shared cross-platform AR experiences, we’re announcing new updates to ARCore’s Augmented Faces and Cloud Anchors APIs.

Augmented Faces on iOS

Earlier this year, we announced our Augmented Faces API, which offers a high-quality, 468-point 3D mesh that lets users attach fun effects to their faces — all without a depth sensor on their smartphone. With the addition of iOS support rolling out today, developers can now create effects for more than a billion users. We’ve also made the creation process easier for both iOS and Android developers with a new face effects template.

Improvements to Cloud Anchors

Last year, we introduced the Cloud Anchors API, which lets developers create shared AR experiences across Android and iOS. Cloud Anchors let devices create a 3D feature map from visual data onto which anchors can be placed. The anchors are hosted in the cloud so multiple people can use them to enable shared real world experiences. Cloud Anchors power a wide variety of cross-platform apps, like Just a Line, PHAROS AR and Spacecraft AR.

In our latest ARCore update, we’ve made some improvements to the Cloud Anchors API that make hosting and resolving anchors more efficient and robust. This is due to improved anchor creation and visual processing in the cloud. Now, when creating an anchor, more angles across larger areas in the scene can be captured for a more robust 3D feature map. Once the map is created, the visual data used to create the map is deleted and only anchor IDs are shared with other devices to be resolved. Moreover, multiple anchors in the scene can now be resolved simultaneously, reducing the time needed to start a shared AR experience.

These updates to Cloud Anchors are available for developers today.

Persistent Cloud Anchors and Call for Collaborators

As we look to the future, we’re taking steps to expand the scale and timeline of shared AR experiences with persistent Cloud Anchors. We see this as enabling a “save button” for AR, so that digital information overlaid on top of the real world can be experienced at anytime.

Imagine working together on a redesign of your home throughout the year, leaving AR notes for your friends around an amusement park, or hiding AR objects at specific places around the world to be discovered by others.

Persistent Cloud Anchors are powering Mark AR, a social app being developed by Sybo and iDreamSky that lets people create, discover, and share their AR art with friends and followers in real-world locations. With persistent Cloud Anchors, users can continuously return back to their pieces as they create and collaborate over time.

Mark AR phone demonstration

Mark AR is an app that lets people create and discover AR art in real-world locations.

Reliably anchoring AR content for every use case—regardless of surface, distance, and time—pushes the limits of computation and computer vision because the real world is diverse and always changing. By enabling a “save button” for AR, we’re taking an important step toward bridging the digital and physical worlds to expand the ways AR can be useful in our day-to-day lives.

We’re currently looking for more developers to help us explore and test persistent Cloud Anchors in real world apps at scale, before making the feature broadly available. If you’re interested in early access, you can apply here.

Giving Lens New Reading Capabilities in Google Go



Around the world, millions of people are coming online for the first time, and many of them are among the 800 million adults worldwide who are unable to read or write, or those who are migrating to towns and cities where they are not able to speak the predominant language. As a smartphone camera-based tool, Google Lens has great potential for helping people who struggle with reading and other language-based challenges. Lens uses computer vision, machine learning and Google’s Knowledge Graph to let people turn the things they see in the real world into a visual search box, enabling them to identify objects like plants and animals, or to copy and paste text from the real world into their phone.

However, in order for Lens to be able to help the greatest number of people, we needed to create a special version that can work on even the most basic smartphones. So at I/O 2019, we announced a new version of Lens designed specifically for use in Google Go—our Search app for entry level devices—and we included a new set of features designed to help people who face reading and other language-based challenges. When users point their camera at text they don’t understand, Lens in Google Go can translate and read it out loud. It even highlights each word as it’s being read so users can follow along. If you want to try out these features for yourself, they are available today via Lens in Google Go. While Google Go was initially available only on Android Go devices and on the Google Play Store in select markets, recently, we made it available globally in the Google Play Store.
To make these reading features work, the Google Go version of Lens needs to be able to capture high quality images on a wide variety of devices, then identify the text, understand its structure, translate and overlay it in context, and finally, read it out loud.

Image Capture
Image capture on entry-level devices, like those that run Android Go, is tricky since it must work on a wide variety of devices, many of which are more resource constrained than flagship phones. To build a universal tool that can reliably capture high-quality images with minimal lag, we made Lens in Google Go an early adopter of a new Android support library called CameraX. Available in Jetpack—a suite of libraries, tools, and guidance for Android developers—CameraX is an abstraction layer over the Android Camera2 API that resolves device compatibility issues so developers don't have to write their own device-specific code.

Using CameraX, we implemented two capture strategies to balance capture latency against performance impact. On higher-end phones, which are powerful enough to provide a constant stream of high-resolution frames from which to select an image, we’ve made capture instantaneous. On less advanced devices, streaming these frames could cause camera lag since the CPU is less powerful, so we process the frame when the user taps capture to produce a single, on-demand high-resolution image.

Text Recognition
After Lens in Google Go captures an image, it needs to make sense of the shapes and letters that constitute the words, sentences and paragraphs. To do this, the image is scaled down and transferred to the Lens server, where the processing will be performed. Next, optical character recognition (OCR) is applied, which utilizes a region proposal network to detect character level bounding boxes that can be merged into lines for text recognition.
Merging these character boxes into words is a two-step, sequential process. The first step is to apply the Hough Transform, which assumes the text is distributed across parallel lines. The second step uses Text Flow, which instead traces text that may follow a curve by finding the shortest path through a graph of detected text boxes. This ensures that text with a variety of distributions, be they straight, curved or mixed, can be identified and processed.

Because the images captured by Lens in Google Go may include sources such as signage, handwriting or documents, a slew of additional challenges can arise. For example, the text can be obscured, scripts can be uniquely stylized, and images can be blurry. All of these issues can cause the OCR engine to misunderstand various characters within each word. To correct mistakes and improve word accuracy, Lens in Google Go uses the context of surrounding words to make corrections. It also utilizes the Knowledge Graph to provide contextual clues, such as whether a word is likely a proper noun and should not be spell-corrected.

All of these steps, from script detection and direction identification to text recognition, are performed by separable convolutional neural networks (CNNs) with an additional quantized long short-term memory (LSTM) network. And the models are trained on data from a variety of sources, ranging from ReCaptcha to scanned images from Google Books.
Left: Image with bounding box around recognized text. The raw OCR output from this image reads, “Cise is beauti640”. Right: By applying Knowledge Graph in addition to context from nearby words, Lens in Google Go recognizes the words, “life is beautiful”.
Understanding Structure
Once the individual words have been recognized, Lens must determine how to fit them together. The text that people come across in the real world is laid out in many different ways. A newspaper, for example, is laid out into columns, with headlines, article text, and advertisements. Meanwhile, a bus schedule, has one column for destinations and another with times. While understanding text structure comes very naturally to people, computers need to be taught how to comprehend it. Lens uses CNNs to detect coherent text blocks like columns, or text in a consistent style or color. And then, within each block, it uses signals like text-alignment, language, and the geometric relationship of the paragraphs to determine their final reading order.

One of the other challenges in detecting document structure is that people take pictures of text from different angles, often with a warped perspective. This means we cannot revert to off-the-shelf detectors that rely on axis aligned boxes, but must generalize our systems to be able to deal with homographic distortions.
Paragraph segmentation on the front page of a newspaper. Notice how “News Analysis”, which is embedded in the middle of a column, has been identified separately due to its distinct style features.
Translations in Context
To provide users with the most helpful information, translations must be both accurate and contextual. Lens uses Google Translate’s neural machine translation (NMT) algorithms, to translate entire sentences at a time, rather than going word-by-word, in order to preserve proper grammar and diction.

For the translation to be most useful, it needs to be placed in the context of the original text. For example, when translating instructions on an ATM, it is important to know which buttons correspond to which instructions. Part of the challenge is accounting for the fact that the translated text can be much shorter or longer than the original. For example, German sentences tend to be longer than English ones. To accomplish this seamless overlay, Lens redistributes the translation into lines of similar length, and chooses an appropriate font size to match. It also matches the color of the translation and its background with the original text through the use of a heuristic that assumes the background and the text differ in luminosity, and that the background takes up the majority of the space. This allows Lens to classify whether a pixel represents background or text, and then sample the average color from these two regions to ensure the translated text matches the original text.

Reading the Text Out Loud
The final challenge in delivering information in the most helpful way with Lens in Google Go is reading the text aloud. High-fidelity audio is generated using Google Text-to-Speech (TTS), a service that applies machine learning to disambiguate and detected entities such as dates, phone numbers and addresses, and uses that to generate realistic speech based on DeepMind’s WaveNet.

These reading features become more contextual and useful when they are paired with display. Lens utilizes timing annotations from the TTS service that mark the beginning of each word in order to highlight each word on screen as it’s being read, similar to a karaoke machine. Say for example, a user takes a picture of an ATM screen with different labels next to different buttons. This karaoke effect allows users to know which label applies to which button. It may also help users learn how to pronounce the words being translated.
Looking Ahead
Taken together, it is our hope that these features will have a positive impact on the day-to-day lives of millions of people. Moving forward, we will continue to work on further updates to these reading features to make the OCR more precise, including improvements to text structure understanding (e.g. multi-column text) and recognition of Indic scripts. As we address these text challenges, we continue to look for new ways that the combination of machine learning and the smartphone camera can help people as they go about their lives.

Source: Google AI Blog


Real-Time AR Self-Expression with Machine Learning



Augmented reality (AR) helps you do more with what you see by overlaying digital content and information on top of the physical world. For example, AR features coming to Google Maps will let you find your way with directions overlaid on top of your real world. With Playground - a creative mode in the Pixel camera -- you can use AR to see the world differently. And with the latest release of YouTube Stories and ARCore's new Augmented Faces API you can add objects like animated masks, glasses, 3D hats and more to your own selfies!

One of the key challenges in making these AR features possible is proper anchoring of the virtual content to the real world; a process that requires a unique set of perceptive technologies able to track the highly dynamic surface geometry across every smile, frown or smirk.
Our 3D mesh and some of the effects it enables
To make all this possible, we employ machine learning (ML) to infer approximate 3D surface geometry to enable visual effects, requiring only a single camera input without the need for a dedicated depth sensor. This approach provides the use of AR effects at realtime speeds, using TensorFlow Lite for mobile CPU inference or its new mobile GPU functionality where available. This technology is the same as what powers YouTube Stories' new creator effects, and is also available to the broader developer community via the latest ARCore SDK release and the ML Kit Face Contour Detection API.

An ML Pipeline for Selfie AR
Our ML pipeline consists of two real-time deep neural network models that work together: A detector that operates on the full image and computes face locations, and a generic 3D mesh model that operates on those locations and predicts the approximate surface geometry via regression. Having the face accurately cropped drastically reduces the need for common data augmentations like affine transformations consisting of rotations, translation and scale changes. Instead it allows the network to dedicate most of its capacity towards coordinate prediction accuracy, which is critical to achieve proper anchoring of the virtual content.

Once the location of interest is cropped, the mesh network is only applied to a single frame at a time, using a windowed smoothing in order to reduce noise when the face is static while avoiding lagging during significant movement.
Our 3D mesh in action
For our 3D mesh we employed transfer learning and trained a network with several objectives: the network simultaneously predicts 3D mesh coordinates on synthetic, rendered data and 2D semantic contours on annotated, real world data similar to those MLKit provides. The resulting network provided us with reasonable 3D mesh predictions not just on synthetic but also on real world data. All models are trained on data sourced from a geographically diverse dataset and subsequently tested on a balanced, diverse testset for qualitative and quantitative performance.

The 3D mesh network receives as input a cropped video frame. It doesn't rely on additional depth input, so it can also be applied to pre-recorded videos. The model outputs the positions of the 3D points, as well as the probability of a face being present and reasonably aligned in the input. A common alternative approach is to predict a 2D heatmap for each landmark, but it is not amenable to depth prediction and has high computational costs for so many points.

We further improve the accuracy and robustness of our model by iteratively bootstrapping and refining predictions. That way we can grow our dataset to increasingly challenging cases, such as grimaces, oblique angle and occlusions. Dataset augmentation techniques also expanded the available ground truth data, developing model resilience to artifacts like camera imperfections or extreme lighting conditions.
Dataset expansion and improvement pipeline
Hardware-tailored Inference
We use TensorFlow Lite for on-device neural network inference. The newly introduced GPU back-end acceleration boosts performance where available, and significantly lowers the power consumption. Furthermore, to cover a wide range of consumer hardware, we designed a variety of model architectures with different performance and efficiency characteristics. The most important differences of the lighter networks are the residual block layout and the accepted input resolution (128x128 pixels in the lightest model vs. 256x256 in the most complex). We also vary the number of layers and the subsampling rate (how fast the input resolution decreases with network depth).
Inference time per frame: CPU vs. GPU
The result of these optimizations is a substantial speedup from using lighter models, with minimal degradation in AR effect quality.
Comparison of the most complex (left) and the lightest models (right). Temporal consistency as well as lip and eye tracking is slightly degraded on light models.
The end result of these efforts empowers a user experience with convincing, realistic selfie AR effects in YouTube, ARCore, and other clients by:
  • Simulating light reflections via environmental mapping for realistic rendering of glasses
  • Natural lighting by casting virtual object shadows onto the face mesh
  • Modelling face occlusions to hide virtual object parts behind a face, e.g. virtual glasses, as shown below.
YouTube Stories includes Creator Effects like realistic virtual glasses, based on our 3D mesh
In addition, we achieve highly realistic makeup effects by:
  • Modelling Specular reflections applied on lips and
  • Face painting by using luminance-aware material 
Case study comparing real make-up against our AR make-up on 5 subjects under different lighting conditions.
We are excited to share this new technology with creators, users and developers alike, who can use this new technology immediately by downloading the latest ARCore SDK. In the future we plan to broaden this technology to more Google products.

Acknowledgements
We would like to thank Yury Kartynnik, Valentin Bazarevsky, Andrey Vakunov, Siargey Pisarchyk, Andrei Tkachenka, and Matthias Grundmann for collaboration on developing the current mesh technology; Nick Dufour, Avneesh Sud and Chris Bregler for an earlier version of the technology based on parametric models; Kanstantsin Sokal, Matsvei Zhdanovich, Gregory Karpiak, Alexander Kanaukou, Suril Shah, Buck Bourdon, Camillo Lugaresi, Siarhei Kazakou and Igor Kibalchich for building the ML pipeline to drive impressive effects; Aleksandra Volf and the annotation team for their diligence and dedication to perfection; Andrei Kulik, Juhyun Lee, Raman Sarokin, Ekaterina Ignasheva, Nikolay Chirkov, and Yury Pisarchyk for careful benchmarking and insights on mobile GPU-centric network architecture optimizations.

Source: Google AI Blog


New UI tools and a richer creative canvas come to ARCore

Posted by Evan Hardesty Parker, Software Engineer

ARCore and Sceneform give developers simple yet powerful tools for creating augmented reality (AR) experiences. In our last update (version 1.6) we focused on making virtual objects appear more realistic within a scene. In version 1.7, we're focusing on creative elements like AR selfies and animation as well as helping you improve the core user experience in your apps.

Creating AR Selfies

Example of 3D face mesh application

ARCore's new Augmented Faces API (available on the front-facing camera) offers a high quality, 468-point 3D mesh that lets users attach fun effects to their faces. From animated masks, glasses, and virtual hats to skin retouching, the mesh provides coordinates and region specific anchors that make it possible to add these delightful effects.

You can get started in Unity or Sceneform by creating an ARCore session with the "front-facing camera" and Augmented Faces "mesh" mode enabled. Note that other AR features such as plane detection aren't currently available when using the front-facing camera. AugmentedFace extends Trackable, so faces are detected and updated just like planes, Augmented Images, and other trackables.

// Create ARCore session that support Augmented Faces for use in Sceneform.
public Session createAugmentedFacesSession(Activity activity) throws UnavailableException {
// Use the front-facing (selfie) camera.
Session session = new Session(activity, EnumSet.of(Session.Feature.FRONT_CAMERA));
// Enable Augmented Faces.
Config config = session.getConfig();
config.setAugmentedFaceMode(Config.AugmentedFaceMode.MESH3D);
session.configure(config);
return session;
}

Animating characters in your Sceneform AR apps

Another way version 1.7 expands the AR creative canvas is by letting your objects dance, jump, spin and move around with support for animations in Sceneform. To start an animation, initialize a ModelAnimator (an extension of the existing Android animation support) with animation data from your ModelRenderable.

void startDancing(ModelRenderable andyRenderable) {
AnimationData data = andyRenderable.getAnimationData("andy_dancing");
animator = new ModelAnimator(data, andyRenderable);
animator.start();
}

Solving common AR UX challenges in Unity with new UI components

In ARCore version 1.7 we also focused on helping you improve your user experience with a simplified workflow. We've integrated "ARCore Elements" -- a set of common AR UI components that have been validated with user testing -- into the ARCore SDK for Unity. You can use ARCore Elements to insert AR interactive patterns in your apps without having to reinvent the wheel. ARCore Elements also makes it easier to follow Google's recommended AR UX guidelines.

ARCore Elements includes two AR UI components that are especially useful:

  • Plane Finding - streamlining the key steps involved in detecting a surface
  • Object Manipulation - using intuitive gestures to rotate, elevate, move, and resize virtual objects

We plan to add more to ARCore Elements over time. You can download the ARCore Elements app available in the Google Play Store to learn more.

Improving the User Experience with Shared Camera Access

ARCore version 1.7 also includes UX enhancements for the smartphone camera -- specifically, the experience of switching in and out of AR mode. Shared Camera access in the ARCore SDK for Java lets users pause an AR experience, access the camera, and jump back in. This can be particularly helpful if users want to take a picture of the action in your app.

More details are available in the Shared Camera developer documentation and Java sample.

Learn more and get started

For AR experiences to capture users' imaginations they need to be both immersive and easily accessible. With tools for adding AR selfies, animation, and UI enhancements, ARCore version 1.7 can help with both these objectives.

You can learn more about these new updates on our ARCore developer website.

Using Global Localization to Improve Navigation



One of the consistent challenges when navigating with Google Maps is figuring out the right direction to go: sure, the app tells you to go north - but many times you're left wondering, "Where exactly am I, and which way is north?" Over the years, we've attempted to improve the accuracy of the blue dot with tools like GPS and compass, but found that both have physical limitations that make solving this challenge difficult, especially in urban environments.

We're experimenting with a way to solve this problem using a technique we call global localization, which combines Visual Positioning Service (VPS), Street View, and machine learning to more accurately identify position and orientation. Using the smartphone camera as a sensor, this technology enables a more powerful and intuitive way to help people quickly determine which way to go.
Due to limitations with accuracy and orientation, guidance via GPS alone is limited in urban environments. Using VPS, Street View and machine learning, Global Localization can provide better context on where you are relative to where you're going.
In this post, we'll discuss some of the limitations of navigation in urban environments and how global localization can help overcome them.

Where GPS Falls Short
The process of identifying the position and orientation of a device relative to some reference point is referred to as localization. Various techniques approach localization in different ways. GPS relies on measuring the delay of radio signals from multiple dedicated satellites to determine a precise location. However, in dense urban environments like New York or San Francisco, it can be incredibly hard to pinpoint a geographic location due to low visibility to the sky and signals reflecting off of buildings. This can result in highly inaccurate placements on the map, meaning that your location could appear on the wrong side of the street, or even a few blocks away.
GPS signals bouncing off facades in an urban environment.
GPS has another technical shortcoming: it can only determine the location of the device, not the orientation. Sometimes, sensors in your mobile device can remedy the situation by measuring the magnetic and gravity field of the earth and the relative motion of the device in order to give rough estimates of your orientation. But these sensors are easily skewed by magnetic objects such as cars, pipes, buildings, and even electrical wires inside the phone, resulting in errors that can be inaccurate by up to 180 degrees.

A New Approach to Localization
To improve the precision position and orientation of the blue dot on the map, a new complementary technology is necessary. When walking down the street, you orient yourself by comparing what you see with what you expect to see. Global localization uses a combination of techniques that enable the camera on your mobile device to orient itself much as you would.

VPS determines the location of a device based on imagery rather than GPS signals. VPS first creates a map by taking a series of images which have a known location and analyzing them for key visual features, such as the outline of buildings or bridges, to create a large scale and fast searchable index of those visual features. To localize the device, VPS compares the features in imagery from the phone to those in the VPS index. However, the accuracy of localization through VPS is greatly affected by the quality of the both the imagery and the location associated with it. And that poses another question—where does one find an extensive source of high-quality global imagery?

Enter Street View
Over 10 years ago we launched Street View in Google Maps in order to help people explore the world more deeply. In that time, Street View has continued to expand its coverage of the world, empowering people to not only preview their route, but also step inside famous landmarks and museums, no matter where they are. To deliver global localization with VPS, we connected it with Street View data, making use of information gathered and tested from over 93 countries across the globe. This rich dataset provides trillions of strong reference points to apply triangulation, helping more accurately determine the position of a device and guide people towards their destination.
Features matched from multiple images.
Although this approach works well in theory, making it work well in practice is a challenge. The problem is that the imagery from the phone at the time of localization may differ from what the scene looked like when the Street View imagery was collected, perhaps months earlier. For example, trees have lots of rich detail, but change as the seasons change and even as the wind blows. To get a good match, we need to filter out temporary parts of the scene and focus on permanent structure that doesn't change over time. That's why a core ingredient in this new approach is applying machine learning to automatically decide which features to pay attention to, prioritizing features that are likely to be permanent parts of the scene and ignoring things like trees, dynamic light movement, and construction that are likely transient. This is just one of the many ways in which we use machine learning to improve accuracy.

Combining Global Localization with Augmented Reality
Global localization is an additional option that users can enable when they most need accuracy. And, this increased precision has enabled the possibility of a number of new experiences. One of the newest features we're testing is the ability to use ARCore, Google's platform for building augmented reality experiences, to overlay directions right on top of Google Maps when someone is in walking navigation mode. With this feature, a quick glance at your phone shows you exactly which direction you need to go.
Although early results are promising, there's significant work to be done. One outstanding challenge is making this technology work everywhere, in all types of conditions—think late at night, in a snowstorm, or in torrential downpour. To make sure we're building something that's truly useful, we're starting to test this feature with select Local Guides, a small group of Google Maps enthusiasts around the world who we know will offer us the feedback about how this approach can be most helpful.

Like other AI-driven camera experiences such as Google Lens (which uses the camera to let you search what you see), we believe the ability to overlay directions over the real world environment offers an exciting and useful way to use the technology that already exists in your pocket. We look forward to continuing to develop this technology, and the potential for smartphone cameras to add new types of valuable experiences.

Source: Google AI Blog


Creating More Realistic AR experiences with updates to ARCore & Sceneform

Posted by Ashish Shah, Product Manager, Google AR & VR

The magic of augmented reality is in the way it blends the digital and the physical worlds. For AR experiences to feel truly immersive, digital objects need to look realistic -- as if they were actually there with you, in your space. This is something we continue to prioritize as we update ARCore and Sceneform, our 3D rendering library for Java developers.

Today, with the release of ARCore 1.6, we're bringing further improvements to help you build more realistic and compelling experiences, including better plane boundary tracking and several lighting improvements in Sceneform.

With 250M devices now supporting ARCore, developers can bring these experiences to an even larger and growing user base.

More Realistic Lighting in Sceneform

Previous versions of Sceneform defaulted to optimizing ambient light as yellow. Version 1.6 defaults to neutral and white. This aligns more closely to the way light appears in the real world, making digital objects look more natural. You can see the differences below.

Left side image: Sceneform 1.5Right side image: Sceneform 1.6

This change will also make objects rendered with Sceneform look as if they're affected more naturally by color and lighting in the surrounding environment. For example, if you're viewing an AR object at sunset, it would appear to be illuminated by the red and orange hues, just like real objects in the scene.

In addition, we've updated Sceneform's built-in environmental image to provide a more neutral scene for your app. This will be most noticeable when viewing reflections in smooth metallic surfaces.

Adding screen capture and recording to the mix

To help you further improve quality and engagement in your AR apps, we're adding screen capture and recording to Sceneform. This is something a number of developers have requested to help with demo recording and prototyping. It can also be used as an external facing feature, allowing your users to share screenshots and videos on social media more easily, which can help get the word out about your app.

You can access this functionality through the surface mirroring API for the SceneView class. The API allows you to display the Sceneform view on a device's screen at the same time it's being rendered to another surface (such as the input surface for the Android MediaRecorder).

Learn more and get started

The new updates to Sceneform and ARCore are available today. With these new versions also comes support for new devices, such as the Samsung Galaxy A3 and the Huawei P20 Lite, that will join the list of ARCore-enabled devices. More information is available on the ARCore developer website.

The Instant Motion Tracking Behind Motion Stills AR



Last summer, we launched Motion Stills on Android, which delivered a great video capture and viewing experience on a wide range of Android phones. Then, we refined our Motion Stills technology further to enable the new motion photos feature in Pixel 2.

Today, we are excited to announce the new Augmented Reality (AR) mode in Motion Stills for Android. With the new AR mode, a user simply touches the viewfinder to place fun, virtual 3D objects on static or moving horizontal surfaces (e.g. tables, floors, or hands), allowing them to seamlessly interact with a dynamic real-world environment. You can also record and share the clips as GIFs and videos.
Motion Stills with instant motion tracking in action
AR mode is powered by instant motion tracking, a six degree of freedom tracking system built upon the technology that powers Motion Text in Motion Stills iOS and the privacy blur on YouTube to accurately track static and moving objects. We refined and enhanced this technology to enable fun AR experiences that can run on any Android device with a gyroscope.

When you touch the viewfinder, Motion Stills AR “sticks” a 3D virtual object to that location, making it look as if it’s part of the real-world scene. By assuming that the tracked surface is parallel to the ground plane, and using the device’s accelerometer sensor to provide the initial orientation of the phone with respect to the ground plane, one can track the six degrees of freedom of the camera (3 for translation and 3 for rotation). This allows us to accurately transform and render the virtual object within the scene.
When the phone is approximately steady, the accelerometer sensor provides the acceleration due to the Earth’s gravity. For horizontal planes the gravity vector is parallel to normal of the tracked plane and can accurately provide the initial orientation of phone.
Instant Motion Tracking
The core idea behind instant motion tracking is to decouple the camera’s translation and rotation estimation, treating them instead as independent optimization problems. First, we determine the 3D camera translation solely from the visual signal of the camera. To do this, we observe the target region's apparent 2D translation and relative scale across frames. A simple pinhole camera model relates both translation and scale of a box in the image plane with the final 3D translation of the camera.
The translation and the change in size (relative scale) of the box in the image plane can be used to determine 3D translation between two camera position C1 and C2. However, as our camera model doesn’t assume the focal length of the camera lens, we do not know the true distance/depth of the tracked plane.
To account for this, we added scale estimation to our existing tracker (the one used in Motion Text) as well as region tracking outside the field of view of the camera. When the camera gets closer to the tracked surface, the virtual content scales accurately, which is consistent with perception of real-world objects. When you pan outside the field of view of the target region and back the virtual object will reappear in approximately the same spot.
Independent translation (from visual signal only as shown by red box) and rotation tracking (from gyro; not shown)
After all this, we obtain the device’s 3D rotation (roll, pitch and yaw) using the phone’s built-in gyroscope. The estimated 3D translation combined with the 3D rotation provides us with the ability to render the virtual content correctly in the viewfinder. And because we treat rotation and translation separately, our instant motion tracking approach is calibration free and works on any Android device with a gyroscope.
Augmented chicken family with Motion Stills AR mode
We are excited to bring this new mode to Motion Stills for Android, and we hope you’ll enjoy it. Please download the new release of Motion Stills and keep sending us feedback with #motionstills on your favorite social media.

Acknowledgements
For rendering, we are thankful we were able to leverage Google’s Lullaby engine using animated Poly models. A thank you to our team members who worked on the tech and this launch with us: John Nack, Suril Shah, Igor Kibalchich, Siarhei Kazakou, and Matthias Grundmann.

ARCore: Augmented reality at Android scale

Posted by Dave Burke, VP, Android Engineering

With more than two billion active devices, Android is the largest mobile platform in the world. And for the past nine years, we've worked to create a rich set of tools, frameworks and APIs that deliver developers' creations to people everywhere. Today, we're releasing a preview of a new software development kit (SDK) called ARCore. It brings augmented reality capabilities to existing and future Android phones. Developers can start experimenting with it right now.

We've been developing the fundamental technologies that power mobile AR over the last three years with Tango, and ARCore is built on that work. But, it works without any additional hardware, which means it can scale across the Android ecosystem. ARCore will run on millions of devices, starting today with the Pixel and Samsung's S8, running 7.0 Nougat and above. We're targeting 100 million devices at the end of the preview. We're working with manufacturers like Samsung, Huawei, LG, ASUS and others to make this possible with a consistent bar for quality and high performance.

ARCore works with Java/OpenGL, Unity and Unreal and focuses on three things:

  • Motion tracking: Using the phone's camera to observe feature points in the room and IMU sensor data, ARCore determines both the position and orientation (pose) of the phone as it moves. Virtual objects remain accurately placed.
  • Environmental understanding: It's common for AR objects to be placed on a floor or a table. ARCore can detect horizontal surfaces using the same feature points it uses for motion tracking.
  • Light estimation: ARCore observes the ambient light in the environment and makes it possible for developers to light virtual objects in ways that match their surroundings, making their appearance even more realistic.

Alongside ARCore, we've been investing in apps and services which will further support developers in creating great AR experiences. We built Blocks and Tilt Brush to make it easy for anyone to quickly create great 3D content for use in AR apps. As we mentioned at I/O, we're also working on Visual Positioning Service (VPS), a service which will enable world scale AR experiences well beyond a tabletop. And we think the Web will be a critical component of the future of AR, so we're also releasing prototype browsers for web developers so they can start experimenting with AR, too. These custom browsers allow developers to create AR-enhanced websites and run them on both Android/ARCore and iOS/ARKit.

ARCore is our next step in bringing AR to everyone, and we'll have more to share later this year. Let us know what you think through GitHub, and check out our new AR Experiments showcase where you can find some fun examples of what's possible. Show us what you build on social media with #ARCore; we'll be resharing some of our favorites.