Tag Archives: AI

A Structured Approach to Unsupervised Depth Learning from Monocular Videos



Perceiving the depth of a scene is an important task for an autonomous robot — the ability to accurately estimate how far from the robot objects are, is crucial for obstacle avoidance, safe planning and navigation. While depth can be obtained (and learned) from sensor data, such as LIDAR, it is also possible to learn it in an unsupervised manner from a monocular camera only, relying on the motion of the robot and the resulting different views of the scene. In doing so, the “ego-motion” (the motion of the robot/camera between two frames) is also learned, which provides localization of the robot itself. While this approach has a long history — coming from the structure-from-motion and multi-view geometry paradigms — new learning based techniques, more specifically for unsupervised learning of depth and ego-motion by using deep neural networks, have advanced the state of the art, including work by Zhou et al., and our own prior research which aligns 3D point clouds of the scene during training.

Despite these efforts, learning to predict scene depth and ego-motion remains an ongoing challenge, specifically when handling highly dynamic scenes and estimating proper depth of moving objects. Because previous research efforts for unsupervised monocular learning do not model moving objects, it can result in consistent misestimation of objects’ depth, often resulting in mapping their depth to infinity.

In “Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos”, to appear in AAAI 2019, we propose a novel approach which is able to model moving objects and produces high quality depth estimation results. Our approach is able to recover the correct depth for moving objects compared to previous methods for unsupervised learning from monocular videos. In our paper, we also propose a seamless online refinement technique that can further improve quality and be applied for transfer across datasets. Furthermore, to encourage even more advanced approaches of onboard robotics learning, we have open sourced the code in TensorFlow.
Previous work (middle row) has not been able to correctly estimate depth of moving objects mapping them to infinity (dark blue regions in the heatmap). Our approach (right) provides much better depth estimates.
Structure
A key idea in our approach is to introduce structure into the learning framework. That is, instead of relying on a neural network to learn depth directly, we treat the monocular scene as 3D, composed of moving objects, including the robot itself. The respective motions are modeled as independent transformations — rotations and translations — in the scene, which is then used to model the 3D geometry and estimate all the objects’ motions. Additionally, knowing which objects may potentially move (e.g., cars, people, bicycles, etc.) helps us learn separate motion vectors for them even if they may be static. By decomposing the scene into 3D and individual objects, better depth and ego-motion in the scene is learned, especially on very dynamic scenes.

We tested this method on both KITTI and Cityscapes urban driving datasets, and found that it outperforms state-of-the-art approaches, and is approaching in quality methods which used stereo pair videos as training supervision. Importantly, we are able to recover correctly the depth of a car moving at the same speed as the ego-motion vehicle. This has been challenging previously — in this case, the moving vehicle appears (in a monocular input) as static, exhibiting the same behavior as the static horizon, resulting in an inferred infinite depth. While stereo inputs can solve that ambiguity, our approach is the first one that is able to correctly infer that from a monocular input.
Previous work with monocular inputs were not able to extract moving objects and incorrectly map them to infinity.
Furthermore, since objects are treated individually in our method, the algorithm is able to provide for the motion vectors for each individual object, i.e. which is an estimate of where it is heading:
Example depth results for a dynamic scene together with estimates of the motion vectors of the individual objects (rotation angles are estimated too, but for simplicity are not shown).
In addition to these results, this research provides motivation for further exploring what an unsupervised learning approach can achieve, as monocular inputs are cheaper and easier to deploy than stereo or LIDAR sensors. As can be seen in the figures below, in both the KITTI and Cityscapes datasets, the supervision sensor (be it stereo or LIDAR) is missing values and may occasionally be misaligned with the camera input, which happens due to time delay.
Depth prediction from monocular video input on the KITTI dataset, middle row, compared to ground truth depth from a Lidar sensor; the latter does not cover the full scene and has missing and noisy values. Ground truth depth is not used during training.
Depth prediction on the Cityscapes dataset. Left to right: image, baseline, our method and ground truth provided by stereo. Note the missing values in the stereo ground truth. Also note that our algorithm is able to achieve these results without any ground truth depth supervision.
Ego-motion
Our results also provide the best among the state-of-the-art estimates in ego-motion, which is crucial for autonomous robots, as it provides localization of the robots while moving in the environment. The video below shows results from our method that visualizes the speed and turning angle, obtained from the inferred ego-motion. While the outputs of both depth and ego-motion are valid up to a scalar, we can see that it is able to estimate its relative speed when slowing down and stopping.
Depth and ego-motion prediction. Follow the speed and the turning angle indicator to see the estimates when the car is taking a turn or stopping for a red light.
Transfer Across Domains
An important characteristic of a learning algorithm is its adaptability when moved to an unknown environment. In this work we further introduce an online refinement approach which continues to learn online while collecting new data. Below are examples of improvement of the estimated depth quality, after training on Cityscapes and online refinement on KITTI.
Online refinement when training on the Cityscapes Data and testing on KITTI. The images show depth prediction of the trained model, and of the trained model with online refinement. Depth prediction with online refinement better outlines the objects in the scene.
We further tested on a notably different dataset and setting, i.e. on an indoor dataset collected by the Fetch robot, while the training is done on the outdoor urban driving Cityscapes dataset. As to be expected, there is a large discrepancy between these datasets. Despite this, we observe that the online learning technique is able to obtain better depth estimates than the baseline.
Results of online adaptation when transferring the learning model from Cityscapes (an outdoors dataset collected from a moving car) to a dataset collected indoors by the Fetch robot. The bottom row shows improved depth after applying online refinement.
In summary, this work addresses unsupervised learning of depth and ego-motion from a monocular camera, and tackles the problem in highly dynamic scenes. It achieves high quality depth and ego-motion results and with quality comparable to stereo and sets forward the idea of incorporating structure in the learning process. More notably, our proposed combination of unsupervised learning of depth and ego-motion from monocular video only and online adaptation demonstrates a powerful concept, because not only can it learn in unsupervised manner from simple video, but it can also be transferred easily to other datasets.

Acknowledgements
This research was conducted by Vincent Casser, Soeren Pirk, Reza Mahjourian and Anelia Angelova. We would like to thank Ayzaan Wahid for his help with data collection and Martin Wicke and Vincent Vanhoucke for their support and encouragement.

Source: Google AI Blog


AI brings “dreams” to life at the Walt Disney Concert Hall

Walt Disney Concert Hall (WDCH) has been home to the Los Angeles Philharmonic since 2003. When architect Frank Gehry designed the Concert Hall, he hoped that the beauty of the music created within its walls would one day be reflected on the outside. So to mark the Philharmonic’s 100-year anniversary this fall, artist Refik Anadol collaborated with the Artists and Machine Intelligence Program at Google Arts and Culture and the Philharmonic to pay tribute to the past and to “dream” what’s to come in the future.

Along with Google engineers, Refik used machine learning to interpret nearly 45 terabytes of data, comprised of audio recordings of past performances, and historic images from the LA Philharmonic’s archive, like photographs and printed programs. Using multiple machine learning algorithms, he identified patterns in the images and create narratives—“dreams”—from these compositions, with the vision of projecting them onto the music hall itself.

WDCH Dreams

To visualize 18,000 hours of audio recordings, computational artist and researcher Parag K. Metal developed an audio browser tool to explore the archive by 256 attributes such as pitch, timbre, amplitude, tempo, tonality and key. Using this tool, Refik and sound designers Kerim Karaoglu and Robert Thomas hand-picked specific “memories” and curated a unique soundtrack that accompanied the visual narrative, called WDCH Dreams.

The 12-minute projection premiered this September, and illuminated downtown Los Angeles for eight days, every 30 minutes from 7:30 p.m. to 11:30 p.m. Until October 2019, visitors can explore the LA Philharmonic’s archives in an interactive exhibition at the Walt Disney Concert Hall, in the Ira Gershwin Gallery. Discover more about the LA Philharmonic and WDCH Dreams onGoogle Arts and Culture —or download our free app for iOS or Android.

A tale of a whale song

Like us, whales sing. But unlike us, their songs can travel hundreds of miles underwater. Those songs potentially help them find a partner, communicate and migrate around the world. But what if we could use these songs and machine learning to better protect them?

Despite decades of being protected against whaling, 15 species of whales are still listed under the Endangered Species Act. Even species that are successfully recovering—such as humpback whales—suffer from threats like entanglement in fishing gear and collisions with vessels, which are among the leading causes of non-natural deaths for whales.

To better protect those animals, the first step is to know where they are and when, so that we can mitigate the risks they face—whether that's putting the right marine protected areas in place or giving warnings to vessels. Since most whales and dolphins spend very little time at the surface of the water, visually finding and counting them is very difficult. This is why NOAA’s Pacific Islands Fisheries Science Center, responsible for monitoring populations of whales and other marine mammals in U.S. Pacific waters, relies instead on listening using underwater audio recorders.

NOAA has been using High-frequency Acoustic Recording Packages (HARPs) to record underwater audio at 12 different sites in the Pacific Ocean, some starting as early as 2005. They have accumulated over 170,000 hours of underwater audio recordings. It would take over 19 years for someone to listen to all of it, working 24 hours a day!

image4

Crew members deploy a high-frequency acoustic recording package (HARP) to detect cetacean sounds underwater (Photo credit: NOAA Fisheries).

To help tackle this problem, we teamed up with NOAA to train a deep neural network that automatically identifies which whale species are calling in these very long underwater recordings, starting with humpback whales. The effort fits into our AI for Social Good program, applying the latest in machine learning to the world’s biggest social, humanitarian and environmental challenges.

The problem of picking out humpback whale songs underwater is particularly difficult to solve for several reasons. Underwater noise conditions can vary: for example, the presence of rain or boat noises can confuse a machine learning model. The distance between a recorder and the whales can cause the calls to be very faint. Finally, humpback whale calls are particularly difficult to classify because they are not stereotyped like blue or fin whale calls—instead, humpbacks produce complex songs and a variety of vocalizations that change over time.

https://youtu.be/k6oeR4yK_oo

A spectrogram (visual representation of the sound) of a humpback whale song in Hawaii.

We decided to leverage Google’s existing work on large-scale sound classification and train a humpback whale classifier on NOAA’s partially annotated underwater data set. We started by turning the underwater audio data into a visual representation of the sound called a spectrogram, and then showed our algorithm many example spectrograms that were labeled with the correct species name. The more examples we can show it, the better our algorithm gets at automatically identifying those sounds. For a deeper dive (ahem) into the techniques we used, check out our Google AI blog post.

Now that we can find and identify humpback whales in recordings, it allows us to understand where they are and where they are going—as shown by the animation below.

image5

Since 2005, NOAA’s Pacific Islands Fisheries Science Center has deployed, recovered and collected recordings from hydrophones moored on the ocean bottom at 12 sites. On this map, you can see the spots where more whales were found by our classifier in orange and yellow.

In the future, we plan to use our classifier to help NOAA better understand humpback whales by identifying changes in breeding location or migration paths, changes in relative abundance (which can be related to human activity), changes in song over the years and differences in song between populations. This could also help directly protect whales by advising vessels to modify their routes when a lot of whales are present in a certain area. Such work is already being done for right whales, which are easier to monitor because of their relatively simple sounds.

The ocean is big and humpback whales are not the only ones to make noise, so we also started training our classifier on more species sounds (like the southern resident killer whale, which is critically endangered). We can’t see the species that live underwater, but we can hear a lot of them. With the help of machine learning, we hope that one day we can detect and classify a lot of these species sounds, giving biologists around the world the information needed to better understand and protect them.

image6

A humpback whale breaching at the surface of the water. (Photo credit: Hawaiian Islands Humpback Whale National Marine Sanctuary.)


AI for Social Good

In pop culture, artificial intelligence (AI) often shows up as a robot companion, like TARS in “Interstellar,” or some far-out superintelligence. But in reality, AI—computer programming tools that help us find patterns in complex data and make everyday products more useful—already powers a lot of technology around us, and is addressing some of society’s biggest unsolved challenges.

For the past few years we’ve been applying core Google AI research and engineering to projects with positive societal impact, including forecasting floods, protecting whales, and predicting famine. Today we’re unifying these efforts in a new program called AI for Social Good. We’re applying AI to a wide range of problems, partnering with external organizations to work toward solutions.


But we’re far from having all the answers—or even knowing all the questions. We want people from as many backgrounds as possible to surface problems that AI can help solve, and to be empowered to create solutions themselves. So as a part of AI for Social Good, we’re also launching the Google AI Impact Challenge, a global call for nonprofits, academics, and social enterprises from around the world to submit proposals on how they could use AI to help address some of the world’s greatest social, humanitarian and environmental problems.


We’ll help selected organizations bring their proposals to life with coaching from Google’s AI experts, Google.org grant funding from a $25 million pool, and credits and consulting from Google Cloud. Grantees will also join a specialized Launchpad Accelerator program, and we’ll tailor additional support to each project’s needs in collaboration with data science nonprofit DataKind. In spring of 2019, an international panel of experts, who work in computer science and the social sector, will help us choose the top proposals.


We don’t expect applicants to be AI experts. For any nonprofit or researcher who has a great idea or wants help brainstorming one, we've built an educational guide with introductions to AI and the types of problems it’s well-suited for, as well as workshops in key locations around the world.


To give you a sense of the potential we see, here are a few examples of how Google and others have already used AI over the past few years:

  • Wildlife conservation:To better protect endangered whales, we have to know where they are. With AI developed at Google—in the same vein as research by college student Daniel de Leon—it’s possible to quickly scan 100,000 hours of audio recorded in the Pacific to identify whale sounds. We hope one day we can not only better identify whales in these recordings, but also accurately deploy this system at scale to find and protect whales.
  • Employment: In South Africa, Harambee Youth Employment Accelerator helps connect unemployed youth with entry-level positions. As a participant in Google Cloud’s Data Solutions for Change program, they’ve used data analytics and ML to match over 50,000 candidates with jobs.
  • Flood prediction: Floods affect up to 250 million people, causing thousands of fatalities and inflicting billions of dollars of economic damage every year. At Google, we’ve combined physics-based modeling and AI to provide earlier and more accurate flood warnings through Google Public Alerts.
  • Wildfire prevention: Two high school students in California built a device that uses AI to identify and predict areas in a forest that are susceptible to wildfires. This technology could one day provide an early warning to fire authorities. 
  • Infant health: Ubenwa is a Canadian company that built an AI system to analyze the sounds of a baby crying and predict the risk of birth asphyxia (when a baby's brain and other organs don’t get enough oxygen and nutrients during birth). It’s a mobile app so it can be widely used even where doctors aren’t readily available.

We’re excited to see what new ideas nonprofits, developers and social entrepreneurs from across the world come up with—and we’re looking forward to supporting them as best we can.


Making creative tools more accessible for everyone

Before I got into the accessibility field, I worked as an art therapist where I met people from all walks of life. No matter the reason why they came to therapy, almost everyone I met seemed to benefit from engaging in the creative process.  Art gives us the ability to point beyond spoken or written language, to unite us, delight, and satisfy. Done right, this process can be enhanced by technology—extending our ability and potential for play.

One of my first sessions as a therapist was with a middle school student on the autism spectrum. He had trouble communicating and socializing with his peers, but in our sessions together he drew, made elaborate scenes with clay, and made music.

Another key moment for me was when I met Chancey Fleet, a blind technology educator and accessibility advocate. I was learning how to program at the time, and together we built a tool to help her plan a dinner event. It was a visual and audio diagramming tool that paired with her screen reader technology. This collaboration got me excited about the potential of technology to make art and creativity more accessible, and it emphasized the importance of collaborative approaches to design.

This sentiment has carried over into the accessibility research and design work that I do at the NYU Ability Project, a research space where we explore the intersection of disability and technology. Our projects bring together engineers, designers, educators, artists and therapists within and beyond the accessibility community. Like so many technological innovations that have begun as assistive and rehabilitative tech, we hope our work will eventually benefit everyone. That’s why when Google reached out to me with an opportunity to explore ideas around creativity and accessibility, I jumped at the chance.

Together, we made Creatability, a set of experiments that explore how creative tools–drawing, music and more–can be made more accessible using web and AI technology. The project is a collaboration with creators and allies in the accessibility community, such as: Jay Alan Zimmerman, a composer who is deaf; Josh Miele, a blind scientist, designer, and educator; Chancey Fleet, a blind, accessibility advocate, and technology educator; as well as, Barry Farrimond and Doug Bott of Open Up Music, a group focused on empowering young disabled musicians to build inclusive youth orchestras.

Creatability keyboard

The experiments explore a diverse set of inputs--from a computer mouse and keystrokes to your body, wrist, nose, or voice. For example, you can make music by moving your facedraw using sight or sound, and experience music visually.

The key technology we used was a machine learning model called Posenet that can detect key body joints in images and videos. This technology lets you control the experiments with your webcam, simply by moving your body. And it’s powered by Tensorflow.js—a library that runs machine learning models on-device and in your browser, which means your images are never stored or sent to a server.

Creating sound

We hope these experiments inspire others to unleash their inner artist regardless of ability. That’s why we’re open sourcing the code and have created helpful guides as starting points for people to create their own projects. If you create a new experiment or want to share your story of how you used the experiments, you can submit to be featured on the Creatability site at g.co/creatability.

Curiosity and Procrastination in Reinforcement Learning



Reinforcement learning (RL) is one of the most actively pursued research techniques of machine learning, in which an artificial agent receives a positive reward when it does something right, and negative reward otherwise. This carrot-and-stick approach is simple and universal, and allowed DeepMind to teach the DQN algorithm to play vintage Atari games and AlphaGoZero to play the ancient game of Go. This is also how OpenAI taught its OpenAI-Five algorithm to play the modern video game Dota, and how Google taught robotic arms to grasp new objects. However, despite the successes of RL, there are many challenges to making it an effective technique.

Standard RL algorithms struggle with environments where feedback to the agent is sparse — crucially, such environments are common in the real world. As an example, imagine trying to learn how to find your favorite cheese in a large maze-like supermarket. You search and search but the cheese section is nowhere to be found. If at every step you receive no “carrot” and no “stick”, there’s no way to tell if you are headed in the right direction or not. In the absence of rewards, what is to stop you from wandering around in circles? Nothing, except perhaps your curiosity, which motivates you go into a product section that looks unfamiliar to you in pursuit of your sought-after cheese.

In “Episodic Curiosity through Reachability” — the result of a collaboration between the Google Brain team, DeepMind and ETH Zürich — we propose a novel episodic memory-based model of granting RL rewards, akin to curiosity, which leads to exploring the environment. Since we want the agent not only to explore the environment but also to solve the original task, we add a reward bonus provided by our model to the original sparse task reward. The combined reward is not sparse anymore which allows standard RL algorithms to learn from it. Thus, our curiosity method expands the set of tasks which are solvable with RL.
Episodic Curiosity through Reachability: Observations are added to memory, reward is computed based on how far the current observation is from the most similar observation in memory. The agent receives more reward for seeing observations which are not yet represented in memory.
The key idea of our method is to store the agent's observations of the environment in an episodic memory, while also rewarding the agent for reaching observations not yet represented in memory. Being “not in memory” is the definition of novelty in our method — seeking such observations means seeking the unfamiliar. Such a drive to seek the unfamiliar will lead the artificial agent to new locations, thus keeping it from wandering in circles and ultimately help it stumble on the goal. As we will discuss later, our formulation can save the agent from undesired behaviours which some other formulations are prone to. Much to our surprise, those behaviours bear some similarity to what a layperson would call “procrastination”.

Previous Curiosity Formulations
While there have been many attempts to formulate curiosity in the past[1][2][3][4], in this post we  focus on one natural and very popular approach: curiosity through prediction-based surprise, explored in the recent paper “Curiosity-driven Exploration by Self-supervised Prediction” (commonly referred to as the ICM method). To illustrate how surprise leads to curiosity, again consider our analogy of looking for cheese in a supermarket.
Illustration © Indira Pasko, used under CC BY-NC-ND 4.0 license.
As you wander throughout the market, you try to predict the future (“Now I’m in the meat section, so I think the section around the corner is the fish section — those are usually adjacent in this supermarket chain”). If your prediction is wrong, you are surprised (“No, it’s actually the vegetables section. I didn’t expect that!”) and thus rewarded. This makes you more motivated to look around the corner in the future, exploring new locations just to see if your expectations about them meet the reality (and, hopefully, stumble upon the cheese).

Similarly, the ICM method builds a predictive model of the dynamics of the world and gives the agent rewards when the model fails to make good predictions — a marker of surprise or novelty. Note that exploring unvisited locations is not directly a part of the ICM curiosity formulation. For the ICM method, visiting them is only a way to obtain more “surprise” and thus maximize overall rewards. As it turns out, in some environments there could be other ways to inflict self-surprise, leading to unforeseen results.
Agent imbued with surprise-based curiosity gets stuck when it encounters TV. GIF adopted from a video by © Deepak Pathak, used under CC BY 2.0 license.
The Dangers of “Procrastination”
In "Large-Scale Study of Curiosity-Driven Learning", the authors of the ICM method along with researchers from OpenAI show a hidden danger of surprise maximization: agents can learn to indulge procrastination-like behaviour instead of doing something useful for the task at hand. To see why, consider a common thought experiment the authors call the “noisy TV problem”, in which an agent is put into a maze and tasked with finding a highly rewarding item (akin to “cheese” in our previous supermarket example). The environment also contains a TV for which the agent has the remote control. There is a limited number of channels (each with a distinct show) and every press on the remote control switches to a random channel. How would an agent perform in such an environment?

For the surprise-based curiosity formulation, changing channels would result in a large reward, as each change is unpredictable and surprising. Crucially, even after cycling through all the available channels, the random channel selection ensures every new change will still be surprising — the agent is making predictions about what will be on the TV after a channel change, and will very likely be wrong, leading to surprise. Importantly, even if the agent has already seen every show on every channel, the change is still unpredictable. Because of this, the agent imbued with surprise-based curiosity would eventually stay in front of the TV forever instead of searching for a highly rewarding item — akin to procrastination. So, what would be a definition of curiosity which does not lead to such behaviour?

Episodic Curiosity
In “Episodic Curiosity through Reachability”, we explore an episodic memory-based curiosity model that turns out to be less prone to “self-indulging” instant gratification. Why so? Using our example above, after changing channels for a while, all of the shows will end up in memory. Thus, the TV won’t be so attractive anymore: even if the order of shows appearing on the screen is random and unpredictable, all those shows are already in memory! This is the main difference to the surprise-based methods: our method doesn’t even try to make bets about the future which could be hard (or even impossible) to predict. Instead, the agent examines the past to know if it has seen observations similar to the current one. Thus our agent won’t be drawn that much to the instant gratification provided by the noisy TV. It will have to go and explore the world outside of the TV to get more reward.

But how do we decide whether the agent is seeing the same thing as an existing memory? Checking for an exact match could be meaningless: in a realistic environment, the agent rarely sees exactly the same thing twice. For example, even if the agent returned to exactly the same room, it would still see this room under a different angle compared to its memories.

Instead of checking for an exact match in memory, we use a deep neural network that is trained to measure how similar two experiences are. To train this network, we have it guess whether two observations were experienced close together in time, or far apart in time. Temporal proximity is a good proxy for whether two experiences should be judged to be part of the same experience. This training leads to a general concept of novelty via reachability which is illustrated below.
Graph of reachabilities would determine novelty. In practice, this graph is not available — so we train a neural network approximator to estimate a number of steps between observations.
Experimental Results
To compare the performance of different approaches to curiosity, we tested them in two visually rich 3D environments: ViZDoom and DMLab. In those environments, the agent was tasked with various problems like searching for a goal in a maze or collecting good and avoiding bad objects. The DMLab environment happens to provide the agent with a laser-like science fiction gadget. The standard setting in the previous work on DMLab was to equip the agent with this gadget for all tasks, and if the agent does not need a gadget for a particular task, it is free not to use it. Interestingly, similar to the noisy TV experiment described above, the surprise-based ICM method actually uses this gadget a lot even when it is useless for the task at hand! When tasked with searching for a high-rewarding item in the maze, it instead prefers to spend time tagging walls because this yields a lot of “surprise” reward. Theoretically, predicting the result of tagging should be possible, but in practice is too hard as it apparently requires a deeper knowledge of physics than is available to a standard agent.
Surprise-based ICM method is persistently tagging the wall instead of exploring the maze.
Our method instead learns reasonable exploration behaviour under the same conditions. This is because it does not try to predict the result of its actions, but rather seeks observations which are “harder” to achieve from those already in the episodic memory. In other words, the agent implicitly pursues goals which require more effort to reach from memory than just a single tagging action.
Our method shows reasonable exploration.
It is interesting to see that our approach to granting reward penalizes an agent running in circles. This is because after completing the first circle the agent does not encounter new observations other than those in memory, and thus receives no reward:
Our reward visualization: red means negative reward, green means positive reward. Left to right: map with rewards, map with locations currently in memory, first-person view.
At the same time, our method favors good exploration behavior:
Our reward visualization: red means negative reward, green means positive reward. Left to right: map with rewards, map with locations currently in memory, first-person view.
We hope that our work will help lead to a new wave of exploration methods, going beyond surprise and learning more intelligent exploration behaviours. For an in-depth analysis of our method, please take a look at the preprint of our research paper.

Acknowledgements:
This project is a result of a collaboration between the Google Brain team, DeepMind and ETH Zürich. The core team includes Nikolay Savinov, Anton Raichuk, Raphaël Marinier, Damien Vincent, Marc Pollefeys, Timothy Lillicrap and Sylvain Gelly. We would like to thank Olivier Pietquin, Carlos Riquelme, Charles Blundell and Sergey Levine for the discussions about the paper. We are grateful to Indira Pasko for the help with illustrations.

References:
[1] "Count-Based Exploration with Neural Density Models", Georg Ostrovski, Marc G. Bellemare, Aaron van den Oord, Remi Munos
[2] "#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning", Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
[3] "Unsupervised Learning of Goal Spaces for Intrinsically Motivated Goal Exploration", Alexandre Péré, Sébastien Forestier, Olivier Sigaud, Pierre-Yves Oudeyer
[4] "VIME: Variational Information Maximizing Exploration", Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel

Source: Google AI Blog


Strike a pose with Pixel 3

With Pixel, we want to give you a camera that you can always trust and rely on. That means a camera which is fast, can take photos in any light and has built-in intelligence to capture those moments that only happen once. The camera should also give you a way to get creative with your photos and videos and be able to easily edit and share.

To celebrate Pixel 3 hitting the shelves in the US today, here are 10 things you can do with the Pixel camera.

1. Just point and shoot!

The Pixel camera has HDR+ on by default which uses computational photography to help you take better pictures in scenes where there is a range of brightness levels. When you press the shutter button, HDR+ actually captures a rapid burst of pictures, then quickly combines them into one. This improves results in both low-light and high dynamic range situations.

2. Top Shot

Get the best shot on the first try. When you take a motion photo, Top Shot captures alternate high-quality shots, then recommends the best one—even if it’s not exactly when you hit the shutter. Behind the scenes,Top Shot looks for those shots where everyone is smiling, with eyes open and facing the camera. Just click on the thumbnail when you take a picture and you’ll get a suggestion to choose a better picture when one is available. You can also find top shots on photos whenever you want by swiping up on the photo in Google Photos. Top Shot works best on people and is getting better all the time.

Top Shot

Top Shot on Pixel 3 

3. Night Sight

In low light scenes when you'd typically use flash—but don't want to because it makes a big scene, blinds your friends, and leaves harsh, uneven lighting—Night Sight can help you take colorful, detailed and low-noise pictures in super low light. Night Sight is coming soon to Pixel. 

4. Super Res Zoom

Pixel 3 lets you zoom in and still get sharp, detailed images. Fun fact: this works by taking advantage of the natural shaking of your hand when you take a photo. For every zoomed shot, we combine a burst of slightly different images, resulting in better resolution, and lower noise. So when you pinch-zoom before pressing the shutter, you’ll definitely get a lot more details in your picture than if you crop afterwards.

5. Group Selfie Cam

If you’re having trouble fitting everyone in shot, or you want the beautiful scenery as well as your beautiful face, try our new wide angle lens that lets you get much more in your selfie. You can get up to 184% more in the shot*, or 11 people is my own personal record. Wide angle lenses fit more people in the shot, but they also stretch and distort faces that are on the edge. The Pixel camera uses AI to correct this, so every face looks natural and you can use the full field of view of the selfie cam.

6. Photobooth

You spend ages getting the selfie at precisely the right angle, but then you try and reach the shutter button and lose the frame. Photobooth mode lets you take photos without pressing the shutter button: simply smile, poke your tongue out, or pucker those lips.

7. Playground

Bring more of your imagination to a scene with Playmoji— augmented reality characters that react to each other and to you—and add animated stickers and fun captions to your photos and videos. Playground also works on the front camera, so you can up your selfie game by standing next to characters you love, like Iron Man from the Marvel Cinematic Universe.

Playground on Pixel 3

Playground on Pixel 3 helps you create and play with the world around you

8. Google Lens Suggestions

Just point the Pixel 3 camera at contact info, URLs, and barcodes and it’ll automatically suggest things to do like calling the number, or sending an email. This all happens without you having to type anything and Lens will show the suggestions even when you’re offline. It’s particularly helpful with business cards, movie posters, and takeout menus.

9. Portrait Mode

Our improved Portrait Mode on Pixel is designed to give you even sharper and more beautiful images this year. Plus we’ve added some fun editing options in Google Photos—like being able to change the blurriness of the background, or change the part of the picture in focus after you’ve taken it. Google Photos can also make the subject of your photo pop by leaving them in color, while changing the background to black and white.

Portrait Mode

Portrait Mode and color pop with Pixel 3 and Google Photos

10. Smooth video

We’ve added new selfie video stabilization so now you can get super smooth video from the front or back cameras. And if you’re recording someone or something that is moving, just tap on them and the video will lock on the subject as they, or you, move—so you don’t lose focus.

Finally, if you’re a pro photographer, we’ve added a bunch of new features to help you manage your photography from the ability to export RAW, to external mic support, to synthetic fill flash which mimics professional lighting equipment to bring a beautiful glow to your pictures.

Once you’ve taken all those amazing photos and videos, Pixel comes with unlimited storage so you never get that “storage full” pop up at a crucial moment.** 

Share your pics using #teampixel so we can see what you create with Pixel 3.



*Compared to iPhone Xs

**Free, unlimited online original-quality storage for photos/videos uploaded from Pixel 3 to Google Photos through 1/31/2022, and those photos/videos will remain free at original quality. g.co/help/photostorage

A new course to teach people about fairness in machine learning

In my undergraduate studies, I majored in philosophy with a focus on ethics, spending countless hours grappling with the notion of fairness: both how to define it and how to effect it in society. Little did I know then how critical these studies would be to my current work on the machine learning education team where I support efforts related to the responsible development and use of AI.


As ML practitioners build, evaluate, and deploy machine learning models, they should keep fairness considerations (such as how different demographics of people will be affected by a model’s predictions) in the forefront of their minds. Additionally, they should proactively develop strategies to identify and ameliorate the effects of algorithmic bias.


To help practitioners achieve these goals, Google’s engineering education and ML fairness teams developed a 60-minute self-study training module on fairness, which is now available publicly as part of our popular Machine Learning Crash Course (MLCC).

ML bias

The MLCC Fairness module explores how human biases affect data sets. For example, people asked to describe a photo of bananas may not remark on their color (“yellow bananas”) unless they perceive it as atypical.

Students who complete this training will learn:

  • Different types of human biases that can manifest in machine learning models via data
  • How to identify potential areas of human bias in data before training a model
  • Methods for evaluating a model’s predictions not just for overall performance, but also for bias

In conjunction with the release of this new Fairness module, we’ve added more than a dozen new fairness entries to our Machine Learning Glossary (tagged with a scale icon in the right margin). These entries provide clear, concise definitions of the key fairness concepts discussed in our curriculum, designed to serve as a go-to reference for both beginners and experienced practitioners. We also hope these glossary entries will help further socialize fairness concerns within the ML community.


We’re excited to share this module with you, and hope that it provides additional tools and frameworks that aid in building systems that are fair and inclusive for all. You can learn more about our work in fairness and on other responsible AI practices on our website.

Pixel 3 and on-device AI: Putting superpowers in your pocket

Last week we announced Pixel 3 and Pixel 3XL, our latest smartphones that combine the best of Google’s AI, software, and hardware to deliver radically helpful experiences. AI is a key ingredient in Pixel that unlocks new, useful capabilities, dramatically changing how we interact with our phones and the world around us.

But what exactly is AI?

Artificial intelligence (AI) is a fancy term for all the technology that lets our devices learn by example and act a bit smarter, from understanding written or spoken language to recognizing people and objects in images. AI is built by “training” machine learning models—a computer learns patterns from lots of example data, and uses these patterns to generate predictions. We’ve built one of the most secure and robust cloud infrastructures for processing this data to make our products smarter. Today, AI helps with everything from filtering spam emails in Gmail to getting answers on Google Search.

What is AI

Machine learned models in the cloud are a secure way to make Google products smarter over time.

Bringing the best AI experiences to Pixel 3 involved some re-thinking from the ground up. Our phones are powerful computers with multiple sensors which enable new helpful and secure experiences when data is processed on your device. These AI-powered features can work offline and don’t require a network connection. And they can keep data on device, private to you. With Pixel 3, we complement our traditional approach to AI, where machine learning and data processing is done in the cloud, with reliable, accessible AI on device, when you’re on the go.

AI on device

The most powerful machine learning models can now run directly on your Pixel to power fast experiences which work even when you’re offline.

Benefits of on-device AI

We’ve been working to miniaturize AI models to bring the power of machine learning and computing in the cloud directly to your Pixel. With on-device AI, new kinds of experiences become possible—that are lightning fast, are more battery efficient, and keep data on your device. We piloted this technology last year with Now Playing, bringing automatic music recognition to Pixel 2. This year, your Phone app and camera both use on-device AI to give you new superpowers, allowing you to interact more seamlessly with the world around you.

AI benefits

On-device AI works without having to go back to a server and consumes less of your battery life.

Take Call Screen, a new feature in the Phone app, initially launching in English in the U.S., where the Google Assistant helps you screen calls, including from unknown or unrecognized numbers. Anytime you receive an incoming call, just tap the “Screen Call” button and on-device speech recognition is used to transcribe the conversation from the caller (who is calling? why they are calling?) so you can decide whether to pick up, hang up, or mark as spam and block. Because everything happens on your device, neither the audio nor transcript from a screened call is sent to anyone other than you.

AI Call Screen

Call Screen uses on-device speech recognition to transcribe the caller’s responses in real time, without sending audio or transcripts off your phone.

This year’s Pixel camera helps you capture great moments and do more with what you see by building on-device AI right into your viewfinder. New low-power vision models can recognize facial expressions, objects, and text without having to send images off your device. Photobooth Mode is powered by an image scoring model that analyzes facial expressions and photo quality in real time. This will automatically capture smiles and funny faces so you can take selfies without having to reach for the shutter button. Top Shot uses the same kind of image analysis to suggest great, candid moments from a motion photo—recommending alternative shots in HDR+. 

Playground creates an intelligent AR experience by using AI models to recommend Playmoji, stickers, and captions so that you can express yourself based on the scene you’re in. And without having to take a photo at all, image recognition lets you act on info from the world around you—surfacing Google Lens suggestions to call phone numbers or show website addresses—right from your camera.

Pixel 3 is just the beginning. We want to empower people with new AI-driven abilities. With our advances in on-device AI, we can develop new, helpful experiences that run right on your phone and are fast, efficient, and private to you.

The Applied Computing Series gets college students into computer science

What do fighting wildfires, searching for dogs in photos and using portrait mode on your phone have in common? Data science and machine learning. Experts across a range of businesses and industries are using data to give machines the ability to “learn” and complete tasks.


But as the field of data science is rapidly growing, workforce projections show that there isn’t enough new talent to meet increasing demand for these roles, especially in machine learning. Given the nationwide scarcity of computer science faculty, we’ve been thinking about how to give students a hands-on computer science education, without CS PHD educators.


At a handful of colleges across the country, we’re piloting the Applied Computing Series (ACS): two college-level introductory computer science and data science courses and a machine learning intensive. The Series will help students understand how to use the best available tools to manipulate and understand data and then solve critical business problems.

20180918-Google Edu-Bay Path U-173.jpg

Students at Bay Path University learning Python programming as part of our first ACS cohort of universities.


The machine learning intensive is meant for students who have already taken introductory computer science classes and who want to pursue more advanced coursework. The intensive will ultimately prepare them for opportunities as data engineers, technical program managers, or data analysts in industries ranging from healthcare to insurance to entertainment and media. Through partnerships with colleges and universities, we provide industry-relevant content and projects; and colleges and universities provide experienced faculty to lead in-class project work and provide coaching for students.


The Applied Computing courses are currently available to students at eight colleges and universities: Adrian College, Agnes Scott College, Bay Path University, Heidelberg University, Holy Names University, Lasell College, SUNY Buffalo State, and Sweet Briar College. If you’re a university and want to apply to be a site for the Applied Computing courses in the fall of 2019, find out more on our website.


The machine learning intensive will start in February 2019 at Mills College and again during the summer session at Agnes Scott College, Bay Path University, Heidelberg University and Scripps College and is open for applications from all U.S. students. If you’re a student who has already completed college-level computer and/or data science coursework and want to apply for the machine learning intensive, learn more at our website.