Tag Archives: AI

Five things you (maybe) didn’t know about AI

While there’s plenty of information out there on artificial intelligence, it’s not always easy to distinguish fact from fiction or find explanations that are easy to understand. That’s why we’ve teamed up with Google to create The A to Z of AI. It’s a series of simple, bite-sized explainers to help anyone understand what AI is, how it works and how it’s changing the world around us. Here are a few things you might learn:

AI.jpg

A is for Artificial Intelligence

1. AI is already in our everyday lives. 

You’ve probably interacted with AI without even realizing it. If you’ve ever searched for a specific image in Google Photos, asked a smart speaker about the weather or been rerouted by your car’s navigation system, you’ve been helped by AI. Those examples might feel obvious, but there are many other ways it plays a role in your life you might not realize. AI is also helping solve some bigger, global challenges. For example, there are apps that use AI to help farmers identify issues with crops. And there are now systems that can examine citywide traffic information in real time to help people efficiently planning their driving routes.

Climate.jpg

C is for Climate

2. AI is being used to help tackle the global climate crisis. 

AI offers us the ability to process large volumes of data and uncover patterns—an invaluable aid when it comes to climate change. One common use case is AI-powered systems that help people regulate the amount of energy they use by turning off the heating and lights when they leave the house. AI is also helping to model glacier melt and predict rising sea levels so effective that action can be taken. Researchers are also considering the environmental impact of data centers and AI computing itself by exploring how to develop more energy efficient systems and infrastructures.

Datasets.jpg

D is for Datasets

3. AI learns from examples in the real world.

Just as a child learns through examples, the same is true of machine learning algorithms. And that’s what datasets are: large collections of examples, like weather data, photos or music, that we can use to train AI. Due to their scale and complexity (think of a dataset made up of extensive maps covering the whole of the known solar system), datasets can be very challenging to build and refine. For this reason, AI design teams often share datasets for the benefit of the wider scientific community, making it easier to collaborate and build on each other's research.

Fakes.jpg

F is for Fakes

4. AI can help our efforts to spot deepfakes.

“Deepfakes'' are AI-generated images, speech, music or videos that look real. They work by studying existing real-world imagery or audio, mapping them in detail, then manipulating them to create works of fiction that are disconcertingly true to life. However, there are often some telltale signs that distinguish them from reality; in a deepfake video, voices might sound a bit robotic, or characters may blink less or repeat their hand gestures. AI can help us spot these inconsistencies.

You.jpg

Y is for You

5. It’s impossible to teach AI what it means to be human. 

As smart as AI is (and will be), it won’t be able to understand everything that humans can. In fact, you could give an AI system all the data in the world and it still wouldn’t reflect, or understand, every human being on the planet. That’s because we’re complex, multidimensional characters that sit outside the data that machines use to make sense of things. AI systems are trained and guided by humans. And it’s up to each person to choose how they interact with AI systems and what information they feel comfortable sharing. You decide how much AI gets to learn about you.

For 22 more bite-sized definitions, visit https://atozofai.withgoogle.com

Five things you (maybe) didn’t know about AI

While there’s plenty of information out there on artificial intelligence, it’s not always easy to distinguish fact from fiction or find explanations that are easy to understand. That’s why we’ve teamed up with Google to create The A to Z of AI. It’s a series of simple, bite-sized explainers to help anyone understand what AI is, how it works and how it’s changing the world around us. Here are a few things you might learn:

AI.jpg

A is for Artificial Intelligence

1. AI is already in our everyday lives. 

You’ve probably interacted with AI without even realizing it. If you’ve ever searched for a specific image in Google Photos, asked a smart speaker about the weather or been rerouted by your car’s navigation system, you’ve been helped by AI. Those examples might feel obvious, but there are many other ways it plays a role in your life you might not realize. AI is also helping solve some bigger, global challenges. For example, there are apps that use AI to help farmers identify issues with crops. And there are now systems that can examine citywide traffic information in real time to help people efficiently planning their driving routes.

Climate.jpg

C is for Climate

2. AI is being used to help tackle the global climate crisis. 

AI offers us the ability to process large volumes of data and uncover patterns—an invaluable aid when it comes to climate change. One common use case is AI-powered systems that help people regulate the amount of energy they use by turning off the heating and lights when they leave the house. AI is also helping to model glacier melt and predict rising sea levels so effective that action can be taken. Researchers are also considering the environmental impact of data centers and AI computing itself by exploring how to develop more energy efficient systems and infrastructures.

Datasets.jpg

D is for Datasets

3. AI learns from examples in the real world.

Just as a child learns through examples, the same is true of machine learning algorithms. And that’s what datasets are: large collections of examples, like weather data, photos or music, that we can use to train AI. Due to their scale and complexity (think of a dataset made up of extensive maps covering the whole of the known solar system), datasets can be very challenging to build and refine. For this reason, AI design teams often share datasets for the benefit of the wider scientific community, making it easier to collaborate and build on each other's research.

Fakes.jpg

F is for Fakes

4. AI can help our efforts to spot deepfakes.

“Deepfakes'' are AI-generated images, speech, music or videos that look real. They work by studying existing real-world imagery or audio, mapping them in detail, then manipulating them to create works of fiction that are disconcertingly true to life. However, there are often some telltale signs that distinguish them from reality; in a deepfake video, voices might sound a bit robotic, or characters may blink less or repeat their hand gestures. AI can help us spot these inconsistencies.

You.jpg

Y is for You

5. It’s impossible to teach AI what it means to be human. 

As smart as AI is (and will be), it won’t be able to understand everything that humans can. In fact, you could give an AI system all the data in the world and it still wouldn’t reflect, or understand, every human being on the planet. That’s because we’re complex, multidimensional characters that sit outside the data that machines use to make sense of things. AI systems are trained and guided by humans. And it’s up to each person to choose how they interact with AI systems and what information they feel comfortable sharing. You decide how much AI gets to learn about you.

For 22 more bite-sized definitions, visit https://atozofai.withgoogle.com

Alfred Camera: Smart camera features using MediaPipe

Guest post by the Engineering team at Alfred Camera

Please note that the information, uses, and applications expressed in the below post are solely those of our guest author, Alfred Camera.

In this article, we’d like to give you a short overview of Alfred Camera and our experience of using MediaPipe to transform our moving object feature, and how MediaPipe has helped to get things easier to achieve our goals.

What is Alfred Camera?

AlfredCamera logo

Fig.1 Alfred Camera Logo

Alfred Camera is a smart home app for both Android and iOS devices, with over 15 million downloads worldwide. By downloading the app, users are able to turn their spare phones into security cameras and monitors directly, which allows them to watch their homes, shops, pets anytime. The mission of Alfred Camera is to provide affordable home security so that everyone can find peace of mind in this busy world.

The Alfred Camera team is composed of professionals in various fields, including an engineering team with several machine learning and computer vision experts. Our aim is to integrate AI technology into devices that are accessible to everyone.

Machine Learning in Alfred Camera

Alfred Camera currently has a feature called Moving Object Detection, which continuously uses the device’s camera to monitor a target scene. Once it identifies a moving object in the area, the app will begin recording the video and send notifications to the device owner. The machine learning models for detection are hand-crafted and trained by our team using TensorFlow, and run on TensorFlow Lite with good performance even on mid-tier devices. This is important because the app is leveraging old phones and we'd like the feature to reach as many users as possible.

The Challenges

We had started building our AI features at Alfred Camera since 2017. In order to have a solid foundation to support our AI feature requirements for the coming years, we decided to rebuild our real-time video analysis pipeline. At the beginning of the project, the goals were to create a new pipeline which should be 1) modular enough so we could swap core algorithms easily with minimal changes in other parts of the pipeline, 2) having GPU acceleration designed in place, 3) cross-platform as much as possible so there’s no need to create/maintain separate implementations for different platforms. Based on the goals, we had surveyed several open source projects that had the potential but we ended up using none of them as they either fell short on the features or were not providing the readiness/stabilities that we were looking for.

We started a small team to prototype on those goals first for the Android platform. What came later were some tough challenges way above what we originally anticipated. We ran into several major design changes as some key design basics were overlooked. We needed to implement some utilities to do things that sounded trivial but required significant effort to make it right and fast. Dealing with asynchronous processing also led us into a bunch of timing issues, which took the team quite some effort to address. Not to mention debugging on real devices was extremely inefficient and painful.

Things didn't just stop here. Our product is also on iOS and we had to tackle these challenges once again. Moreover, discrepancies in the behavior between the platform-specific implementations introduced additional issues that we needed to resolve.

Even though we finally managed to get the implementations to the confidence level we wanted, that was not a very pleasant experience and we have never stopped thinking if there is a better option.

MediaPipe - A Game Changer

Google open sourced MediaPipe project in June 2019 and it immediately caught our attention. We were surprised by how it is perfectly aligned with the previous goals we set, and has functionalities that could not have been developed with the amount of engineering resources we had as a small company.

We immediately decided to start an evaluation project by building a new product feature directly using MediaPipe to see if it could live up to all the promises.

Migrating to MediaPipe

To start the evaluation, we decided to migrate our existing moving object feature to see what exactly MediaPipe can do.

Our current Moving Object Detection pipeline consists of the following main components:

  • (Moving) Object Detection Model
    As explained earlier, a TensorFlow Lite model trained by our team, tailored to run on mid-tier devices.
  • Low-light Detection and Low-light Filter
    Calculate the average luminance of the scene, and based on the result conditionally process the incoming frames to intensify the brightness of the pixels to let our users see things in the dark. We are also controlling whether we should run the detection or not as the moving object detection model does not work properly when the frame has been processed by the filter.
  • Motion Detection
    Sending frames through Moving Object Detection still consumes a significant amount of power even with a small model like the one we created. Running inferences continuously does not seem to be a good idea as most of the time there may not be any moving object in front of the camera. We decided to implement a gating mechanism where the frames are only being sent to the Moving Object Detection model based on the movements detected from the scene. The detection is done mainly by calculating the differences between two frames with some additional tricks that take the movements detected in a few frames before into consideration.
  • Area of Interest
    This is a mechanism to let users manually mask out the area where they do not want the camera to see. It can also be done automatically based on regional luminance that can be generated by the aforementioned low-light detection component.

Our current implementation has taken GPU into consideration as much as we can. A series of shaders are created to perform the tasks above and the pipeline is designed to avoid moving pixels between CPU/GPU frequently to eliminate the potential performance hits.

The pipeline involves multiple ML models that are conditionally executed, mixed CPU/GPU processing, etc. All the challenges here make it a perfect showcase for how MediaPipe could help develop a complicated pipeline.

Playing with MediaPipe

MediaPipe provides a lot of code samples for any developer to bootstrap with. We took the Object Detection on Android sample that comes with the project to start with because of the similarity with the back-end part of our pipeline. It did take us sometimes to fully understand the design concepts of MediaPipe and all the tools associated. But with the complete documentation and the great responsiveness from the MediaPipe team, we got up to speed soon to do most of the things we wanted.

That being said, there were a few challenges we needed to overcome on the road to full migration. Our original pipeline of Moving Object Detection takes the input frame asynchronously, but MediaPipe has timestamp bound limitations such that we cannot just show the result in an allochronic way. Meanwhile, we need to gather data through JNI in a specific data format. We came up with a workaround that conquered all the issues under the circumstances, which will be mentioned later.

After wrapping our models and the processing logics into calculators and wired them up, we have successfully transformed our existing implementation and created our first MediaPipe Moving Object Detection pipeline like the figure below, running on Android devices:

Fig.2 Moving Object Detection Graph

Fig.2 Moving Object Detection Graph

We do not block the video frame in the main calculation loop, and set the detection result as an input stream to show the annotation on the screen. The whole graph is designed as a multi-functioned process, the left chunk is the debug annotation and video frame output module, and the rest of the calculation occurs in the rest of the graph, e.g., low light detection, motion triggered detection, cropping of the area of interest and the detection process. In this way, the graph process will naturally separate into real-time display and asynchronous calculation.

As a result, we are able to complete a full processing for detection in under 40ms on a device with Snapdragon 660 chipset. MediaPipe’s tight integration with TensorFlow Lite provides us the flexibility to get even more performance gain by leveraging whatever acceleration techniques available (GPU or DSP) on the device.

The following figure shows the current implementation working in action:

Fig.3 Moving Object Detection running in Alfred Camera

Fig.3 Moving Object Detection running in Alfred Camera

After getting things to run on Android, Desktop GPU (OpenGL-ES) emulation was our next target to evaluate. We are already using OpenGL-ES shaders for some computer vision operations in our pipeline. Having the capability to develop the algorithm on desktop, seeing it work in action before deployment onto mobile platforms is a huge benefit to us. The feature was not ready at the time when the project was first released, but MediaPipe team had soon added Desktop GPU emulation support for Linux in follow-up releases to make this possible. We have used the capability to detect and fix some issues in the graphs we created even before we put things on the mobile devices. Although it currently only works on Linux, it is still a big leap forward for us.

Testing the algorithms and making sure they behave as expected is also a challenge for a camera application. MediaPipe helps us simplify this by using pre-recorded MP4 files as input so we could verify the behavior simply by replaying the files. There is also built-in profiling support that makes it easy for us to locate potential performance bottlenecks.

MediaPipe - Exactly What We Were Looking For

The result of the evaluation and the feedback from our engineering team were very positive and promising:

  1. We are able to design/verify the algorithm and complete core implementations directly on the desktop emulation environment, and then migrate to the target platforms with minimum efforts. As a result, complexities of debugging on real devices are greatly reduced.
  2. MediaPipe’s modular design of graphs/calculators enables us to better split up the development into different engineers/teams, try out new pipeline design easily by rewiring the graph, and test the building blocks independently to ensure quality before we put things together.
  3. MediaPipe’s cross-platform design maximizes the reusability and minimizes fragmentation of the implementations we created. Not only are the efforts required to support a new platform greatly reduced, but we are also less worried about the behavior discrepancies on different platforms due to different interpretations of the spec from platform engineers.
  4. Built-in graphics utilities and profiling support saved us a lot of time creating those common facilities and making them right, and we could be more focused on the key designs.
  5. Tight integration with TensorFlow Lite really saves lots of effort for a company like us that heavily depends on TensorFlow, and it still gives us the flexibility to easily interface with other solutions.

With just a few weeks working with MediaPipe, it has shown strong capabilities to fundamentally transform how we develop our products. Without MediaPipe we could have spent months creating the same features without the same level of performance.

Summary

Alfred Camera is designed to bring home security with AI to everyone, and MediaPipe has significantly made achieving that goal easier for our team. From Moving Object Detection to future AI-powered features, we are focusing on transforming a basic security camera use case into a smart housekeeper that can help provide even more context that our users care about. With the support of MediaPipe, we have been able to accelerate our development process and bring the features to the market at an unprecedented speed. Our team is really excited about how MediaPipe could help us progress and discover new possibilities, and is looking forward to the enhancements that are yet to come to the project.

The new tool helping Asian newsrooms detect fake images

Journalists and fact-checkers face huge challenges in sorting accurate information from fast-spreading misinformation. But it’s not just about the words we read. Viral images and memes flood our feeds and chats, and often they’re out-of-context or fake. In Asia, where there are eight times more social media users than in North America, these issues are magnified.  


There are existing tools that Asian journalists can use to discover the origins and trustworthiness of news images, but they’re relatively old, inconsistent and for the most part only available on desktop. That’s a barrier for fact-checkers and journalists in countries where most people connect to the internet on their mobile. 


For the past two years, the Google News Initiative has worked with  journalists to identify manipulated images using technology. At the 2018 Trusted Media Summit in Singapore, a group of experts from Google, Storyful and the broader news industry joined a design sprint to develop a new tool, taking advantage of artificial intelligence and optimized for mobile. With support from the Google News Initiative, the GNI Cloud Program and volunteer Google engineers, the resulting prototype has now been developed into an app called Source, powered by Storyful

With the app now being used by journalists around the region, we asked Eamonn Kennedy, Storyful’s Chief Product Officer, to tell us a bit more. 


What does Storyful see as the challenges facing journalists and fact-checkers around the world and in Asia in particular?

[Eamonn Kennedy] Sharing on social often happens based on impulse rather than full analysis. Anybody can share a story with thousands of people before they even finish reading what is being said. Bad actors know this and bet on people’s emotions. They’re willing to exploit the free reach of social platforms and pollute conversations with false facts and narratives, including extremist content. For fact-checkers, that means any given conversation is vulnerable to lies and manipulation from anywhere in the world, at any time. 

Can you tell us a bit about the process for developing Source, and how AI helped solve some of the problems?

[EK] At Storyful, we see old, inaccurate or modified images being reshared to push a misleading narrative in news cycles big and small. 

The common way of tackling this for journalists is to use reverse image search to prove that the image is old and has been re-used—but that has a couple of challenges. First, these repurposed images are frequently tampered with and the journalist needs to have the ability to identify manipulation so they get the best chance of finding the original.Second, search results are ordered by the most recent, where journalists tend to be interested in older results, so that means a lot of scrolling to find the original. 

Source uses Google's AI technology to give instant access to an image's public history, allowing you to sort, analyze and understand its provenance, including any manipulation. That’s already useful but it goes a step further. Source helps detect and translate text in images too, which is especially useful for journalists cataloguing or analyzing memes online.
Source.gif

The Source app improves journalists’ ability to verify the origins or authenticity of a particular image and source how a meme evolved. 

How are newsrooms using Source and what are the plans for it in 2020?    

[EK] So far, 130 people from 17 different countries have used the app to check the provenance of images on social media, messaging apps and news sites. It’s been especially good to see that 30 percent of Source users are accessing the site on their mobile, and that our largest base of users is in India, where members of the Digital News Publishers Association—a coalition of leading media companies dedicated to fighting misinformation—have provided important feedback. 

Looking forward, we’ve been listening to fact-checkers as we think about how to build version two of the app. We know Source has been used to interrogate frames from a video, for example, which shows there’s potential to take it beyond just text and images. The ultimate aim would be to build a “toolbox” of public fact-checking resources, with Source at the center, using Google’s AI to support journalists around the world. 


AutoFlip: An Open Source Framework for Intelligent Video Reframing

Originally posted on the AI Blog

Videos filmed and edited for television and desktop are typically created and viewed in landscape aspect ratios (16:9 or 4:3). However, with an increasing number of users creating and consuming content on mobile devices, historical aspect ratios don’t always fit the display being used for viewing. Traditional approaches for reframing video to different aspect ratios usually involve static cropping, i.e., specifying a camera viewport, then cropping visual contents that are outside. Unfortunately, these static cropping approaches often lead to unsatisfactory results due to the variety of composition and camera motion styles. More bespoke approaches, however, typically require video curators to manually identify salient contents on each frame, track their transitions from frame-to-frame, and adjust crop regions accordingly throughout the video. This process is often tedious, time-consuming, and error-prone.

To address this problem, we are happy to announce AutoFlip, an open source framework for intelligent video reframing. AutoFlip is built on top of the MediaPipe framework that enables the development of pipelines for processing time-series multimodal data. Taking a video (casually shot or professionally edited) and a target dimension (landscape, square, portrait, etc.) as inputs, AutoFlip analyzes the video content, develops optimal tracking and cropping strategies, and produces an output video with the same duration in the desired aspect ratio.
Left: Original video (16:9). Middle: Reframed using a standard central crop (9:16). Right: Reframed with AutoFlip (9:16). By detecting the subjects of interest, AutoFlip is able to avoid cropping off important visual content.

AutoFlip Overview

AutoFlip provides a fully automatic solution to smart video reframing, making use of state-of-the-art ML-enabled object detection and tracking technologies to intelligently understand video content. AutoFlip detects changes in the composition that signify scene changes in order to isolate scenes for processing. Within each shot, video analysis is used to identify salient content before the scene is reframed by selecting a camera mode and path optimized for the contents.

Shot (Scene) Detection

A scene or shot is a continuous sequence of video without cuts (or jumps). To detect the occurrence of a shot change, AutoFlip computes the color histogram of each frame and compares this with prior frames. If the distribution of frame colors changes at a different rate than a sliding historical window, a shot change is signaled. AutoFlip buffers the video until the scene is complete before making reframing decisions, in order to optimize the reframing for the entire scene.

Video Content Analysis

We utilize deep learning-based object detection models to find interesting, salient content in the frame. This content typically includes people and animals, but other elements may be identified, depending on the application, including text overlays and logos for commercials, or motion and ball detection for sports.

The face and object detection models are integrated into AutoFlip through MediaPipe, which uses TensorFlow Lite on CPU. This structure allows AutoFlip to be extensible, so developers may conveniently add new detection algorithms for different use cases and video content. Each object type is associated with a weight value, which defines its relative importance — the higher the weight, the more influence the feature will have when computing the camera path.


Left: People detection on sports footage. Right: Two face boxes (‘core’ and ‘all’ face landmarks). In narrow portrait crop cases, often only the core landmark box can fit.

Reframing

After identifying the subjects of interest on each frame, logical decisions about how to reframe the content for a new view can be made. AutoFlip automatically chooses an optimal reframing strategy — stationary, panning or tracking — depending on the way objects behave during the scene (e.g., moving around or stationary). In stationary mode, the reframed camera viewport is fixed in a position where important content can be viewed throughout the majority of the scene. This mode can effectively mimic professional cinematography in which a camera is mounted on a stationary tripod or where post-processing stabilization is applied. In other cases, it is best to pan the camera, moving the viewport at a constant velocity. The tracking mode provides continuous and steady tracking of interesting objects as they move around within the frame.

Based on which of these three reframing strategies the algorithm selects, AutoFlip then determines an optimal cropping window for each frame, while best preserving the content of interest. While the bounding boxes track the objects of focus in the scene, they typically exhibit considerable jitter from frame-to-frame and, consequently, are not sufficient to define the cropping window. Instead, we adjust the viewport on each frame through the process of Euclidean-norm optimization, in which we minimize the residuals between a smooth (low-degree polynomial) camera path and the bounding boxes.

Top: Camera paths resulting from following the bounding boxes from frame-to-frame. Bottom: Final smoothed camera paths generated using Euclidean-norm path formation. Left: Scene in which objects are moving around, requiring a tracking camera path. Right: Scene where objects stay close to the same position; a stationary camera covers the content for the full duration of the scene.

AutoFlip’s configuration graph provides settings for either best-effort or required reframing. If it becomes infeasible to cover all the required regions (for example, when they are too spread out on the frame), the pipeline will automatically switch to a less aggressive strategy by applying a letterbox effect, padding the image to fill the frame. For cases where the background is detected as being a solid color, this color is used to create seamless padding; otherwise a blurred version of the original frame is used.

AutoFlip Use Cases

We are excited to release this tool directly to developers and filmmakers, reducing the barriers to their design creativity and reach through the automation of video editing. The ability to adapt any video format to various aspect ratios is becoming increasingly important as the diversity of devices for video content consumption continues to rapidly increase. Whether your use case is portrait to landscape, landscape to portrait, or even small adjustments like 4:3 to 16:9, AutoFlip provides a solution for intelligent, automated and adaptive video reframing.


What’s Next?

Like any machine learning algorithm, AutoFlip can benefit from an improved ability to detect objects relevant to the intent of the video, such as speaker detection for interviews or animated face detection on cartoons. Additionally, a common issue arises when input video has important overlays on the edges of the screen (such as text or logos) as they will often be cropped from the view. By combining text/logo detection and image inpainting technology, we hope that future versions of AutoFlip can reposition foreground objects to better fit the new aspect ratios. Lastly, in situations where padding is required, deep uncrop technology could provide improved ability to expand beyond the original viewable area.

While we work to improve AutoFlip internally at Google, we encourage contributions from developers and filmmakers in the open source communities.

Acknowledgments

We would like to thank our colleagues who contributed to Autoflip, Alexander Panagopoulos, Jenny Jin, Brian Mulford, Yuan Zhang, Alex Chen, Xue Yang, Mickey Wang, Justin Parra, Hartwig Adam, Jingbin Wang, and Weilong Yang; MediaPipe team who helped with open sourcing, Jiuqiang Tang, Tyler Mullen, Mogan Shieh, Ming Guang Yong, and Chuo-Ling Chang.

By Nathan Frey, Senior Software Engineer, Google Research, Los Angeles and Zheng Sun, Senior Software Engineer, Google Research, Mountain View

Applying AI to big problems––six research projects we’re supporting in public health, education, disaster prevention, and conservation

Whether it’s forecasting floods or detecting diabetic eye disease -- we’re increasingly seeing people apply AI to address big challenges. In fact, we believe that some of the biggest issues of our time can be tackled with AI. This is why we’ve made research in AI for Social Good one of the key focus areas of Google Research India, the AI lab we started in Bangalore last September. 


As we’re planning to explore applied research in a variety of fields, from healthcare to education, partnering closely with experts in these areas is crucial. Today, we’re kicking off support for six research projects led by organizations from India and across Asia, focusing on addressing social, humanitarian and environmental challenges with AI. Each project is a collaboration between leading academic AI researchers and a nonprofit organization with expertise in the respective area, with support from Google researchers, engineers and program managers. 


In addition to supporting these efforts with expertise in areas such as computer vision, natural language processing, and other deep learning techniques, we are also providing each team with funding and computational resources. 


  • Improving health information for high HIV/AIDS risk communities: Applying AI to identify influencers among marginalized communities at high risk of HIV/AIDS contraction, with the goal of better disseminating health information, providing services, and ultimately reducing the rate of HIV contraction. 
    • Predicting risks for expectant mothers: Using AI to predict the risk of expectant mothers dropping out of healthcare programs, to improve targeted interventions and increase positive healthcare outcomes for mothers and their babies.  
      • Improving consistency of healthcare information input: Applying AI to help ensure consistency in how healthcare information is captured and monitored, to enable more targeted and actionable healthcare interventions.
      • Predicting human-wildlife conflict: Using AI to predict human-wildlife conflict in the state of Maharashtra to help inform data-driven policy making. 
        • Improving dam and barrage water release: Using AI to inform dam and barrage water releases, to help build early warning systems that minimize risk of disasters. 
        • Supporting publishing of underserved Indian language content: Building open-source input tools for underserved Indian languages to accelerate publishing of openly licensed content. 


          Starting on this research journey today, we look forward to supporting academic researchers, organizations and the broader community over the coming months and years to bring these projects to life. Healthcare, conservation, education, and disaster prediction are some of the most difficult challenges of our time. As computer scientists, it’s incredibly humbling and exciting to partner with the community towards making a positive impact for people in India and around the world. 

          Posted by Manish Gupta, Director, Google Research Team in India and Milind Tambe, Director AI for Social Good, Google Research Team in India

          MediaPipe on the Web

          Posted by Michael Hays and Tyler Mullen from the MediaPipe team

          MediaPipe is a framework for building cross-platform multimodal applied ML pipelines. We have previously demonstrated building and running ML pipelines as MediaPipe graphs on mobile (Android, iOS) and on edge devices like Google Coral. In this article, we are excited to present MediaPipe graphs running live in the web browser, enabled by WebAssembly and accelerated by XNNPack ML Inference Library. By integrating this preview functionality into our web-based Visualizer tool, we provide a playground for quickly iterating over a graph design. Since everything runs directly in the browser, video never leaves the user’s computer and each iteration can be immediately tested on a live webcam stream (and soon, arbitrary video).

          Running the MediaPipe face detection example in the Visualizer

          Figure 1 shows the running of the MediaPipe face detection example in the Visualizer

          MediaPipe Visualizer

          MediaPipe Visualizer (see Figure 2) is hosted at viz.mediapipe.dev. MediaPipe graphs can be inspected by pasting graph code into the Editor tab or by uploading that graph file into the Visualizer. A user can pan and zoom into the graphical representation of the graph using the mouse and scroll wheel. The graph will also react to changes made within the editor in real time.

          MediaPipe Visualizer hosted at https://viz.mediapipe.dev

          Figure 2 MediaPipe Visualizer hosted at https://viz.mediapipe.dev

          Demos on MediaPipe Visualizer

          We have created several sample Visualizer demos from existing MediaPipe graph examples. These can be seen within the Visualizer by visiting the following addresses in your Chrome browser:

          Edge Detection

          Face Detection

          Hair Segmentation

          Hand Tracking

          Edge detection
          Face detection
          Hair segmentation
          Hand tracking

          Each of these demos can be executed within the browser by clicking on the little running man icon at the top of the editor (it will be greyed out if a non-demo workspace is loaded):

          This will open a new tab which will run the current graph (this requires a web-cam).

          Implementation Details

          In order to maximize portability, we use Emscripten to directly compile all of the necessary C++ code into WebAssembly, which is a special form of low-level assembly code designed specifically for web browsers. At runtime, the web browser creates a virtual machine in which it can execute these instructions very quickly, much faster than traditional JavaScript code.

          We also created a simple API for all necessary communications back and forth between JavaScript and C++, to allow us to change and interact with the MediaPipe graph directly from JavaScript. For readers familiar with Android development, you can think of this as a similar process to authoring a C++/Java bridge using the Android NDK.

          Finally, we packaged up all the requisite demo assets (ML models and auxiliary text/data files) as individual binary data packages, to be loaded at runtime. And for graphics and rendering, we allow MediaPipe to automatically tap directly into WebGL so that most OpenGL-based calculators can “just work” on the web.

          Performance

          While executing WebAssembly is generally much faster than pure JavaScript, it is also usually much slower than native C++, so we made several optimizations in order to provide a better user experience. We utilize the GPU for image operations when possible, and opt for using the lightest-weight possible versions of all our ML models (giving up some quality for speed). However, since compute shaders are not widely available for web, we cannot easily make use of TensorFlow Lite GPU machine learning inference, and the resulting CPU inference often ends up being a significant performance bottleneck. So to help alleviate this, we automatically augment our “TfLiteInferenceCalculator” by having it use the XNNPack ML Inference Library, which gives us a 2-3x speedup in most of our applications.

          Currently, support for web-based MediaPipe has some important limitations:

          • Only calculators in the demo graphs above may be used
          • The user must edit one of the template graphs; they cannot provide their own from scratch
          • The user cannot add or alter assets
          • The executor for the graph must be single-threaded (i.e. ApplicationThreadExecutor)
          • TensorFlow Lite inference on GPU is not supported

          We plan to continue to build upon this new platform to provide developers with much more control, removing many if not all of these limitations (e.g. by allowing for dynamic management of assets). Please follow the MediaPipe tag on the Google Developer blog and Google Developer twitter account. (@googledevs)

          Acknowledgements

          We would like to thank Marat Dukhan, Chuo-Ling Chang, Jianing Wei, Ming Guang Yong, and Matthias Grundmann for contributing to this blog post.

          Detecting hidden signs of anemia from the eye


          Beyond helping us navigate the world, the human eye can reveal signs of underlying disease, which care providers can now uncover during a simple, non-invasive screening (a photograph taken of the back of the eye). We’ve previously shown that deep learning applied to these photos can help identify diabetic eye disease as well as cardiovascular risk factors. Today, we’re sharing how we’re continuing to use deep learning to detect anemia.

          Anemia is a major public health problem that affects 1.6 billion people globally, and can cause tiredness, weakness, dizziness and drowsiness. The diagnosis of anemia typically involves a blood test to measure the amount of hemoglobin (a critical protein in your red blood cells that carries oxygen). If your hemoglobin is lower than normal, that indicates anemia. Women during pregnancy are at particularly high risk of anemia with more than 2 in 5 affected, and anemia can also be an early sign of colon cancer in otherwise healthy individuals. 

          Our findings

          In our latest work, "Detection of anemia from retinal fundus images via deep learning" published in “Nature Biomedical Engineering” we find that a deep learning model can quantify hemoglobin using de-identified photographs of the back of the eye and common metadata (e.g. age, self-reported sex) from the UK Biobank, a population-based study. Compared to just using metadata, deep learning improved the detection of anemia (as measured using the AUC), from 74 percent to 88 percent.

          To ensure these promising findings were not the result of chance or false correlations, other scientists helped to validate the model—which was initially developed on a dataset of primarily Caucasian ancestry—on a separate dataset from Asia. The performance of the model was similar on both datasets, suggesting the model could be useful in a variety of settings.

          Optic disc

          Multiple “explanation” techniques suggest that the optic disc is important for detecting anemia from images of the back of the eye.

          Because this research uncovered new findings about the effects of anemia on the eye, we wanted to identify which parts of the eye contained signs of anemia. Our analysis revealed that much of the information comes from the optic disc and surrounding blood vessels. The optic disc is where nerves and blood vessels enter and exit the eye, and normally appears much brighter than the surrounding areas on a photograph of the back of the eye.

          Key takeaways

          This method to non-invasively screen for anemia could add value to existing diabetic eye disease screening programs, or support an anemia screening that would be quicker and easier than a blood test. Additionally, this work is another example of using deep learning with explainable insights to discover new biomedical knowledge, extending our previous work oncardiovascular risk factors, refractive error, and progression of macular degeneration. We hope this will inspire additional research to reveal new scientific insights from existing medical tests, and to help improve early interventions and health outcomes.

          To read more about our latest research for improving the diagnosis of eye diseases, visit Nature Communications and Ophthalmology. You can find more research from Google Health team here.

          AI’s killer (whale) app

          The Salish Sea, which extends from British Columbia to Washington State in the U.S., was once home to hundreds of killer whales, also known as orcas. Now, the population of Southern Resident Killer Whales, a subgroup of orcas, is struggling to survive—there are only 73 of them left. Building on our work using AI for Social Good, we’re partnering with Fisheries and Oceans Canada (DFO) to apply machine learning to protect killer whales in the Salish Sea.

          According to DFO, which monitors and protects this endangered population of orcas, the greatest threats to the animals are scarcity of prey (particularly Chinook salmon, their favorite meal), contaminants, and disturbance caused by human activity and passing vessels. Teaming up with DFO and Rainforest Connection, we used deep neural networks to track, monitor and observe the orcas’ behavior in the Salish Sea, and send alerts to Canadian authorities. With this information, marine mammal managers can monitor and treat whales that are injured, sick or distressed. In case of an oil spill, the detection system can allow experts to locate the animals and use specialized equipment to alter the direction of travel of the orcas to prevent exposure.

          To teach a machine learning model to recognize orca sounds, DFO provided 1,800 hours of underwater audio and 68,000 labels that identified the origin of the sound. The model is used to analyze live sounds that DFO monitors across 12 locations within the Southern Resident Killer Whales’ habitat. When the model hears a noise that indicates the presence of a killer whale, it’s displayed on the Rainforest Connection (a grantee of the Google AI Impact Challenge) web interface, and live alerts on their location are provided to DFO and key partners through an app that Rainforest Connection developed.

          Our next steps on this project include distinguishing between the three sub-populations of orcas—Southern Resident Killer Whales, Northern Resident Killer Whales and Biggs Killer Whales—so that we can better monitor their health and protect them in real time. We hope that advances in bioacoustics technology using AI can make a difference in animal conservation.

          Discovering millions of datasets on the web

          Across the web, there are millions of datasets about nearly any subject that interests you. If you’re looking to buy a puppy, you could find datasets compiling complaints of puppy buyers or studies on puppy cognition. Or if you like skiing, you could find data on revenue of ski resorts or injury rates and participation numbers. Dataset Search has indexed almost 25 million of these datasets, giving you a single place to search for datasets and find links to where the data is. Over the past year, people have tried it out and provided feedback, and now Dataset Search is officially out of beta.

          Dataset search - skiing

          Some of the search results for the query "skiing," which include datasets ranging from speeds of the fastest skiers to revenues of ski resorts.

          What's new in Dataset Search?

          Based on what we’ve learned from the early adopters of Dataset Search, we’ve added new features. You can now filter the results based on the types of dataset that you want (e.g., tables, images, text), or whether the dataset is available for free from the provider. If a dataset is about a geographic area, you can see the map. Plus, the product is now available on mobile and we’ve significantly improved the quality of dataset descriptions. One thing hasn't changed however: anybody who publishes data can make their datasets discoverable in Dataset Search by using an open standard (schema.org) to describe the properties of their dataseton their own web page.

          We have also learned how many different types of people look for data. There are academic researchers, finding data to develop their hypotheses (e.g., try oxytocin), students looking for free data in a tabular format, covering the topic of their senior thesis (e.g., try incarceration rates with the corresponding filters), business analysts and data scientists looking for information on mobile apps or fast food establishments, and so on. There is data on all of that! And what do our users ask? The most common queries include "education," "weather," "cancer," "crime," "soccer," and, yes, "dogs".

          Dataset search - fast food establishments

          Some of the search results for the query "fast food establishment.”

          What datasets can you find in Dataset Search?

          Dataset Search also gives us a snapshot of the data out there on the Web. Here are a few highlights. The largest topics that the datasets cover are geosciences, biology, and agriculture. The majority of governments in the world publish their data and describe it with schema.org. The United States leads in the number of open government datasets available, with more than 2 million. And the most popular data formats? Tables–you can find more than 6 million of them on Dataset Search.

          The number of datasets that you can find in Dataset Search continues to grow. If you have a dataset on your site and you describe it using schema.org, an open standard, others can find it in Dataset Search. If you know that a dataset exists, but you can't find it in Dataset Search, ask the provider to add the schema.org descriptions and others will be able to learn about their dataset as well.

          What's next?

          Dataset Search is out of beta, but we will continue to improve the product, whether or not it has the "beta" next to it. If you haven't already, take Dataset Search for a spin, and tell us what you think.

          Source: Search