Tag Archives: ML Kit

Top 3 things to know for AI on Android at Google I/O ‘25

Posted by Kateryna Semenova – Sr. Developer Relations Engineer

AI is reshaping how users interact with their favorite apps, opening new avenues for developers to create intelligent experiences. At Google I/O, we showcased how Android is making it easier than ever for you to build smart, personalized and creative apps. And we’re committed to providing you with the tools needed to innovate across the full development stack in this evolving landscape.

This year, we focused on making AI accessible across the spectrum, from on-device processing to cloud-powered capabilities. Here are the top 3 announcements you need to know for building with AI on Android from Google I/O ‘25:

#1 Leverage the efficiency of Gemini Nano for on-device AI experiences

For on-device AI, we announced a new set of ML Kit GenAI APIs powered by Gemini Nano, our most efficient and compact model designed and optimized for running directly on mobile devices. These APIs provide high-level, easy integration for common tasks including text summarization, proofreading, rewriting content in different styles, and generating image description. Building on-device offers significant benefits such as local data processing and offline availability at no additional cost for inference. To start integrating these solutions explore the ML Kit GenAI documentation, the sample on GitHub and watch the "Gemini Nano on Android: Building with on-device GenAI" talk.

#2 Seamlessly integrate on-device ML/AI with your own custom models

The Google AI Edge platform enables building and deploying a wide range of pretrained and custom models on edge devices and supports various frameworks like TensorFlow, PyTorch, Keras, and Jax, allowing for more customization in apps. The platform now also offers improved support of on-device hardware accelerators and a new AI Edge Portal service for broad coverage of on-device benchmarking and evals. If you are looking for GenAI language models on devices where Gemini Nano is not available, you can use other open models via the MediaPipe LLM Inference API.

Serving your own custom models on-device can pose challenges related to handling large model downloads and updates, impacting the user experience. To improve this, we’ve launched Play for On-Device AI in beta. This service is designed to help developers manage custom model downloads efficiently, ensuring the right model size and speed are delivered to each Android device precisely when needed.

For more information watch “Small language models with Google AI Edge” talk.

#3 Power your Android apps with Gemini Flash, Pro and Imagen using Firebase AI Logic

For more advanced generative AI use cases, such as complex reasoning tasks, analyzing large amounts of data, processing audio or video, or generating images, you can use larger models from the Gemini Flash and Gemini Pro families, and Imagen running in the cloud. These models are well suited for scenarios requiring advanced capabilities or multimodal inputs and outputs. And since the AI inference runs in the cloud any Android device with an internet connection is supported. They are easy to integrate into your Android app by using Firebase AI Logic, which provides a simplified, secure way to access these capabilities without managing your own backend. Its SDK also includes support for conversational AI experiences using the Gemini Live API or generating custom contextual visual assets with Imagen. To learn more, check out our sample on GitHub and watch "Enhance your Android app with Gemini Pro and Flash, and Imagen" session.

These powerful AI capabilities can also be brought to life in immersive Android XR experiences. You can find corresponding documentation, samples and the technical session: "The future is now, with Compose and AI on Android XR".

Flow cahrt demonstrating Firebase AI Logic integration architecture
Figure 1: Firebase AI Logic integration architecture

Get inspired and start building with AI on Android today

We released a new open source app, Androidify, to help developers build AI-driven Android experiences using Gemini APIs, ML Kit, Jetpack Compose, CameraX, Navigation 3, and adaptive design. Users can create personalized Android bot with Gemini and Imagen via the Firebase AI Logic SDK. Additionally, it incorporates ML Kit pose detection to detect a person in the camera viewfinder. The full code sample is available on GitHub for exploration and inspiration. Discover additional AI examples in our Android AI Sample Catalog.

moving image of the Androidify app on a mobile device, showing a fair-skinned woman with blond hair wearing a red jacket with black shirt and pants and a pair of sunglasses converting into a 3D image of a droid with matching skin tone and blond hair wearing a red jacket with black shirt and pants and a pair of sunglasses
The original image and Androidifi-ed image

Choosing the right Gemini model depends on understanding your specific needs and the model's capabilities, including modality, complexity, context window, offline capability, cost, and device reach. To explore these considerations further and see all our announcements in action, check out the AI on Android at I/O ‘25 playlist on YouTube and check out our documentation.

We are excited to see what you will build with the power of Gemini!

On-device GenAI APIs as part of ML Kit help you easily build with Gemini Nano

Posted by Caren Chang - Developer Relations Engineer, Chengji Yan - Software Engineer, Taj Darra - Product Manager

We are excited to announce a set of on-device GenAI APIs, as part of ML Kit, to help you integrate Gemini Nano in your Android apps.

To start, we are releasing 4 new APIs:

    • Summarization: to summarize articles and conversations
    • Proofreading: to polish short text
    • Rewriting: to reword text in different styles
    • Image Description: to provide short description for images

Key benefits of GenAI APIs

GenAI APIs are high level APIs that allow for easy integration, similar to existing ML Kit APIs. This means you can expect quality results out of the box without extra effort for prompt engineering or fine tuning for specific use cases.

GenAI APIs run on-device and thus provide the following benefits:

    • Input, inference, and output data is processed locally
    • Functionality remains the same without reliable internet connection
    • No additional cost incurred for each API call

To prevent misuse, we also added safety protection in various layers, including base model training, safety-aware LoRA fine-tuning, input and output classifiers and safety evaluations.

How GenAI APIs are built

There are 4 main components that make up each of the GenAI APIs.

  1. Gemini Nano is the base model, as the foundation shared by all APIs.
  2. Small API-specific LoRA adapter models are trained and deployed on top of the base model to further improve the quality for each API.
  3. Optimized inference parameters (e.g. prompt, temperature, topK, batch size) are tuned for each API to guide the model in returning the best results.
  4. An evaluation pipeline ensures quality in various datasets and attributes. This pipeline consists of: LLM raters, statistical metrics and human raters.

Together, these components make up the high-level GenAI APIs that simplify the effort needed to integrate Gemini Nano in your Android app.

Evaluating quality of GenAI APIs

For each API, we formulate a benchmark score based on the evaluation pipeline mentioned above. This score is based on attributes specific to a task. For example, when evaluating the summarization task, one of the attributes we look at is “grounding” (ie: factual consistency of generated summary with source content).

To provide out-of-box quality for GenAI APIs, we applied feature specific fine-tuning on top of the Gemini Nano base model. This resulted in an increase for the benchmark score of each API as shown below:

Use case in English Gemini Nano Base Model ML Kit GenAI API
Summarization 77.2 92.1
Proofreading 84.3 90.2
Rewriting 79.5 84.1
Image Description 86.9 92.3

In addition, this is a quick reference of how the APIs perform on a Pixel 9 Pro:

Prefix Speed
(input processing rate)
Decode Speed
(output generation rate)
Text-to-text 510 tokens/second 11 tokens/second
Image-to-text 510 tokens/second + 0.8 seconds for image encoding 11 tokens/second

Sample usage

This is an example of implementing the GenAI Summarization API to get a one-bullet summary of an article:

val articleToSummarize = "We are excited to announce a set of on-device generative AI APIs..."

// Define task with desired input and output format
val summarizerOptions = SummarizerOptions.builder(context)
    .setInputType(InputType.ARTICLE)
    .setOutputType(OutputType.ONE_BULLET)
    .setLanguage(Language.ENGLISH)
    .build()
val summarizer = Summarization.getClient(summarizerOptions)

suspend fun prepareAndStartSummarization(context: Context) {
    // Check feature availability. Status will be one of the following: 
    // UNAVAILABLE, DOWNLOADABLE, DOWNLOADING, AVAILABLE
    val featureStatus = summarizer.checkFeatureStatus().await()

    if (featureStatus == FeatureStatus.DOWNLOADABLE) {
        // Download feature if necessary.
        // If downloadFeature is not called, the first inference request will 
        // also trigger the feature to be downloaded if it's not already
        // downloaded.
        summarizer.downloadFeature(object : DownloadCallback {
            override fun onDownloadStarted(bytesToDownload: Long) { }

            override fun onDownloadFailed(e: GenAiException) { }

            override fun onDownloadProgress(totalBytesDownloaded: Long) {}

            override fun onDownloadCompleted() {
                startSummarizationRequest(articleToSummarize, summarizer)
            }
        })    
    } else if (featureStatus == FeatureStatus.DOWNLOADING) {
        // Inference request will automatically run once feature is      
        // downloaded.
        // If Gemini Nano is already downloaded on the device, the   
        // feature-specific LoRA adapter model will be downloaded very  
        // quickly. However, if Gemini Nano is not already downloaded, 
        // the download process may take longer.
        startSummarizationRequest(articleToSummarize, summarizer)
    } else if (featureStatus == FeatureStatus.AVAILABLE) {
        startSummarizationRequest(articleToSummarize, summarizer)
    } 
}

fun startSummarizationRequest(text: String, summarizer: Summarizer) {
    // Create task request  
    val summarizationRequest = SummarizationRequest.builder(text).build()

    // Start summarization request with streaming response
    summarizer.runInference(summarizationRequest) { newText -> 
        // Show new text in UI
    }

    // You can also get a non-streaming response from the request
    // val summarizationResult = summarizer.runInference(summarizationRequest)
    // val summary = summarizationResult.get().summary
}

// Be sure to release the resource when no longer needed
// For example, on viewModel.onCleared() or activity.onDestroy()
summarizer.close()

For more examples of implementing the GenAI APIs, check out the official documentation and samples on GitHub:

Use cases

Here is some guidance on how to best use the current GenAI APIs:

For Summarization, consider:

    • Conversation messages or transcripts that involve 2 or more users
    • Articles or documents less than 4000 tokens (or about 3000 English words). Using the first few paragraphs for summarization is usually good enough to capture the most important information.

For Proofreading and Rewriting APIs, consider utilizing them during the content creation process for short content below 256 tokens to help with tasks such as:

    • Refining messages in a particular tone, such as more formal or more casual
    • Polishing personal notes for easier consumption later

For the Image Description API, consider it for:

    • Generating titles of images
    • Generating metadata for image search
    • Utilizing descriptions of images in use cases where the images themselves cannot be displayed, such as within a list of chat messages
    • Generating alternative text to help visually impaired users better understand content as a whole

GenAI API in production

Envision is an app that verbalizes the visual world to help people who are blind or have low vision lead more independent lives. A common use case in the app is for users to take a picture to have a document read out loud. Utilizing the GenAI Summarization API, Envision is now able to get a concise summary of a captured document. This significantly enhances the user experience by allowing them to quickly grasp the main points of documents and determine if a more detailed reading is desired, saving them time and effort.

side by side images of a mobile device showing a document on a table on the left, and the results of the scanned document on the right showing details providing the what, when, and where as written in the document

Supported devices

GenAI APIs are available on Android devices using optimized MediaTek Dimensity, Qualcomm Snapdragon, and Google Tensor platforms through AICore. For a comprehensive list of devices that support GenAI APIs, refer to our official documentation.

Learn more

Start implementing GenAI APIs in your Android apps today with guidance from our official documentation and samples on GitHub: AI Catalog GenAI API Samples with Compose, ML Kit GenAI APIs Quickstart.

WPS uses ML Kit to seamlessly translate 43 languages and net $65M in annual savings

Posted by the Android team

WPS is an office suite software that lets users effortlessly view and edit all their documents, presentations, spreadsheets, and more. As a global product, WPS requires a top-notch and reliable in-suite translation technology that doesn’t require users to leave the app. To ensure all its users can enjoy the full benefits of the suite and its content in their preferred language, WPS uses the translation API from ML Kit, Google's on-device and production-ready machine learning toolkit for Android development.

WPS users rely on text translation

Many WPS users rely on ML Kit’s translation tools when reading, writing, or viewing their documents. According to a WPS data sample on single-day usage, there were 6,762 daily active users using ML Kit to translate 17,808 pages across all 43 of its supported languages. Students, who represent 44% of WPS’s userbase, especially rely on translation technology in WPS. WPS helps students better learn to read and write foreign languages by providing them with instant, offline translations through ML Kit.

Moving image of text bubbles with 'hello' in different languages appear (Spanish, French, Korean, English, Greek, Chinese, Italian, Russian, Portuguese, Tamil)

ML Kit provides free, offline translations

When choosing a translation provider, the WPS team looked at a number of popular options. But the other services the company considered only supported cloud-based translation and couldn’t translate text for some complex languages. The WPS team wanted to ensure all of its users could benefit from text translation, regardless of language or network availability. WPS ultimately chose ML Kit because it could both translate text offline and for each of the languages it serves.

“WPS has many African users, among whom are speakers of Swahili and Tamil, which are complex languages that aren’t supported by other translation services,” said Zhou Qi, Android team leader at WPS. “We’re very happy to provide these users with the translation services they need through ML Kit.”

What’s more, the other translation services WPS considered were expensive. ML Kit is completely free to use, and WPS estimates it's saving roughly $65 million per year by choosing ML Kit over another, paid translation software development kit.

Optimizing WPS for ML Kit’s translation API

ML Kit not only provides powerful multilingual translation but also supports App Bundle and Dynamic Delivery, which gives users the option to download ML Kit's translation module on demand. Without App Bundle and Dynamic Delivery, users who don’t need ML Kit would have had to download it anyway, impacting install-time delivery.

“When a user downloads the WPS app, the basic module is downloaded by default. And when the user needs to use the translation feature, only then will it be downloaded. This reduces the initial download size and ensures users who don't need translation assistance won’t be bothered by downloading the module,” said Zhou.

Quote card with headshot of Zhou Qui and text reads, “By using ML Kit’s free API, we provide our users with very useful functions, adding convenience to their daily lives and making document reading and processing more efficient.” — Zhou Qi, Android team leader, WPS

ML Kit’s resources made the process easy

During implementation the WPS team used ML Kit’s official guides frequently to steer their development processes. These tools allowed them to learn the ins and outs of the API and ensure any changes met all of its users’ needs. With the documentation and recommendations provided directly from the ML Kit site, WPS developers were able to quickly and easily integrate the new toolkit to their workflow.

“With the provided resources, we rarely had to search for help. The documentation was clear and concise. Plus, the API was straightforward and developer-friendly, which greatly reduced the learning curve,” said Zhou.

Streamlining UX with ML Kit

Before implementing ML Kit, WPS users had to open a separate application to translate their documents, creating a burdensome user experience. With ML Kit’s automatic language identification and instant translations, WPS now provides its users a streamlined way to translate text quickly, accurately, and without ever leaving the application, significantly improving platform UX.

Moving forward, WPS plans to expand its use of ML Kit, particularly with text recognition. WPS users continue to request the ability to process text on captured photos, so the company plans to use ML Kit to refine the app’s text recognition abilities as well.

Integrate machine learning into your workflow

Learn more about how ML Kit makes on-device machine learning easy.

ML Kit is now in GA & Introducing Selfie Segmentation

Posted by Kenny Sulaimon, Product Manager, ML Kit Chengji Yan, Suril Shah, Buck Bourdon, Software Engineers, ML Kit, Shiyu Hu, Technical Lead, ML Kit Dong Chen, Technical Lead, ML Kit

At the end of 2020, we introduced the Entity Extraction API to our ML Kit SDK, making it even easier to detect and perform actions on text within mobile apps. Since then, we’ve been hard at work updating our existing APIs with new functionality and also fine tuning Selfie Segmentation with the help of our partners in the ML Kit early access program.

Today we are excited to officially add Selfie Segmentation to the ML Kit lineup, introduce a few enhancements we’ve made to our popular Pose Detection API and announce that ML Kit has graduated to general availability!

Natural language graphic

General Availability

ML Kit image

(ML Kit is now in General Availability)

We launched ML Kit back in 2018 in order to make it easy for developers on Android and iOS to use machine learning within their apps. Over the last two years we have rapidly expanded our set of APIs to help with both vision and natural language processing based use cases.

Thanks to the overwhelmingly positive response, developer feedback and a ton of adoption across both Android and iOS, we are ready to drop the beta label and officially announce the general availability of ML Kit’s APIs. All of our APIs (except Selfie Segmentation, Pose Detection, and Entity Extraction) are now in general availability!

Selfie Segmentation

Selfie Segmentation photo

(Example of ML Kit Selfie Segmentation)

With the increased usage of selfie cameras and webcams in today's world, being able to quickly and easily add effects to camera experiences has become a necessity for many app developers.

ML Kit's Selfie Segmentation API allows developers to easily separate the background from users within a scene and focus on what matters. Adding cool effects to selfies or inserting your users into interesting background environments has never been easier. The model works on live and still images, and both half and full body subjects.

Under The Hood

Under the hood graph

(Diagram of Selfie Segmentation API)

The Selfie Segmentation API takes an input image and produces an output mask. Each pixel of the mask is assigned a float number that has a range between [0.0, 1.0]. The closer the number is to 1.0, the higher the confidence that the pixel represents a person, and vice versa.

The API works with static images and live video use cases. During live video (stream_mode), the API will leverage output from previous frames to return smoother segmentation results.

We’ve also implemented a “RAW_SIZE_MASK” option to give developers more options for mask output. By default, the mask produced will be the same size as the input image. If the RAW_SIZE_MASK option is enabled, then the mask will be the size of the model output (256x256). This option makes it easier to apply customized rescaling logic or reduces latency if rescaling to the input image size is not needed for your use case.

Pose Detection Update

Example of Pose Detection API

(Example of updated Pose Detection API; colors represent the Z value)

Last month, we updated our state-of-the-art Pose Detection API with a new model and new features. A quick summary of the enhancements is listed below:

  • More poses added The API now recognizes more poses, targeting fitness and yoga use cases, especially when a user is directly facing the camera.
  • 50% size reduction The base and accurate pose models are now significantly smaller. This change does not impact the quality of the models.
  • Z Coordinate for depth analysis The API now outputs a depth coordinate Z to help determine whether parts of the user's body are in front or behind the user’s hips.

Z Coordinate

The Z Coordinate is an experimental feature that is calculated for every point (excluding the face). The estimate is provided using synthetic data, obtained via the GHUM model (articulated 3D human shape model).

It is measured in "image pixels" like the X and Y coordinates. The Z axis is perpendicular to the camera and passes between a subject's hips. The origin of the Z axis is approximately the center point between the hips (left/right and front/back relative to the camera). Negative Z values are towards the camera; positive values are away from it. The Z coordinate does not have an upper or lower bound.

For more information on the Pose Detection changes, please see our API documentation.

Pose Classification

After the release of Pose Detection, we’ve received quite a bit of requests from developers to help with classifying specific poses within their apps. To help tackle this problem, we partnered with the MediaPipe team to release a pose classification tutorial and Google Colab. In the classification tutorial, we demonstrate how to build and run a custom pose classifier within the ML Kit Android sample app and also demo a simple push-up and squat rep counter using the classifier.

Example of Pose classification and repetition counting with MLKit Pose

(Example of Pose classification and repetition counting with MLKit Pose)

For a deep dive into building your own pose classifier with different camera angles, environment conditions, body shapes etc, please see the pose classification tutorial.

For more general classification tips, please see our Pose Classification Options page on the ML Kit website.

Beyond General Availability

It has been an exciting two years getting ML Kit to general availability and we couldn’t have gotten here without your help and feedback. As we continue to introduce new APIs such as Selfie Segmentation and Pose Detection, your feedback is more important than ever. Please continue to share your enhancement requests and questions with our development team or reach out through our community channels. Let’s build a smarter future together.

Announcing the Newest Addition to MLKit: Entity Extraction

Posted by Kenny Sulaimon, Product Manager, ML Kit, Tory Voight, Product Manager, ML Kit, Daniel Furlong, Lei Yu, Software Engineers, ML Kit, Dong Chen, Technical Lead, MLKit
ML Kit logo

Six months ago, we introduced the standalone version of the ML Kit SDK, making it even easier to integrate on-device machine learning into mobile apps. Since then, we’ve launched the Digital Ink Recognition and Pose Detection APIs, and also introduced the ML Kit early access program. Today we are excited to add Entity Extraction to the official ML Kit lineup and also debut a new API for our early access program, Selfie Segmentation!

Overview of ML Kit's Entity Extraction API

With ML Kit’s Entity Extraction API, you can now improve the user experience inside your app by understanding text and performing specific actions on it.

The Entity Extraction API allows you to detect and locate entities from raw text, and take action based on those entities. The API works on static text and also in real-time while a user is typing. It supports 11 different entities and 15 different languages (with more coming in the future) to allow developers to make any text interaction a richer experience for the user.

Supported Entities

  • Address (350 third street, cambridge)
  • Date-Time* (12/12/2020, tomorrow at 3pm) (let's meet tomorrow at 6pm)
  • Email ([email protected])
  • Flight Number* (LX37)
  • IBAN* (CH52 0483 0000 0000 0000 9)
  • ISBN* (978-1101904190)
  • Money (including currency)* ($12, 25USD)
  • Payment Card* (4111 1111 1111 1111)
  • Phone Number ((555) 225-3556, 12345)
  • Tracking Number* (1Z204E380338943508)
  • URL (www.google.com, https://en.wikipedia.org/wiki/Platypus, seznam.cz)
Example Results
Table of Input text and Detected entities

Real World Applications

2 phones showing TamTam using Entity Extraction API

(Images courtesy of TamTam)

Our early access partner, TamTam, has been using the Entity Extraction API to provide helpful suggestions to their users during their chat conversations. This feature allows users to quickly perform actions based on the context of their conversations.

While integrating this API, Iurii Dorofeev, Head of TamTam Android Development, mentioned, “We appreciated the ease of integration of the ML Kit ... and it works offline. Clustering the content of messages right on the device allowed us to save resources. ML Kit capabilities will help us develop other features for TamTam messenger in the future.”

Check out their messaging app on the Google Play and App Store today.

Under The Hood

Diagram of underlying Text Classifier API

(Diagram of underlying Text Classifier API)

ML Kit’s Entity Extraction API builds upon the technology powering the Smart Linkify feature in Android 10+ to deliver an easy-to-use and streamlined experience for developers. For an in-depth review of the Text Classifier API, please see our blog post here.

The neural network annotators/models in the Entity Extraction API work as follows: A given input text is first split into words (based on space separation), then all possible word subsequences of certain maximum length (15 words in the example above) are generated, and for each candidate the scoring neural net assigns a value (between 0 and 1) based on whether it represents a valid entity.

Next, the generated entities that overlap are removed, favoring the ones with a higher score over the conflicting ones with a lower score. Then a second neural network is used to classify the type of the entity as a phone number, an address, or in some cases, a non-entity.

The neural network models in the Entity Extraction API are used in combination with other types of models (e.g. rule-based) to identify additional entities in text, such as: flight numbers, currencies and other examples listed above. Therefore, if multiple entities are detected for one text input, the Entity Extraction API can return several overlapping results.

Lastly, ML Kit will automatically download the required language-specific models to the device dynamically. You can also explicitly manage models you want available on the device by using ML Kit’s model management API. This can be useful if you want to download models ahead of time for your users. The API also allows you to delete models that are no longer required.

New APIs Coming Soon

Selfie Segmentation

With the increased usage of selfie cameras and webcams in today's world, being able to quickly and easily add effects to camera experiences has become a necessity for many app developers today.

ML Kit's Selfie Segmentation API allows developers to easily separate the background from a scene and focus on what matters. Adding cool effects to selfies or inserting your users into interesting background environments has never been easier. This API produces great results with low latency on both Android and iOS devices.

Example of ML Kit Selfie Segmentation

(Example of ML Kit Selfie Segmentation)

Key capabilities:

  • Easily allows developers to replace or blur a user’s background
  • Works well with single or multiple people
  • Cross-platform support (iOS and Android)
  • Runs real-time on most modern phones

To join our early access program and request access to ML Kit's Selfie Segmentation API, please fill out this form.

ML Kit Pose Detection Makes Staying Active at Home Easier

Posted by Kenny Sulaimon, Product Manager, ML Kit; Chengji Yan and Areeba Abid, Software Engineers, ML Kit

ML Kit logo

Two months ago we introduced the standalone version of the ML Kit SDK, making it even easier to integrate on-device machine learning into mobile apps. Since then we’ve launched the Digital Ink Recognition API, and also introduced the ML Kit early access program. Our first two early access APIs were Pose Detection and Entity Extraction. We’ve received an overwhelming amount of interest in these new APIs and today, we are thrilled to officially add Pose Detection to the ML Kit lineup.

ML Kit Overview

A New ML Kit API, Pose Detection


Examples of ML Kit Pose Detection

ML Kit Pose Detection is an on-device, cross platform (Android and iOS), lightweight solution that tracks a subject's physical actions in real time. With this technology, building a one-of-a-kind experience for your users is easier than ever.

The API produces a full body 33 point skeletal match that includes facial landmarks (ears, eyes, mouth, and nose), along with hands and feet tracking. The API was also trained on a variety of complex athletic poses, such as Yoga positions.

Skeleton image detailing all 33 landmark points

Skeleton image detailing all 33 landmark points

Under The Hood

Diagram of the ML Kit Pose Detection Pipeline

The power of the ML Kit Pose Detection API is in its ease of use. The API builds on the cutting edge BlazePose pipeline and allows developers to build great experiences on Android and iOS, with little effort. We offer a full body model, support for both video and static image use cases, and have added multiple pre and post processing improvements to help developers get started with only a few lines of code.

The ML Kit Pose Detection API utilizes a two step process for detecting poses. First, the API combines an ultra-fast face detector with a prominent person detection algorithm, in order to detect when a person has entered the scene. The API is capable of detecting a single (highest confidence) person in the scene and requires the face of the user to be present in order to ensure optimal results.

Next, the API applies a full body, 33 landmark point skeleton to the detected person. These points are rendered in 2D space and do not account for depth. The API also contains a streaming mode option for further performance and latency optimization. When enabled, instead of running person detection on every frame, the API only runs this detector when the previous frame no longer detects a pose.

The ML Kit Pose Detection API also features two operating modes, “Fast” and “Accurate”. With the “Fast” mode enabled, you can expect a frame rate of around 30+ FPS on a modern Android device, such as a Pixel 4 and 45+ FPS on a modern iOS device, such as an iPhone X. With the “Accurate” mode enabled, you can expect more stable x,y coordinates on both types of devices, but a slower frame rate overall.

Lastly, we’ve also added a per point “InFrameLikelihood” score to help app developers ensure their users are in the right position and filter out extraneous points. This score is calculated during the landmark detection phase and a low likelihood score suggests that a landmark is outside the image frame.

Real World Applications


Examples of a pushup and squat counter using ML Kit Pose Detection

Keeping up with regular physical activity is one of the hardest things to do while at home. We often rely on gym buddies or physical trainers to help us with our workouts, but this has become increasingly difficult. Apps and technology can often help with this, but with existing solutions, many app developers are still struggling to understand and provide feedback on a user’s movement in real time. ML Kit Pose Detection aims to make this problem a whole lot easier.

The most common applications for Pose detection are fitness and yoga trackers. It’s possible to use our API to track pushups, squats and a variety of other physical activities in real time. These complex use cases can be achieved by using the output of the API, either with angle heuristics, tracking the distance between joints, or with your own proprietary classifier model.

To get you jump started with classifying poses, we are sharing additional tips on how to use angle heuristics to classify popular yoga poses. Check it out here.

Learning to Dance Without Leaving Home

Learning a new skill is always tough, but learning to dance without the aid of a real time instructor is even tougher. One of our early access partners, Groovetime, has set out to solve this problem.

With the power of ML Kit Pose Detection, Groovetime allows users to learn their favorite dance moves from popular short-form dance videos, while giving users automated real time feedback on their technique. You can join their early access beta here.

Groovetime App using ML Kit Pose Detection

Staying Active Wherever You Are

Our Pose Detection API is also helping adidas Training, another one of our early access partners, build a virtual workout experience that will help you stay active no matter where you are. This one-of-a-kind innovation will help analyze and give feedback on the user’s movements, using nothing more than just your phone. Integration into the adidas Training app is still in the early phases of the development cycle, but stay tuned for more updates in the future.

How to get started?

If you would like to start using the Pose Detection API in your mobile app, head over to the developer documentation or check out the sample apps for Android and iOS to see the API in action. For questions or feedback, please reach out to us through one of our community channels.

Full spectrum of on-device machine learning tools on Android

Posted by Hoi Lam, Android Machine Learning



This blog post is part of a weekly series for #11WeeksOfAndroid. Each week we’re diving into a key area of Android so you don’t miss anything. Throughout this week, we covered various aspects of Android on-device machine learning (ML). Whichever stage of development be it starting out or an established app; whatever role you play in design, product and engineering; whatever your skill level from beginner to experts, we have a wide range of ML tools for you.

Design - ML as a differentiator

“Focus on the user and all else will follow” is a Google mantra that becomes even more relevant in our machine learning age. Our Design Advocate, Di Dang, highlighted the importance of finding the unique intersection of user problems and ML strengths. Too often, teams are so keen on the idea of machine learning that they lose sight of their user needs.



Di outlined how the People + AI Guidebook can help you make ML product decisions and used the example of the Read Along app to illustrate topics like precision and recall, which are unique to ML design and development. Check out her interview with the Read Along team together with your team for more inspiration.

New ML Kit fully focused on on-device

When you decide that on-device machine learning is the solution, the easiest way to implement it will be through turnkey SDKs like ML Kit. Sophisticated Google-trained models and processing pipelines are offered through an easy to use interface in Kotlin / Java. ML Kit is designed and built for on-device ML: it works offline, offers enhanced privacy, unlocks high performance for real-time use cases and it is free. We recently made ML Kit a standalone SDK and it no longer requires a Firebase account. Just one line in your build.gradle file and you can start bringing ML functionality into your app.



The team has also added new functionalities such as Jetpack lifecycle support and the option to use the face contour models via Google Play Services saving as much as 20MB in app size. Another much anticipated addition is the support for swapping Google models with your own for both Image Labeling as well as Object Detection and Tracking. This provides one of the easiest ways to add TensorFlow Lite models to your applications without interacting with ByteArray!

Customise with TensorFlow Lite and Android tools

If the base model provided by ML Kit doesn’t quite fit the bill, what should developers do? The first port of call should be TensorFlow Hub where ready-to-use TensorFlow Lite models from both Google and the wider community can be downloaded. From 100,000 US Supermarket products to tomato plant diseases classifiers, the choice is yours.



In addition to Firebase AutoML Vision Edge, you can also build your own model using TensorFlow Model Maker (image classification / text classification) with just a few lines of Python. Once you have a TensorFlow Lite model from either TensorFlow Hub, or the Model Maker, you can easily integrate it with your Android app using ML Kit Image Labelling or Object Detection and Tracking. If you prefer an open source solution, Android Studio 4.1 beta introduces ML model binding that helps wrap around the TensorFlow Lite model with an easy to use Kotlin / Java wrapper. Adding a custom model to your Android app has never been easier. Check out this blog for more details.

Time for on-device ML is now

From the examples of the Android Developer Challenge winners, it is obvious that on-device machine learning has come of age and ML functionalities once reserved for the cloud or supercomputers are now available on your Android phone. Take a step forward with us by trying out our codelabs of the day:

Also checkout the ML Week learning pathway and take the quiz to get your very own ML badge.

Android on-device machine learning is a rapidly evolving platform, if you have any enhancement requests or feedback on how it could be improved, please let us know together with your use-case (TensorFlow Lite / ML Kit). Time for on-device ML is now.

Resources

You can find the entire playlist of #11WeeksOfAndroid video content here, and learn more about each week here. We’ll continue to spotlight new areas each week, so keep an eye out and follow us on Twitter and YouTube. Thanks so much for letting us be a part of this experience with you!

New tools for finding, training, and using custom machine learning models on Android

Posted by Hoi Lam, Android Machine Learning

Yesterday, we talked about turnkey machine learning (ML) solutions with ML Kit. But what if that doesn’t completely address your needs and you need to tweak it a little? Today, we will discuss how to find alternative models, and how to train and use custom ML models in your Android app.

Find alternative ML models

Crop disease models from the wider research community available on tfhub.dev

If the turnkey ML solutions don't suit your needs, TensorFlow Hub should be your first port of call. It is a repository of ML models from Google and the wider research community. The models on the site are ready for use in the cloud, in a web-browser or in an app on-device. For Android developers, the most exciting models are the TensorFlow Lite (TFLite) models that are optimized for mobile.

In addition to key vision models such as MobileNet and EfficientNet, the repository also boast models powered by the latest research such as:

Many of these solutions were previously only available in the cloud, as the models are too large and too power intensive to run on-device. Today, you can run them on Android on-device, offline and live.

Train your own custom model

Besides the large repository of base models, developers can also train their own models. Developer-friendly tools are available for many common use cases. In addition to Firebase’s AutoML Vision Edge, the TensorFlow team launched TensorFlow Lite Model Maker earlier this year to give developers more choices over the base model that support more use cases. TensorFlow Lite Model Maker currently supports two common ML tasks:

The TensorFlow Lite Model Maker can run on your own developer machine or in Google Colab online machine learning notebooks. Going forward, the team plans to improve the existing offerings and to add new use cases.

Using custom model in your Android app

New TFLite Model import screen in Android Studio 4.1 beta

Once you have selected a model or trained your model there are new easy-to-use tools to help you integrate them into your Android app without having to convert everything into ByteArrays. The first new tool is ML Model binding with Android Studio 4.1. This lets developers import any TFLite model, read the input / output signature of the model, and use it with just a few lines of code that calls the open source TensorFlow Lite Android Support Library.

Another way to implement a TensorFlow Lite model is via ML Kit. Starting in June, ML Kit no longer requires a Firebase project for on-device functionality. In addition, the image classification and object detection and tracking (ODT) APIs support custom models. The latter ODT offering is especially useful in use-cases where you need to separate out objects from a busy scene.

So how should you choose between these three solutions? If you are trying to detect a product on a busy supermarket shelf, ML Kit object detection and tracking can help your user select a specific product for processing. The API then performs image classification on just the part of the image that contains the product, which results in better detection performance. On the other hand, if the scene or the object you are trying to detect takes up most of the input image, for example, a landmark such as Big Ben, using ML Model binding or the ML Kit image classification API might be more appropriate.

TensorFlow Hub bird detection model with ML Kit Object Detection & Tracking AP

Two examples of how these tools can fit together

Here are some resources to help you get started:

Customizing your model is easier than ever

Finding, building and using custom models on Android has never been easier. As both Android and TensorFlow teams increase the coverage of machine learning use cases, please let us know how we can improve these tools for your use cases by filing an enhancement request with TensorFlow Lite or ML Kit.

Tomorrow, we will take a step back and focus on how to appropriately use and design for a machine learning first Android app. The content will be appropriate for the entire development team, so bring your product manager and designers along. See you next time.

On-device machine learning solutions with ML Kit, now even easier to use

Posted by Christiaan Prins, Product Manager, ML Kit and Shiyu Hu, Tech Lead Manager, ML Kit

ML Kit logo

Two years ago at I/O 2018 we introduced ML Kit, making it easier for mobile developers to integrate machine learning into your apps. Today, more than 25,000 applications on Android and iOS make use of ML Kit’s features. Now, we are introducing some changes that will make it even easier to use ML Kit. In addition, we have a new feature and a set of improvements we’d like to discuss.

A new ML Kit SDK, fully focused on on-device ML

ML Kit API Overview

ML Kit's APIs are built to help you tackle common challenges in the Vision and Natural Language domains. We make it easy to recognize text, scan barcodes, track and classify objects in real-time, do translation of text, and more.

The original version of ML Kit was tightly integrated with Firebase, and we heard from many of you that you wanted more flexibility when implementing it in your apps. As a result, we are now making all the on-device APIs available in a new standalone ML Kit SDK that no longer requires a Firebase project. You can still use both ML Kit and Firebase to get the best of both products if you choose to.

With this change, ML Kit is now fully focused on on-device machine learning, giving you access to the unique benefits that on-device versus cloud ML offers:

  • It’s fast, unlocking real-time use cases- since processing happens on the device, there is no network latency. This means, we can do inference on a stream of images / video or multiple times a second on text strings.
  • Works offline - you can rely on our APIs even when the network is spotty or your app’s end-user is in an area without connectivity.
  • Privacy is retained: since all processing is performed locally, there is no need to send sensitive user data over the network to a server.

Naturally, you still get access to Google’s on-device models and processing pipelines, all accessible through easy-to-use APIs, and offered at no cost.

All ML Kit resources can now be found on our new website where we made it a lot easier to access sample apps, API reference docs and our community channels that are there to help you if you have questions.

Object detection & tracking gif Text recognition + Language ID + Translate gif

What does this mean if I already use ML Kit today?

If you are using ML Kit for Firebase’s on-device APIs in your app today, we recommend you to migrate to the new standalone ML Kit SDK to benefit from new features and updates. For more information and step-by-step instructions to update your app, please follow our Migration guide. The cloud-based APIs, model deployment and AutoML Vision Edge remain available through Firebase Machine Learning.

Shrink your app footprint with Google Play Services

Apart from making ML Kit easier to use, developers also asked if we can ship ML Kit through Google Play Services resulting in a smaller app footprint and the model can be reused between apps. Apart from Barcode scanning and Text recognition, we have now added Face detection / contour (model size: 20MB) to the list of APIs that support this functionality.

// Face detection / Face contour model
// Delivered via Google Play Services outside your app's APK…
implementation 'com.google.android.gms:play-services-mlkit-face-detection:16.0.0'

// …or bundled with your app's APK
implementation 'com.google.mlkit:face-detection:16.0.0'

Jetpack Lifecycle / CameraX support

Android Jetpack Lifecycle support has been added to all APIs. Developers can use addObserver to automatically manage teardown of ML Kit APIs as the app goes through screen rotation or closure by the user / system. This makes CameraX integration easier. With this release, we are also recommending that developers adopt CameraX in their apps due to the ease of integration and image quality improvements (compared to Camera1) on a wide range of devices.

// ML Kit now supports Lifecycle
val recognizer = TextRecognizer.newInstance()
lifecycle.addObserver(recognizer)

// ...

// Just like CameraX
val camera = cameraProvider.bindToLifecycle( /* lifecycleOwner= */this,
    cameraSelector, previewUseCase, analysisUseCase)

For an overview of all recent changes, check out the release notes for the new SDK.

Codelab of the day - ML Kit x CameraX

To help you get started with the new ML Kit and its support for CameraX, we have created this code lab to Recognize, Identify Language and Translate text. If you have any questions regarding this code lab, please raise them at StackOverflow and tag it with [google-mlkit]. Our team will monitor this.

screenshot of app running

Early access program

Through our early access program, developers have an opportunity to partner with the ML Kit team and get access to upcoming features. Two new APIs are now available as part of this program:

  • Entity Extraction - Detect entities in text & make them actionable. We have support for phone numbers, addresses, payment numbers, tracking numbers, date/time and more.
  • Pose Detection - Low-latency pose detection supporting 33 skeletal points, including hands and feet tracking.

If you are interested, head over to our early access page for details.

pose detection on man jumping rope

Tomorrow - Support for custom models

ML Kit's turn-key solutions are built to help you take common challenges. However, if you needed to have a more tailored solution, one that required custom models, you typically needed to build an implementation from scratch. To help, we are now providing the option to swap out the default Google models with a custom TensorFlow Lite model. We’re starting with the Image Labeling and Object Detection and Tracking APIs, that now support custom image classification models.

Tomorrow, we will dive a bit deeper into how to find or train a TensorFlow Lite model and use it either with ML Kit, or with Android Studio’s new ML binding functionality.

On-device machine learning solutions with ML Kit, now even easier to use

Posted by Christiaan Prins, Product Manager, ML Kit and Shiyu Hu, Tech Lead Manager, ML Kit

ML Kit logo

Two years ago at I/O 2018 we introduced ML Kit, making it easier for mobile developers to integrate machine learning into your apps. Today, more than 25,000 applications on Android and iOS make use of ML Kit’s features. Now, we are introducing some changes that will make it even easier to use ML Kit. In addition, we have a new feature and a set of improvements we’d like to discuss.

A new ML Kit SDK, fully focused on on-device ML

ML Kit API Overview

ML Kit's APIs are built to help you tackle common challenges in the Vision and Natural Language domains. We make it easy to recognize text, scan barcodes, track and classify objects in real-time, do translation of text, and more.

The original version of ML Kit was tightly integrated with Firebase, and we heard from many of you that you wanted more flexibility when implementing it in your apps. As a result, we are now making all the on-device APIs available in a new standalone ML Kit SDK that no longer requires a Firebase project. You can still use both ML Kit and Firebase to get the best of both products if you choose to.

With this change, ML Kit is now fully focused on on-device machine learning, giving you access to the unique benefits that on-device versus cloud ML offers:

  • It’s fast, unlocking real-time use cases- since processing happens on the device, there is no network latency. This means, we can do inference on a stream of images / video or multiple times a second on text strings.
  • Works offline - you can rely on our APIs even when the network is spotty or your app’s end-user is in an area without connectivity.
  • Privacy is retained: since all processing is performed locally, there is no need to send sensitive user data over the network to a server.

Naturally, you still get access to Google’s on-device models and processing pipelines, all accessible through easy-to-use APIs, and offered at no cost.

All ML Kit resources can now be found on our new website where we made it a lot easier to access sample apps, API reference docs and our community channels that are there to help you if you have questions.

Object detection & tracking gif Text recognition + Language ID + Translate gif

What does this mean if I already use ML Kit today?

If you are using ML Kit for Firebase’s on-device APIs in your app today, we recommend you to migrate to the new standalone ML Kit SDK to benefit from new features and updates. For more information and step-by-step instructions to update your app, please follow our Migration guide. The cloud-based APIs, model deployment and AutoML Vision Edge remain available through Firebase Machine Learning.

Shrink your app footprint with Google Play Services

Apart from making ML Kit easier to use, developers also asked if we can ship ML Kit through Google Play Services resulting in a smaller app footprint and the model can be reused between apps. Apart from Barcode scanning and Text recognition, we have now added Face detection / contour (model size: 20MB) to the list of APIs that support this functionality.

// Face detection / Face contour model
// Delivered via Google Play Services outside your app's APK…
implementation 'com.google.android.gms:play-services-mlkit-face-detection:16.0.0'

// …or bundled with your app's APK
implementation 'com.google.mlkit:face-detection:16.0.0'

Jetpack Lifecycle / CameraX support

Android Jetpack Lifecycle support has been added to all APIs. Developers can use addObserver to automatically manage teardown of ML Kit APIs as the app goes through screen rotation or closure by the user / system. This makes CameraX integration easier. With this release, we are also recommending that developers adopt CameraX in their apps due to the ease of integration and image quality improvements (compared to Camera1) on a wide range of devices.

// ML Kit now supports Lifecycle
val recognizer = TextRecognizer.newInstance()
lifecycle.addObserver(recognizer)

// ...

// Just like CameraX
val camera = cameraProvider.bindToLifecycle( /* lifecycleOwner= */this,
    cameraSelector, previewUseCase, analysisUseCase)

For an overview of all recent changes, check out the release notes for the new SDK.

Codelab of the day - ML Kit x CameraX

To help you get started with the new ML Kit and its support for CameraX, we have created this code lab to Recognize, Identify Language and Translate text. If you have any questions regarding this code lab, please raise them at StackOverflow and tag it with [google-mlkit]. Our team will monitor this.

screenshot of app running

Early access program

Through our early access program, developers have an opportunity to partner with the ML Kit team and get access to upcoming features. Two new APIs are now available as part of this program:

  • Entity Extraction - Detect entities in text & make them actionable. We have support for phone numbers, addresses, payment numbers, tracking numbers, date/time and more.
  • Pose Detection - Low-latency pose detection supporting 33 skeletal points, including hands and feet tracking.

If you are interested, head over to our early access page for details.

pose detection on man jumping rope

Tomorrow - Support for custom models

ML Kit's turn-key solutions are built to help you take common challenges. However, if you needed to have a more tailored solution, one that required custom models, you typically needed to build an implementation from scratch. To help, we are now providing the option to swap out the default Google models with a custom TensorFlow Lite model. We’re starting with the Image Labeling and Object Detection and Tracking APIs, that now support custom image classification models.

Tomorrow, we will dive a bit deeper into how to find or train a TensorFlow Lite model and use it either with ML Kit, or with Android Studio’s new ML binding functionality.