Tag Archives: Selfie Segmentation

ML Kit is now in GA & Introducing Selfie Segmentation

Posted by Kenny Sulaimon, Product Manager, ML Kit Chengji Yan, Suril Shah, Buck Bourdon, Software Engineers, ML Kit, Shiyu Hu, Technical Lead, ML Kit Dong Chen, Technical Lead, ML Kit

At the end of 2020, we introduced the Entity Extraction API to our ML Kit SDK, making it even easier to detect and perform actions on text within mobile apps. Since then, we’ve been hard at work updating our existing APIs with new functionality and also fine tuning Selfie Segmentation with the help of our partners in the ML Kit early access program.

Today we are excited to officially add Selfie Segmentation to the ML Kit lineup, introduce a few enhancements we’ve made to our popular Pose Detection API and announce that ML Kit has graduated to general availability!

Natural language graphic

General Availability

ML Kit image

(ML Kit is now in General Availability)

We launched ML Kit back in 2018 in order to make it easy for developers on Android and iOS to use machine learning within their apps. Over the last two years we have rapidly expanded our set of APIs to help with both vision and natural language processing based use cases.

Thanks to the overwhelmingly positive response, developer feedback and a ton of adoption across both Android and iOS, we are ready to drop the beta label and officially announce the general availability of ML Kit’s APIs. All of our APIs (except Selfie Segmentation, Pose Detection, and Entity Extraction) are now in general availability!

Selfie Segmentation

Selfie Segmentation photo

(Example of ML Kit Selfie Segmentation)

With the increased usage of selfie cameras and webcams in today's world, being able to quickly and easily add effects to camera experiences has become a necessity for many app developers.

ML Kit's Selfie Segmentation API allows developers to easily separate the background from users within a scene and focus on what matters. Adding cool effects to selfies or inserting your users into interesting background environments has never been easier. The model works on live and still images, and both half and full body subjects.

Under The Hood

Under the hood graph

(Diagram of Selfie Segmentation API)

The Selfie Segmentation API takes an input image and produces an output mask. Each pixel of the mask is assigned a float number that has a range between [0.0, 1.0]. The closer the number is to 1.0, the higher the confidence that the pixel represents a person, and vice versa.

The API works with static images and live video use cases. During live video (stream_mode), the API will leverage output from previous frames to return smoother segmentation results.

We’ve also implemented a “RAW_SIZE_MASK” option to give developers more options for mask output. By default, the mask produced will be the same size as the input image. If the RAW_SIZE_MASK option is enabled, then the mask will be the size of the model output (256x256). This option makes it easier to apply customized rescaling logic or reduces latency if rescaling to the input image size is not needed for your use case.

Pose Detection Update

Example of Pose Detection API

(Example of updated Pose Detection API; colors represent the Z value)

Last month, we updated our state-of-the-art Pose Detection API with a new model and new features. A quick summary of the enhancements is listed below:

  • More poses added The API now recognizes more poses, targeting fitness and yoga use cases, especially when a user is directly facing the camera.
  • 50% size reduction The base and accurate pose models are now significantly smaller. This change does not impact the quality of the models.
  • Z Coordinate for depth analysis The API now outputs a depth coordinate Z to help determine whether parts of the user's body are in front or behind the user’s hips.

Z Coordinate

The Z Coordinate is an experimental feature that is calculated for every point (excluding the face). The estimate is provided using synthetic data, obtained via the GHUM model (articulated 3D human shape model).

It is measured in "image pixels" like the X and Y coordinates. The Z axis is perpendicular to the camera and passes between a subject's hips. The origin of the Z axis is approximately the center point between the hips (left/right and front/back relative to the camera). Negative Z values are towards the camera; positive values are away from it. The Z coordinate does not have an upper or lower bound.

For more information on the Pose Detection changes, please see our API documentation.

Pose Classification

After the release of Pose Detection, we’ve received quite a bit of requests from developers to help with classifying specific poses within their apps. To help tackle this problem, we partnered with the MediaPipe team to release a pose classification tutorial and Google Colab. In the classification tutorial, we demonstrate how to build and run a custom pose classifier within the ML Kit Android sample app and also demo a simple push-up and squat rep counter using the classifier.

Example of Pose classification and repetition counting with MLKit Pose

(Example of Pose classification and repetition counting with MLKit Pose)

For a deep dive into building your own pose classifier with different camera angles, environment conditions, body shapes etc, please see the pose classification tutorial.

For more general classification tips, please see our Pose Classification Options page on the ML Kit website.

Beyond General Availability

It has been an exciting two years getting ML Kit to general availability and we couldn’t have gotten here without your help and feedback. As we continue to introduce new APIs such as Selfie Segmentation and Pose Detection, your feedback is more important than ever. Please continue to share your enhancement requests and questions with our development team or reach out through our community channels. Let’s build a smarter future together.

Announcing the Newest Addition to MLKit: Entity Extraction

Posted by Kenny Sulaimon, Product Manager, ML Kit, Tory Voight, Product Manager, ML Kit, Daniel Furlong, Lei Yu, Software Engineers, ML Kit, Dong Chen, Technical Lead, MLKit
ML Kit logo

Six months ago, we introduced the standalone version of the ML Kit SDK, making it even easier to integrate on-device machine learning into mobile apps. Since then, we’ve launched the Digital Ink Recognition and Pose Detection APIs, and also introduced the ML Kit early access program. Today we are excited to add Entity Extraction to the official ML Kit lineup and also debut a new API for our early access program, Selfie Segmentation!

Overview of ML Kit's Entity Extraction API

With ML Kit’s Entity Extraction API, you can now improve the user experience inside your app by understanding text and performing specific actions on it.

The Entity Extraction API allows you to detect and locate entities from raw text, and take action based on those entities. The API works on static text and also in real-time while a user is typing. It supports 11 different entities and 15 different languages (with more coming in the future) to allow developers to make any text interaction a richer experience for the user.

Supported Entities

  • Address (350 third street, cambridge)
  • Date-Time* (12/12/2020, tomorrow at 3pm) (let's meet tomorrow at 6pm)
  • Email ([email protected])
  • Flight Number* (LX37)
  • IBAN* (CH52 0483 0000 0000 0000 9)
  • ISBN* (978-1101904190)
  • Money (including currency)* ($12, 25USD)
  • Payment Card* (4111 1111 1111 1111)
  • Phone Number ((555) 225-3556, 12345)
  • Tracking Number* (1Z204E380338943508)
  • URL (www.google.com, https://en.wikipedia.org/wiki/Platypus, seznam.cz)
Example Results
Table of Input text and Detected entities

Real World Applications

2 phones showing TamTam using Entity Extraction API

(Images courtesy of TamTam)

Our early access partner, TamTam, has been using the Entity Extraction API to provide helpful suggestions to their users during their chat conversations. This feature allows users to quickly perform actions based on the context of their conversations.

While integrating this API, Iurii Dorofeev, Head of TamTam Android Development, mentioned, “We appreciated the ease of integration of the ML Kit ... and it works offline. Clustering the content of messages right on the device allowed us to save resources. ML Kit capabilities will help us develop other features for TamTam messenger in the future.”

Check out their messaging app on the Google Play and App Store today.

Under The Hood

Diagram of underlying Text Classifier API

(Diagram of underlying Text Classifier API)

ML Kit’s Entity Extraction API builds upon the technology powering the Smart Linkify feature in Android 10+ to deliver an easy-to-use and streamlined experience for developers. For an in-depth review of the Text Classifier API, please see our blog post here.

The neural network annotators/models in the Entity Extraction API work as follows: A given input text is first split into words (based on space separation), then all possible word subsequences of certain maximum length (15 words in the example above) are generated, and for each candidate the scoring neural net assigns a value (between 0 and 1) based on whether it represents a valid entity.

Next, the generated entities that overlap are removed, favoring the ones with a higher score over the conflicting ones with a lower score. Then a second neural network is used to classify the type of the entity as a phone number, an address, or in some cases, a non-entity.

The neural network models in the Entity Extraction API are used in combination with other types of models (e.g. rule-based) to identify additional entities in text, such as: flight numbers, currencies and other examples listed above. Therefore, if multiple entities are detected for one text input, the Entity Extraction API can return several overlapping results.

Lastly, ML Kit will automatically download the required language-specific models to the device dynamically. You can also explicitly manage models you want available on the device by using ML Kit’s model management API. This can be useful if you want to download models ahead of time for your users. The API also allows you to delete models that are no longer required.

New APIs Coming Soon

Selfie Segmentation

With the increased usage of selfie cameras and webcams in today's world, being able to quickly and easily add effects to camera experiences has become a necessity for many app developers today.

ML Kit's Selfie Segmentation API allows developers to easily separate the background from a scene and focus on what matters. Adding cool effects to selfies or inserting your users into interesting background environments has never been easier. This API produces great results with low latency on both Android and iOS devices.

Example of ML Kit Selfie Segmentation

(Example of ML Kit Selfie Segmentation)

Key capabilities:

  • Easily allows developers to replace or blur a user’s background
  • Works well with single or multiple people
  • Cross-platform support (iOS and Android)
  • Runs real-time on most modern phones

To join our early access program and request access to ML Kit's Selfie Segmentation API, please fill out this form.