Tag Archives: Pose Detection

ML Kit is now in GA & Introducing Selfie Segmentation

Posted by Kenny Sulaimon, Product Manager, ML Kit Chengji Yan, Suril Shah, Buck Bourdon, Software Engineers, ML Kit, Shiyu Hu, Technical Lead, ML Kit Dong Chen, Technical Lead, ML Kit

At the end of 2020, we introduced the Entity Extraction API to our ML Kit SDK, making it even easier to detect and perform actions on text within mobile apps. Since then, we’ve been hard at work updating our existing APIs with new functionality and also fine tuning Selfie Segmentation with the help of our partners in the ML Kit early access program.

Today we are excited to officially add Selfie Segmentation to the ML Kit lineup, introduce a few enhancements we’ve made to our popular Pose Detection API and announce that ML Kit has graduated to general availability!

Natural language graphic

General Availability

ML Kit image

(ML Kit is now in General Availability)

We launched ML Kit back in 2018 in order to make it easy for developers on Android and iOS to use machine learning within their apps. Over the last two years we have rapidly expanded our set of APIs to help with both vision and natural language processing based use cases.

Thanks to the overwhelmingly positive response, developer feedback and a ton of adoption across both Android and iOS, we are ready to drop the beta label and officially announce the general availability of ML Kit’s APIs. All of our APIs (except Selfie Segmentation, Pose Detection, and Entity Extraction) are now in general availability!

Selfie Segmentation

Selfie Segmentation photo

(Example of ML Kit Selfie Segmentation)

With the increased usage of selfie cameras and webcams in today's world, being able to quickly and easily add effects to camera experiences has become a necessity for many app developers.

ML Kit's Selfie Segmentation API allows developers to easily separate the background from users within a scene and focus on what matters. Adding cool effects to selfies or inserting your users into interesting background environments has never been easier. The model works on live and still images, and both half and full body subjects.

Under The Hood

Under the hood graph

(Diagram of Selfie Segmentation API)

The Selfie Segmentation API takes an input image and produces an output mask. Each pixel of the mask is assigned a float number that has a range between [0.0, 1.0]. The closer the number is to 1.0, the higher the confidence that the pixel represents a person, and vice versa.

The API works with static images and live video use cases. During live video (stream_mode), the API will leverage output from previous frames to return smoother segmentation results.

We’ve also implemented a “RAW_SIZE_MASK” option to give developers more options for mask output. By default, the mask produced will be the same size as the input image. If the RAW_SIZE_MASK option is enabled, then the mask will be the size of the model output (256x256). This option makes it easier to apply customized rescaling logic or reduces latency if rescaling to the input image size is not needed for your use case.

Pose Detection Update

Example of Pose Detection API

(Example of updated Pose Detection API; colors represent the Z value)

Last month, we updated our state-of-the-art Pose Detection API with a new model and new features. A quick summary of the enhancements is listed below:

  • More poses added The API now recognizes more poses, targeting fitness and yoga use cases, especially when a user is directly facing the camera.
  • 50% size reduction The base and accurate pose models are now significantly smaller. This change does not impact the quality of the models.
  • Z Coordinate for depth analysis The API now outputs a depth coordinate Z to help determine whether parts of the user's body are in front or behind the user’s hips.

Z Coordinate

The Z Coordinate is an experimental feature that is calculated for every point (excluding the face). The estimate is provided using synthetic data, obtained via the GHUM model (articulated 3D human shape model).

It is measured in "image pixels" like the X and Y coordinates. The Z axis is perpendicular to the camera and passes between a subject's hips. The origin of the Z axis is approximately the center point between the hips (left/right and front/back relative to the camera). Negative Z values are towards the camera; positive values are away from it. The Z coordinate does not have an upper or lower bound.

For more information on the Pose Detection changes, please see our API documentation.

Pose Classification

After the release of Pose Detection, we’ve received quite a bit of requests from developers to help with classifying specific poses within their apps. To help tackle this problem, we partnered with the MediaPipe team to release a pose classification tutorial and Google Colab. In the classification tutorial, we demonstrate how to build and run a custom pose classifier within the ML Kit Android sample app and also demo a simple push-up and squat rep counter using the classifier.

Example of Pose classification and repetition counting with MLKit Pose

(Example of Pose classification and repetition counting with MLKit Pose)

For a deep dive into building your own pose classifier with different camera angles, environment conditions, body shapes etc, please see the pose classification tutorial.

For more general classification tips, please see our Pose Classification Options page on the ML Kit website.

Beyond General Availability

It has been an exciting two years getting ML Kit to general availability and we couldn’t have gotten here without your help and feedback. As we continue to introduce new APIs such as Selfie Segmentation and Pose Detection, your feedback is more important than ever. Please continue to share your enhancement requests and questions with our development team or reach out through our community channels. Let’s build a smarter future together.

ML Kit Pose Detection Makes Staying Active at Home Easier

Posted by Kenny Sulaimon, Product Manager, ML Kit; Chengji Yan and Areeba Abid, Software Engineers, ML Kit

ML Kit logo

Two months ago we introduced the standalone version of the ML Kit SDK, making it even easier to integrate on-device machine learning into mobile apps. Since then we’ve launched the Digital Ink Recognition API, and also introduced the ML Kit early access program. Our first two early access APIs were Pose Detection and Entity Extraction. We’ve received an overwhelming amount of interest in these new APIs and today, we are thrilled to officially add Pose Detection to the ML Kit lineup.

ML Kit Overview

A New ML Kit API, Pose Detection


Examples of ML Kit Pose Detection

ML Kit Pose Detection is an on-device, cross platform (Android and iOS), lightweight solution that tracks a subject's physical actions in real time. With this technology, building a one-of-a-kind experience for your users is easier than ever.

The API produces a full body 33 point skeletal match that includes facial landmarks (ears, eyes, mouth, and nose), along with hands and feet tracking. The API was also trained on a variety of complex athletic poses, such as Yoga positions.

Skeleton image detailing all 33 landmark points

Skeleton image detailing all 33 landmark points

Under The Hood

Diagram of the ML Kit Pose Detection Pipeline

The power of the ML Kit Pose Detection API is in its ease of use. The API builds on the cutting edge BlazePose pipeline and allows developers to build great experiences on Android and iOS, with little effort. We offer a full body model, support for both video and static image use cases, and have added multiple pre and post processing improvements to help developers get started with only a few lines of code.

The ML Kit Pose Detection API utilizes a two step process for detecting poses. First, the API combines an ultra-fast face detector with a prominent person detection algorithm, in order to detect when a person has entered the scene. The API is capable of detecting a single (highest confidence) person in the scene and requires the face of the user to be present in order to ensure optimal results.

Next, the API applies a full body, 33 landmark point skeleton to the detected person. These points are rendered in 2D space and do not account for depth. The API also contains a streaming mode option for further performance and latency optimization. When enabled, instead of running person detection on every frame, the API only runs this detector when the previous frame no longer detects a pose.

The ML Kit Pose Detection API also features two operating modes, “Fast” and “Accurate”. With the “Fast” mode enabled, you can expect a frame rate of around 30+ FPS on a modern Android device, such as a Pixel 4 and 45+ FPS on a modern iOS device, such as an iPhone X. With the “Accurate” mode enabled, you can expect more stable x,y coordinates on both types of devices, but a slower frame rate overall.

Lastly, we’ve also added a per point “InFrameLikelihood” score to help app developers ensure their users are in the right position and filter out extraneous points. This score is calculated during the landmark detection phase and a low likelihood score suggests that a landmark is outside the image frame.

Real World Applications


Examples of a pushup and squat counter using ML Kit Pose Detection

Keeping up with regular physical activity is one of the hardest things to do while at home. We often rely on gym buddies or physical trainers to help us with our workouts, but this has become increasingly difficult. Apps and technology can often help with this, but with existing solutions, many app developers are still struggling to understand and provide feedback on a user’s movement in real time. ML Kit Pose Detection aims to make this problem a whole lot easier.

The most common applications for Pose detection are fitness and yoga trackers. It’s possible to use our API to track pushups, squats and a variety of other physical activities in real time. These complex use cases can be achieved by using the output of the API, either with angle heuristics, tracking the distance between joints, or with your own proprietary classifier model.

To get you jump started with classifying poses, we are sharing additional tips on how to use angle heuristics to classify popular yoga poses. Check it out here.

Learning to Dance Without Leaving Home

Learning a new skill is always tough, but learning to dance without the aid of a real time instructor is even tougher. One of our early access partners, Groovetime, has set out to solve this problem.

With the power of ML Kit Pose Detection, Groovetime allows users to learn their favorite dance moves from popular short-form dance videos, while giving users automated real time feedback on their technique. You can join their early access beta here.

Groovetime App using ML Kit Pose Detection

Staying Active Wherever You Are

Our Pose Detection API is also helping adidas Training, another one of our early access partners, build a virtual workout experience that will help you stay active no matter where you are. This one-of-a-kind innovation will help analyze and give feedback on the user’s movements, using nothing more than just your phone. Integration into the adidas Training app is still in the early phases of the development cycle, but stay tuned for more updates in the future.

How to get started?

If you would like to start using the Pose Detection API in your mobile app, head over to the developer documentation or check out the sample apps for Android and iOS to see the API in action. For questions or feedback, please reach out to us through one of our community channels.