Tag Archives: mobile vision APIs

Android Mobile Vision restores operation and adds Text API

Posted by Michael Sipe, Product Manager

As an important framework for finding objects in photos and video, Mobile Vision operation for Android devices is restored in Google Play Services v9.2.

This new version of Google Play Services fixes a download issue in Google Play Services v.9.0 that caused a service outage. See release notes for details.

We’re also pleased to announce the Text API, a new component for Android Mobile Vision.

The Text API’s optical character recognition technology reads Latin character text (e.g. English, Spanish, German, French, etc.) in photos and returns the text as well as the organizational structure (paragraphs, lines, words). Mobile apps can now:

  • Organize photos that contain text
  • Automate tedious data entry for credit cards, receipts, and business cards
  • Translate documents (along with the Cloud Translate API)
  • Keep track of real objects, such as reading the numbers on subway trains
  • Provide accessibility features

If you want to get started quickly, you can try our codelab which will get Android developers reading text with their apps in under an hour.

Like the Mobile Vision Face and Barcode components, the Text API runs on-device and is suitable for real-time applications. For more information, check out the Mobile Vision Developer site.

Barcode Detection in Google Play services

Posted by Laurence Moroney, Developer Advocate

With the release of Google Play services 7.8 we’re excited to announce that we’ve added new Mobile Vision APIs which provides the Barcode Scanner API to read and decode a myriad of different barcode types quickly, easily and locally.

Barcode detection

Classes for detecting and parsing bar codes are available in the com.google.android.gms.vision.barcode namespace. The BarcodeDetector class is the main workhorse -- processing Frame objects to return a SparseArray<Barcode> types.

The Barcode type represents a single recognized barcode and its value. In the case of 1D barcode such as UPC codes, this will simply be the number that is encoded in the barcode. This is available in the rawValue property, with the detected encoding type set in the format field.

For 2D barcodes that contain structured data, such as QR codes, the valueFormat field is set to the detected value type, and the corresponding data field is set. So, for example, if the URL type is detected, the constant URL will be loaded into the valueFormat, and the URL property will contain the desired value. Beyond URLs, there are lots of different data types that the QR code can support -- check them out in the documentation here.

When using the API, you can read barcodes in any orientation. They don’t always need to be straight on, and oriented upwards!

Importantly, all barcode parsing is done locally, making it really fast, and in some cases, such as PDF-417, all the information you need might be contained within the barcode itself, so you don’t need any further lookups.

You can learn more about using the API by checking out the sample on GitHub. This uses the Mobile Vision APIs along with a Camera preview to detect both faces and barcodes in the same image.

Supported Bar Code Types

The API supports both 1D and 2D bar codes, in a number of sub formats.

For 1D Bar Codes, these are:

AN-13
EAN-8
UPC-A
UPC-E
Code-39
Code-93
Code-128
ITF
Codabar

For 2D Bar Codes, these are:

QR Code
Data Matrix
PDF 417

Learn More

It’s easy to build applications that use bar code detection using the Barcode Scanner API, and we’ve provided lots of great resources that will allow you to do so. Check them out here:

Follow the Code Lab

Read the Mobile Vision Documentation

Explore the sample

Face Detection in Google Play services

Posted by Laurence Moroney, Developer Advocate

With the release of Google Play services 7.8, we announced the addition of new Mobile Vision APIs, which includes a new Face API that finds human faces in images and video better and faster than before. This API is also smarter at distinguishing faces at different orientations and with different facial features facial expressions.

Face Detection

Face Detection is a leap forward from the previous Android FaceDetector.Face API. It’s designed to better detect human faces in images and video for easier editing. It’s smart enough to detect faces even at different orientations -- so if your subject’s head is turned sideways, it can detect it. Specific landmarks can also be detected on faces, such as the eyes, the nose, and the edges of the lips.

Important Note

This is not a face recognition API. Instead, the new API simply detects areas in the image or video that are human faces. It also infers from changes in the position frame to frame that faces in consecutive frames of video are the same face. If a face leaves the field of view, and re-enters, it isn’t recognized as a previously detected face.


Detecting a face

When the API detects a human face, it is returned as a Face object. The Face object provides the spatial data for the face so you can, for example, draw bounding rectangles around a face, or, if you use landmarks on the face, you can add features to the face in the correct place, such as giving a person a new hat.

  • getPosition() - Returns the top left coordinates of the area where a face was detected
  • getWidth() - Returns the width of the area where a face was detected
  • getHeight() - Returns the height of the area where a face was detected
  • getId() - Returns an ID that the system associated with a detected face

Orientation

The Face API is smart enough to detect faces in multiple orientations. As the head is a solid object that is capable of moving and rotating around multiple axes, the view of a face in an image can vary wildly.

Here’s an example of a human face, instantly recognizable to a human, despite being oriented in greatly different ways:

The API is capable of detecting this as a face, even in the circumstances where as much as half of the facial data is missing, and the face is oriented at an angle, such as in the corners of the above image.

Here are the method calls available to a face object:

  • getEulerY() - Returns the rotation of the face around the vertical axis -- i.e. has the neck turned so that the face is looking left or right [The y degree in the above image]
  • getEulerZ() - Returns the rotation of the face around the Z azis -- i.e. has the user tilted their neck to cock the head sideways [The r degree in the above image]

Landmarks

A landmark is a point of interest within a face. The API provides a getLandmarks() method which returns a List , where a Landmark object returns the coordinates of the landmark, where a landmark is one of the following: Bottom of mouth, left cheek, left ear, left ear tip, left eye, left mouth, base of nose, right cheek, right ear, right ear tip, right eye or right mouth.

Activity

In addition to detecting the landmark, the API offers the following function calls to allow you to smartly detect various facial states:

  • getIsLeftEyeOpenProbability() - Returns a value between 0 and 1, giving probability that the left eye is open
  • getIsRighteyeOpenProbability() - Same but for right eye
  • getIsSmilingProbability() - Returns a value between 0 and 1 giving a probability that the face is smiling

Thus, for example, you could write an app that only takes a photo when all of the subjects in the image are smiling.

Learn More

It’s easy to build applications that use facial detection using the Face API, and we’ve provided lots of great resources that will allow you to do so. Check them out here:

Follow the Code Lab

Read the Documentation

Explore the sample