Enhance Cloud Search results for PDFs containing images with Optical Character Recognition support

What’s changing

Cloud Search now supports Optical Character Recognition (OCR) based text extraction for PDFs that contain images, such as:

Physical contract documents
Engineering documents that contain annotations or labels
Physical customer invoices, and more

This makes PDFs with images containing text, such as scanned documents, easily searchable by users and improving discoverability of such PDFs.

Who’s impacted

Admins and end users

Why it’s important

Many critical business documents are either in physical form or as scanned versions of those physical documents. With OCR support, admins can now easily index these documents for Cloud Search, making it easier for users to quickly find relevant scanned documents.

In addition, this feature eliminates the need to extract the text offline from PDFs containing images before indexing these documents on Cloud Search.

Getting started

Admins: The feature is ON by default. Use this guide to learn more about how to use enhanced search for PDFs containing images. Important Note: PDFs must be submitted using the Asynchronous Indexing mode and must contain only images.
End Users: No user action is required

Rollout pace

Rapid Release and Scheduled Release domains: This feature is available now for all users.

Availability

Available to Google Workspace Enterprise Plus and Google Cloud Search customers
Not available to Google Workspace Essentials, Business Starter, Business Standard, Business Plus, Enterprise Essentials, Enterprise Standard, Education Fundamentals, Education Plus, Frontline, and Nonprofits, as well as G Suite Basic and Business customers

Resources

Google Workspace Admin Help: Supported file types for text extraction

googblogs.com

All Google blogs and Press in one site