Tag Archives: Google Lens

Ask a techspert: How does Lens turn images to text?

When I was on holiday recently, I wanted to take notes from an ebook I was reading. But instead of taking audio notes or scribbling things down in a notebook, I used Lens to select a section of the book, copy it and paste it into a document. That got me curious: How did all that just happen on my phone? How does a camera recognize words in all their fonts and languages?

I decided to get to the root of the question and speak to Ana Manasovska, a Zurich-based software engineer who is one of the Googlers on the front line of converting an image into text.

Ana, tell us about your work in Lens

I’m involved with the text aspect, so making sure that the app can discern text and copy it for a search or translate it — with no typing needed. For example, if you point your phone’s camera at a poster in a foreign language, the app can translate the text on it. And for people who are blind or have low vision, it can read the text out loud. It’s pretty impressive.

So part of what my team does is get Lens to recognize not just the text, but also the structure of the text. We humans automatically understand writing that is separated into sentences and paragraphs, or blocks and columns, and know what goes together. It’s very difficult for a machine to distinguish that, though.

Is this machine learning then?

Yes. In other words, it uses systems (we call them models) that we’ve trained to discern characters and structure in images. A traditional computing system would have only a limited ability to do this. But our machine learning model has been built to “teach itself” on enormous datasets and is learning to distinguish text structures the same way a human would.

Can the system work with different languages?

Yes, it can recognize 30 scripts, including Cyrillic, Devanagari, Chinese and Arabic. It’s most accurate in Latin-alphabet languages at the moment, but even there, the many different types of fonts present challenges. Japanese and Chinese are tricky because they have lots of nuances in the characters. What seems like a small variation to the untrained eye can completely change the meaning.

What’s the most challenging part of your job?

There’s lots of complexity and ambiguity, which are challenging, so I’ve had to learn to navigate that. And it’s very fast paced; things are moving constantly and you have to ask a lot of questions and talk to a lot of people to get the answers you need.

When it comes to actual coding, what does that involve?

Mostly I use a programming language called C++, which enables you to run processing steps needed to take you from an image to a representation of words and structure.

Hmmm, I sort of understand. What does it look like?

A screenshot of some C++ code against a white background.

This is what C++ looks like.

The code above shows the processing for extracting only the German from a section of text. So say the image showed German, French and Italian — only the German would be extracted for translation. Does that make sense?

Kind of! Tell me what you love about your job

It boils down to my lifelong love of solving problems. But I also really like that I’m building something I can use in my everyday life. I’m based in Zurich but don’t speak German well, so I use Lens for translation into English daily.

Seniors search what they see, using a new Lens

Technology shines when it helps us get things done in our daily lives, and that’s exactly why a group of around 100 very eager seniors gathered in Odense, Denmark. All older than 65, many up to 85, they decided to stay on top of the latest technological tricks and tools. On this March day, the eye-opener was the often overlooked potential in searching for information using visual tools, like Google Lens.

So now the seniors searched their surroundings directly: Scanned trees, plants, animals and buildings, used Translate to get hold of Turkish language menu cards or Japanese sayings, and found product declarations through barcode scanning.

The group was taking part in a training set up by Faglige Seniorer, which organizes 300,000 seniors in total. They first partnered with Google back in 2019 to train seniors in using voice to search, and now the time had come to use live images.

“Often, when I go for a walk, I stumble upon an unknown flower or a tree. Now I can just take a picture to discover what kind of plant I am standing before,” Verner Madsen, one of the participants, remarked. “I don’t need to bring my encyclopedia. It is really smart and helpful.”

Seniors in a country like Denmark are generally very tech savvy, but with digitization constantly advancing — accelerating even faster during two years of COVID-19 — some seniors risk being left behind, creating gaps between generations. During worldwide lockdowns, technological tools have helped seniors stay connected with their family and friends, and smartphone features have helped improve everyday life. One key element of that is delivering accurate and useful information when needed. And for that, typed words on a smartphone keyboard can often be substituted with a visual search, using a single tap on the screen.

Being able to "search what you see" in this way was an eye-opener to many. As the day ended, another avid participant, Henrik Rasmussen, declared he was heading straight home to continue his practice.

“I thought I was up to speed on digital developments, but after today I realize that I still have a lot to learn and discover,” he said.

Search your world, any way and anywhere

People have always gathered information in a variety of ways — from talking to others, to observing the world around them, to, of course, searching online. Though typing words into a search box has become second nature for many of us, it’s far from the most natural way to express what we need. For example, if I’m walking down the street and see an interesting tree, I might point to it and ask a friend what species it is and if they know of any nearby nurseries that might sell seeds. If I were to express that question to a search engine just a few years ago… well, it would have taken a lot of queries.

But we’ve been working hard to change that. We've already started on a journey to make searching more natural. Whether you're humming the tune that's been stuck in your head, or using Google Lens to search visually (which now happens more than 8 billion times per month!), there are more ways to search and explore information than ever before.

Today, we're redefining Google Search yet again, combining our understanding of all types of information — text, voice, visual and more — so you can find helpful information about whatever you see, hear and experience, in whichever ways are most intuitive to you. We envision a future where you can search your whole world, any way and anywhere.

Find local information with multisearch

The recent launch of multisearch, one of our most significant updates to Search in several years, is a milestone on this path. In the Google app, you can search with images and text at the same time — similar to how you might point at something and ask a friend about it.

Now we’re adding a way to find local information with multisearch, so you can uncover what you need from the millions of local businesses on Google. You’ll be able to use a picture or screenshot and add “near me” to see options for local restaurants or retailers that have the apparel, home goods and food you’re looking for.

An animation of a phone showing a search. A photo is taken of Korean cuisine, then Search scans it for restaurants near the user that serve it.

Later this year, you’ll be able to find local information with multisearch.

For example, say you see a colorful dish online you’d like to try – but you don’t know what’s in it, or what it’s called. When you use multisearch to find it near you, Google scans millions of images and reviews posted on web pages, and from our community of Maps contributors, to find results about nearby spots that offer the dish so you can go enjoy it for yourself.

Local information in multisearch will be available globally later this year in English, and will expand to more languages over time.

Get a more complete picture with scene exploration

Today, when you search visually with Google, we’re able to recognize objects captured in a single frame. But sometimes, you might want information about a whole scene in front of you.

In the future, with an advancement called “scene exploration,” you’ll be able to use multisearch to pan your camera and instantly glean insights about multiple objects in a wider scene.

In the future, “scene exploration” will help you uncover insights across multiple objects in a scene at the same time.

Imagine you’re trying to pick out the perfect candy bar for your friend who's a bit of a chocolate connoisseur. You know they love dark chocolate but dislike nuts, and you want to get them something of quality. With scene exploration, you’ll be able to scan the entire shelf with your phone’s camera and see helpful insights overlaid in front of you. Scene exploration is a powerful breakthrough in our devices’ ability to understand the world the way we do – so you can easily find what you’re looking for– and we look forward to bringing it to multisearch in the future.

These are some of the latest steps we’re taking to help you search any way and anywhere. But there’s more we’re doing, beyond Search. AI advancements are helping bridge the physical and digital worlds in Google Maps, and making it possible to interact with the Google Assistant more naturally and intuitively. To ensure information is truly useful for people from all communities, it’s also critical for people to see themselves represented in the results they find. Underpinning all these efforts is our commitment to helping you search safely, with new ways to control your online presence and information.

Go beyond the search box: Introducing multisearch

How many times have you tried to find the perfect piece of clothing, a tutorial to recreate nail art or even instructions on how to take care of a plant someone gifted you — but you didn’t have all the words to describe what you were looking for?

At Google, we’re always dreaming up new ways to help you uncover the information you’re looking for — no matter how tricky it might be to express what you need. That’s why today, we’re introducing an entirely new way to search: using text and images at the same time. With multisearch in Lens, you can go beyond the search box and ask questions about what you see.

Let’s take a look at how you can use multisearch to help with your visual needs, including style and home decor questions. To get started, simply open up the Google app on Android or iOS, tap the Lens camera icon and either search one of your screenshots or snap a photo of the world around you, like the stylish wallpaper pattern at your local coffee shop. Then, swipe up and tap the "+ Add to your search" button to add text.

Multisearch allows people to search with both images and text at the same time.

With multisearch, you can ask a question about an object in front of you or refine your search by color, brand or a visual attribute. Give it a go yourself by using Lens to:

  • Screenshot a stylish orange dress and add the query “green” to find it in another color
  • Snap a photo of your dining set and add the query “coffee table” to find a matching table
  • Take a picture of your rosemary plant and add the query “care instructions”

All this is made possible by our latest advancements in artificial intelligence, which is making it easier to understand the world around you in more natural and intuitive ways. We’re also exploring ways in which this feature might be enhanced by MUM– our latest AI model in Search– to improve results for all the questions you could imagine asking.

This is available as a beta feature in English in the U.S., with the best results for shopping searches. Try out multisearch today in the Google app, the best way to search with your camera, voice and now text and images at the same time.

Here’s how online shoppers are finding inspiration

People shop across Google more than a billion times a day — and we have a pretty good sense of what they’re browsing for. For instance, our Search data shows that the early 2000’s are having a moment. We’re seeing increased search interest in “Y2k fashion” and products like bucket hats and ankle bracelets. Also popular? The iconic Clinique “Happy” perfume, Prada crochet bags and linen pants.

While we know what’s trending, we also wanted to understand how people find inspiration when they’re shopping for lifestyle products. So we surveyed 2,000 U.S. shoppers of apparel, beauty and home decor for our first Inspired Shopping Report. Read on to find out what we learned.

Shopping isn’t always a checklist

According to our findings, most fashion, beauty and home shoppers spend up to two weeks researching products before they buy them. Many, though, are shopping online just for fun — 65% say they often or sometimes shop or browse online when they’re not looking for anything in particular. To help make online shopping even easier and more entertaining, we recently added more browsable search results for fashion and apparel shopping queries. So when you search for chunky loafers, a lime green dress or a raffia bag on Google, you’ll scroll through a visual feed with various colors and styles — alongside other helpful information like local shops, style guides and videos.

Phone screens show animations of a Google search for various clothing items with visual scrolling results

Apparel queries on Search show a more visual display of products

Inspiration can strike anywhere

We know shopping inspiration can strike at any moment. In fact, 60% of shoppers say they often or sometimes get inspired or prompted to buy something even when they aren’t actively shopping. That can come from spotting great street style: 39% of shoppers say they often or sometimes look for a specific outfit online after they see someone wearing it. Or it can come from browsing online: 48% of shoppers have taken a screenshot of a piece of clothing, accessory or home decor item they liked (and 70% of them say they’ve searched for or bought it afterwards). Google Lens can help you shop for looks as soon as you spot them. Just snap a photo or screenshot and you’ll find exact or similar results to shop from.

Sometimes words aren’t enough

We know it can be hard to find what you’re looking for using words alone, even when you do have an image — like that multi-colored, metallic floral wallpaper you took a photo of that would go perfectly with your living room rug. Half of shoppers say they often or sometimes have failed to find a specific piece of clothing or furniture online after trying to describe it with just words. And 66% of shoppers wished they could find an item in a different color or print.

To help you track down those super specific pieces, we’re introducing an entirely new way to search — using text and images at the same time. With multisearch on Lens, you can better uncover the products you’re looking for even when you don’t have all the words to describe them. For example, you might be on the lookout for a scarf in the same pattern as one of your handbags. Just snap a photo of the patterned handbag on Lens and add the query “scarf” to complete your look. Or take a photo of your favorite heels and add the query “flats” to find a more comfortable version.

Phone screen shows the ability to search for a flat version of a pair of yellow high heels, using text and images at the same time.

With multisearch on Lens, you can search with both images and text at the same time

Trying before you buy matters

It’s not always possible to make it to the store and try something on before you buy it — but it matters. Among online beauty shoppers, more than 60% have decided not to purchase a beauty or cosmetic item online because they didn’t know what color or shade to choose, and 41% have decided to return an item because it was the wrong shade. With AR Beauty experiences, you can virtually discover and “try on” thousands of products from brands like Maybelline New York, M.A.C. and Charlotte Tilbury — helping you make more informed decisions. And now, shoppers can try on cosmetics from a variety of brands carried at Ulta Beauty right in Google Search. Just search for a product, like the Morphe Matte Liquid Lipstick or Kylie Cosmetics High Gloss, and find the best shade for you.

Phone screens show animations of models virtually trying on various lipstick and eyeshadow shades.

Google’s AR Beauty experience features products from Ulta Beauty

No matter where you find your shopping inspiration, we hope these features and tools help you discover new products, compare different options and ultimately make the perfect purchase.