Tag Archives: Ask a Techspert

Ask a techspert: How does Lens turn images to text?

When I was on holiday recently, I wanted to take notes from an ebook I was reading. But instead of taking audio notes or scribbling things down in a notebook, I used Lens to select a section of the book, copy it and paste it into a document. That got me curious: How did all that just happen on my phone? How does a camera recognize words in all their fonts and languages?

I decided to get to the root of the question and speak to Ana Manasovska, a Zurich-based software engineer who is one of the Googlers on the front line of converting an image into text.

Ana, tell us about your work in Lens

I’m involved with the text aspect, so making sure that the app can discern text and copy it for a search or translate it — with no typing needed. For example, if you point your phone’s camera at a poster in a foreign language, the app can translate the text on it. And for people who are blind or have low vision, it can read the text out loud. It’s pretty impressive.

So part of what my team does is get Lens to recognize not just the text, but also the structure of the text. We humans automatically understand writing that is separated into sentences and paragraphs, or blocks and columns, and know what goes together. It’s very difficult for a machine to distinguish that, though.

Is this machine learning then?

Yes. In other words, it uses systems (we call them models) that we’ve trained to discern characters and structure in images. A traditional computing system would have only a limited ability to do this. But our machine learning model has been built to “teach itself” on enormous datasets and is learning to distinguish text structures the same way a human would.

Can the system work with different languages?

Yes, it can recognize 30 scripts, including Cyrillic, Devanagari, Chinese and Arabic. It’s most accurate in Latin-alphabet languages at the moment, but even there, the many different types of fonts present challenges. Japanese and Chinese are tricky because they have lots of nuances in the characters. What seems like a small variation to the untrained eye can completely change the meaning.

What’s the most challenging part of your job?

There’s lots of complexity and ambiguity, which are challenging, so I’ve had to learn to navigate that. And it’s very fast paced; things are moving constantly and you have to ask a lot of questions and talk to a lot of people to get the answers you need.

When it comes to actual coding, what does that involve?

Mostly I use a programming language called C++, which enables you to run processing steps needed to take you from an image to a representation of words and structure.

Hmmm, I sort of understand. What does it look like?

A screenshot of some C++ code against a white background.

This is what C++ looks like.

The code above shows the processing for extracting only the German from a section of text. So say the image showed German, French and Italian — only the German would be extracted for translation. Does that make sense?

Kind of! Tell me what you love about your job

It boils down to my lifelong love of solving problems. But I also really like that I’m building something I can use in my everyday life. I’m based in Zurich but don’t speak German well, so I use Lens for translation into English daily.

Ask a Techspert: How does Google Assistant understand your questions?

Talking to Google Assistant is a real “wow, we’re officially in the future” moment for me, often to the point that it makes me wonder: How do voice-activated virtual assistants work? Specifically, how do they understand what someone is asking, then provide a correct, useful and even delightful response? For instance, a few weeks ago, I was playing around with Assistant before getting to my actual question, which was, naturally, food-related. I said, “Hey Google, what’s your favorite food?” Assistant’s answer was swift: “I’m always hungry for knowledge,” it said. As the cherry on top, the written version that appeared as Assistant spoke had a fork and knife emoji at the end of the sentence.

Assistant can respond to so many different types of queries. Whether you’re curious about the biggest mammal in the world or if your favorite ice cream shop is open, chances are Assistant can answer that for you. And the team that works on Assistant is constantly thinking about how to make its responses better, faster and more helpful than ever. To learn more, I spoke with Distinguished Scientist Françoise Beaufays, an engineer and researcher on Google’s speech team, for a primer on how Assistant understands voice queries and then delivers satisfying (and often charming) answers.

Françoise, what exactly do you do at Google?

I lead the speech recognition team at Google. My job is to build speech recognition systems for all the products at Google that are powered by voice. The work my team does allows Assistant to hear its users, try to understand what its users want and then take action. It also lets us write captions on YouTube videos and in Meet as people speak and allows users to dictate text messages to their friends and family. Speech recognition technology is behind all of those experiences.

Why is it so key for speech recognition to work as well as possible with Assistant?

Assistant is based on understanding what someone said and then taking action based on that understanding. It's so critical that the interaction is very smooth. You only decide to do something by voice that you could do with your fingers if it provides a benefit. If you speak to a machine, and you're not confident it can understand you quickly, the delight disappears.

So how does the machine understand what you're asking? How did it learn to recognize spoken words in the first place?

Everything in speech recognition is machine learning. Machine learning is a type of technology where an algorithm is used to help a “model” learn from data. The way we build a speech recognition system is not by writing rules like: If someone is speaking and makes a sound “k” that lasts 10 to 30 milliseconds and then a sound “a” that lasts 50 to 80 milliseconds, maybe the person is about to say “cat.” Machine learning is more intelligent than that. So, instead, we would present a bunch of audio snippets to the model and tell the model, here, someone said, “This cat is happy.” Here, someone said, “That dog is tired.” Progressively, the model will learn the difference. And it will also understand variations of the original snippets, like “This cat is tired” or “This dog is not happy,” no matter who says it.

The models we use nowadays in Assistant to do this are deep neural networks.

What’s a deep neural network?

It’s a kind of model inspired by how the human brain works. Your brain uses neurons to share information and cause the rest of your body to act. In artificial neural networks, the “neurons” are what we call computational units, or bits of code that communicate with each other. These computational units are grouped into layers. These layers can stack on top of each other to create more complex possibilities for understanding and action. You end up with these “neural networks” that can get big and involved — hence, deep neural networks.

For Assistant, a deep neural network can receive an input, like the audio of someone speaking, and process that information across a stack of layers to turn it into text. This is what we call “speech recognition.” Then, the text is processed by another stack of layers to parse it into pieces of information that help the Assistant understand what you need and help you by displaying a result or taking an action on your behalf. This is what we call “natural language processing.”

Got it. Let’s say I ask Assistant something pretty straightforward, like, “Hey Google, where's the closest dog park?” — how would Assistant understand what I'm saying and respond to my query?

The first step is for Assistant to process that “Hey Google” and realize, “Ah, it looks like this person is now speaking to me and wants something from me.”

Assistant picks up the rest of the audio, processes the question and gets text out of it. As it does that, it tries to understand what your sentence is about. What type of intention do you have?

To determine this, Assistant will parse the text of your question with another neural network that tries to identify the semantics, i.e. the meaning, of your question.

In this case, it will figure out that it's a question it needs to search for — it's not you asking to turn on your lights or anything like that. And since this is a location-based question, if your settings allow it, Assistant can send the geographic data of your device to Google Maps to return the results of which dog park is near you.

Then Assistant will sort its possible answers based on things like how sure it is that it understood you correctly and how relevant its various potential answers are. It will decide on the best answer, then provide it in the appropriate format for your device. It might be just a speaker, in which case it can give you spoken information. If you have a display in front of you, it could show you a map with walking directions.

To make it a little more complicated: If I were to ask something a bit more ambiguous, like, “Hey Google, what is the most popular dog?” — how would it know if I meant dog breed, dog name or the most popular famous dog?

In the first example, Assistant has to understand that you’re looking for a location ("where is") and what you’re looking for ("a dog park"), so it makes sense to use Maps to help. In this, Assistant would recognize it's a more open-ended question and call upon Search instead. What this really comes down to is identifying the best interpretation. One thing that is helpful is that Assistant can rank how satisfied previous users were with similar responses to similar questions — that can help it decide how certain it is of its interpretation. Ultimately, that question would go to Search, and the results would be proposed to you with whatever formatting is best for your device.

It’s also worth noting that there’s a group within the Assistant team that works on developing its personality, including by writing answers to common get-to-know-you questions like the one you posed about Assistant’s favorite food.

One other thing I’ve been wondering about is multi-language queries. If someone asks a question that has bits and bobs of different languages, how does Assistant understand them?

This is definitely more complicated. Roughly half of the world speaks more than one language. I’m a good example of this. I’m Belgian, and my husband is Italian. At home with my family, I speak Italian. But if I'm with just my kids, I may speak to them in French. At work, I speak English. I don't mind speaking English to my Assistant, even when I'm home. But I wouldn't speak to my husband in English because our language is Italian. Those are the kinds of conventions established in multilingual families.

The simplest way of tackling a case where the person speaks two languages is for Assistant to listen to a little bit of what they say and try to recognize which language they’re speaking. Assistant can do this using different models, each dedicated to understanding one specific language. Another way to do it is to train a model that can understand many languages at the same time. That’s the technology we're developing. In many cases, people switch from one language to the other within the same sentence. Having a single model that understands what those languages are is a great solution to that — it can pick up whatever comes to it.

More from this Series

Ask a Techspert

We ask experts at Google to explain complicated topics, for the rest of us. 

View more from Ask a Techspert

Ask a Techspert: How does a building become “water positive”?

Our new Bay View campus is on track to be the largest development project in the world to achieve Water Petal certification from the Living Building Challenge, meaning it will meet the definition of being “net water positive.” That important sustainability effort moves us closer to our 2030 company goal of replenishing more water than we consume. But what exactly does “water positive” mean at Bay View? District Systems Water Lead Drew Wenzel dove into that question head first.

Let’s get right to it: What does “water positive” mean?

“Water positive” at Bay View means we will produce more non-potable water than we have demand for at the Bay View site.

Hmm, let’s back up: What’s non-potable water again?

There are a couple of types of water. There is potable water, which is suitable for human contact and consumption, and there is non-potable water, which is not drinking quality but can be used for other water demands like flushing toilets or irrigation.

Typically, buildings use potable water for everything. At Bay View, we have an opportunity to match the right quality of water with the right demand, and only use potable water when it's necessary. And by over-producing non-potable water, we can share such excess with surrounding areas that might otherwise rely on potable water for non-potable needs.

So basically, are we helping the water system save high-quality water?

Right. In California, we’re years into our most recent drought. We're only going to see increasing pressure on water resources. Regional and State water agencies are working hard to secure the potable water supply. We believe we can best support these public-sector efforts and increase potable water supply by using non-potable water where we can.

So how did we get to water positive for Bay View?

This may be surprising, but it actually all starts with the geothermal energy system. Typically, removing heat from a building is done through evaporating a lot of water via cooling towers. At Bay View, instead of doing that, the geothermal system removes almost all heat by transferring it into the soil beneath the building. This system eliminates at least 90% of the water needed for cooling, or about 5 million gallons of water per year.

Ok, we start by reducing water use. What’s next?

After improving the water efficiency as much as we could through the geothermal system and other measures, we looked at the water resources we have on site. By collecting all of our stormwater and wastewater and treating it for reuse, we are able to meet all of our onsite non-potable demands.

How do we treat it?

Stormwater treatment starts with the retention ponds that collect rain. Water from these ponds is slowly drawn down and pumped to a central treatment plant, where it goes through several stages of filtration, treatment and disinfection.

All of the wastewater — from our cafes, restrooms and showers — is collected and sent to the central plant, where it undergoes two stages of filtration and treatment. From there, the water goes out to a field of reeds that naturally pull out nutrients, creating a perennial green landscape that supports local wildlife. Finally, the water is sent back to the plant again for final treatment and disinfection.

The final output from both stormwater and wastewater treatment processes is recycled water that meets State regulations for non-potable use.

A technical diagram shows stormwater and wastewater moving from an office building into a central treatment plant for filtration, treatment and disinfection. The wastewater is sent to a green landscape for additional treatment before returning to the plant. From the central plant, recycled water is returned to the building.

What happens if we don’t have enough non-potable water at any given time?

If our recycled water tank doesn’t have enough non-potable water to meet campus needs, it will fill up with potable water. At that point, everything in the tank would be considered non-potable because it has been mixed. Again, this isn’t the best use of high-quality potable water, and it’s something we’ll work to avoid.

If we create extra non-potable water, how can we share it?

This is something we’re thinking through, and there are a few ways we could go about it. The easiest way would be to export it to adjacent properties for irrigation to water something like a baseball field. That would require a relatively minimal effort of adding pipes.

Another way would involve potentially working with a local municipality that has a recycled water system, creating additional redundancy and resiliency within that system.

Aside from sharing extra water, Bay View already helps by treating and reusing its own water instead of adding demands to municipal systems. That leaves capacity on the system for the rest of the community and allows water providers to focus their time and resources on other needs across the water system.

Thinking beyond Bay View, is “water positive” important in places that aren’t in a drought?

Definitely. There are many cities that have combined sewer and stormwater systems that can be overwhelmed by excess water from buildings or large storm events. That can cause back-ups and flooding into the streets. Water positive systems, like the one at Bay View, can help communities and developers avoid placing additional pressure on municipal water systems.

Can any development site be water positive?

While a system like Bay View’s might not make sense for every project, there are different scales and variations of onsite water capture, treatment and reuse that are valuable, even if a building doesn't get to official water positive status. Every little bit is going to help.

Ask a Techspert: What’s breaking my text conversations?

Not to brag, but I have a pretty excellent group chat with my friends. We use it to plan trips, to send happy birthdays and, obviously, to share lots and lots of GIFs. It’s the best — until it’s not. We don’t all have the same kind of phones; we’ve got both Android phones and iPhones in the mix. And sometimes, they don’t play well together. Enter “green bubble issues” — things like, missing read receipts and typing indicators, low-res photos and videos, broken group chats…I could go on describing the various potential communication breakdowns, but you probably know what I’m talking about. Instead, I decided to ask Google’s Elmar Weber: What’s the problem with messaging between different phone platforms?

First, can you tell me what you do at Google?

I lead several engineering organizations including the team that builds Google’s Messages app, which is available on most Android phones today.

OK, then you’re the perfect person to talk to! So my first question: When did this start being a problem? I remember wayback when I had my first Android phone, I would text iPhone friends…and it was fine.

Texting has been around for a long time. Basic SMS texting — which is what you’re talking about here — is 30 years old. SMS, which means Short Message Service, was originally only 160 characters. Back then you couldn’t do things like send photos or reactions or read receipts. In fact, mobile phones weren’t made for messaging, they were designed for making phone calls. To send a message you actually had to hit the number buttons to get to the letters that you’d have to spell out. But people started using it a ton, and it sort of exploded. So this global messaging industry took off. MMS (Multimedia Messaging Service) was then introduced in the early 2000s, which let people send photos and videos for the first time. But that came with a lot of limitations too.

Got it. Then the messaging apps all started building their own systems to support modern messaging features like emoji reactions and typing indicators, because SMS/MMS were created long before those things were even dreamed of?

Yes, exactly.

I guess…we need a new SMS?

Well the new SMS is RCS, which stands for Rich Communication Services. It enables things like high-resolution photo and video sharing, read receipts, emoji reactions, better security and privacy with end-to-end encryption and more. Most major carriers support RCS, and Android users have been using it for years.

How long has RCS been around?

Version one of RCS was released December 15, 2008.

Who made it?

RCS isn’t a messaging app like Messages or WhatsApp — it’s an industry-wide standard. Similar to other technical standards (USB, 5G, email), it was developed by a group of different companies. In the case of RCS, it was coordinated by an association of global wireless operators, hardware chip makers and other industry players.

RCS makes messaging better, so if Android phones use this, then why are texts from iPhones still breaking? RCS sounds like an upgrade — so shouldn’t that fix everything?

There’s the hitch! So Android phones use RCS, and iPhones still don’t. iPhones still rely on SMS and MMS for conversations with Android users, which is why your group chats feel so outdated. Think of it like this: If you have two groups of people who use different spoken languages, they can communicate effectively in their respective languages to other people who speak their language, but they can’t talk to each other. And when they try to talk to one another, they have to act out what they're saying, as though they're playing charades. Now think of RCS as a magic translator that helps multiple groups speak fluently — but every group has to use the translator, and if one doesn’t, they’re each going to need to use motions again.

Do you think iPhones will start using RCS too?

I hope so! It’s not just about things like the typing indicators, read receipts or emoji reactions — everyone should be able to pick up their phone and have a secure, modern messaging experience. Anyone who has a phone number should get that, and that’s been lost a little bit because we’re still finding ourselves using outdated messaging systems. But the good news is that RCS could bring that back and connect all smartphone users, and because so many different companies and carriers are working together on it, the future is bright.

Check outAndroid.com/GetTheMessageto learn why now is the time for Apple to fix texting.

Ask a Techspert: How do digital wallets work?

In recent months, you may have gone out to dinner only to realize you left your COVID vaccine card at home. Luckily, the host is OK with the photo of it on your phone. In this case, it’s acceptable to show someone a picture of a card, but for other things it isn’t — an image of your driver’s license or credit card certainly won’t work. So what makes digital versions of these items more legit than a photo? To better understand the digitization of what goes into our wallets and purses, I talked to product manager Dong Min Kim, who works on the brand new Google Wallet. Google Wallet, which will be coming soon in over 40 countries, is the new digital wallet for Android and Wear OS devices…but how does it work?

Let’s start with a basic question: What is a digital wallet?

A digital wallet is simply an application that holds digital versions of the physical items you carry around in your actual wallet or purse. We’ve seen this shift where something you physically carry around becomes part of your smartphone before, right?


Look at the camera: You used to carry around a separate item, a camera, to take photos. It was a unique device that did a specific thing. Then, thanks to improvements in computing power, hardware and image processing algorithms, engineers merged the function of the camera — taking photos — into mobile phones. So now, you don’t have to carry around both, if you don’t want to.

Ahhh yes, I am old enough to remember attending college gatherings with my digital camera andmy flip phone.

Ha! So think about what else you carry around: your wallet and your keys.

So the big picture here is that digital wallets help us carry around less stuff?

That’s certainly something we’re thinking about, but it’s more about how we can make these experiences — the ones where you need to use a camera, or in our case, items from your wallet — better. For starters, there’s security: It's really hard for someone to take your phone and use your Google Wallet, or to take your card and add it to their own phone. Your financial institution will verify who you are before you can add a card to your phone, and you can set a screen lock so a stranger can’t access what’s on your device. And should you lose your device, you can remotely locate, lock or even wipe it from “Find My Device.”

What else can Google Wallet do that my physical wallet can’t?

If you saved your boarding pass for a flight to Google Wallet, it will notify you of delays and gate changes. When you head to a concert, you’ll receive a notification on your phone beforehand, reminding you of your saved tickets.

Wallet also works with other Google apps — for instance if you’re taking the bus to see a friend and look up directions in Google Maps, your transit card and balance will show up alongside the route. If you're running low on fare, you can tap and add more. We’ll also give you complete control over how items in your wallet are used to enable these experiences; for example, the personal information on your COVID vaccine pass is kept on your device and never shared without your permission, not even with Google.

Plus, even if you lose your credit or debit card and you’re waiting for the replacement to show up, you can still use that card with Google Wallet because of the virtual number attached to it.

This might be taking a step backwards, but can I pay someone from my Google Wallet? As in can I send money from a debit card, or straight from my bank account?

That’s actually where the Google Pay app — which is available in markets like the U.S., India and Singapore — comes in. We’ll keep growing this app as a companion app where you can do more payments-focused things like send and receive money from friends or businesses, discover offers from your favorite retailers or manage your transactions.

OK, but can I pay with my Google Wallet?

Yes,you can still pay with the cards stored in your Google Wallet in stores where Google Pay is accepted; it’s simple and secure.

Use payment cards in Google Wallet in stores with Google Pay, got it — but how does everything else “get” into Wallet?

We've already partnered with hundreds of transit agencies, retailers, ticket providers, health agencies and airlines so they can create digital versions of their cards or tickets for Google Wallet. You can add a card or ticket directly to Wallet, or within the apps or sites of businesses we partner with, you’ll see an option to add it to Wallet. We’re working on adding more types of content for Wallet, too, like digital IDs, or office and hotel keys.

An image of the Google Wallet app open on a Pixel phone. The app is showing a Chase Freedom Unlimited credit card, a ticket for a flight from SFO to JFK, and a Walgreens cash reward pass. In the bottom right hand corner, there is a “Add to Wallet” button.

Developers can make almost any item into a digital pass.. Developers can use the templates we’ve created, like for boarding passes and event tickets — or they can use a generic template if it’s something more unique and we don’t have a specific solution for it yet. This invitation to developers is part of what I think makes Google Wallet interesting; it’s very open.

What exactly do you mean by “open” exactly?

Well, the Android platform is open — any Android developer can use and develop for Wallet. One thing that’s great about that is all these features and tools can be made available on less expensive phones, too, so it isn’t only people who can afford the most expensive, newest phones out there who can use Google Wallet. Even if a phone can’t use some features of Google Wallet, it’s possible for developers to use QR or barcodes for their content, which more devices can access.

So working with Google Wallet is easier for developers. Any ways you’re making things easier for users?

Plenty of them! In particular, we’re working on ways to make it easy to add objects directly from your phone too. For instance, today if you take a screenshot of your boarding pass or Covid vaccine card from an Android device, we’ll give you the option to add it directly to your Google Wallet!

Ask a Techspert: What’s that weird box next to my emoji?

A few months ago, I received a message from a friend that, I have to confess, made absolutely no sense. Rows of emoji followed by different boxes — like this ?????? — appeared…so I sent back a simple “huh?” Apparently she’d sent me a string of emoji that were meant to tell me about her weekend and let’s just say that it was all lost in translation.

To find out exactly what caused our communication breakdown, I decided to ask emoji expert Jennifer Daniel.

Why did the emoji my friend typed to me show up as ?????? ?

Oy boy. No bueno. Sounds like your friend was using some of the new emoji that were released this month. (Not to rub it in but they are so good!!! There’s a salute ?, a face holding back tears, ? and another face that’s melting ?!) Sadly, you’re not the only one who’s losing things in translation. For way too long, 96% of Android users couldn’t see emoji released the year they debuted.

And it isn't just an Android problem: Despite being one of the earliest platforms to include emoji, Gmail received its first emoji update since 2016 last year! (You read that right: Two-thousand-sixteen!) This often resulted in skin toned and gendered emoji appearing broken.

Illustration of a few examples of "broken" skin tone and gendered emoji.

A few examples of "broken" skin tone and gendered emoji.

What!? That’s crazy. Why?

Yeah, strong agree. Historically, emoji have been at the mercy of operating system updates. New OS? New emoji. If you didn’t update your device, it meant that when new emoji were released, they would display as those black boxes you saw, which are referred to as a “tofu.” It gets worse: What if your phone doesn’t offer OS updates? Well, you’d have to buy a newer phone. Maybe that’d be worth it so you can use the new finger heart emoji (?)???

Emoji are fundamental to digital communication. Meanwhile, there is a very real economic divide between people who can afford to get a new phone every year (or who can afford a fancy phone that generously updates the OS) and everyone else in the world. That is lame, absurd and I personally hate it. Now for the good news: Check your phone, I bet you can see the emoji from your friend’s email today.

Whaaaaat! You’re right. Why can I see them now but I couldn’t a few months ago?

Well, this year Google finally decoupled emoji updates from operating system updates. That means YOU get an emoji and YOU get an emoji and YOU get an emoji!

Examples of emoji

What does “decoupled” emoji updates mean?

It basically means emoji can be updated on your phone or your computer without you updating your operating system. As of this month, all apps that use Appcompat (a tool that enables Android apps to be compatible with several Android versions)will automatically get the latest and greatest emoji so you can send and receive emoji even if you don’t have the newest phone. And this will work across Google: All 3,366 emoji will now appear in Gmail, on Chrome OS and lots of other places when people send them to you. Apps that make their own emoji rather than defaulting on the operating system may find themselves falling behind as taking on the responsibility of maintaining and distributing emoji is a lot of work. This is why we're so thrilled to see Google rely on Noto Emoji so everyone can get the latest emoji quickly.

Since you mentioned Gmail being an early emoji adopter, it makes me wonder…how old are emoji? Where do they come from?

A volunteer-based organization called the Unicode Consortium digitizes the world’s languages. They’re the reason why when you send Hindi from one computer the computer on the other end can render it in Hindi. In their mission to ensure different platforms and operating systems can work together, they standardize the underlying technology that Google, Apple, Twitter and others use to render their emoji fonts.

You see, emoji are a font. That’s right. A font. I know. They look like tiny pictures but they operate the same way any other letter of the alphabet does when it enters our digital realm.

Like the letter A (U+0041) or the letter अ (U+0905), each emoji is assigned a code point (for instance, ? is U+1F624) by the Unicode Consortium. (Some emoji contain multiple code points — I’m generalizing a bit! Don’t tell the Unicode Consortium.) Point being: Emoji are a font and like fonts, some emoji on iPhones look different than they do on Pixel phones.

A variety of the new emoji designs that are now visible  across Google products including Gmail, Google Chat, YouTube Live Chat and Chrome OS.

A variety of the new emoji designs that are now visible across Google products including Gmail, Google Chat, YouTube Live Chat and Chrome OS.

So, the Unicode Consortium makes fonts?

No, they manage a universal character encoding set that written languages map to. Google's Noto project is a global font project to support those existing scripts and languages. Google uses Noto Emoji and provides resources to ensure your emoji render on Android and in desktop environments including metadata like iconography and shortcodes too! All Google chat products now support this.

We’re also working on ways for you to download or embed Noto Emoji into your website of choice via fonts.google.com. So, stay tuned ?.

Emoji are a font. Black boxes are tofus. The more you know! I guess I have one final question: Now that I can send (and see!) the melting face emoji, will it look identical no matter who I send it to?

Well, every emoji font has its own flavor. Some of these design variations are minor and you might not even notice them. With others, primarily the smilies (???), the details really matter — people are hardwired to read micro-expressions! The last thing anyone wants is an emoji you see as a smile and someone else sees as a downward smirk — it can ruin friendships! Fortunately, over the past three years designs have converged, so there’s less chance of being misunderstood ?.

Ask a Techspert: What does AI do when it doesn’t know?

As humans, we constantly learn from the world around us. We experience inputs that shape our knowledge — including the boundaries of both what we know and what we don’t know.

Many of today’s machines also learn by example. However, these machines are typically trained on datasets and information that doesn’t always include rare or out-of-the-ordinary examples that inevitably come up in real-life scenarios. What is an algorithm to do when faced with the unknown?

I recently spoke with Abhijit Guha Roy, an engineer on the Health AI team, and Ian Kivlichan, an engineer on the Jigsaw team, to hear more about using AI in real-world scenarios and better understand the importance of training it to know when it doesn’t know.

Abhijit, tell me about your recent research in the dermatology space.

We’re applying deep learning to a number of areas in health, including in medical imaging where it can be used to aid in the identification of health conditions and diseases that might require treatment. In the dermatological field, we have shown that AI can be used to help identify possible skin issues and are in the process of advancing research and products, including DermAssist, that can support both clinicians and people like you and me.

In these real-world settings, the algorithm might come up against something it's never seen before. Rare conditions, while individually infrequent, might not be so rare in aggregate. These so-called “out-of-distribution” examples are a common problem for AI systems which can perform less well when it’s exposed to things they haven’t seen before in its training.

Can you explain what “out-distribution” means for AI?

Most traditional machine learning examples that are used to train AI deal with fairly unsubtle — or obvious — changes. For example, if an algorithm that is trained to identify cats and dogs comes across a car, then it can typically detect that the car — which is an “out-of-distribution” example — is an outlier. Building an AI system that can recognize the presence of something it hasn’t seen before in training is called “out-of-distribution detection,” and is an active and promising field of AI research.

Okay, let’s go back to how this applies to AI in medical settings.

Going back to our research in the dermatology space, the differences between skin conditions can be much more subtle than recognizing a car from a dog or a cat, even more subtle than recognizing a previously unseen “pick-up truck” from a “truck”. As such, the out-of-distribution detection task in medical AI demands even more of our focused attention.

This is where our latest research comes in. We trained our algorithm to recognize even the most subtle of outliers (a so-called “near-out of distribution” detection task). Then, instead of the model inaccurately guessing, it can take a safer course of action — like deferring to human experts.

Ian, you’re working on another area where AI needs to know when it doesn’t know something. What’s that?

The field of content moderation. Our team at Jigsaw used AI to build a free tool called Perspective that scores comments according to how likely they are to be considered toxic by readers. Our AI algorithms help identify toxic language and online harassment at scale so that human content moderators can make better decisions for their online communities. A range of online platforms use Perspective more than 600 million times a day to reduce toxicity and the human time required to moderate content.

In the real world, online conversations — both the things people say and even the ways they say them — are continually changing. For example, two years ago, nobody would have understood the phrase “non-fungible token (NFT).” Our language is always evolving, which means a tool like Perspective doesn't just need to identify potentially toxic or harassing comments, it also needs to “know when it doesn’t know,” and then defer to human moderators when it encounters comments very different from anything it has encountered before.

In our recent research, we trained Perspective to identify comments it was uncertain about and flag them for separate human review. By prioritizing these comments, human moderators can correct more than 80% of the mistakes the AI might otherwise have made.

What connects these two examples?

We have more in common with the dermatology problem than you'd expect at first glance — even though the problems we try to solve are so different.

Building AI that knows when it doesn’t know something means you can prevent certain errors that might have unintended consequences. In both cases, the safest course of action for the algorithm entails deferring to human experts rather than trying to make a decision that could lead to potentially negative effects downstream.

There are some fields where this isn’t as important and others where it’s critical. You might not care if an automated vegetable sorter incorrectly sorts a purple carrot after being trained on orange carrots, but you would definitely care if an algorithm didn’t know what to do about an abnormal shadow on an X-ray that a doctor might recognize as an unexpected cancer.

How is AI uncertainty related to AI safety?

Most of us are familiar with safety protocols in the workplace. In safety-critical industries like aviation or medicine, protocols like “safety checklists” are routine and very important in order to prevent harm to both the workers and the people they serve.

It’s important that we also think about safety protocols when it comes to machines and algorithms, especially when they are integrated into our daily workflow and aid in decision-making or triaging that can have a downstream impact.

Teaching algorithms to refrain from guessing in unfamiliar scenarios and to ask for help from human experts falls within these protocols, and is one of the ways we can reduce harm and build trust in our systems. This is something Google is committed to, as outlined in its AI Principles.

Ask a Techspert: What’s a subsea cable?

Whenever I try to picture the internet at work, I see little pixels of information moving through the air and above our heads in space, getting where they need to go thanks to 5G towers and satellites in the sky. But it’s a lot deeper than that — literally. Google Cloud’s Vijay Vusirikala recently talked with me about why the coolest part of the internet is really underwater. So today, we’re diving into one of the best-kept secrets in submarine life: There wouldn’t be an internet without the ocean.

First question: How does the internet get underwater?

We use something called a subsea cable that runs along the ocean floor and transmits bits of information.

What’s a subsea cable made of?

These cables are about the same diameter as the average garden hose, but on the inside they contain thin optical fibers. Those fibers are surrounded by several layers of protection, including two layers of ultra-high strength steel wires, water-blocking structures and a copper sheath. Why so much protection? Imagine the pressure they are under. These cables are laid directly on the sea bed and have tons of ocean water on top of them! They need to be super durable.

Two photographs next to each other, the first showing a cable with outer protection surrounding it. The second photograph shows a stripped cable with copper wires and optical fibers inside.

A true inside look at subsea cables: On the left, a piece of the Curie subsea cable showing the additional steel armoring for protection close to the beach landing. On the right, a cross-sectional view of a typical deep water subsea cable showing the optical fibers, copper sheath, and steel wires for protection.

Why are subsea cables important?

Subsea cables are faster, can carry higher traffic loads and are more cost effective than satellite networks. Subsea cables are like a highway that has the right amount of lanes to handle rush-hour traffic without getting bogged down in standstill jams. Subsea cables combine high bandwidths (upwards of 300 to 400 terabytes of data per second) with low lag time. To put that into context, 300 to 400 terabytes per second is roughly the same as 17.5 million people streaming high quality videos — at the same time!

So when you send a customer an email, share a YouTube video with a family member or talk with a friend or coworker on Google Meet, these underwater cables are like the "tubes" that deliver those things to the recipient.

Plus, they help increase internet access in places that have had limited connectivity in the past, like countries in South America and Africa. This leads to job creation and economic growth in the places where they’re constructed.

How many subsea cables are there?

There are around 400 subsea cables criss-crossing the planet in total. Currently, Google invests in 19 of them — a mix of cables we build ourselves and projects we’re a part of, where we work together with telecommunications providers and other companies.

Video introducing Curie, a subsea cable.

Wow, 400! Does the world need more of them?

Yes! Telecommunications providers alongside technology companies are still building them around the world. At Google, we invest in subsea cables for a few reasons: One, our Google applications and Cloud services keep growing. This means more network demand from people and businesses in every country around the world. And more demand means building more cables and upgrading existing ones, which have less capacity than their modern counterparts.

Two, you cannot have a single point of failure when you're on a mission to connect the world’s information and make it universally accessible. Repairing a subsea cable that goes down can take weeks, so to guard against this we place multiple cables in each cross section. This gives us sufficient extra cable capacity so that services aren’t affected for people around the world.

What’s your favorite fact about subsea cables?

Three facts, if I may!

First, I love that we name many of our cables after pioneering women, like Curie for Marie Curie, which connects California to Chile, and Grace Hopper, which links the U.S., Spain and the U.K. Firmina, which links the U.S., Argentina, Brazil and Uruguay, is named after Brazil’s first novelist, Maria Firmina dos Reis.

Second, I’m proud that the cables are kind to their undersea homes. They’re environmentally friendly and are made of chemically inactive materials that don't harm the flora and fauna of the ocean, and they generally don’t move around much! We’re very careful about where we place them; we study each beach’s marine life conditions and we adjust our attachment timeline so we don’t disrupt a natural lifecycle process, like sea turtle nesting season. For the most part they’re stationary and don't disrupt the ocean floor or marine life. Our goal is to integrate into the underwater landscape, not bother it.

And lastly, my favorite fact is actually a myth: Most people think sharks regularly attack our subsea cables, but I’m aware of exactly one shark attack on a subsea cable that took place more than 15 years ago. Truly, the most common problems for our cables are caused by people doing things like fishing, trawling (which is when a fishing net is pulled through the water behind a boat) and anchor drags (when a ship drifts without holding power even though it has been anchored).