Misinformation can have dramatic consequences on people’s lives — from finding reliable information on everything from elections to vaccinations — and the pandemic has only exacerbated the problem as accurate information can save lives. To help fight the rise in minsformation, Full Fact, a nonprofit that provides tools and resources to fact checkers, turned to Google.org for help. Today, ahead of International Fact Checking Day, we’re sharing the impact of this work.
Every day, millions of claims, like where to vote and COVID-19 vaccination rates, are made across a multitude of platforms and media. It was becoming increasingly difficult for fact checkers to identify the most important claims to investigate.
We’re not just fighting an epidemic; we’re fighting an infodemic. Fake news spreads faster and more easily than this virus and is just as dangerous.Tedros Adhanom Director General of the World Health Organization
Last year, Google.org provided Full Fact with $2 million and seven Googlers from the Google.org Fellowship, a pro-bono program that matches teams of Googlers with nonprofits for up to six months to work full-time on technical projects. The Fellows helped Full Fact build AI tools to help fact checkers detect claims made by key politicians, then group them by topic and match them with similar claims from across press, social networks and even radio using speech to text technology. Over the past year, Full Fact boosted the amount of claims they could process by 1000x, detecting and clustering over 100,000 claims per day — that’s more than 36.5 million total claims per year!
The AI-powered tools empower fact checkers to be more efficient, so that they can spend more time actually checking and debunking facts rather than identifying which facts to check. Using a machine learning BERT-based model, the technology now works across four languages (English, French, Portuguese and Spanish). And Full Fact’s work has expanded to South Africa, Nigeria, Kenya with their partner Africa Check and Argentina with Chequeado. In total in 2020, Full Fact’s fact checks appeared 237 million times across the internet.
If you’re interested in learning more about how you can use Google to fact check and spot misinformation, check out some of our tips and tricks. Right now more than ever we need to empower citizens to find reliable authoritative information, and we're excited about the impact that Full Fact and its partners have had in making the internet a safer place for everyone.
Sixteen years ago, many of us held a printout of directions in one hand and the steering wheel in the other to get around— without information about the traffic along your route or details about when your favorite restaurant was open. Since then, we’ve been pushing the boundaries of what a map can do, propelled by the latest machine learning. This year, we’re on track to bring over 100 AI-powered improvements to Google Maps so you can get the most accurate, up-to-date information about the world, exactly when you need it. Here's a snapshot of how we're using AI to make Maps work better for you with a number of updates coming this year.
Navigate indoors with Live View
We all know that awkward moment when you're walking in the opposite direction of where you want to go — Live View uses AR cues to avoid just that. Live View is powered by a technology called global localization, which uses AI to scan tens of billions of Street View images to understand your orientation. Thanks to new advancements that help us understand the precise altitude and placement of objects inside a building, we’re now able to bring Live View to some of the trickiest-to-navigate places indoors: airports, transit stations and malls.
If you’re catching a plane or train, Live View can help you find the nearest elevator and escalators, your gate, platform, baggage claim, check-in counters, ticket office, restrooms, ATMs and more. Arrows and accompanying directions will point you the right way. And if you need to pick something up from the mall, use Live View to see what floor a store is on and how to get there so you can get in and out in a snap. Indoor Live View is live now on Android and iOS in a number of malls in Chicago, Long Island, Los Angeles, Newark, San Francisco, San Jose, and Seattle. It starts rolling out in the coming months in select airports, malls, and transit stations in Tokyo and Zurich, with more cities on the way.
Plan ahead with more information about weather and air quality
With the new weather layer, you can quickly see current and forecasted temperature and weather conditions in an area — so you’ll never get caught in the rain without an umbrella. And the new air quality layer shows you how healthy (or unhealthy) the air is — information that’s especially helpful if you have allergies or are in a smoggy or fire-prone area. Data from partners like The Weather Company, AirNow.gov and the Central Pollution Board power these layers that start rolling out on Android and iOS in the coming months. The weather layer will be available globally and the air quality layer will launch in Australia, India, and the U.S., with more countries to come.
Find more eco-friendly options to get around
With insights from the U.S. Department of Energy’s National Renewable Energy Lab, we’re building a new routing model that optimizes for lower fuel consumption based on factors like road incline and traffic congestion. This is all part of the commitment we made last September to help one billion people who use our products take action to reduce their environmental footprint. Soon, Google Maps will default to the route with the lowest carbon footprint when it has approximately the same ETA as the fastest route. In cases where the eco-friendly route could significantly increase your ETA, we’ll let you compare the relative CO2 impact between routes so you can choose. Always want the fastest route? That’s OK too — simply adjust your preferences in Settings. Eco-friendly routes launch in the U.S. on Android and iOS later this year, with a global expansion on the way.
From Amsterdam to Jakarta, cities around the world have established low emission zones — areas that restrict polluting vehicles like certain diesel cars or cars with specific emissions stickers — to help keep the air clean. To support these efforts, we’re working on alerts to help drivers better understand when they’ll be navigating through one of these zones. You can quickly know if your vehicle is allowed in the area, choose an alternative mode of transportation, or take another route. Low emission zone alerts launch this June in Germany, the Netherlands, France, Spain, and the UK on Android and iOS, with more countries coming soon.
But we know that getting around sustainably goes beyond driving. So we’re making it easier to choose more sustainable options when you’re on the go. Soon you’ll get a comprehensive view of all routes and transportation modes available to your destination — you can compare how long it’ll take to get there by car, transit or bike without toggling between tabs. Using advanced machine learning models, Maps will automatically prioritize your preferred modes — and even boost modes that are popular in your city. For example, if you bike a lot, we’ll automatically show you more biking routes. And if you live in a city like New York, London, Tokyo, or Buenos Aires where taking the subway is popular, we’ll rank that mode higher. This rolls out globally in the coming months on Android and iOS.
Save time with curbside grocery pickup on Maps
Delivery and curbside pickup have grown in popularity during the pandemic — they’re convenient and minimize contact. To make this process easier, we’re bringing helpful shopping information to stores’ Business Profiles on Maps and Search, like delivery providers, pickup and delivery windows, fees, and order minimums. We’re rolling this out on mobile Search starting with Instacart and Albertsons Cos. stores in the U.S., with plans to expand to Maps and other partners.
This summer, we’re also teaming up with U.S. supermarket Fred Meyer, a division of The Kroger Co., on a pilot in select stores in Portland, Oregon to make grocery pickup easier. After you place an order for pickup on the store’s app, you can add it to Maps. We’ll send you a notification when it’s time to leave, and let you share your arrival time with the store. Your ETA is continuously updated, based on location and traffic. This helps the store prioritize your order so it’s ready as soon as you get there. Check in on the Google Maps app, and they’ll bring your order right out for a seamless, fast, no-contact pickup.
All of these updates are possible thanks to AI advancements that have transformed Google Maps into a map that can reflect the millions of changes made around the world every day — in the biggest cities and the smallest towns. Whether you’re getting around, exploring an area, or knocking out errands, let Google Maps help you find your way.
In our JournalismAI report, journalists around the world told researchers they are eager to collaborate and explore the benefits of AI, especially as it applies to newsgathering, production and distribution.
To facilitate their collaboration, the Google News Initiative and Polis – the journalism think tank at the London School of Economics and Political Science – are launching the JournalismAI Collab Challenges, an opportunity for three groups of five newsrooms from the Americas, Europe, the Middle East and Africa, and Asia Pacific to experiment together.
Each cohort – selected by Polis – will have six months to either cover global news stories using AI-powered storytelling techniques or to develop prototypes of new AI-based products and processes.
Participants will receive support from the JournalismAI team and partner institutions in each region: in the Americas, the challenge will be co-hosted with the Knight Lab at Northwestern University; in Europe, the Middle East and Africa, the challenge will be co-hosted with BBC News Labs and Clwstwr. JournalismAI’s partner in Asia Pacific will be announced later this year.
Newsrooms interested in participating in this free, year-long program must have made AI a strategic priority, must guarantee the participation of two staff members – one from editorial and one from the technical department – who can participate two to four hours a week, and must embrace collaboration with other publishers.
The outcome of their work – whose ownership will be shared among participants – will be presented at the second edition of the JournalismAI Festival in November.
Applications for the Americas challenge and the Europe, the Middle East and Africa Challenge open today and close at 11:59 PM GMT on April 5. The Challenge will open later this year in Asia Pacific.
When Nithya Sambasivan was finishing her undergraduate degree in engineering, she felt slightly unsatisfied. “I wanted to know, ‘how will the technology I build impact people?’” she says. Luckily, she would soon discover the field of Human Computer Interaction (HCI) and pursue her graduate degrees.
She completed her master’s and PhD in HCI focusing on technology design for low-income communities in India. “I worked with sex workers, slum communities, microentrepreneurs, fruit and vegetables sellers on the streetside...” she says. “I wanted to understand what their values, aspirations and struggles are, and how we can build with them in mind.”
Today, Nithya is the founder of the HCI group at the Google Research India lab and an HCI researcher at PAIR, a multidisciplinary team at Google that explores the human side of AI by doing fundamental research, building tools, creating design frameworks, and working with diverse communities. She recently sat down to answer some of our questions about her journey to researching responsible AI, fairness and championing historically underrepresented technology users.
How would you explain your job to someone who isn't in tech?
I’m a human-computer interaction (HCI) researcher, which means I study people to better understand how to build technology that works for them. There’s been a lot of focus in the research community on building AI systems and the possibility of positively impacting the lives of billions of people. I focus on human-centered, responsible AI; specifically looking for ways it can empower communities in the Global South, where over 80% of the world’s population lives. Today, my research outlines a road map for fairness research in India, calling for re-contextualizing datasets and models while empowering communities and enabling an entire fairness ecosystem.
What originally inspired your interest in technology?
I grew up in a middle class family, the younger of two daughters from the South of India. My parents have very progressive views about gender roles and independence, especially in a conservative society — this definitely influenced what and how I research; things like gender, caste and poverty. In school, I started off studying engineering, which is a conventional path in India. Then, I went on to focus on HCI and designing with my own and other under-represented communities around the world.
How do Google’s AI Principles inform your research? And how do you approach your research in general?
Context matters. A general theory of algorithmic fairness cannot be based on “Western” populations alone. My general approach is to research an important long-term, foundational problem. For example, our research on algorithmic fairness reframes the conversation on ethical AI away from focusing mainly on Western, meaning largely European or North American, perspectives. Another project revealed that AI developers have historically focused more on the model — or algorithm — instead of the data. Both deeply affect the eventual AI performance, so being so focused on only one aspect creates downstream problems. For example, data sets may fully miss sub-populations, so when they are deployed, they may have much higher error rates or be unusable. Or they could make outcomes worse for certain groups, by misidentifying them as suspects for crimes or erroneously denying them bank loans they should receive.
These insights not only enable AI systems to be better designed for under-represented communities; they also generate new considerations in the field of computing for humane and inclusive data collection, gender and social status representation, and privacy and safety needs of the most vulnerable. They are then incorporated into Google products that millions of people use, such as Safe Folder on Files Go, Google Go’s incognito mode, Neighbourly‘s privacy, Safe Safer by Google Maps and Women in STEM videos.
What are some of the questions you’re seeking to answer with your work?
How do we challenge inherent “West”-centric assumptions for algorithmic fairness, tech norms and make AI work better for people around the world?
For example, there’s an assumption that algorithmic biases can be fixed by adding more data from different groups. But in India, we've found that data can't always represent individuals or events for many different reasons like economics and access to devices. The data could come mostly from middle class Indian men, since they’re more likely to have internet access. This means algorithms will work well for them. Yet, over half the population — primarily women, rural and tribal communities — lack access to the internet and they’re left out. Caste, religion and other factors can also contribute to new biases for AI models.
How should aspiring AI thinkers and future technologists prepare for a career in this field?
It’s really important that Brown and Black people enter this field. We not only bring technical skills but also lived experiences and values that are so critical to the field of computing. Our communities are the most vulnerable to AI interventions, so it’s important we shape and build these systems. To members of this community: Never play small or let someone make you feel small. Involve yourself in the political, social and ecological aspects of the invisible, not on tech innovation alone. We can’t afford not to.
Almost a year and a half ago, we announced Google Research India, an AI Lab in Bangalore. Along with advancing fundamental research in AI, we sought to support nonprofits and universities to solve big challenges in the field of Public Health, Conservation, Agriculture and Education using AI.
In 2020, we announced AI for Social Good would be supporting six projects from NGOs and Academic collaborations to utilize the application of AI to assist underserved communities that have not traditionally benefited from the prowess of AI. Google provided scientific and technical contributions for each project, as well as funding from Google Research and Google.org.
Today, we are pleased to provide an update on some of these projects, and highlight successes and challenges in AI for Social Good.
India accounts for 11 percent of global maternal mortality, and a woman in India dies in childbirth every fifteen minutes. However, almost 90 percent of maternal deaths are avoidable if women receive timely intervention. Access to timely, accurate health information is a significant challenge among women in rural areas and urban slums. ARMMAN runs mMitra, a free mobile voice call service that sends timely and targeted preventive care information to expectant and new mothers. Adherence to such public health programs is a big challenge but timely intervention to retain people is beneficial to improve maternal health outcomes. Researchers from Google Research and IIT Madras worked with ARMMAN to design an AI technology that could provide an indication of women who were at risk of dropping out from the health information program. The early targeted identification helps ARMMAN to personalise interventions and retain these people, improving maternal health outcomes. Test results demonstrated that use of AI technology was able to bring down the risk of drop-offs by up to 32% for women at high risk of dropping out. The team is currently working towards scaling this to 300,000+ women in mMitra and we are excited to continue to support ARMMAN as the project team increases the reach of this technology to 1M+ mothers and children in 2021. To support ARMMAN’s growing efforts, Google.org is committing another USD $530,000 to ARMMAN to scale the use of AI for social good to reach underserved women and children.
The importance of targeted interventions to improve health outcomes cannot be overstated. AI can help play a critical role in its advancement, however the lack of availability of high-quality public health data is a significant challenge. Frequently, data collection is enabled through the labour and expertise of frontline health workers and yet Khushibaby discovered various challenges in the field that inhibited the collection of the high-quality data required. Researchers from Singapore Management University and Google Research collaborated with Khushibaby to develop AI algorithms with over 90 percent accuracy that provided timely predictions about the drop in health workers’ data quality. These timely predictions help Khushibaby provide assistance to the health worker to enable them to record high-quality data. The project team is currently planning to deploy and safely test this technology with 250+ healthcare workers who serve over 15,000 people.
India is home to some of the most biodiverse regions, where human settlements and wildlife co-exist in forests. However, interactions between local communities and wildlife can result in conflicts, leading to loss of crops, cattle, and even human life. Wildlife Conservation Trust needed help to proactively predict human-wildlife conflict to enable them to take timely steps to protect local communities, wildlife, and the forest. With technical and scientific contributions from Google Research and Singapore Management University, Wildlife Conservation Trust designed AI models that help predict human-wildlife conflict in Bramhapuri Forest Division in Tadoba, Maharashtra. These novel AI techniques provide over 80 percent accuracy in predicting human-wildlife conflict in the Bramhapuri Forest Division in the test results. This work is currently being field-tested in Chandrapur district, Madhya Pradesh, to ensure safe deployment.
Local Language Adoption
Six out of ten children globally do not achieve minimum proficiency levels in reading, despite attending school. Lack of access to reading content in one’s local language is a significant challenge in addressing this problem. Storyweaver, an open-licence driven organization, works towards bridging that gap by developing and curating story books in a multitude of local languages to help children learn new concepts, new ideas and open up their imagination. Storyweaver needed help to enable access to creation tools in low-resource languages. Creation tools in low-resource languages suffer from very low accuracy, adding barriers to content creation. The team at AI4Bharat & IIT Madras, with support from Google, developed state-of-the-art Natural Language Understanding tools to develop open-language models for two low-resource languages (Konkani, Maithal), making story reading easier for 70,000+ children.
We are humbled to see the progress in the development and deployment of AI technologies for social good in a short period of time. We are confident in our development and support of a collaborative model that involves experts from Academia and NGOs, as well as contributions from Google, to advance AI for social good. Continuing our scientific, technical, and financial support of organizations working in this space, we are excited to announce an expanded follow-up program to initiate collaborative AI for Social Good projects in Asia Pacific and Sub-Saharan Africa.
We recognize that AI is not a magic wand to solve all the world’s challenges, it is however a powerful tool to help experts and social-impact organisations to explore and address hard, unanswered questions.
Posted by Milind Tambe, Director of AI for Social Good, Google Research India, and Manish Gupta, Director, Google Research India
Posted by Jason Scott, Head of Startup Developer Ecosystem, USA & Saurabh Sharma, Head of Assistant Investments
In December 2020, we announced our inaugural Google for Startups Accelerator: Voice AI program, a 10-week digital accelerator designed to help North American voice technology startups to take their businesses to the next level. Today, we are proud to announce our cohort of 12 companies - collectively leveraging voice user interfaces to solve complex challenges across accessibility, education, and care:
tinychef is a voice-first Culinary AI™ platform that helps consumers in their kitchen from their dinner dilemma, to grocery planning, grocery shopping, and cooking their meals with interactive experiences on smart speakers.
Voicify’s SaaS platform allows brands and large enterprises to easily design, build, and deploy voice apps, chatbots, and other conversational experiences across voice assistants, chatbots, and social media platforms.
Vowel brings the best of productivity and communication platforms into a single, integrated meeting tool.
The program kicks off on Monday, March 15th and will focus on product design, technical infrastructure, customer acquisition, and leadership development - granting our founders access to an expansive network of mentors, senior executives, and industry leaders,
We are incredibly excited to support this group of entrepreneurs over the next three months, connecting them with the best of our people, products, and programming to advance their companies and solutions.
We look forward to augmenting the work of these 12 innovators and to showcasing their accomplishments on Thursday, May 20th at 12:30pm EST at our Google for Startups Accelerator: Voice AI Demo Day.
Dr. Marian Croak has spent decades working on groundbreaking technology, with over 200 patents in areas such as Voice over IP, which laid the foundation for the calls we all use to get things done and stay in touch during the pandemic. For the past six years she’s been a VP at Google working on everything from site reliability engineering to bringing public Wi-Fi to India’s railroads.
Now, she’s taking on a new project: making sure Google develops artificial intelligence responsibly and that it has a positive impact. To do this, Marian has created and will lead a new center of expertise on responsible AI within Google Research.
I sat down (virtually) with Marian to talk about her new role and her vision for responsible AI at Google. You can watch parts of our conversation in the video above, or read on for a few key points she discussed.
Technology should be designed with people in mind.
“My graduate studies were in both quantitative analysis and social psychology. I did my dissertation on looking at societal factors that influence inter-group bias as well as altruistic behavior. And so I’ve always approached engineering with that kind of mindset, looking at the impact of what we’re doing on users in general. [...] What I believe very, very strongly is that any technology that we’re designing should have a positive impact on society.”
Responsible AI research requires input from many different teams.
“I’m excited to be able to galvanize the brilliant talent that we have at Google working on this. We have to make sure we have the frameworks and the software and the best practices designed by the researchers and the applied engineers [...] so we can proudly say that our systems are behaving in responsible ways. The research that’s going on needs to inform that work, the work we’re doing with engineering better solutions, and it needs to be shared with the outside world as well. I am thrilled to support teams doing both pure research as well as applied research — both are valuable and absolutely necessary to ensure technology has a positive impact on the world.’’
This area is new, and there are still growing pains.
“This field, the field of responsible AI and ethics, is new. Most institutions have only developed principles, and they’re very high-level, abstract principles, in the last five years. There’s a lot of dissension, a lot of conflict in terms of trying to standardize on normative definitions of these principles. Whose definition of fairness, or safety, are we going to use? There’s quite a lot of conflict right now within the field, and it can be polarizing at times. And what I’d like to do is have people have the conversation in a more diplomatic way, perhaps, than we’re having it now, so we can truly advance this field.”
Compromise can be tough, but the result is worth it.
“If you look at the work we did on VoIP, it required such a huge organizational and business shift in the company I was working for. We had to bring teams together that were very contentious — people who had domain expertise in the internet and could move in a fast and furious way, along with others who were very methodical and disciplined in their approach. Huge conflicts! But over time it settled, and we were able to really make a huge difference in terms of being able to scale VoIP in a way that allowed it to handle billions and billions of calls in a very robust and resilient way. So it was more than worth it.”
Nearly 75% of India’s population — which possesses the second highest number of internet users in the world — interacts with the web primarily using Indian languages, rather than English. Over the next five years, that number is expected to rise to 90%. In order to make Google Maps as accessible as possible to the next billion users, it must allow people to use it in their preferred language, enabling them to explore anywhere in the world.
However, the names of most Indian places of interest (POIs) in Google Maps are not generally available in the native scripts of the languages of India. These names are often in English and may be combined with acronyms based on the Latin script, as well as Indian language words and names. Addressing such mixed-language representations requires a transliteration system that maps characters from one script to another, based on the source and target languages, while accounting for the phonetic properties of the words as well.
For example, consider a user in Ahmedabad, Gujarat, who is looking for a nearby hospital, KD Hospital. They issue the search query, કેડી હોસ્પિટલ, in the native script of Gujarati, the 6th most widely spoken language in India. Here, કેડી (“kay-dee”) is the sounding out of the acronym KD, and હોસ્પિટલ is “hospital”. In this search, Google Maps knows to look for hospitals, but it doesn't understand that કેડી is KD, hence it finds another hospital, CIMS. As a consequence of the relative sparsity of names available in the Gujarati script for places of interest (POIs) in India, instead of their desired result, the user is shown a result that is further away.
To address this challenge, we have built an ensemble of learned models to transliterate names of Latin script POIs into 10 languages prominent in India: Hindi, Bangla, Marathi, Telugu, Tamil, Gujarati, Kannada, Malayalam, Punjabi, and Odia. Using this ensemble, we have added names in these languages to millions of POIs in India, increasing the coverage nearly twenty-fold in some languages. This will immediately benefit millions of existing Indian users who don't speak English, enabling them to find doctors, hospitals, grocery stores, banks, bus stops, train stations and other essential services in their own language.
Transliteration vs. Transcription vs. Translation
Our goal was to design a system that will transliterate from a reference Latin script name into the scripts and orthographies native to the above-mentioned languages. For example, the Devanagari script is the native script for both Hindi and Marathi (the language native to Nagpur, Maharashtra). Transliterating the Latin script names for NIT Garden and Chandramani Garden, both POIs in Nagpur, results in एनआईटी गार्डन and चंद्रमणी गार्डन, respectively, depending on the specific language’s orthography in that script.
It is important to note that the transliterated POI names are not translations. Transliteration is only concerned with writing the same words in a different script, much like an English language newspaper might choose to write the name Горбачёв from the Cyrillic script as “Gorbachev” for their readers who do not read the Cyrillic script. For example, the second word in both of the transliterated POI names above is still pronounced “garden”, and the second word of the Gujarati example earlier is still “hospital” — they remain the English words “garden” and “hospital”, just written in the other script. Indeed, common English words are frequently used in POI names in India, even when written in the native script. How the name is written in these scripts is largely driven by its pronunciation; so एनआईटी from the acronym NIT is pronounced “en-aye-tee”, not as the English word “nit”. Knowing that NIT is a common acronym from the region is one piece of evidence that can be used when deriving the correct transliteration.
Note also that, while we use the term transliteration, following convention in the NLP community for mapping directly between writing systems, romanization in South Asian languages regardless of the script is generally pronunciation-driven, and hence one could call these methods transcription rather than transliteration. The task remains, however, mapping between scripts, since pronunciation is only relatively coarsely captured in the Latin script for these languages, and there remain many script-specific correspondences that must be accounted for. This, coupled with the lack of standard spelling in the Latin script and the resulting variability, is what makes the task challenging.
We use an ensemble of models to automatically transliterate from the reference Latin script name (such as NIT Garden or Chandramani Garden) into the scripts and orthographies native to the above-mentioned languages. Candidate transliterations are derived from a pair of sequence-to-sequence(seq2seq)models. One is a finite-state model for general text transliteration, trained in a manner similar to models used by Gboard on-device for transliteration keyboards. The other is a neural long short-term memory (LSTM) model trained, in part, on the publicly released Dakshina dataset. This dataset contains Latin and native script data drawn from Wikipedia in 12 South Asian languages, including all but one of the languages mentioned above, and permits training and evaluation of various transliteration methods. Because the two models have such different characteristics, together they produce a greater variety of transliteration candidates.
To deal with the tricky phenomena of acronyms (such as the “NIT” and “KD” examples above), we developed a specialized transliteration module that generates additional candidate transliterations for these cases.
For each native language script, the ensemble makes use of specialized romanization dictionaries of varying provenance that are tailored for place names, proper names, or common words. Examples of such romanization dictionaries are found in the Dakshina dataset.
Scoring in the Ensemble
The ensemble combines scores for the possible transliterations in a weighted mixture, the parameters of which are tuned specifically for POI name accuracy using small targeted development sets for such names.
For each native script token in candidate transliterations, the ensemble also weights the result according to its frequency in a very large sample of on-line text. Additional candidate scoring is based on a deterministic romanization approach derived from the ISO 15919 romanization standard, which maps each native script token to a unique Latin script string. This string allows the ensemble to track certain key correspondences when compared to the original Latin script token being transliterated, even though the ISO-derived mapping itself does not always perfectly correspond to how the given native script word is typically written in the Latin script.
In aggregate, these many moving parts provide substantially higher quality transliterations than possible for any of the individual methods alone.
The following table provides the per-language quality and coverage improvements due to the ensemble over existing automatic transliterations of POI names. The coverage improvement measures the increase in items for which an automatic transliteration has been made available. Quality improvement measures the ratio of updated transliterations that were judged to be improvements versus those that were judged to be inferior to existing automatic transliterations.
* Unknown / No Baseline.
As with any machine learned system, the resulting automatic transliterations may contain a few errors or infelicities, but the large increase in coverage in these widely spoken languages marks a substantial expansion of the accessibility of information within Google Maps in India. Future work will include using the ensemble for transliteration of other classes of entities within Maps and its extension to other languages and scripts, including Perso-Arabic scripts, which are also commonly used in the region.
Acknowledgments: This work was a collaboration between the authors and Jacob Farner, Jonathan Herbert, Anna Katanova, Andre Lebedev, Chris Miles, Brian Roark, Anurag Sharma, Kevin Wang, Andy Wildenberg, and many others.
Posted by Cibu Johny, Software Engineer, Google Research and Saumya Dalal, Product Manager, Google Geo
Posted by Jeff Dean, Senior Fellow and SVP of Google Research and Health, on behalf of the entire Google Research community
When I joined Google over 20 years ago, we were just figuring out how to really start on the journey of making a high quality and comprehensive search service for information on the web, using lots of curiously wired computers. Fast forward to today, and while we’re taking on a much broader array of technical challenges, it’s still with the same overarching goal of organizing the world's information and making it universally accessible and useful. In 2020, as the world has been reshaped by COVID-19, we saw the ways research-developed technologies could help billions of people better communicate, understand the world, and get things done. I’m proud of what we’ve accomplished, and excited about new possibilities on the horizon.
The goal of Google Research is to work on long-term, ambitious problems across a wide range of important topics — from predicting the spread of COVID-19, to designing algorithms, to learning to translate more and more languages automatically, to mitigating bias in ML models. In the spirit of our annual reviews for 2019, 2018, and more narrowly focused reviews of some work in 2017 and 2016, this post covers key Google Research highlights from this unusual year. This is a long post, but grouped into many different sections. Hopefully, there’s something interesting in here for everyone! For a more comprehensive look, please see our >750 research publications in 2020.
COVID-19 and Health As the impact of COVID-19 took a tremendous toll on people’s lives, researchers and developers around the world rallied together to develop tools and technologies to help public health officials and policymakers understand and respond to the pandemic. Apple and Google partnered in 2020 to develop the Exposure Notifications System (ENS), a Bluetooth-enabled privacy-preserving technology that allows people to be notified if they have been exposed to others who have tested positive for COVID-19. ENS supplements traditional contact tracing efforts and has been deployed by public health authorities in more than 50 countries, states and regions to help curb the spread of infection.
In the early days of the pandemic, public health officials signalled their need for more comprehensive data to combat the virus’ rapid spread. Our Community Mobility Reports, which provide anonymized insights into movement trends, are helping researchers not only understand the impact of policies like stay-at-home directives and social distancing, and also conduct economic forecasting.
Community Mobility Reports: Navigate and download a report for regions of interest.
Our own researchers have also explored using this anonymized data to forecast COVID-19 spreadusing graph neural networks instead of traditional time series-based models.
Although the research community knew little about this disease and secondary effects initially, we’re learning more every day. Our COVID-19 Search Trends symptoms allows researchers to explore temporal or symptomatic associations, such as anosmia — the loss of smell that is sometimes a symptom of the virus. To further support the broader research community, we launched Google Health Studies app to provide the public ways to participate in research studies.
Our COVID-19 Search Trends are helping researchers study the link between the disease’s spread and symptom-related searches.
Teams across Google are contributing tools and resources to the broader scientific community, which is working to address the health and economic impacts of the virus.
Accurate information is critical in dealing with public health threats. We collaborated with many product teams at Google in order to improve information quality about COVID-19 in Google News and Search through supporting fact checking efforts, as well as similar efforts in YouTube.
To determine the aggressiveness of prostate cancers, pathologists examine a biopsy and assign it a Gleason grade. In published research, our system was able to grade with higher accuracy than a cohort of pathologists who have not had specialist training in prostate cancer. The first stage of the deep learning system assigns a Gleason grade to every region in a biopsy. In this biopsy, green indicates Gleason pattern 3, while yellow indicates Gleason pattern 4.
Our study examines how a deep learning model can quantify hemoglobin levels — a measure doctors use to detect anemia — from retinal images.
This year has also brought exciting demonstrations of how these same technologies can peer into the human genome. Google’s open-source tool, DeepVariant, identifies genomic variants in sequencing data using a convolutional neural network, and this year won the FDA Challenge for best accuracy in 3 out of 4 categories. Using this same tool, a study led by the Dana-Farber Cancer Institute improved diagnostic yield by 14% for genetic variants that lead to prostate cancer and melanoma in a cohort of 2,367 cancer patients.
Research doesn’t end at measurement of experimental accuracy. Ultimately, truly helping patients receive better care requires understanding how ML tools will affect people in the real world. This year we began work with Mayo Clinic to develop a machine learning system to assist in radiotherapy planning and to better understand how this technology could be deployed into clinical practice. With our partners in Thailand, we’ve used diabetic eye disease screening as a test case in how we can build systems with people at the center, and recognize the fundamental role of diversity, equity, and inclusion in building tools for a healthier world.
Weather, Environment and Climate Change Machine learning can help us better understand the environment and make useful predictions to help people in both their everyday life as well as in disaster situations. For weather and precipitation forecasting, computationally intensive physics-based models like NOAA’s HRRR have long reigned supreme. We have been able to show, though, that ML-based forecasting systems can predict current precipitation with much better spatial resolution (“Is it raining in my local park in Seattle?” and not just “Is it raining in Seattle?”) and can produce short-term forecasts of up to eight hours that are considerably more accurate than HRRR, and can compute the forecast more quickly, yet with higher temporal and spatial resolution.
A visualization of predictions made over the course of roughly one day. Left: The 1-hour HRRR prediction made at the top of each hour, the limit to how often HRRR provides predictions. Center: The ground truth, i.e., what we are trying to predict. Right: The predictions made by our model. Our predictions are every 2 minutes (displayed here every 15 minutes) at roughly 10 times the spatial resolution made by HRRR. Notice that we capture the general motion and general shape of the storm.
Based on this work, we’re excited to partner with NOAA on using AI and ML to amplify NOAA’s environmental monitoring, weather forecasting and climate research using Google Cloud’s infrastructure.
Accessibility Machine learning continues to provide amazing opportunities for improving accessibility, because it can learn to transfer one kind of sensory input into others. As one example, we released Lookout, an Android application that can help visually impaired users by identifying packaged foods, both in a grocery store and also in their kitchen cupboard at home. The machine learning system behind Lookout demonstrates that a powerful-but-compact machine learning model can accomplish this in real-time on a phone for nearly 2 million products.
Similarly, people who communicate with sign language find it difficult to use video conferencing systems because even if they are signing, they are not detected as actively speaking by audio-based speaker detection systems. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing presents a real-time sign language detection model and demonstrates how it can be used to provide video conferencing systems with a mechanism to identify the person signing as the active speaker.
Applications of ML to Other Fields Machine learning continues to prove vital in helping us make progress across many fields of science. In 2020, in collaboration with the FlyEM team at HHMI Janelia Research Campus, we released the drosophila hemibrain connectome, the large synapse-resolution map of brain connectivity, reconstructed using large-scale machine learning models applied to high-resolution electron microscope imaging of brain tissue. This connectome information will aid neuroscientists in a wide variety of inquiries, helping us all better understand how brains function. Be sure to check out the very fly interactive 3-D UI!
The application of ML to problems in systems biology is also on the rise. Our Google Accelerated Science team, in collaboration with our colleagues at Calico, have been applying machine learning to yeast, to get a better understanding of how genes work together as a whole system. We’ve also been exploring how to use model-based reinforcement learning in order to design biological sequences like DNA or proteins that have desirable properties for medical or industrial uses. Model-based RL is used to improve sample efficiency. At each round of experimentation the policy is trained offline using a simulator fit on functional measurements from prior rounds. On various tasks like designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structures, we find that model-based RL is an attractive alternative to existing methods.
We’ve also seen success applying machine learning to core computer science and computer systems problems, a growing trend that is spawning entire new conferences like MLSys. In Learning-based Memory Allocation for C++ Server Workloads, a neural network-based language model predicts context-sensitive per-allocation site object lifetime information, and then uses this to organize the heap so as to reduce fragmentation. It is able to reduce fragmentation by up to 78% while only using huge pages (which are better for TLB behavior). End-to-End, Transferable Deep RL for Graph Optimization described an end-to-end transferable deep reinforcement learning method for computational graph optimization that shows 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization, with 15x faster convergence over prior computation graph optimization methods.
Overview of GO: An end-to-end graph policy network that combines graph embedding and sequential attention.
As described in Chip Design with Deep Reinforcement Learning, we have also been applying reinforcement learning to the problem of place-and-route in computer chip design. This is normally a very time-consuming, labor-intensive process, and is a major reason that going from an idea for a chip to actually having a fully designed and fabricated chip takes so long. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. The system is able to generate placements that usually outperform those of human chip design experts, and we have been using this system (running on TPUs) to do placement and layout for major portions of future generations of TPUs. Menger is a recent infrastructure we’ve built for large-scale distributed reinforcement learning that is yielding promising performance for difficult RL tasks such as chip design.
Macro placements of Ariane, an open-source RISC-V processor, as training progresses. On the left, the policy is being trained from scratch, and on the right, a pre-trained policy is being fine-tuned for this chip. Each rectangle represents an individual macro placement. Notice how the cavity that is occupied by non-macro logic cells that is discovered by the from-scratch policy is already present from the outset in the pre-trained policy’s placement.
The Model Cards work that was introduced in collaboration with the University of Toronto in 2019 has been growing in influence. Indeed, many well-known models like OpenAI’s GPT-2 and GPT-3, many of Google’s MediaPipe models and various Google Cloud APIs have all adopted Model Cards as a way of giving users of a machine learning model more information about the model’s development and the observed behavior of the model under different conditions. To make this easier for others to adopt for their own machine learning models, we also introduced the Model Card Toolkit for easier model transparency reporting. In order to increase transparency in ML development practices, we demonstrate the applicability of a range of best practices throughout the dataset development lifecycle, including data requirements specification and data acceptance testing.
Differential privacy is a way to formally quantify privacy protections and requires a rethinking of the most basic algorithms to operate in a way that they do not leak information about any particular individual. In particular, differential privacy can help in addressing memorization effects and information leakage of the kinds mentioned above. In 2020 there were a number of exciting developments, from more efficient ways of computing private empirical risk minimizers to private clustering methods with tight approximation guarantees and private sketching algorithms. We also open sourced the differential privacy libraries that lie at the core of our internal tools, taking extra care to protect against leakage caused by the floating point representation of real numbers. These are the exact same tools that we use to produce differentially private COVID-19 mobility reports that have been a valuable source of anonymous data for researchers and policymakers.
To help developers assess the privacy properties of their classification models we released an ML privacy testing library in Tensorflow. We hope this library will be the starting point of a robust privacy testing suite that can be used by any machine learning developer around the world.
Membership inference attack on models for CIFAR10. The x-axis is the test accuracy of the model, and y-axis is vulnerability score (lower means more private). Vulnerability grows while test accuracy remains the same — better generalization could prevent privacy leakage.
In addition to pushing the state of the art in developing private algorithms, I am excited about the advances we made in weaving privacy into the fabric of our products. One of the best examples is Chrome’s Privacy Sandbox, which changes the underpinnings of the advertising ecosystem and helps systematically protect individuals’ privacy. As part of the project, we proposed and evaluated a number of different APIs, including federated learning of cohorts (FLoC) for interest based targeting, and aggregate APIs for differentially private measurement.
Security for our users is also an area of considerable interest for us. In 2020, we continued to improve protections for Gmail users, by deploying a new ML-based document scanner that provides protection against malicious documents, which increased malicious office document detection by 10% on a daily basis. Thanks to its ability to generalize, this tool has been very effective at blocking some adversarial malware campaigns that elude other detection mechanisms and increased our detection rate by 150% in some cases.
On the account protection side, we released a fully open-source security key firmware to help advance state of art in the two factor authentication space, staying focused on security keys as the best way to protect accounts against phishing.
Natural Language Understanding Better understanding of language is an area where we saw considerable progress this year. Much of the work in this space from Google and elsewhere now relies on Transformers, a particular style of neural network model originally developed for language problems (but with a growing body of evidence that they are also useful for images, videos, speech, protein folding, and a wide variety of other domains).
One area of excitement is in dialog systems that can chat with a user about something of interest, often encompassing multiple turns of interaction. While successful work in this area to date has involved creating systems that are specialized around particular topics (e.g., Duplex) these systems cannot carry on general conversations. In pursuit of the general research goal of creating systems capable of much more open-ended dialog, in 2020 we described Meena, a learned conversational agent that aspirationally can chat about anything. Meena achieves high scores on a dialog system metric called SSA, which measures both sensibility and specificity of responses. We’ve seen that as we scale up the model size of Meena, it is able to achieve lower perplexity and, as shown in the paper, lower perplexity correlates extremely closely with improved SSA.
A chat between Meena (left) and a person (right).
One well-known issue with generative language models and dialog systems is that when discussing factual data, the model’s capacity may not be large enough to remember every specific detail about a topic, so they generate language that is plausible but incorrect. (This is not unique to machines — people can commit these errors too.) To address this in dialog systems, we are exploring ways to augment a conversational agent by giving it access to external information sources (e.g., a large corpus of documents or a search engine API), and developing learning techniques to use this as an additional resource in order to generate language that is consistent with the retrieved text. Work in this area includes integrating retrieval into language representation models (and a key underlying technology for this to work well is something like ScaNN, an efficient vector similarity search, to efficiently match the desired information to information in the corpus of text). Once appropriate content is found, it can be better understood with approaches like using neural networks to find answers in tables and extracting structured data from templatic documents. Our work on PEGASUS, a state-of-the-art model for abstractive text summarization can also help to create automatic summaries from any piece of text, a general technique useful in conversations, retrieval systems, and many other places.
Efficiency of NLP models has also been a significant focus for our work in 2020. Techniques like transfer learning and multi-task learning can dramatically help with making general NLP models usable for new tasks with modest amounts of computation. Work in this vein includes transfer learning explorations in T5, sparse activation of models (as in our GShard work mentioned below), and more efficient model pre-training with ELECTRA. Several threads of work also look to improve on the basic Transformer architecture, including Reformer, which uses locality-sensitive hashing and reversible computation to more efficiently support much larger attention windows, Performers, which use an approach for attention that scales linearly rather than quadratically (and discusses its use in the context of protein modeling), and ETC and BigBird, which utilize global and sparse random connections, to enable linear scaling for larger and structured sequences. We also explored techniques for creating very lightweight NLP models that are 100x smaller than a larger BERT model, but perform nearly as well for some tasks, making them very suitable for on-device NLP. In Encode, Tag and Realize, we also explored new approaches for generative text models that use edit operations rather than fully general text generation, which can have advantages in computation requirements for generation, more control over the generated text, and require less training data.
Language Translation Effective language translation helps bring the world closer together by enabling us to all communicate, despite speaking different languages. To date, over a billion people around the world use Google Translate, and last year we added support for five new languages (Kinyarwanda, Odia, Tatar, Turkmen and Uyghur, collectively spoken by 75 million people). Translation quality continues to improve, showing an average +5 BLEU point gain across more than 100 languages from May 2019 to May 2020, through a wide variety of techniques like improved model architectures and training, better handling of noise in datasets, multilingual transfer and multi-task learning, and better use of monolingual data to improve low-resource languages (those without much written public content on the web), directly in line with our goals of improving ML fairness of machine learning systems to provide benefits to the greatest number of people possible.
We strongly believe that continued scaling of multilingual translation models will bring further quality improvements, especially to the billions of speakers of low-resource languages around the world. In GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Google researchers showed that training sparsely-activated multilingual translation models of up to 600 billion parameters leads to major improvements in translation quality for 100 languages as measured by BLEU score improvement over a baseline of a separate 400M parameter monolingual baseline model for each language. Three trends stood out in this work, illustrated by Figure 6 in the paper, reproduced below (see the paper for complete discussion):
The BLEU score improvements from multilingual training are high for all languages but are even higher for low-resource languages (right hand side of graph is higher than the left) whose speakers represent billions of people in some of the world’s most marginalized communities. Each rectangle on the figure represents languages with 1B speakers.
The larger and deeper the model, the larger the BLEU score improvements were across all languages (the lines hardly ever cross).
Large, sparse models also show a ~10x to 100x improvement in computational efficiency for model training over training a large, dense model, while simultaneously matching or significantly exceeding the BLEU scores of the large, dense model (computational efficiency discussed in paper).
We’re actively working on bringing the benefits demonstrated in this GShard research work to Google Translate, as well as training single models that cover 1000 languages, including languages like Dhivehi and Sudanese Arabic (while sharing some challenges that needed solving along the way).
Machine Learning Algorithms We continue to develop new machine learning algorithms and approaches for training that enable systems to learn more quickly and from less supervised data. By replaying intermediate results during training of neural networks, we find that we can fill idle time on ML accelerators and therefore can train neural networks faster. By changing the connectivity of neurons dynamically during training, we can find better solutions compared with statically-connected neural networks. We also developed SimCLR, a new self-supervised and semi-supervised learning technique that simultaneously maximizes agreement between differently transformed views of the same image and minimizes agreement between transformed views of different images. This approach significantly improves on the best self-supervised learning techniques.
ImageNet top-1 accuracy of linear classifiers trained on representations learned with different self-supervised methods (pretrained on ImageNet). Gray cross indicates supervised ResNet-50.
Reinforcement Learning Reinforcement learning (RL), which learns to make good long-term decisions from limited experience, has been an important focus area for us. An important challenge in RL is to learn to make decisions from few data points, and we’ve improved RL algorithm efficiency through learning from fixed datasets, learning from other agents, and improving exploration.
Overview of our method and illustration of data processing flow in AttentionAgent. Top: Input transformation — A sliding window segments an input image into smaller patches, and then “flattens” them for future processing. Middle: Patch election — The modified self-attention module holds votes between patches to generate a patch importance vector. Bottom: Action generation — AttentionAgent picks the patches of the highest importance, extracts corresponding features and makes decisions based on them.
We shared open source tools for scaling up and productionizing RL. To expand the scope and problems tackled by users, we’ve introduced SEED, a massively parallel RL agent, released a library for measuring the RL algorithm reliability, and a new version of TF-Agents that includes distributed RL, TPU support, and a full set of bandit algorithms. In addition, we performed a large empirical study of RL algorithms to improve hyperparameter selection and algorithm design.
AutoML Using learning algorithms to develop new machine learning techniques and solutions, or meta-learning, is a very active and exciting area of research. In much of our previous work in this area, we’ve created search spaces that look at how to find ways to combine sophisticated hand-designed components together in interesting ways. In AutoML-Zero: Evolving Code that Learns, we took a different approach, by giving an evolutionary algorithm a search space consisting of very primitive operations (like addition, subtraction, variable assignment, and matrix multiplication) in order to see if it was possible to evolve modern ML algorithms from scratch. The presence of useful learning algorithms in this space is incredibly sparse, so it is remarkable that the system was able to progressively evolve more and more sophisticated ML algorithms. As shown in the figure below, the system reinvents many of the most important ML discoveries over the past 30 years, such as linear models, gradient descent, rectified linear units, effective learning rate settings and weight initializations, and gradient normalization.
We also used meta-learning to discover a variety of new efficient architectures for object detection in both still images and videos. Last year’s work on EfficientNet for efficient image classification architectures showed significant accuracy improvements and computational cost reductions for image classification. In follow-on work this year, EfficientDet: Towards Scalable and Efficient Object Detection builds on top of the EfficientNet work to derive new efficient architectures for object detection and localization, showing remarkable improvements in both highest absolute accuracy, as well as computational cost reductions of 13-42x over previous approaches to achieve a given level of accuracy.
EfficientDet achieves state-of-the-art 52.2 mAP, up 1.5 points from the prior state of the art (not shown since it is at 3045B FLOPs) on COCO test-dev under the same setting. Under the same accuracy constraint, EfficientDet models are 4x-9x smaller and use 13x-42x less computation than previous detectors.
This approach can also be used to develop effective model architectures for time series forecasting. Using AutoML for Time Series Forecasting describes the system that discovers new forecasting models via an automated search over a search space involving many interesting kinds of low-level building blocks, and its effectiveness was demonstrated in the Kaggle M5 Forecasting Competition, by generating an algorithm and system that placed 138th out of 5558 participants (top 2.5%). While many of the competitive forecasting models required months of manual effort to create, our AutoML solution found the model in a short time with only a moderate compute cost (500 CPUs for 2 hours) and no human intervention.
As neural networks are made wider and deeper, they often train faster and generalize better. This is a core mystery in deep learning since classical learning theory suggests that large networks should overfit more. We are working to understand neural networks in this overparameterized regime. In the limit of infinite width, neural networks take on a surprisingly simple form, and are described by a Neural Network Gaussian Process (NNGP) or Neural Tangent Kernel (NTK). We studied this phenomenon theoretically and experimentally, and released Neural Tangents, an open-source software library written in JAX that allows researchers to build and train infinite-width neural networks.
Left: A schematic showing how deep neural networks induce simple input / output maps as they become infinitely wide. Right: As the width of a neural network increases, we see that the distribution of outputs over different random instantiations of the network becomes Gaussian.
Lastly, in real-world problems, one often needs to deal with significant label noise. For instance, in large scale learning scenarios, weakly labeled data is available in abundance with large label noise. We have developed new techniques for distilling effective supervision from severe label noise leading to state-of-the-art results. We have further analyzed the effects of training neural networks with random labels, and shown that it leads to alignment between network parameters and input data, enabling faster downstream training than initializing from scratch. We have also explored questions such as whether label smoothing or gradient clipping can mitigate label noise, leading to new insights for developing robust training techniques with noisy labels.
We also continued our work in scalable graph mining and graph-based learning and hosted the Graph Mining & Learning at Scale Workshop at NeurIPS’20, which covered work on scalable graph algorithms including graph clustering, graph embedding, causal inference, and graph neural networks. As part of the workshop, we showed how to solve several fundamental graph problems faster, both in theory and practice, by augmenting standard synchronous computation frameworks like MapReduce with a distributed hash-table similar to a BigTable. Our extensive empirical study validates the practical relevance of the AMPC model inspired by our use of distributed hash tables in massive parallel algorithms for hierarchical clustering and connected components, and our theoretical results show how to solve many of these problems in constant distributed rounds, greatly improving upon our previous results. We also achieved exponential speedup for computing PageRank and random walks. On the graph-based learning side, we presented Grale, our framework for designing graphs for use in machine learning. Furthermore, we presented our work on more scalable graph neural network models, where we show that PageRank can be used to greatly accelerate inference in GNNs.
Machine Perception Perceiving the world around us — understanding, modeling and acting on visual, auditory and multimodal input — continues to be a research area with tremendous potential to be beneficial in our everyday lives.
In 2020, deep learning powered new approaches that bring 3D computer vision and computer graphics closer together. CvxNet, deep implicit functions for 3D shapes, neural voxel rendering and CoReNet are a few examples of this direction. Furthermore, our research on representing scenes as neural radiance fields (aka NeRF, see also this blog post) is a good example of how Google Research's academic collaborations stimulate rapid progress in the area of neural volume rendering.
In Learning to Factorize and Relight a City, a collaboration with UC Berkeley, we proposed a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors. This gives the ability to change lighting effects and scene geometry for any Street View panorama, or even turn it into a full-day timelapse video.
Our work on generative human shape and articulated pose models introduces a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework. Such models enable 3D human pose and shape reconstruction of people from a single photo to better understand the scene.
Samples of encoded and cover images for Distortion Agnostic Deep Watermarking. First row: Cover image with no embedded message. Second row: Encoded image from HiDDeN combined distortion model. Third row: Encoded images from our model. Fourth row: Normalized difference of the encoded image and cover image for the HiDDeN combined model. Fifth row: Normalized difference for our model
Additional important themes in perceptual research included:
We continued to make strides to improve experiences and promote helpfulness on mobile devices through ML-based solutions. Our ability to run sophisticated natural language processing on-device, enabling more natural conversational features, continues to improve. In 2020, we expanded Call Screen and launched Hold for Me to allow users to save time when performing mundane tasks, and we also launched language-based actions and language navigability of our Recorder app to aid productivity.
We have used Google's Duplex technology to make calls to businesses and confirm things like temporary closures. This has enabled us to make 3 million updates to business information globally, that have been seen over 20 billion times on Maps and Search. We also used text to speech technology for easier access to web pages, by enabling Google Assistant to read it aloud, supporting 42 languages.
We also continued to make meaningful improvements to imaging applications. We made it easier to capture precious moments on Pixel with innovative controls and new ways to relight, edit, enhance and relive them again in Google Photos. For the Pixel camera, beginning with Pixel 4 and 4a, we added Live HDR+, which uses machine learning to approximate the vibrance and balanced exposure and appearance of HDR+ burst photography in real time in the viewfinder. We also created dual exposure controls, which allow the brightness of shadows and highlights in a scene to be adjusted independently — live in the viewfinder.
More recently, we introduced Portrait Light, a new post-capture feature for the Pixel Camera and Google Photos apps that adds a simulated directional light source to portraits. This feature is again one that is powered by machine learning, having been trained on 70 different people, photographed one light at a time, in our pretty cool 331-LED Light Stage computational illumination system.
In the past year, Google researchers were excited to contribute to many new (and timely) ways of using Google products. Here are a few examples
Robotics In the area of robotics research, we’ve made tremendous progress in our ability to learn more and more complex, safe and robust robot behaviors with less and less data, using many of the RL techniques described earlier in the post.
Transporter Networks are a novel approach to learning how to represent robotic tasks as spatial displacements. Representing relations between objects and the robot end-effectors, as opposed to absolute positions in the environment, makes learning robust transformations of the workspace very efficient.
In Grounding Language in Play, we demonstrated how a robot can be taught to follow natural language instructions (in many languages!). This required a scalable approach to collecting paired data of natural language instructions and robot behaviors. One key insight is that this can be accomplished by asking robot operators to simply play with the robot, and label after-the-fact what instructions would have led to the robot accomplishing the same task.
One increased emphasis this year has been on safety: how do we deploy safe delivery drones in the real world? How do we explore the world in a way that always allows the robot to recover from its mistakes? How do we certify the stability of learned behaviors? This is a critical area of research on which we expect to see increased focus in the future.
Quantum Computing Our Quantum AI team continued its work to establish practical uses of quantum computing. We ran experimental algorithms on our Sycamore processors to simulate systems relevant to chemistry and physics. These simulations are approaching a scale at which they can not be performed on classical computers anymore, making good on Feynman’s original idea of using quantum computers as an efficient means to simulate systems in which quantum effects are important. We published new quantum algorithms, for instance to perform precise processor calibration, to show an advantage for quantum machine learning or to test quantum enhanced optimization. We also worked on programming models to make it easier to express quantum algorithms. We released qsim, an efficient simulation tool to develop and test quantum algorithms with up to 40 qubits on Google Cloud.
We continued to follow our roadmap towards building a universal error-corrected quantum computer. Our next milestone is the demonstration that quantum error correction can work in practice. To achieve this, we will show that a larger grid of qubits can hold logical information exponentially longer than a smaller grid, even though individual components such as qubits, couplers or I/O devices have imperfections. We are also particularly excited that we now have our own cleanroom which should significantly increase the speed and quality of our processor fabrication.
Finally, this was a huge year for Colab. Usage doubled, and we launched productivity features to help people do their work more efficiently, including improved Drive integration and access to the Colab VM via the terminal. And we launched Colab Pro to enable users to access faster GPUs, longer runtimes and more memory.
Open Datasets and Dataset Search Open datasets with clear and measurable goals are often very helpful in driving forward the field of machine learning. To help the research community find interesting datasets, we continue to index a wide variety of open datasets sourced from many different organizations with Google Dataset Search. We also think it's important to create new datasets for the community to explore and to develop new techniques, while ensuring that we share open data responsibly. This year, in addition to open datasets to help address the COVID crisis, we released a number of open datasets across many different areas:
Google Compute Cluster Trace Data: in 2011, Google published a trace of 29 days of compute activity on one of our compute clusters, which has proven very useful for the computer systems community to explore job scheduling policies, better understand utilization in these clusters, etc. This year we published a larger and more extensive version of this data, covering eight of our compute clusters with much finer-grained information.
Announcing the Objectron Dataset: a collection of 15,000 short, object-centric video clips annotated with 3-D bounding boxes, capturing a larger set of common objects from different angles, as well as 4M annotated images collected from a geo-diverse sample (covering 10 countries across five continents).
Open Images V6 — Now Featuring Localized Narratives: in addition to the 9M images annotated with 36M image-level labels, 15.8M bounding boxes, 2.8M instance segmentations, and 391k visual relationships found in version 5, this new release adds localized narratives, a completely new form of multimodal annotations that consist of synchronized voice, text, and mouse traces over the objects being described. In Open Images V6, these localized narratives are available for 500k of its images. Additionally, in order to facilitate comparison to previous works, we also release localized narratives annotations for the full 123k images of the COCO dataset.
TyDi QA: A Multilingual Question Answering Benchmark discusses a new benchmark for measuring effectiveness of multilingual question answering (since many benchmarks in this area are English-only or otherwise monolingual, and we feel answering questions in any language is important).
Wiki-40B: Multilingual Language Model Dataset is a new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families. With around 40 billion characters, we hope this new resource will accelerate the research of multilingual modeling. We also released high quality trained language models trained on this dataset, enabling easy comparison of different techniques with these baselines.
Meta-Dataset: A Dataset of Datasets for Few-Shot Learning is a dataset of datasets. One of the long-term goals in ML is to build systems that can generalize not from one example to another within the same task, but can generalize even across tasks to solve new problems with little or no training. This meta-dataset can allow us to measure progress towards this ultimate goal.
Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval introduces the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks.
Research Community Interaction We are proud to enthusiastically support and participate in the broader research community. In 2020, Google researchers presented over 500 papers at leading research conferences, additionally serving on program committees, organizing workshops, tutorials and numerous other activities aimed at collectively progressing the state of the art in the field. To learn more about our contributions to some of the larger research conferences this year, please see our blog posts for ICLR 2020, CVPR 2020, ACL 2020, ICML 2020, ECCV 2020 and NeurIPS 2020.
In 2020 we supported external research with $37M in funding, including $8.5M in COVID research, $8M in research inclusion and equity, and $2M in responsible AI research. In February, we announced the 2019 Google Faculty Research Award Recipients, funding research proposals from 150 faculty members throughout the world. Among this group, 27% self-identified as members of historically underrepresented groups within technology. We also announced a new Research Scholar Program to support early-career professors who are pursuing research in fields relevant to Google via unrestricted gifts. As we have for more than a decade, we selected a group of incredibly talented PhD student researchers to receive Google PhD Fellowships, which provides funding for graduate studies, as well as mentorship as they pursue their research, and opportunities to interact with other Google PhD Fellows.
In 2019, Google’s CS Research Mentorship Program (CSRMP) helped provide mentoring to 37 undergraduate students to introduce them to conducting computer science research. Based on the success of the program in 2019/2020, we’re excited to greatly expand this program in 2020/2021 and will have hundreds of Google researchers mentoring hundreds of undergraduate students in order to encourage more people from underrepresented backgrounds to pursue computer science research careers. Finally, in October we provided exploreCSR awards to 50 institutions around the world for the 2020 academic year. These awards fund faculty to host workshops for undergraduates from underrepresented groups in order to encourage them to pursue CS research.
Looking Forward to 2021 and Beyond I’m excited about what’s to come, from our technical work on next-generation AI models, to the very human work of growing our community of researchers.
We’ll keep ensuring our research is done responsibly and has a positive impact, using our AI Principles as a guiding framework and applying particular scrutiny to topics that can have broad societal impact. This post covers just a few of the many papers on responsible AI that Google published in the past year. While pursuing our research, we’ll focus on:
Promoting research integrity: We’ll make sure Google keeps conducting a wide range of research in an appropriate manner, and provides comprehensive, scientific views on a variety of challenging, interesting topics.
Responsible AI development: Tackling tough topics will remain core to our work, and Google will continue creating new ML algorithms to make machine learning more efficient and accessible, developing approaches to combat unfair bias in language models, devising new techniques for ensuring privacy in learning systems, and much more. And importantly, beyond looking at AI development with a suitably critical eye, we’re eager to see what techniques we and others in the community can develop to mitigate risks and make sure new technologies have equitable, positive impacts on society.
Advancing diversity, equity, and inclusion: We care deeply that the people who are building influential products and computing systems better reflect the people using these products all around the world. Our efforts here are both within Google Research, as well as within the wider research and academic communities — we’ll be calling upon the academic and industry partners we work with to advance these efforts together. On a personal level, I am deeply committed to improving representation in computer science, having spent hundreds of hours working towards these goals over the last few years, as well as supporting universities like Berkeley, CMU, Cornell, Georgia Tech, Howard, UW, and numerous other organizations that work to advance inclusiveness. This is important to me, to Google, and to the broader computer science community.
Finally, looking ahead to the year, I’m particularly enthusiastic about the possibilities of building more general-purpose machine learning models that can handle a variety of modalities and that can automatically learn to accomplish new tasks with very few training examples. Advances in this area will empower people with dramatically more capable products, bringing better translation, speech recognition, language understanding and creative tools to billions of people all around the world.This kind of exploration and impact is what keeps us excited about our work!
Acknowledgements Thanks to Martin Abadi, Marc Bellemare, Elie Bursztein, Zhifeng Chen, Ed Chi, Charina Chou, Katherine Chou, Eli Collins, Greg Corrado, Corinna Cortes, Tiffany Deng, Tulsee Doshi, Robin Dua, Kemal El Moujahid, Aleksandra Faust, Orhan Firat, Jen Gennai, Till Hennig, Ben Hutchinson, Alex Ingerman, Tomáš Ižo, Matthew Johnson, Been Kim, Sanjiv Kumar, Yul Kwon, Steve Langdon, James Laudon, Quoc Le, Yossi Matias, Brendan McMahan, Aranyak Mehta, Vahab Mirrokni, Meg Mitchell, Hartmut Neven, Mohammad Norouzi, Timothy Novikoff, Michael Piatek, Florence Poirel, David Salesin, Nithya Sambasivan, Navin Sarma, Tom Small, Jascha Sohl-Dickstein, Zak Stone, Rahul Sukthankar, Mukund Sundararajan, Andreas Terzis, Sergei Vassilvitskii, Vincent Vanhoucke, and Leslie Yeh and others for helpful feedback and for drafting portions of this post, and to the entire Research and Health communities at Google for everyone’s contributions towards this work.
When you’ve got your hands full, so you use your voice to ask your phone to play your favorite song, it can feel like magic. In reality, it’s a more complicated combination of engineering, design and natural language processing at work, making it easier for many of us to use our smartphones. But what happens when this voice technology isn’t available in our own language?
This is something Google India researcher Shachi Dave considers as part of her day-to-day work. While English is the most widely spoken language globally, it ranks third as the most widely spoken native language (behind Mandarin and Spanish)—just ahead of Hindi, Bengali and a number of other languages that are official in India. Home to more than one billion people and an impressive number of official languages—22, to be exact—India is at the cutting edge of Google’s language localization or L10n (10 represents the number of letters between ‘l’ and ‘n’) efforts.
Shachi, who is a founding member of the Google India Research team, works on natural language understanding, a field of artificial intelligence (AI) which builds computer algorithms to understand our everyday speech and language. Working with Google’s AI principles, she aims to ensure teams build our products to be socially beneficial and inclusive. Born and raised in India, Shachi graduated with a master’s degree in computer science from the University of Southern California. After working at a few U.S. startups, she joined Google over 12 years ago and returned to India to take on more research and leadership responsibilities. Since she joined the company, she has worked closely with teams in Mountain View, New York, Zurich and Tel Aviv. She also actively contributes towards improving diversity and inclusion at Google through mentoring fellow female software engineers.
How would you explain your job to someone who isn't in tech?
My job is to make sure computers can understand and interact with humans naturally, a field of computer science we call natural language processing (NLP). Our research has found that many Indian users tend to use a mix of English and their native language when interacting with our technology, so that’s why understanding natural language is so important—it’s key to localization, our efforts to provide our services in every language and culture—while making sure our technology is fun to use and natural-sounding along the way.
What are some of the biggest challenges you’re tackling in your work now?
The biggest challenge is that India is a multilingual country, with 22 official languages. I have seen friends, family and even strangers struggle with technology that doesn’t work for them in their language, even though it can work so well in other languages.
Let’s say one of our users is a shop owner and lives in a small village in the southern Indian state of Telangana. She goes online for the first time with her phone. But since she has never used a computer or smartphone before, using her voice is the most natural way for her to interact with her phone. While she knows some English, she is also more comfortable speaking in her native language, Telugu. Our job is to make sure that she has a positive experience and does not have to struggle to get the information she needs. Perhaps she’s able to order more goods for her shop through the web, or maybe she decides to have her services listed online to grow her business.
So that’s part of my motivation to do my research, and that’s one of Google’s AI Principles, too—to make sure our technology is socially beneficial.
Speaking of the AI Principles, what other principles help inform your research?
Another one of Google’s AI Principles is avoiding creating or reinforcing unfair bias. AI systems are good at recognizing patterns within data. Given that most data that we feed into training an AI system is generated by humans, it tends to have human biases and prejudices. I look for systematic ways to remove these biases. This requires constant awareness: being aware of how people have different languages, backgrounds and financial statuses. Our society has people from the entire financial spectrum, from super rich to low-income, so what works on the most expensive phones might not work on lower-cost devices. Also, some of our users might not be able to read or write, so we need to provide some audio and visual tools for them to have a better internet experience.
What led you to this career and inspired you to join Google?
I took an Introduction to Artificial Intelligence course as an undergraduate, and it piqued my interest and curiosity. That ultimately led to research on machine translation at the Indian Institute of Technology Bombay and then an advanced degree at the University of Southern California. After that, I spent some time working at U.S. startups that were using NLP and machine learning.
But I wanted more. I wanted to be intellectually challenged, solving hard problems. Since Google had the computing power and reputation for solving problems at scale, it became one of my top choices for places to work.
Now you’ve been at Google for over 12 years. What are some of the most rewarding moments of your career?
Definitely when I saw the quality improvements I worked on go live on Google Search and Assistant, positively impacting millions of people. I remember I was able to help launch local features like getting the Assistant to play the songs people wanted to hear. Playing music upon request makes people happy, and it’s a feature that still works today.
Over the years, I have gone through difficult situations as someone from an underrepresented group. I was fortunate to have a great support network—women peers as well as allies—who helped me. I try to pay it forward by being a mentor for underrepresented groups both within and outside Google.
How should aspiring AI researchers prepare for a career in this field?
First, be a lifelong learner: The industry is moving at a fast pace. It’s important to carve out time to keep yourself well-read about the latest research in your field as well as related fields.
Second, know your motivation: When a problem is super challenging and super hard, you need to have that focus and belief that what you’re doing is going to contribute positively to our society.