Tag Archives: Indian languages

Progress from a year of AI for Social Good at Google Research India

Almost a year and a half ago, we announced Google Research India, an AI Lab in Bangalore. Along with advancing fundamental research in AI, we sought  to support nonprofits and universities to solve big challenges in the field of Public Health, Conservation, Agriculture and Education using AI. 

In 2020, we announced AI for Social Good would be supporting six projects from NGOs and Academic collaborations to utilize the application of AI to assist underserved communities that have not traditionally benefited from the prowess of AI. Google provided scientific and technical contributions for each project, as well as  funding from Google Research and Google.org. 

Today, we are pleased to provide an update on some of these projects, and highlight successes and challenges in AI for Social Good. 

Maternal Healthcare

India accounts for 11 percent of global maternal mortality, and a woman in India dies in childbirth every fifteen minutes. However, almost 90 percent of maternal deaths are avoidable if women receive timely intervention. Access to timely, accurate health information is a significant challenge among women in rural areas and urban slums. ARMMAN runs mMitra, a free mobile voice call service that sends timely and targeted preventive care information to expectant and new mothers. Adherence to such public health programs is a big challenge but timely intervention to retain people is beneficial to improve maternal health outcomes. Researchers from Google Research and IIT Madras worked with ARMMAN to design an AI technology that could provide an indication of women who were at risk of dropping out from the health information program. The early targeted identification helps ARMMAN to personalise interventions and retain these people, improving maternal health outcomes. Test results demonstrated that use of AI technology was able to bring down the risk of drop-offs by up to 32% for women at high risk of dropping out. The team is currently working towards scaling this to 300,000+ women in mMitra and we are excited to continue to support ARMMAN as the project team increases the reach of this technology to 1M+ mothers and children in 2021. To support ARMMAN’s growing efforts, Google.org is committing another USD $530,000 to ARMMAN to scale the use of AI for social good to reach underserved women and children. 

The importance of targeted interventions to improve health outcomes cannot be overstated. AI can help play a critical role in its advancement, however the lack of availability of high-quality public health data is a significant challenge. Frequently, data collection is enabled through the labour and expertise of frontline health workers and yet Khushibaby discovered various challenges in the field that inhibited the collection of the high-quality data required. Researchers from Singapore Management University and Google Research collaborated with Khushibaby to develop AI algorithms with over 90 percent accuracy that provided timely predictions about the drop in health workers’ data quality. These timely predictions help Khushibaby provide assistance to the health worker to enable them to record high-quality data. The project team is currently planning to deploy and safely test this technology with 250+ healthcare workers who serve over 15,000 people. 

Wildlife Conservation

India is home to some of the most biodiverse regions, where human settlements and wildlife co-exist in forests. However, interactions between local communities and wildlife can result in conflicts, leading to loss of crops, cattle, and even human life. Wildlife Conservation Trust needed help to proactively predict human-wildlife conflict to enable them to take timely steps to protect local communities, wildlife, and the forest. With technical and scientific contributions from Google Research and Singapore Management University, Wildlife Conservation Trust designed AI models that help predict human-wildlife conflict in Bramhapuri Forest Division in Tadoba, Maharashtra. These novel AI techniques provide over 80 percent accuracy in predicting human-wildlife conflict in the Bramhapuri Forest Division in the test results. This work is currently being field-tested in Chandrapur district, Madhya Pradesh, to ensure safe deployment. 

Local Language Adoption

Six out of ten children globally do not achieve minimum proficiency levels in reading, despite attending school. Lack of access to reading content in one’s local language is a significant challenge in addressing this problem. Storyweaver, an open-licence driven organization, works towards bridging that gap by developing and curating story books in a multitude of local languages to help children learn new concepts, new ideas and open up their imagination.  Storyweaver needed help to enable access to creation tools in low-resource languages. Creation tools in low-resource languages suffer from very low accuracy, adding barriers to content creation. The team at AI4Bharat & IIT Madras, with support from Google, developed state-of-the-art Natural Language Understanding tools to develop open-language models for two low-resource languages (Konkani, Maithal), making story reading easier for 70,000+ children. 

We are humbled to see the progress in the development and deployment of AI technologies for social good in a short period of time. We are confident in our development and support of a collaborative model that involves experts from Academia and NGOs, as well as contributions from Google, to advance AI for social good. Continuing our scientific, technical, and financial support of organizations working in this space, we are excited to announce an expanded follow-up program to initiate collaborative AI for Social Good projects in Asia Pacific and Sub-Saharan Africa. 

We recognize that AI is not a magic wand to solve all the world’s challenges, it is however a powerful tool to help experts and social-impact organisations to explore and address hard, unanswered questions. 

Posted by Milind Tambe, Director of AI for Social Good, Google Research India, and Manish Gupta, Director, Google Research India


Improving Indian Language Transliterations in Google Maps

Nearly 75% of India’s population — which possesses the second highest number of internet users in the world — interacts with the web primarily using Indian languages, rather than English. Over the next five years, that number is expected to rise to 90%. In order to make Google Maps as accessible as possible to the next billion users, it must allow people to use it in their preferred language, enabling them to explore anywhere in the world.

However, the names of most Indian places of interest (POIs) in Google Maps are not generally available in the native scripts of the languages of India. These names are often in English and may be combined with acronyms based on the Latin script, as well as Indian language words and names. Addressing such mixed-language representations requires a transliteration system that maps characters from one script to another, based on the source and target languages, while accounting for the phonetic properties of the words as well.

For example, consider a user in Ahmedabad, Gujarat, who is looking for a nearby hospital, KD Hospital. They issue the search query, કેડી હોસ્પિટલ, in the native script of Gujarati, the 6th most widely spoken language in India. Here, કેડી (“kay-dee”) is the sounding out of the acronym KD, and હોસ્પિટલ is “hospital”. In this search, Google Maps knows to look for hospitals, but it doesn't understand that કેડી is KD, hence it finds another hospital, CIMS. As a consequence of the relative sparsity of names available in the Gujarati script for places of interest (POIs) in India, instead of their desired result, the user is shown a result that is further away.

To address this challenge, we have built an ensemble of learned models to transliterate names of Latin script POIs into 10 languages prominent in India: Hindi, Bangla, Marathi, Telugu, Tamil, Gujarati, Kannada, Malayalam, Punjabi, and Odia. Using this ensemble, we have added names in these languages to millions of POIs in India, increasing the coverage nearly twenty-fold in some languages. This will immediately benefit millions of existing Indian users who don't speak English, enabling them to find doctors, hospitals, grocery stores, banks, bus stops, train stations and other essential services in their own language.

Transliteration vs. Transcription vs. Translation

Our goal was to design a system that will transliterate from a reference Latin script name into the scripts and orthographies native to the above-mentioned languages. For example, the Devanagari script is the native script for both Hindi and Marathi (the language native to Nagpur, Maharashtra). Transliterating the Latin script names for NIT Garden and Chandramani Garden, both POIs in Nagpur, results in एनआईटी गार्डन and चंद्रमणी गार्डन, respectively, depending on the specific language’s orthography in that script.

It is important to note that the transliterated POI names are not translations. Transliteration is only concerned with writing the same words in a different script, much like an English language newspaper might choose to write the name Горбачёв from the Cyrillic script as “Gorbachev” for their readers who do not read the Cyrillic script. For example, the second word in both of the transliterated POI names above is still pronounced “garden”, and the second word of the Gujarati example earlier is still “hospital” — they remain the English words “garden” and “hospital”, just written in the other script. Indeed, common English words are frequently used in POI names in India, even when written in the native script. How the name is written in these scripts is largely driven by its pronunciation; so एनआईटी from the acronym NIT is pronounced “en-aye-tee”, not as the English word “nit”. Knowing that NIT is a common acronym from the region is one piece of evidence that can be used when deriving the correct transliteration.

Note also that, while we use the term transliteration, following convention in the NLP community for mapping directly between writing systems, romanization in South Asian languages regardless of the script is generally pronunciation-driven, and hence one could call these methods transcription rather than transliteration. The task remains, however, mapping between scripts, since pronunciation is only relatively coarsely captured in the Latin script for these languages, and there remain many script-specific correspondences that must be accounted for. This, coupled with the lack of standard spelling in the Latin script and the resulting variability, is what makes the task challenging.

Transliteration Ensemble

We use an ensemble of models to automatically transliterate from the reference Latin script name (such as NIT Garden or Chandramani Garden) into the scripts and orthographies native to the above-mentioned languages. Candidate transliterations are derived from a pair of sequence-to-sequence (seq2seq) models. One is a finite-state model for general text transliteration, trained in a manner similar to models used by Gboard on-device for transliteration keyboards. The other is a neural long short-term memory (LSTM) model trained, in part, on the publicly released Dakshina dataset. This dataset contains Latin and native script data drawn from Wikipedia in 12 South Asian languages, including all but one of the languages mentioned above, and permits training and evaluation of various transliteration methods. Because the two models have such different characteristics, together they produce a greater variety of transliteration candidates.

To deal with the tricky phenomena of acronyms (such as the “NIT” and “KD” examples above), we developed a specialized transliteration module that generates additional candidate transliterations for these cases.

For each native language script, the ensemble makes use of specialized romanization dictionaries of varying provenance that are tailored for place names, proper names, or common words. Examples of such romanization dictionaries are found in the Dakshina dataset.

Scoring in the Ensemble

The ensemble combines scores for the possible transliterations in a weighted mixture, the parameters of which are tuned specifically for POI name accuracy using small targeted development sets for such names.

For each native script token in candidate transliterations, the ensemble also weights the result according to its frequency in a very large sample of on-line text. Additional candidate scoring is based on a deterministic romanization approach derived from the ISO 15919 romanization standard, which maps each native script token to a unique Latin script string. This string allows the ensemble to track certain key correspondences when compared to the original Latin script token being transliterated, even though the ISO-derived mapping itself does not always perfectly correspond to how the given native script word is typically written in the Latin script.

In aggregate, these many moving parts provide substantially higher quality transliterations than possible for any of the individual methods alone.

Coverage

The following table provides the per-language quality and coverage improvements due to the ensemble over existing automatic transliterations of POI names. The coverage improvement measures the increase in items for which an automatic transliteration has been made available. Quality improvement measures the ratio of updated transliterations that were judged to be improvements versus those that were judged to be inferior to existing automatic transliterations.

Language 

Coverage Improvement

Quality Improvement

Hindi

3.2x

1.8x

Bengali

19x

3.3x

Marathi

19x

2.9x

Telugu

3.9x

2.6x

Tamil

19x

3.6x

Gujarati

19x

2.5x

Kannada

24x

2.3x

Malayalam

24x

1.7x

Odia

960x

*

Punjabi

24x

*


* Unknown / No Baseline.


Conclusion

As with any machine learned system, the resulting automatic transliterations may contain a few errors or infelicities, but the large increase in coverage in these widely spoken languages marks a substantial expansion of the accessibility of information within Google Maps in India. Future work will include using the ensemble for transliteration of other classes of entities within Maps and its extension to other languages and scripts, including Perso-Arabic scripts, which are also commonly used in the region.


Acknowledgments: This work was a collaboration between the authors and Jacob Farner, Jonathan Herbert, Anna Katanova, Andre Lebedev, Chris Miles, Brian Roark, Anurag Sharma, Kevin Wang, Andy Wildenberg, and many others.

Posted by Cibu Johny, Software Engineer, Google Research and Saumya Dalal, Product Manager, Google Geo


Supporting India’s startups to accelerate the country’s digital transformation

Over the last few years, improved connectivity and more affordable data have paved the way for India’s startup ecosystem to scale and solve for the needs of the country’s growing number of internet users. And now, in a matter of a few months, the pandemic has not only accelerated internet adoption, it has also expanded how people use the internet to get things done in their daily lives. All over the country, people are embracing new ways of doing things like virtual learning, making online payments and buying groceries online. 


In the last two years alone, 100 million new internet users have come online from rural India. Data shows that rural consumption now accounts for roughly 45 percent of overall mobile data usage in the country, and is primarily focused on online video. But many of these internet users continue to have trouble finding content to read or services they can use confidently, in their own language. And this significantly limits the value of the internet for them, particularly at a time like this when the internet is the lifeline of so many people.  


Teams at Google have been working over the years to solve this challenge in a number of ways. We’ve built new products and features that enable people to create, consume and communicate more effortlessly across more Indic languages, and through that, better serve not just the needs of over a billion people in India, but many more people around the world. 


And we’re also eager to support the wider ecosystem in India, particularly local startups innovating in this space. When we shared details of the India Digitization Fund in July this year, we identified enabling affordable access and information for every Indian in their own language, whether it’s Hindi, Tamil, Malayalam, Gujarati, and more as a key pillar in order to drive forward India’s digitization. 


This is why we’re pleased to announce investments in leading Indian startups Glance Inmobi and VerSe Innovation, enabling them to further scale the availability of relevant and engaging content in different formats across various Indic languages. Glance Inmobi delivers visual, immersive and localized content experiences across products like Glance and Roposo, while VerSe Innovation serves vernacular content in 14 languages through platforms like the Dailyhunt and Josh apps.  


These investments underline our strong belief in partnering deeply with India’s innovative startups, and our commitment to working towards the shared goal of building a truly inclusive digital economy that will benefit everyone. 


Posted by Caesar Sengupta, VP, Google

Use Google to read and translate text—now on KaiOS

Google’s philosophy has always been to build for everyone -- to break down language barriers, make knowledge accessible, and enable people to communicate how they want and what they want, effortlessly. In India, our rich diversity of languages presents an exciting challenge especially in the context of millions of new users coming online every day. Nine out of ten of these new users are non-English speakers. While many would be fluent at speaking and understanding their native language, there are others who might struggle when it comes to reading and writing it.


Google Assistant has made it easy for users in India to find answers and get things done on their devices using their voice. Since its launch at Google for India in 2017, we’ve worked hard to bring more helpful features like integrated voice typing on KaiOS, voice-based language selection, and support for Indian languages to help first-time internet users overcome barriers to literacy and interact with technology and their devices more naturally. 


At Google I/O in 2019, we brought camera-based translation to Google Lens to help you understand information you find in the real world. With Lens, you can point your camera at text you see and translate it into more than 100 languages. Lens can even speak the words out loud in your preferred language. We brought these Lens capabilities to Google Go, too, so even those on the most affordable smartphones can access them.




Today we are extending this capability to the millions of Google Assistant users on KaiOS devices in India. From Assistant, they can click the camera icon to simply point their phone at real-world text (like a product label, street sign, or document, for example,) and have it read back in their preferred language, translated, or defined. Just long press the center button from the home screen to get started with Assistant.

Within Google Assistant, KaiOS users can now use Google Lens  to read, translate and define words in the real word


It is currently available for English and several Indian languages including Hindi, Bengali, Telugu, Marathi and Tamil, and will soon be available in Kannada and Gujarati. Users can simply press the right soft key once within Assistant to access and use this feature.


This is another step in our commitment to make language more accessible to everyone, and we hope this will enable millions of KaiOS users across the country to have a more seamless language experience.

Posted by Shriya Raghunathan, Product Manager Google Assistant, and Harsh Kharbanda, Product Manager Google Lens

Use Google to read and translate text—now on KaiOS

Google’s philosophy has always been to build for everyone -- to break down language barriers, make knowledge accessible, and enable people to communicate how they want and what they want, effortlessly. In India, our rich diversity of languages presents an exciting challenge especially in the context of millions of new users coming online every day. Nine out of ten of these new users are non-English speakers. While many would be fluent at speaking and understanding their native language, there are others who might struggle when it comes to reading and writing it.


Google Assistant has made it easy for users in India to find answers and get things done on their devices using their voice. Since its launch at Google for India in 2017, we’ve worked hard to bring more helpful features like integrated voice typing on KaiOS, voice-based language selection, and support for Indian languages to help first-time internet users overcome barriers to literacy and interact with technology and their devices more naturally. 


At Google I/O in 2019, we brought camera-based translation to Google Lens to help you understand information you find in the real world. With Lens, you can point your camera at text you see and translate it into more than 100 languages. Lens can even speak the words out loud in your preferred language. We brought these Lens capabilities to Google Go, too, so even those on the most affordable smartphones can access them.




Today we are extending this capability to the millions of Google Assistant users on KaiOS devices in India. From Assistant, they can click the camera icon to simply point their phone at real-world text (like a product label, street sign, or document, for example,) and have it read back in their preferred language, translated, or defined. Just long press the center button from the home screen to get started with Assistant.

Within Google Assistant, KaiOS users can now use Google Lens  to read, translate and define words in the real word


It is currently available for English and several Indian languages including Hindi, Bengali, Telugu, Marathi and Tamil, and will soon be available in Kannada and Gujarati. Users can simply press the right soft key once within Assistant to access and use this feature.


This is another step in our commitment to make language more accessible to everyone, and we hope this will enable millions of KaiOS users across the country to have a more seamless language experience.

Posted by Shriya Raghunathan, Product Manager Google Assistant, and Harsh Kharbanda, Product Manager Google Lens

Shopping on Google brings new features to connect more users to retailers

https://lh3.googleusercontent.com/ESxE2TPcnaVUmmvFIIEI2cOkZCZX3GMP5EFAkI-L1UBi3FyyXvyj499d1rdmAOBB2d8R2bCMP7FVQcXqZlcudkSAL_a1fReM32Emy7go42CHw49EYjoAWZjmnubNXeVNUrJHiWmW
Last year we announced a new shopping experience on Search -- a frictionless way for Indian shoppers to discover new products on Google. We also enabled online merchants to list their products for free in the Merchant Center, along with auto feeds that made the process quick and easy.  

We’ve witnessed incredible momentum since then. Indian shoppers engage with this shopping experience more often, and for longer periods of time compared to other markets, and there are now over 200 million offers available on Google Shopping. Not only that, share of clicks on listings that direct to small and medium business websites have increased by 30 percent! We’re committed to helping small and medium sized businesses succeed in India and are excited to announce new tools to help them connect with shoppers, online and off.



Getting local stores online with Google My Business

Although online shopping in India continues to grow in popularity, about 96 percent of shopping still happens offline. Soon, any local retailer will be able to create an online store through Google My Business and connect with the millions of shoppers searching for their products online. When they post photos of their in-store products, they will automatically be surfaced as product listings on Search and in the Google Shopping tab. We’re excited to welcome the 20,000 local businesses that are already on Google My Business in India into the Shopping experience when it launches early next year. 

Shopping in Indian languages

The rate at which Indian language users are coming online cannot be overstated -- 9 out of every 10 new users coming online are Indian language users (largely using their mobile phones, from tier 2 and 3 cities.) And these users are searching online more than ever: at Google For India earlier this year, we announced that 20 percent of Search queries in India are in Hindi.

So we are glad to share that we are extending the power of Google Translate to the Shopping tab as well as the Shopping home page for Indian languages.



Hindi
Telugu
Gujarati

Over the next two to three years, approximately 500 million non-English speaking users will be online in India, and we hope that this step will enable them to more easily find products in their own language. And on the merchant side, it requires no extra effort -- the products that will be showcased to online shoppers in India will seamlessly be displayed in their preferred Indian language. This feature will also be available to shoppers in India early next year.

As we shared when we launched Shopping on Search in India last year, our endeavor is to enable India’s small and medium retailers to grow and thrive, and to open a world of new online experiences for Indian shoppers. With the integration of Google My Business and Google Translate, we are excited to bring the full power of Google to Shopping. 
Posted by Surojit Chatterjee, Vice President - Product Management, Google Shopping

Announcing the Google Search Conference 2018

https://lh3.googleusercontent.com/-1ii-cICCLcM/Ww-029GPuVI/AAAAAAAAQw4/mAmkFqDlshAbOL88BaSULV0y9phWo0XeQCL0BGAYYCw/h988/2018-05-31.jpg

Regional language Internet usage is growing at an unprecedented pace in India and generating great demand for local language content. While many publishers have focused on Hindi, there is now growing demand for other regional languages. We are delighted to share that we are bringing back the Google Search conference with an objective to help Indian language publishers make the most of the opportunity and bridge the Indian language content gap that exists on the web today.


We will be hosting conferences across 11 cities in India to help local language publishers and webmasters better understand how they can make their content easily accessible to the growing number of Internet users across India. In addition to Hindi, this year we will cover four other Indian languages such as Tamil, Telugu, Marathi and Bengali.


Conference cities and dates -


  • Gurgaon on June 20 (Wednesday)
  • Pune on June 22 (Friday)
  • Indore on July 2 (Monday)
  • Patna on July 4 (Wednesday)
  • Lucknow on July 6 (Friday)
  • Hyderabad on July 16 (Monday)
  • Visakhapatnam on July 18 (Wednesday)
  • Kolkata on July 20 (Friday)
  • Coimbatore on July 30 (Monday)
  • Chennai on Aug 1 (Wednesday)
  • Bengaluru on Aug 3 (Friday) - this event is focused on women webmasters


At each day-long conference, members of the Google Webmaster Outreach team will talk about a host of topics including ‘How Search Works’, ‘Tips for better visibility of Indian language websites in Google Search’, ‘Best practices for mobile-friendly websites’, ‘Google’s Search quality guidelines’ and many other important topics. We are also including a  session from the Google AdSense team to elucidate their policies and share how to avoid mistakes while running AdSense on your sites.


Please note that this is an invite-only event and filling out the form does not guarantee your spot at the conference. Please register your interest here! (The registration for the aforementioned cities closes 10 days before the listed date). Once selected, you will receive an invitation email from us confirming the venue details.



Posted by Vandana Bharvani, Director, Research & Outreach - Trust & Safety