The Stable channel is being updated to 96.0.4664.77 (Platform version: 14268.51.0) for most Chrome OS devices.
Google Chrome OS
The Stable channel is being updated to 96.0.4664.77 (Platform version: 14268.51.0) for most Chrome OS devices.
Google Chrome OS
For many concepts, there is no direct one-to-one translation from one language to another, and even when there is, such translations often carry different associations and connotations that are easily lost for a non-native speaker. In such cases, however, the meaning may be more obvious when grounded in visual examples. Take, for instance, the word "wedding". In English, one often associates a bride in a white dress and a groom in a tuxedo, but when translated into Hindi (शादी), a more appropriate association may be a bride wearing vibrant colors and a groom wearing a sherwani. What each person associates with the word may vary considerably, but if they are shown an image of the intended concept, the meaning becomes more clear.
|The word “wedding” in English and Hindi conveys different mental images. Images are taken from wikipedia, credited to Psoni2402 (left) and David McCandless (right) with CC BY-SA 4.0 license.|
With current advances in neural machine translation and image recognition, it is possible to reduce this sort of ambiguity in translation by presenting a text paired with a supporting image. Prior research has made much progress in learning image–text joint representations for high-resource languages, such as English. These representation models strive to encode the image and text into vectors in a shared embedding space, such that the image and the text describing it are close to each other in that space. For example, ALIGN and CLIP have shown that training a dual-encoder model (i.e., one trained with two separate encoders) on image–text pairs using a contrastive learning loss works remarkably well when provided with ample training data.
Unfortunately, such image–text pair data does not exist at the same scale for the majority of languages. In fact, more than 90% of this type of web data belongs to the top-10 highly-resourced languages, such as English and Chinese, with much less data for under-resourced languages. To overcome this issue, one could either try to manually collect image–text pair data for under-resourced languages, which would be prohibitively difficult due to the scale of the undertaking, or one could seek to leverage pre-existing datasets (e.g., translation pairs) that could inform the necessary learned representations for multiple languages.
In “MURAL: Multimodal, Multitask Representations Across Languages”, presented at Findings of EMNLP 2021, we describe a representation model for image–text matching that uses multitask learning applied to image–text pairs in combination with translation pairs covering 100+ languages. This technology could allow users to express words that may not have a direct translation into a target language using images instead. For example, the word “valiha”, refers to a type of tube zither played by the Malagasy people, which lacks a direct translation into most languages, but could be easily described using images. Empirically, MURAL shows consistent improvements over state-of-the-art models, other benchmarks, and competitive baselines across the board. Moreover, MURAL does remarkably well for the majority of the under-resourced languages on which it was tested. Additionally, we discover interesting linguistic correlations learned by MURAL representations.
The MURAL architecture is based on the structure of ALIGN, but employed in a multitask fashion. Whereas ALIGN uses a dual-encoder architecture to draw together representations of images and associated text descriptions, MURAL employs the dual-encoder structure for the same purpose while also extending it across languages by incorporating translation pairs. The dataset of image–text pairs is the same as that used for ALIGN, and the translation pairs are those used for LaBSE.
MURAL solves two contrastive learning tasks: 1) image–text matching and 2) text–text (bitext) matching, with both tasks sharing the text encoder module. The model learns associations between images and text from the image–text data, and learns the representations of hundreds of diverse languages from the translation pairs. The idea is that a shared encoder will transfer the image–text association learned from high-resource languages to under-resourced languages. We find that the best model employs an EfficientNet-B7 image encoder and a BERT-large text encoder, both trained from scratch. The learned representation can be used for downstream visual and vision-language tasks.
|The architecture of MURAL depicts dual encoders with a shared text-encoder between the two tasks trained using a contrastive learning loss.|
Multilingual Image-to-Text and Text-to-Image Retrieval
To demonstrate MURAL’s capabilities, we choose the task of cross-modal retrieval (i.e., retrieving relevant images given a text and vice versa) and report the scores on various academic image–text datasets covering well-resourced languages, such as MS-COCO (and its Japanese variant, STAIR), Flickr30K (in English) and Multi30K (extended to German, French, Czech), XTD (test-only set with seven well-resourced languages: Italian, Spanish, Russian, Chinese, Polish, Turkish, and Korean). In addition to well-resourced languages, we also evaluate MURAL on the recently published Wikipedia Image–Text (WIT) dataset, which covers 108 languages, with a broad range of both well-resourced (English, French, Chinese, etc.) and under-resourced (Swahili, Hindi, etc.) languages.
MURAL consistently outperforms prior state-of-the-art models, including M3P, UC2, and ALIGN, in both zero-shot and fine-tuned settings evaluated on well-resourced and under-resourced languages. We see remarkable performance gains for under-resourced languages when compared to the state-of-the-art model, ALIGN.
|Mean recall on various multilingual image–text retrieval benchmarks. Mean recall is a common metric used to evaluate cross-modal retrieval performance on image–text datasets (higher is better). It measures the [email protected] (i.e., the chance that the ground truth image appears in the first N retrieved images) averaged over six measurements: Image→Text and Text→Image retrieval for N=[1, 5, 10]. Note that XTD scores report [email protected] for Text→Image retrieval.|
We also analyzed zero-shot retrieved examples on the WIT dataset comparing ALIGN and MURAL for English (en) and Hindi (hi). For under-resourced languages like Hindi, MURAL shows improved retrieval performance compared to ALIGN that reflects a better grasp of the text semantics.
|Comparison of the top-5 images retrieved by ALIGN and by MURAL for the Text→Image retrieval task on the WIT dataset for the Hindi text, एक तश्तरी पर बिना मसाले या सब्ज़ी के रखी हुई सादी स्पगॅत्ती”, which translates to the English, “A bowl containing plain noodles without any spices or vegetables”.|
Even for Image→Text retrieval in a well-resourced language, like French, MURAL shows better understanding for some words. For example, MURAL returns better results for the query “cadran solaire” (“sundial”, in French) than ALIGN, which doesn’t retrieve any text describing sundials (below).
|Comparison of the top-5 text results from ALIGN and from MURAL on the Image→Text retrieval task for the same image of a sundial.|
Previously, researchers have shown that visualizing model embeddings can reveal interesting connections among languages — for instance, representations learned by a neural machine translation (NMT) model have been shown to form clusters based on their membership to a language family. We perform a similar visualization for a subset of languages belonging to the Germanic, Romance, Slavic, Uralic, Finnic, Celtic, and Finno-Ugric language families (widely spoken in Europe and Western Asia). We compare MURAL’s text embeddings with LaBSE’s, which is a text-only encoder.
A plot of LabSE’s embeddings shows distinct clusters of languages influenced by language families. For instance, Romance languages (in purple, below) fall into a different region than Slavic languages (in brown, below). This finding is consistent with prior work that investigates intermediate representations learned by a NMT system.
|Visualization of text representations of LaBSE for 35 languages. Languages are color coded based on their genealogical association. Representative languages include: Germanic (red) — German, English, Dutch; Uralic (orange) — Finnish, Estonian; Slavic (brown) — Polish, Russian; Romance (purple) — Italian, Portuguese, Spanish; Gaelic (blue) — Welsh, Irish.|
In contrast to LaBSE’s visualization, MURAL’s embeddings, which are learned with a multimodal objective, shows some clusters that are in line with areal linguistics (where elements are shared by languages or dialects in a geographic area) and contact linguistics (where languages or dialects interact and influence each other). Notably, in the MURAL embedding space, Romanian (ro) is closer to the Slavic languages like Bulgarian (bg) and Macedonian (mk), which is in line with the Balkan sprachbund, than it is in LaBSE. Another possible language contact brings Finnic languages, Estonian (et) and Finnish (fi), closer to the Slavic languages cluster. The fact that MURAL pivots on images as well as translations appears to add an additional view on language relatedness as learned in deep representations, beyond the language family clustering observed in a text-only setting.
|Visualization of text representations of MURAL for 35 languages. Color coding is the same as the figure above.|
Our findings show that training jointly using translation pairs helps overcome the scarcity of image–text pairs for many under-resourced languages and improves cross-modal performance. Additionally, it is interesting to observe hints of areal linguistics and contact linguistics in the text representations learned by using a multimodal model. This warrants more probing into different connections learned implicitly by multimodal models, such as MURAL. Finally, we hope this work promotes further research in the multimodal, multilingual space where models learn representations of and connections between languages (expressed via images and text), beyond well-resourced languages.
This research is in collaboration with Mandy Guo, Krishna Srinivasan, Ting Chen, Sneha Kudugunta, Chao Jia, and Jason Baldridge. We thank Zarana Parekh, Orhan Firat, Yuqing Chen, Apu Shah, Anosh Raj, Daphne Luong, and others who provided feedback for the project. We are also grateful for general support from Google Research teams.
This summer, we shared an update about how we’re continuing to improve video calling on Chromebooks, thanks to performance improvements across Google Meet, Zoom and more. And the camera on your Chromebook is good for more than just video chatting. Hundreds of millions of images and videos have been captured using the Chromebook Camera app so far this year.
Today, we’re sharing a few features that make your Chromebook’s camera even more useful.
Have you ever wanted to use your Chromebook to share a physical document or image, but weren’t sure how without the help of a scanner? You can now use your Chromebook’s built-in camera to scan any document and turn it into a PDF or JPEG file. If your Chromebook comes with a front and back facing camera, you can use either of these to scan.
Open the Camera app and select “Scan” mode. When you hold out the document you want to scan in front of the camera, the edges will be automatically detected. Once it’s done, it’s easy to share through Gmail, to social media or to nearby Android phones or Chromebooks using Nearby Share.
You can now scan files using your Chromebook’s built-in camera.
If you use an external camera with your Chromebook, you can use the Pan-Tilt-Zoom feature to have more control over what your camera captures. You can now crop and angle your camera view exactly how you want it. Whether you want to show your furry friend napping in the background or just want to zoom in on yourself, your Chromebook’s got you covered.
With your external camera plugged in and configured, open the Camera app to adjust the angle you want to capture. Your selections will automatically save so when you jump from a Google Meet work call to making a video with your new puppy, your camera angle preferences will stay the same.
With Pan-Tilt-Zoom you can adjust your camera angle to capture only what you want.
In addition to taking pictures or scanning documents with your Chromebook’s camera, here are a few other features to test out:
Starting early next year, you’ll be able to create GIFs on the Camera app. Just record a five-second video dancing around with friends, hugging your loved ones, or playing with your favorite pet, and it will automatically turn into a shareable GIF.
If you’re interested in getting a sneak peak and providing feedback on Chromebook features before they launch, join our Chrome OS Beta Community. Sign-up here to be a Chrome OS Beta Tester Product Expert. Currently in Beta is a feature that integrates the Camera app with the Google Assistant. Just say “take a photo,” “record video” or “take a selfie” – you can even use Google Assistant to open the Camera app, so you don’t have to lift a finger.
We’ll be back in the new year to share more new Chromebook features.
The Dev channel has been updated to 98.0.4736.0 for Linux, Windows and Mac coming soon
A partial list of changes is available in the log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.
Images can be an integral part of many people’s online experiences. We rely on them to help bring news stories to life, see what our family and friends are up to, or help us decide which couch to buy. However, for 338 million people who are blind or have moderate to severe vision impairment, knowing what's in a web image that isn’t properly labeled can be a challenge. Screen reader technology relies on the efforts of content creators and developers who manually label images in order to make them accessible through spoken feedback or braille. Yet, billions of web images remain unlabelled, rendering them inaccessible for these users.
To help close this gap, the Chrome Accessibility and Google Research teams collaborated on developing a feature that automatically describes unlabelled images using AI. This feature was first released in 2019 supporting English only and was subsequently extended to five new languages in 2020 – French, German, Hindi, Italian and Spanish.
Today, we are expanding this feature to support ten additional languages: Croatian, Czech, Dutch, Finnish, Indonesian, Norwegian, Portuguese, Russian, Swedish and Turkish.
The major innovation behind this launch is the development of a single machine learning model that generates descriptions in each of the supported languages. This enables a more equitable user experience across languages in the sense that the generated image descriptions in any two languages can often be regarded as translations that respect the image details (Thapliyal and Soricut (2020)).
Auto-generated image descriptions can be incredibly helpful and their quality has come a long way, but it’s important to note they still can’t caption all images as well as a human. Our system was built to describe natural images and is unlikely to generate a description for other types of images, such as sketches, cartoons, memes or screenshots. We considered fairness, safety and quality when developing this feature and implemented a process to evaluate the images and captions along these dimensions before they're eligible to be shown to users.
We are excited to take this next step towards improving accessibility for more people around the world and look forward to expanding support to more languages in the future.
To activate this feature, you first need to turn on your screen reader (here's how to do that in Chrome). From there, you can activate the “Get image descriptions from Google” feature either by opening the context menu when browsing a web page or under your browser’s Accessibility settings. Chrome will then automatically generate descriptions for unlabelled web images in your preferred language.
Posted by Jessica Dene Earley-Cha, Developer Relations Engineer
Like many other people who use their smartphone to make their lives easier, I’m way more likely to use an app that adapts to my behavior and is customized to fit me. Android apps already can support some personalization like the ability to long touch an app and a list of common user journeys are listed. When I long press my Audible app (an online audiobook and podcast service), it gives me a shortcut to the book I’m currently listening to; right now that is Daring Greatly by Brené Brown.
Now, imagine if these shortcuts could also be triggered by a voice command – and, when relevant to the user, show up in Google Assistant for easy use.
Wouldn't that be lovely?
Dynamic shortcuts on a mobile device
Well, now you can do that with App Actions by pushing dynamic shortcuts to the Google Assistant. Let’s go over what Shortcuts are, what happens when you push dynamic shortcuts to Google Assistant, and how to do just that!
As an Android developer, you're most likely familiar with shortcuts. Shortcuts give your users the ability to jump into a specific part of your app. For cases where the destination in your app is based on individual user behavior, you can use a dynamic shortcut to jump to a specific thing the user was previously working with. For example, let’s consider a ToDo app, where users can create and maintain their ToDo lists. Since each item in the ToDo list is unique to each user, you can use Dynamic Shortcuts so that users' shortcuts can be based on their items on their ToDo list.
Below is a snippet of an Android dynamic shortcut for the fictional ToDo app.
val shortcut = = new ShortcutInfoCompat.Builder(context, task.id)
Dynamic Shortcuts for App Actions
If you're pushing dynamic shortcuts, it's a short hop to make those same shortcuts available for use by Google Assistant. You can do that by adding the Google Shortcuts Integration library and a few lines of code.
To extend a dynamic shortcut to Google Assistant through App Actions, two jetpack modules need to be added, and the dynamic shortcut needs to include .
val shortcut = = new ShortcutInfoCompat.Builder(context, task.id)
.addCapabilityBinding("actions.intent.GET_THING", "thing.name", listOf(task.title))
addCapabilityBinding method binds the dynamic shortcut to a capability, which are declared ways a user can launch your app to the requested section. If you don’t already have App Actions implemented, you’ll need to add Capabilities to your shortcuts.xml file. Capabilities are an expression of the relevant feature of an app and contains a Built-In Intent (BII). BIIs are a language model for a voice command that Assistant already understands, and linking a BII to a shortcut allows Assistant to use the shortcut as the fulfillment for a matching command. In other words, by having capabilities, Assistant knows what to listen for, and how to launch the app.
In the example above, the
addCapabilityBinding binds that dynamic shortcut to the
actions.intent.GET_THING BII. When a user requests one of their items in their ToDo app, Assistant will process their request and it’ll trigger capability with the
GET_THING BII that is listed in their shortcuts.xml.
<!-- Eg. name = the ToDo item -->
So in summary, the process to add dynamic shortcuts looks like this:
1. Configure App Actions by adding two jetpack modules ( ShortcutManagerCompat library and Google Shortcuts Integration Library). Then associate the shortcut with a Built-In Intent (BII) in your shortcuts.xml file. Finally push the dynamic shortcut from your app.
2. Two major things happen when you push your dynamic shortcuts to Assistant:
Not too bad. I don’t know about you, but I like to test out new functionality in a small app first. You're in luck! We recently launched a codelab that walks you through this whole process.
Looking for more resources to help improve your understanding of App Actions? We have a new learning pathway that walks you through the product, including the dynamic shortcuts that you just read about. Let us know what you think!
Thanks for reading! To share your thoughts or questions, join us on Reddit at r/GoogleAssistantDev.
Follow @ActionsOnGoogle on Twitter for more of our team's updates, and tweet using #AppActions to share what you’re working on. Can’t wait to see what you build!
Welcome to #IamaGDE - a series of spotlights presenting Google Developer Experts (GDEs) from across the globe. Discover their stories, passions, and highlights of their community work.
In college, Krupal Modi programmed a robot to catch a ball based on the ball’s color, and he enjoyed it enough that he became a developer. Now, he leads machine learning initiatives at Haptik, a conversational AI platform. He is a Google Developer Expert in Machine Learning and recently built the MyGov Corona Helpdesk module for the Indian government, to help Indians around the country schedule COVID-19 vaccinations. He lives in Gujarat, India.
Meet Krupal Modi, Google Developer Expert in Machine Learning.
GDE Krupal Modi
The early days
Krupal Modi didn’t set out to become a developer, but when he did some projects in college related to pattern recognition, in which he built and programmed a robot to catch a ball based on the color of the ball, he got hooked.
“Then, it just happened organically that I liked those problems and became a developer,” he says.
Now, he has been a developer for ten years and is proficient in Natural Language Processing, Image Processing, and unstructured data analysis, using conventional machine learning and deep learning algorithms. He leads machine learning initiatives at Haptik, a conversational AI platform where developers can program virtual AI assistants and chat bots.
“I have been there almost seven years now,” he says. “I like that most of my time goes into solving some of the open problems in the state of natural language and design.”
Krupal has been doing machine learning for nine years, and says advances in Hardware, especially in the past eight years, have made machine learning much more accessible to a wider range of developers. “We’ve come very far with so many advances in hardware,” he says. “I was fortunate enough to have a great community around me.”
Krupal is currently invested in solving the open problems of language understanding.
“Today, nobody really prefers talking with a bot or a virtual assistant,” he says. “Given a choice, you’d rather communicate with a human at a particular business.”
Krupal aims to take language understanding to a new level, where people might prefer to talk to an AI, rather than a human. To do that, his team needs to get technology to the point where it becomes a preferred and faster mode of communication.
Ultimately, Krupal’s dream is to make sure whatever technology he builds can impact some of the fundamental aspects of human life, like health care, education, and digital well being.
“These are a few places where there’s a long way to go, and where the technology I work on could create an impact,” he says. “That would be a dream come true for me.”
COVID in India/Government Corona Help Desk Module
One way Krupal has aimed to use technology to impact health care is in the creation of the MyGov Corona Helpdesk module in India, a WhatsApp bot authorized by the Indian government to combat the spread of COVID-19 misinformation. Indian citizens could text MyGov Corona Helpdesk to get instant information on symptoms, how to seek treatment, and to schedule a vaccine.
“There was a lot of incorrect information on various channels related to the symptoms of COVID and treatments for COVID,” he explains. “Starting this initiative was to have a reliable source of information to combat the spread of misinformation.”
To date, the app has responded to over 100 million queries. Over ten million people have downloaded their vaccination certificates using the app, and over one million people have used it to book vaccination appointments.
Watch this video of how it works.
Becoming a GDE
As a GDE, Krupal focuses on Machine Learning and appreciates the network of self-motivated, passionate developers.
“That’s one of the things I admire the most about the program—the passionate, motivated people in the community,” Krupal says. “If you’re surrounded by such a great community, you take on and learn a lot from them.”
Advice to other developers
“If you are passionate about a specific technology; you find satisfaction in writing about it and sharing it with other developers across the globe; and you look forward to learning from them, then GDE is the right program for you.”
If you make online video content, you’ve probably heard of VidCon, an event where creators, brands, industry experts and fans from around the world converge to celebrate the latest and greatest in digital media. The next VidCon takes place December 3-6 in Abu Dhabi featuring panel discussions, meet and greets and performances with some of the world’s most influential video content creators.
Google for Creators will speak at two sessions at VidCon Abu Dhabi, both of which will focus on helping creators build their brands and monetize their content. On December 3, Head of Creator Relations at Google Paul Bakaus and cosplay designer Yaya Han will discuss how creators can have more control over their futures and businesses. Later that day, Google for Creators writer Crystal Lambert and creator Kaya Marriott will speak at Get the Most from Your Post — How to Create Powerful and Efficient Content Bundles.
On the Google for Creators team, Crystal writes the educational guides for Creators.google. A liaison between the creator community and Google’s expert sources, Crystal researches, compiles and organizes vast troves of information into digestible, easy-to-follow and fun-to-read guides.
We spoke with Crystal to hear more about her upcoming VidCon appearance, and why content bundling is such an important strategy for creators.
Why focus on content bundling at VidCon?
We’re giving two talks at VidCon, and we wanted to focus both of them on the biggest needs in the creator economy. What we’ve learned from creators is that many want to know how to continuously make content without burning out. Content bundling — creating multiple pieces of content on one topic for different formats and platforms — is about tackling content creation in a holistic way. It’s not about approaching all these platforms as individual entities, but grouping what you’re doing together and building on it. It’s one of the easiest and most effective things a creator can do. It’s about content strategy, cross promotion and dealing with brands. Content bundles give you more visibility as a business and credibility when you reach out to brands, or when brands reach out to you.
Who is your VidCon co-presenter?
Kaya Marriott is the founder and content creator behind lifestyle and beauty blog Comfy Girl with Curls. I was super excited to connect with her because she’s on her way to becoming a successful, full-time content creator, and her journey has been so inspiring. She started Comfy Girl with Curls as a natural hair blog, but because so many other creators have come to her for advice, she also shares tips about creating content.
Kaya’s built her own business and she’s been very proficient and proactive about it. She brings a lot of credibility and first-hand knowledge about how and why content bundles work. We’re both excited to speak together.
What else are you excited to see at VidCon?
VidSummit was the first creator-geared conference I went to, and it was inspiring to see how helpful the video creator community is and how enthusiastic they are about what they do. They’re willing to help other creators by teaching them what they’ve learned. I’m excited to see that community at VidCon.
I’m also excited to experience VidCon in another country. I’ve never been to Abu Dhabi, and I’m looking forward to seeing who will be there and what the Abu Dhabi creator community is like.
This May at Google I/O 2021, we shared our vision for Project Starline, a technology project that combines advances in hardware and software to enable friends, families and coworkers to feel together, even when they're cities (or countries) apart.
Project Starline is the culmination of advances we've made across 3D imaging, real-time compression, spatial audio and our breakthrough light field display system that, when combined, enables a sense of depth and realism that feels like in-person communication. We recently described some of these advancements in a technical paper, Project Starline: A high-fidelity telepresence system, which we're honored to have had accepted for publication at SIGGRAPH Asia.
As we’ve started expanding Project Starline’s availability in more Google offices around the United States, we’ve been encouraged by the promising feedback. Google employees have spent thousands of hours using Project Starline to onboard, interview and meet new teammates, pitch ideas to colleagues and engage in one-on-one collaboration. Many users noted how powerful the ability to make eye contact was, and how much more engaged and connected they felt. One user compared their experience to a coffee chat - a genuine interaction that makes you want to lean in and focus on the other person.
We measured the impact of hundreds of Google employees' experiences with Project Starline, and the results showed that it feels much closer to being in the same room with someone than traditional video calls. We saw an increase in some of the most important signals that are often lost in video calls, such as attentiveness, memory recall and overall sense of presence. Here’s what we found when comparing Project Starline to traditional video calls:
These early results show promise for Project Starline's ability to facilitate more personal connections from afar. As Google and more companies navigate the future of work, we are optimistic about the potential to deepen connection and collaboration among employees in the modern-day workplace. We look forward to continuing to expand Project Starline and sharing more on our progress.