Monthly Archives: June 2022

Chrome Dev for Desktop Update

The Dev channel has been updated to 105.0.5148.2 for Windows, Mac and Linux.

A partial list of changes is available in the Git log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Prudhvi Bommana
Google Chrome

Identifying Disfluencies in Natural Speech

People don’t write in the same way that they speak. Written language is controlled and deliberate, whereas transcripts of spontaneous speech (like interviews) are hard to read because speech is disorganized and less fluent. One aspect that makes speech transcripts particularly difficult to read is disfluency, which includes self-corrections, repetitions, and filled pauses (e.g., words like “umm”, and “you know”). Following is an example of a spoken sentence with disfluencies from the LDC CALLHOME corpus:

But that's it's not, it's not, it's, uh, it's a word play on what you just said.

It takes some time to understand this sentence — the listener must filter out the extraneous words and resolve all of the nots. Removing the disfluencies makes the sentence much easier to read and understand:

But it’s a word play on what you just said.

While people generally don't even notice disfluencies in day-to-day conversation, early foundational work in computational linguistics demonstrated how common they are. In 1994, using the Switchboard corpus, Elizabeh Shriberg demonstrated that there is a 50% probability for a sentence of 10–13 words to include a disfluency and that the probability increases with sentence length.

The proportion of sentences from the Switchboard dataset with at least one disfluency plotted against sentence length measured in non-disfluent (i.e., efficient) tokens in the sentence. The longer a sentence gets, the more likely it is to contain a disfluency.

In “Teaching BERT to Wait: Balancing Accuracy and Latency for Streaming Disfluency Detection”, we present research findings on how to “clean up” transcripts of spoken text. We create more readable transcripts and captions of human speech by finding and removing disfluencies in people’s speech. Using labeled data, we created machine learning (ML) algorithms that identify disfluencies in human speech. Once those are identified we can remove the extra words to make transcripts more readable. This also improves the performance of natural language processing (NLP) algorithms that work on transcripts of human speech. Our work puts special priority on ensuring that these models are able to run on mobile devices so that we can protect user privacy and preserve performance in scenarios with low connectivity.

Base Model Overview
At the core of our base model is a pre-trained BERTBASE encoder with 108.9 million parameters. We use the standard per-token classifier configuration, with a binary classification head being fed by the sequence encodings for each token.

Illustration of how tokens in text become numerical embeddings, which then lead to output labels.

We refined the BERT encoder by continuing the pretraining on the comments from the Pushrift Reddit dataset from 2019. Reddit comments are not speech data, but are more informal and conversational than the wiki and book data. This trains the encoder to better understand informal language, but may run the risk of internalizing some of the biases inherent in the data. For our particular use case, however, the model only captures the syntax or overall form of the text, not its content, which avoids potential issues related to semantic-level biases in the data.

We fine-tune our model for disfluency classification on hand-labeled corpora, such as the Switchboard corpus mentioned above. Hyperparameters (batch size, learning rate, number of training epochs, etc.) were optimized using Vizier.

We also produce a range of “small” models for use on mobile devices using a knowledge distillation technique known as “self training”. Our best small model is based on the Small-vocab BERT variant with 3.1 million parameters. This smaller model achieves comparable results to our baseline at 1% the size (in MiB). You can read more about how we achieved this model miniaturization in our 2021 Interspeech paper.

Streaming
Some of the latest use cases for automatic speech transcription include automated live captioning, such as produced by the Android “Live Captions” feature, which automatically transcribes spoken language in audio being played on the device. For disfluency removal to be of use in improving the readability of the captions in this setting, then it must happen quickly and in a stable manner. That is, the model should not change its past predictions as it sees new words in the transcript.

We call this live token-by-token processing streaming. Accurate streaming is difficult because of temporal dependencies; most disfluencies are only recognizable later. For example, a repetition does not actually become a repetition until the second time the word or phrase is said.

To investigate whether our disfluency detection model is effective in streaming applications, we split the utterances in our training set into prefix segments, where only the first N tokens of the utterance were provided at training time, for all values of N up to the full length of the utterance. We evaluated the model simulating a stream of spoken text by feeding prefixes to the models and measuring the performance with several metrics that capture model accuracy, stability, and latency including streaming F1, time to detection (TTD), edit overhead (EO), and average wait time (AWT). We experimented with look-ahead windows of either one or two tokens, allowing the model to “peek” ahead at additional tokens for which the model is not required to produce a prediction. In essence, we’re asking the model to “wait” for one or two more tokens of evidence before making a decision.

While adding this fixed look-ahead did improve the stability and streaming F1 scores in many contexts, we found that in some cases the label was already clear even without looking ahead to the next token and the model did not necessarily benefit from waiting. Other times, waiting for just one extra token was sufficient. We hypothesized that the model itself could learn when it should wait for more context. Our solution was a modified model architecture that includes a “wait” classification head that decides when the model has seen enough evidence to trust the disfluency classification head.

Diagram showing how the model labels input tokens as they arrive. The BERT embedding layers feed into two separate classification heads, which are combined for the output.

We constructed a training loss function that is a weighted sum of three factors:

  1. The traditional cross-entropy loss for the disfluency classification head
  2. A cross-entropy term that only considers up to the first token with a “wait” classification
  3. A latency penalty that discourages the model from waiting too long to make a prediction

We evaluated this streaming model as well as the standard baseline with no look-ahead and with both 1- and 2-token look-ahead values:

Graph of the streaming F1 score versus the average wait time in tokens. Three data points indicate F1 scores above 0.82 across multiple wait times. The proposed streaming model achieves near top performance with much shorter wait times than the fixed look ahead models.

The streaming model achieved a better streaming F1 score than both a standard baseline with no look ahead and a model with a look ahead of 1. It performed nearly as well as the variant with fixed look ahead of 2, but with much less waiting. On average the model waited for only 0.21 tokens of context.

Internationalization
Our best outcomes so far have been with English transcripts. This is mostly due to resourcing issues: while there are a number of relatively large labeled conversational datasets that include disfluencies in English, other languages often have very few such datasets available. So, in order to make disfluency detection models available outside English a method is needed to build models in a way that does not require finding and labeling hundreds of thousands of utterances in each target language. A promising solution is to leverage multi-language versions of BERT to transfer what a model has learned about English disfluencies to other languages in order to achieve similar performance with much less data. This is an area of active research, but we do have some promising results to outline here.

As a first effort to validate this approach, we added labels to about 10,000 lines of dialogue from the German CALLHOME dataset. We then started with the Geotrend English and German Bilingual BERT model (extracted from Multilingual BERT) and fine-tuned it with approximately 77,000 disfluency-labeled English Switchboard examples and 1.3 million examples of self-labeled transcripts from the Fisher Corpus. Then, we did further fine tuning with about 7,500 in-house–labeled examples from the German CALLHOME dataset.

Diagram illustrating the flow of labeled data and self-trained output in our best multilingual training setup. By training on both English and German data we are able to improve performance via transfer learning.

Our results indicate that fine-tuning on a large English corpus can produce acceptable precision using zero-shot transfer to similar languages like German, but at least a modest amount of German labels were needed to improve recall from less than 60% to greater than 80%. Two-stage fine-tuning of an English-German bilingual model produced the highest precision and overall F1 score.

Approach Precision Recall F1
German BERTBASE model fine-tuned on 7,300 human-labeled German CALLHOME examples 89.1% 81.3% 85.0
Same as above but with additional 7,500 self-labeled German CALLHOME examples 91.5% 83.3% 87.2
English/German Bilingual BERTbase model fine-tuned on English Switchboard+Fisher, evaluated on German CALLHOME (zero-shot language transfer) 87.2% 59.1% 70.4
Same as above but subsequently fine-tuned with 14,800 German CALLHOME (human- and self-labeled) examples 95.5% 82.6% 88.6

Conclusion
Cleaning up disfluencies from transcripts can improve not just their readability for people, but also the performance of other models that consume transcripts. We demonstrate effective methods for identifying disfluencies and expand our disfluency model to resource-constrained environments, new languages, and more interactive use cases.

Acknowledgements
Thank you to Vicky Zayats, Johann Rocholl, Angelica Chen, Noah Murad, Dirk Padfield, and Preeti Mohan for writing the code, running the experiments, and composing the papers discussed here. Wealso thank our technical product manager Aaron Schneider, Bobby Tran from the Cerebra Data Ops team, and Chetan Gupta from Speech Data Ops for their support obtaining additional data labels.

Source: Google AI Blog


Dev Channel Update for Desktop

The Dev channel has been updated to 105.0.5148.2 for Mac, Linux and Windows.

A partial list of changes is available in the log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Prudhvikumar Bommana

Google Chrome 

Get started scripting with Google Ads scripts templates

Today we’re launching scripts templates, a quick and easy way to get started with scripting in Google Ads. Instead of starting your script from scratch, you can choose from a list of templates, edit to fit your account, and deploy them.

To get started, visit the scripts page and click on the plus (+) button, then Start from a template:

Next, pick a template and click Customize to begin editing.

If you have any questions or feedback regarding the new experience, please leave a post on our forum so that we can help.

Easily share profile links via Contacts

Quick summary

In 2021, we introduced a new Google Contacts (contacts.google.com) experience that provides rich information about your colleagues and stakeholders. Starting today, every contact with a Workspace email has a new profile link that is easy to copy, share, and send within an organization. This new profile link helps everyone in your organization get in touch and stay connected. 

shareable profile links


Getting started 

  • Admins: 
  • End users: 
    • Once available, access “Copy profile link” by one of two ways: 
      • In the detailed view for a contact, select the "3-dot-menu" and click the "Copy profile link" button. 
      • In a contact list view (e.g. contacts.google.com), select the "3-dot-menu" for a contact row and click the "Copy profile link" button. 
    • You can also enter "contacts.google.com/emailaddress" into the web browser address bar to look for a person in your organization. Email formats include: [email protected], [email protected], and jane.smith. 

Rollout pace 

  • Rapid Release and Scheduled Release domains: Extended rollout (potentially longer than 15 days for feature visibility) starting on June 30, 2022, with anticipated completion by July 18, 2022. Due to the staggered nature of this rollout, end users may be able to copy profile links from their browser address bar before having access to the "Copy profile link" button.

Availability 

  • Available to all Google Workspace customers, as well as legacy G Suite Basic and Business customers 
  • Not available to users with personal Google Accounts 

Resources 

New Google Workspace features to help solo business owners

Over the past few years, we’ve seen more people forging their own path and turning their personal passions into businesses. These individual business owners, sometimes called “solopreneurs,” wear many hats as they run and grow their businesses: salesperson, marketer, accountant, the list goes on.

That’s why one year ago, we launched Google Workspace Individual as a new offering to help these solo business owners grow their businesses with the familiar apps they’re likely already using in their personal life. We’ve heard from customers that Google Workspace Individual helps them focus their time on doing what they love — like meeting with customers and designing personalized services — and less time on recurring tasks like scheduling appointments and sending emails. Since launch, we’ve delivered a number of improvements to provide even more value to customers, and today we’re announcing what’s coming next – electronic signatures right within Google Docs.

Coming soon: Easily sign important documents right in Google Docs

Whether you’re an event planner or digital creator, it can be a challenge to stay on top of contracts and customer agreements that need to be signed as you’re constantly context switching and jumping between different apps to get work done. That’s why we’re natively integrating eSignature in Google Docs, so you can quickly execute agreements from the familiar interface of Docs without having to switch tabs or apps.

Animation of the process of inserting electronic signature fields in Google Docs

Coming soon: Easily request electronic signatures directly in Google Docs

eSignature in Google Docs will take advantage of the same secure-by-design infrastructure and built-in protections Google uses to help secure your information and safeguard your privacy. Let’s take a look at how eSignature can help you create agreements:

  • Collaborate in documents: Collaborate on changes directly in Google Docs with comments and suggestions — no need to export the file to send a draft contract over email.
  • Add fields to documents: Within the familiar Google Docs interface, you can easily drag and drop signature and date fields in branded documents you create.
  • Request a signature: Once you resolve all comments and suggestions, requesting a signature is as easy as sharing a file in Drive.
  • Add signatures: When ready to sign, the signee can easily add their signature, no downloads needed. Once the signature is added, a completed PDF contract is emailed to both parties.
  • Monitor and track progress: Quickly see the status of pending signatures and easily find completed, signed contracts.
  • Create copies of contracts: For signature workflows that need to be repeated regularly, you can streamline the process by creating copies of existing contracts and then modifying as needed.

eSignature in Google Docs is coming soon in Beta to Google Workspace Individual users and is the latest in a series of improvements we’ve announced for the subscription in the past year. If you’re already using a dedicated eSignature solution, Google Workspace integrates with a number of leading providers. Learn more about how these eSignature and other integrations can help you optimize your workspace on our blog post.

ICYMI: Google Workspace Individual updates from this past year

Email marketing updates for engaging campaigns

For any business, it’s vital to connect with customers and prospects, both on a one-to-one basis and at a large scale. Google Workspace Individual makes it easy to do both, so you can easily send communications like monthly newsletters and also offer items like scheduled consultations.

Animation of the process of creating  and sending customized marketing emails from Gmail

Create and send customized marketing emails from Gmail

To help you reach many customers at once, last year we added a way to run simple email campaigns directly in Gmail. We started first by providing professionally designed templates that you can customize with your own branding and images in just a few clicks. Then earlier this year, we added multi-send, which allows you to deliver individual emails to a large number of recipients with a unique unsubscribe link for each recipient. With the combination of these improvements, it’s easy to make communications as targeted as you like, because you can create multiple email mailing lists within Google Contacts for different audiences and easily tailor the message to each audience. Gmail layouts and multi-send are generally available in Google Workspace Individual today.

Appointment scheduling updates for easier bookings

For scheduling in-person appointments or virtual meetings, Google Calendar helps streamline the appointment scheduling process and avoid back-and-forth communication to find a time that works. Since launching, we’ve made a number of enhancements that improve the experience for both the business owner and scheduler, including the ability to:

  • Help prevent no-shows by customizing the timing of reminder emails and having users verify their email before booking for added security.
  • Reflect your operational needs by setting flexible appointment durations, adding buffer time between appointments and limiting the number of bookings per day.
  • Easily update your availability with one-off exceptions like regional holidays and customizable start and end dates.
Animation of creating a shareable appointment schedule that clients can use to book appointments online by setting your availability and appointment offerings directly in Google Calendar.

Get your own professional booking page that stays up to date

Customized appointment scheduling with the above features are generally available in Google Workspace Individual today, on the web and your mobile device.

Google Meet updates for your customer and partner calls

Once an appointment is on the books and it’s time to connect, Google Meet provides an easy way for you to deepen customer and partner relationships through secure video meetings. Helpful features in Meet ensure you can be clearly seen and heard. Noise cancellation removes background distractions like barking dogs, while low-light mode automatically adjusts your video in poorly lit settings. Here are a few notable Meet announcements from this past year:

  • Mimic taking your call from a real-life cafe or condo with immersive backgrounds.
  • Filter out the echoes in spaces with hard surfaces so that you can have conference-room audio quality whether you’re in a basement, a kitchen, or a big empty room.
  • Clearly see participants on a call while you’re presenting or multi-tasking with picture-in-picture on Chrome browsers.
  • Review your forecast or business proposal with meetings directly in Docs, Sheets and Slides.
Animation of joining a Google Meet video call directly from Google Docs.

Quickly join a Google Meet call from Google Docs, Sheets and Slides

Sign up today to take advantage of promotional pricing

Save 20% until October 2022[3bdee8]when you sign up for Google Workspace Individual today or learn more about Google Workspace Individual on our website.

Google for Mexico: Economic recovery through technology

During the pandemic, different technological tools allowed us to stay connected, collaborate and find the best responses to overcome the challenges in front of us.

As we move forward, we want to become Mexico's trusted technology ally and contribute to the country with programs, products and initiatives that promote economic, social and cultural development. Today, at our second Google for Mexico event, we aim to accelerate the country's economic recovery, helping people find more and better jobs, making it easier for businesses to grow, reduce the gender gap and promote financial inclusion.

Improving Mexicans’ lives through technology

In collaboration with the Ministry of Public Education, we helped students across the country to continue their school year by providing more than 20 million free Google for Education accounts. We have trained more than 1.9 million people in Mexico through Grow with Google and Google.org grants. And we have worked together with the Ministry of Tourism to create a joint strategy to digitize the travel sector, and partnered with the Ministry of Economy on gender gap reduction projects and a technological innovation program for manufacturing companies in the southeast region of the country.

According to a study we conducted with AlphaBeta, in 2021 we estimated that companies in the country obtained annual economic benefits worth more than $7.7 billion dollars from Google products (Google Search & Ads, AdSense, Google Play and YouTube), approximately three times the impact in 2018 ($2.3 billion dollars).

Today, more people in the world are using their smartphones to save credit and debit cards and to buy new things. Over the last few years, we have seen rapid digitization of essentials that we carry with us every day, such as car keys, digital IDs and vaccine records.

That’s why we are announcing that Mexico is part of the global launch of Google Wallet on Android and Wear OS. Google Wallet will initially launch with support for payment cards and loyalty passes and eventually expand to new experiences like transit and event tickets, boarding passes, car keys and digital IDs.

$10 million from Google.org

Mexico's Southeast region is home to more than 50% of the country's indigenous population; it is also a place affected by poverty and with big social vulnerability. Google.org, the philanthropic arm of Google, is allocating $10 million — the largest amount of funding provided by the organization in the country — to this region’s transformation. This initiative will mostly benefit women during the upcoming three years, supporting programs focused on promoting economic opportunities that accelerate financial inclusion, reducing the gender gap.

A Mexican woman wearing a red dress with a white ruffle stands in front of hills, looking slightly away from the camera.

Women from Mexico's Southeast region will benefit from Google.org 10 million dollars fund through local and regional NGOs.

Technology as a booster for jobs

In 2019, during the first edition of Google for Mexico, we announced the launch of Google Career Certificates alongside a grant of $1.1 million for International Youth Foundation Mexico (IYF). Through this grant, IYF has trained 1,200 young people. Seventy percent of the graduates managed to get a new job, while the participants who were already employed raised their income by more than 30%. To expand this initiative, and as part of the $10M fund to support Mexico’s Southeast region, we are announcing a $2 million grant to support IYF to take their project into the region and train 2300 women from the community.

Supporting the news industry

In late 2020, we launched Google News Showcase, an initiative that offers a better experience for readers and news editors. Google News Showcase is a licensing program to pay publishers for high-quality content. This program will help participating publishers monetize their content through an enhanced storytelling experience that lets people go deeper into more complex stories and stay informed about different issues and interests.

Today we are announcing the beginning of negotiations with local media to soon launch a News Showcase in México. We are excited to continue contributing to the country’s media ecosystem, and offer our users relevant, truthful and quality information on local, national and international news.

Illustration of a finger swiping through Google News panels on a screen

Google News Showcase will bring a better experience for readers and news publishers in Mexico.

Preserving and promoting native languages

Every 14 days, a language becomes extinct. This means that out of the 7,000 existing tongues in the world, more than 3,000 are in danger of vanishing. To support the efforts of groups dedicated to language preservation, Google Arts & Culture is collaborating with partners around the world to launch Woolaroo, an experiment that uses machine learning to identify objects and show them in native languages.

Through their mobile cameras, users can take a photo or check their surroundings to receive a translation, and its correct pronunciation. In the beginning, Woolaroo could do this in 10 languages, and today seven more have been added, including Maya and Tepehua.

Animated GIF of a hand holding a phone that shows nature pictures that reflect the background.

Woolaroo, a language preservation experiment powered by machine learning, will include ancestral languages Maya and Tepehua.

At Google, we believe technology is the fuel to be helpful for Mexicans across the country, providing intelligent solutions for millions of people.

Minerva: Solving Quantitative Reasoning Problems with Language Models

Language models have demonstrated remarkable performance on a variety of natural language tasks — indeed, a general lesson from many works, including BERT, GPT-3, Gopher, and PaLM, has been that neural networks trained on diverse data at large scale in an unsupervised way can perform well on a variety of tasks.

Quantitative reasoning is one area in which language models still fall far short of human-level performance. Solving mathematical and scientific questions requires a combination of skills, including correctly parsing a question with natural language and mathematical notation, recalling relevant formulas and constants, and generating step-by-step solutions involving numerical calculations and symbolic manipulation. Due to these challenges, it is often believed that solving quantitative reasoning problems using machine learning will require significant advancements in model architecture and training techniques, granting models access to external tools such as Python interpreters, or possibly a more profound paradigm shift.

In “Solving Quantitative Reasoning Problems With Language Models” (to be released soon on the arXiv), we present Minerva, a language model capable of solving mathematical and scientific questions using step-by-step reasoning. We show that by focusing on collecting training data that is relevant for quantitative reasoning problems, training models at scale, and employing best-in-class inference techniques, we achieve significant performance gains on a variety of difficult quantitative reasoning tasks. Minerva solves such problems by generating solutions that include numerical calculations and symbolic manipulation without relying on external tools such as a calculator. The model parses and answers mathematical questions using a mix of natural language and mathematical notation. Minerva combines several techniques, including few-shot prompting, chain of thought or scratchpad prompting, and majority voting, to achieve state-of-the-art performance on STEM reasoning tasks. You can explore Minerva’s output with our interactive sample explorer!

Solving a multi-step problem: A question from the MATH dataset and Minerva’s solution. The model writes down a line equation, simplifies it, substitutes a variable, and solves for y.

A Model Built for Multi-step Quantitative Reasoning
To promote quantitative reasoning, Minerva builds on the Pathways Language Model (PaLM), with further training on a 118GB dataset of scientific papers from the arXiv preprint server and web pages that contain mathematical expressions using LaTeX, MathJax, or other mathematical typesetting formats. Standard text cleaning procedures often remove symbols and formatting that are essential to the semantic meaning of mathematical expressions. By maintaining this information in the training data, the model learns to converse using standard mathematical notation.

Example questions from the Joint Entrance Examination Main Math 2020 exam taken each year by almost 2M Indian high-school students intended to study engineering and similar fields (left), and the National Math Exam in Poland (May 2022) taken by approximately 270K high-school students every year (right).
A dataset for quantitative reasoning: Careful data processing preserves mathematical information, allowing the model to learn mathematics at a higher level.

Minerva also incorporates recent prompting and evaluation techniques to better solve mathematical questions. These include chain of thought or scratchpad prompting — where Minerva is prompted with several step-by-step solutions to existing questions before being presented with a new question — and majority voting. Like most language models, Minerva assigns probabilities to different possible outputs. When answering a question, rather than taking the single solution Minerva scores as most likely, multiple solutions are generated by sampling stochastically from all possible outputs. These solutions are different (e.g., the steps are not identical), but often arrive at the same final answer. Minerva uses majority voting on these sampled solutions, taking the most common result as the conclusive final answer.

Majority voting: Minerva generates multiple solutions to each question and chooses the most common answer as the solution, improving performance significantly.

Evaluation on STEM Benchmarks
To test Minerva’s quantitative reasoning abilities we evaluated the model on STEM benchmarks ranging in difficulty from grade school level problems to graduate level coursework.

  • MATH: High school math competition level problems
  • MMLU-STEM: A subset of the Massive Multitask Language Understanding benchmark focused on STEM, covering topics such as engineering, chemistry, math, and physics at high school and college level.
  • GSM8k: Grade school level math problems involving basic arithmetic operations that should all be solvable by a talented middle school student.

We also evaluated Minerva on OCWCourses, a collection of college and graduate level problems covering a variety of STEM topics such as solid state chemistry, astronomy, differential equations, and special relativity that we collected from MIT OpenCourseWare.

In all cases, Minerva obtains state-of-the-art results, sometimes by a wide margin.

Evaluation results on MATH and MMLU-STEM, which include high school and college level questions covering a range of STEM topics.
Model   MATH     MMLU-STEM     OCWCourses     GSM8k  
Minerva 50.3% 75% 30.8% 78.5%
Published state of the art    6.9% 55% - 74.4%
Minerva 540B significantly improves state-of-the-art performance on STEM evaluation datasets.

What Minerva Gets Wrong
Minerva still makes its fair share of mistakes. To better identify areas where the model can be improved, we analyzed a sample of questions the model gets wrong, and found that most mistakes are easily interpretable. About half are calculation mistakes, and the other half are reasoning errors, where the solution steps do not follow a logical chain of thought.

It is also possible for the model to arrive at a correct final answer but with faulty reasoning. We call such cases “false positives”, as they erroneously count toward a model’s overall performance score. In our analysis, we find that the rate of false positives is relatively low (Minerva 62B produces less than 8% false positives on MATH).

Below are a couple of example mistakes the model makes.

Calculation mistake: The model incorrectly cancels the square root on both sides of the equation.
Reasoning mistake: The model computes the number of free throws at the fourth practice, but then uses this number as the final answer for the first practice.

Limitations
Our approach to quantitative reasoning is not grounded in formal mathematics. Minerva parses questions and generates answers using a mix of natural language and LaTeX mathematical expressions, with no explicit underlying mathematical structure. This approach has an important limitation, in that the model’s answers cannot be automatically verified. Even when the final answer is known and can be verified, the model can arrive at a correct final answer using incorrect reasoning steps, which cannot be automatically detected. This limitation is not present in formal methods for theorem proving (e.g., see Coq, Isabelle, HOL, Lean, Metamath, and Mizar). On the other hand, an advantage of the informal approach is that it can be applied to a highly diverse set of problems which may not lend themselves to formalization.

Future Directions
While machine learning models have become impressive tools in many scientific disciplines, they are often narrowly scoped to solve specific tasks. We hope that general models capable of solving quantitative reasoning problems will help push the frontiers of science and education. Models capable of quantitative reasoning have many potential applications, including serving as useful aids for researchers, and enabling new learning opportunities for students. We present Minerva as a small step in this direction. To see more samples from Minerva, such as the one below, please visit the interactive sample explorer!

Solving a problem using calculus and trigonoometry: A question from the MATH dataset asking for the speed of a particle in circular motion. Minerva finds a correct step-by-step solution. In the process, Minerva computes a time derivative and applies a trigonometric identity.

Acknowledgements
Minerva was a collaborative effort that spanned multiple teams in Google Research. We would like to thank our coauthors Aitor Lewkowycz, Ambrose Slone, Anders Andreassen, Behnam Neyshabur, Cem Anil, David Dohan, Henryk Michalewski, Imanol Schlag, Theo Gutman-Solo, Vedant Misra, Vinay Ramasesh, and Yuhuai Wu, as well as our collaborators Erik Zelikman and Yasaman Razeghi. Minerva builds upon the work of many others at Google, and we would like to thank the PaLM team, the T5X team, the Flaxformer team, and the JAX team for their efforts. We thank Tom Small for designing the animation in this post. We would also like to especially thank Vedant Misra for developing the Minerva sample explorer.

Source: Google AI Blog


Mahima Pushkarna is making data easier to understand

Five years ago, information designer Mahima Pushkarna joined Google to make data easier to understand. As a senior interaction designer on the People + AI Research (PAIR) team, she designed Data Cards to help everyone better understand the contexts of the data they are using. The Data Cards Playbook puts Google’s AI Principles into practice by providing opportunities for feedback, relevant explanations and appeal.

Recently, Mahima’s paper on Data Cards (co-written with Googlers Andrew Zaldivar and Oddur Kjartansson) was accepted to the ACM Conference on Fairness, Accountability and Transparency (ACM FAccT). Let’s catch up with her and find out more about what brought her to Google.

How did your background lead you to the work you’re doing now?

I've always been fascinated by conjuring up solutions to things. The kind of questions that I’ve found meaningful are those that are never truly solved, or never have one correct answer. (The kind of questions that exasperate us!) Those have been the problems I am always drawn towards.

Early in my career, I realized the power in visualizing data, but spreadsheets were intimidating. I wondered how design could make communicating complexity easier. So I found myself in grad school in Boston studying information design and data visualization. I focused on how people experience data and how our relationships to each other and our contexts are mediated.

I joined Google Brain as the first visual designer in a full-time capacity, though I had no background in artificial intelligence or machine learning — this was the deep end of the pool. This opened up the space to explore human-AI interaction, and make AI more accessible to a broader class of developers. At PAIR, my work focuses on making information experiences more meaningful for developers, researchers and others who build AI technologies.

What’s it like to have a unique background as a designer on a technical AI research team?

When you're an engineer and immersed in building technology, it's easy to assume everyone has a similar experience to your own — especially when you’re surrounded by peers who share your expertise. The actual user experience is very personal and varies drastically across users and contexts. That particular clarity is what designers bring to the table.

I’ve been able to engage my engineering and research colleagues with simple, people-centered questions right in the very beginning. How are people using an AI tool? What are they learning from it? Who else might be involved in the conversation? Do they have the proficiency we assume they have?

Pull quote: “Identifying what we don’t know about data is just as important as articulating what we do know.”

How did you begin designing Data Cards?

This project started when I was working on another visualization toolkit, Facets, to communicate the skews and imbalances within datasets to help machine learning practitioners make informed decisions. At the time, transparency was a moving target. Andrew, Tulsee Doshi and I started to proactively think about fairness in data, and saw a huge gap in the documentation of human decisions that dot a dataset's lifecycle.

This “invisible” information shapes how we use data and the outcomes of models trained on them. For example, a model trained on a dataset that captures age in just two or three buckets will have very different outcomes compared to a dataset with ten buckets. The goal of Data Cards is to make both visible and invisible information about datasets available and simple to understand, so people from a variety of backgrounds can knowledgeably make decisions.

As we cover in our FAccT paper, Andrew and Oddur and I arrived at two insights. The first is that identifying what we don’t know about data is just as important as articulating what we do know. In capturing these nuances, it is possible to narrow those knowledge gaps before even collecting data. The second thing that surprised us was the sheer number of people involved in a dataset’s life cycle, and how fragile knowledge is. Context is easily lost in translation both between and within teams, across documents, emails, people and time.

Data Cards stand on the shoulders of giants, like Data Sheets (Gebru, et al.) and Model Cards (Mitchell et al.). We've been immensely lucky to have had the support of many original authors on these seminal papers that have paved our path to FAccT.

How do you hope the paper is used across the tech industry?

Imagine a world in which finding verifiable information about the motivations of a dataset’s creators or performance of a model is as easy as learning about the ethical beliefs of a celebrity or the rating of a movie. Our vision for Data Cards is that they become a cultural mainstay — invisible, but their absence would be missed by ML practitioners.

In this paper, we introduce frameworks that other teams can use in their work. Alongside that, we’ve open-sourced the Data Cards Playbook, so we're trying to lower the barrier to access in every way possible.