Monthly Archives: January 2022

Dev Channel Update for Desktop

The Dev channel has been updated to 99.0.4840.0 for Windows,Mac and Linux.

A partial list of changes is available in the log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Prudhvikumar Bommana

Google Chrome

LaMDA: Towards Safe, Grounded, and High-Quality Dialog Models for Everything

Language models are becoming more capable than ever before and are helpful in a variety of tasks — translating one language into another, summarizing a long document into a brief highlight, or answering information-seeking questions. Among these, open-domain dialog, where a model needs to be able to converse about any topic, is probably one of the most difficult, with a wide range of potential applications and open challenges. In addition to producing responses that humans judge as sensible, interesting, and specific to the context, dialog models should adhere to Responsible AI practices, and avoid making factual statements that are not supported by external information sources.

Today we’re excited to share recent advances in our “LaMDA: Language Models for Dialog Applications” project. In this post, we’ll give an overview on how we’re making progress towards safe, grounded, and high-quality dialog applications. LaMDA is built by fine-tuning a family of Transformer-based neural language models specialized for dialog, with up to 137B model parameters, and teaching the models to leverage external knowledge sources.

Objectives & Metrics
Defining objectives and metrics is critical to guide training dialog models. LaMDA has three key objectives — Quality, Safety, and Groundedness — each of which we measure using carefully designed metrics:

Quality: We decompose Quality into three dimensions, Sensibleness, Specificity, and Interestingness (SSI), which are evaluated by human raters. Sensibleness refers to whether the model produces responses that make sense in the dialog context (e.g., no common sense mistakes, no absurd responses, and no contradictions with earlier responses). Specificity is measured by judging whether the system's response is specific to the preceding dialog context, and not a generic response that could apply to most contexts (e.g., “ok” or “I don’t know”). Finally, Interestingness measures whether the model produces responses that are also insightful, unexpected or witty, and are therefore more likely to create better dialog.

Safety: We’re also making progress towards addressing important questions related to the development and deployment of Responsible AI. Our Safety metric is composed of an illustrative set of safety objectives that captures the behavior that the model should exhibit in a dialog. These objectives attempt to constrain the model’s output to avoid any unintended results that create risks of harm for the user, and to avoid reinforcing unfair bias. For example, these objectives train the model to avoid producing outputs that contain violent or gory content, promote slurs or hateful stereotypes towards groups of people, or contain profanity. Our research towards developing a practical Safety metric represents very early work, and there is still a great deal of progress for us to make in this area.

Groundedness: The current generation of language models often generate statements that seem plausible, but actually contradict facts established in known external sources. This motivates our study of groundedness in LaMDA. Groundedness is defined as the percentage of responses with claims about the external world that can be supported by authoritative external sources, as a share of all responses containing claims about the external world. A related metric, Informativeness, is defined as the percentage of responses with information about the external world that can be supported by known sources, as a share of all responses. Therefore, casual responses that do not carry any real world information (e.g., “That’s a great idea”), affect Informativeness but not Groundedness. While grounding LaMDA generated responses in known sources does not in itself guarantee factual accuracy, it allows users or external systems to judge the validity of a response based on the reliability of its source.

LaMDA Pre-Training
With the objectives and metrics defined, we describe LaMDA’s two-stage training: pre-training and fine-tuning. In the pre-training stage, we first created a dataset of 1.56T words — nearly 40 times more words than what were used to train previous dialog models — from public dialog data and other public web documents. After tokenizing the dataset into 2.81T SentencePiece tokens, we pre-train the model using GSPMD to predict every next token in a sentence, given the previous tokens. The pre-trained LaMDA model has also been widely used for natural language processing research across Google, including program synthesis, zero-shot learning, style transfer, as well as in the BIG-bench workshop.

LaMDA Fine-Tuning
In the fine-tuning stage, we train LaMDA to perform a mix of generative tasks to generate natural-language responses to given contexts, and classification tasks on whether a response is safe and high-quality, resulting in a single multi-task model that can do both. The LaMDA generator is trained to predict the next token on a dialog dataset restricted to back-and-forth dialog between two authors, while the LaMDA classifiers are trained to predict the Safety and Quality (SSI) ratings for the response in context using annotated data. During a dialog, the LaMDA generator first generates several candidate responses given the current multi-turn dialog context, and the LaMDA classifiers predict the SSI and Safety scores for every response candidate. Candidate responses with low Safety scores are first filtered out. Remaining candidates are re-ranked by their SSI scores, and the top result is selected as the response. We further filter the training data used for the generation task with LaMDA classifiers to increase the density of high-quality response candidates.

LaMDA generates and then scores a response candidate.
LaMDA handles arbitrary user input in a way that is sensible, specific, and interesting. Only LaMDA’s very first statement “Hello, I’m a friendly...” was hard coded to set the purpose of the dialog.

Factual Grounding
While people are capable of checking their facts by using tools and referencing established knowledge bases, many language models draw their knowledge on their internal model parameters only. To improve the groundedness of LaMDA’s original response, we collect a dataset of dialogs between people and LaMDA, which are annotated with information retrieval queries and the retrieved results where applicable. We then fine-tune LaMDA’s generator and classifier on this dataset to learn to call an external information retrieval system during its interaction with the user to improve the groundedness of its responses. While this is very early work, we’re seeing promising results.

Zero-shot domain adaptation: cherry-picked, but real example of LaMDA pretending to be Mount Everest, by simply setting its initial message to be “Hi I’m Mount Everest. What would you like me to know about me?” Everest LaMDA is shown providing educational and factually correct responses.

Evaluation
In order to quantify progress against our key metrics, we collect responses from the pre-trained model, fine-tuned model, and human raters (i.e., human-generated responses) to multi-turn two-author dialogs, and then ask a different set of human raters a series of questions to evaluate these responses against the Quality, Safety, and Groundedness metrics.

We observe that LaMDA significantly outperforms the pre-trained model in every dimension and across all model sizes. Quality metrics (Sensibleness, Specificity, and Interestingness, in the first column below) generally improve with the number of model parameters, with or without fine-tuning. Safety does not seem to benefit from model scaling alone, but it does improve with fine-tuning. Groundedness improves as model size increases, perhaps because larger models have a greater capacity to memorize uncommon knowledge, but fine-tuning allows the model to access external knowledge sources and effectively shift some of the load of remembering knowledge to an external knowledge source. With fine-tuning, the quality gap to human levels can be narrowed, though the model’s performance remains below human levels in safety and groundedness.

Comparing the pre-trained model (PT), fine-tuned model (LaMDA) and human-rater-generated dialogs (Human) across Sensibleness, Specificity, Interestingness, Safety, Groundedness, and Informativeness. The test sets used to measure Safety and Groundedness were designed to be especially difficult.

Future Research & Challenges
LaMDA’s level of Sensibleness, Specificity and Interestingness unlocks new avenues for understanding the benefits and risks of open-ended dialog agents. It also presents encouraging evidence that key challenges with neural language models, such as using a safety metric and improving groundedness, can improve with larger models and fine-tuning with more well-labeled data. However, this is very early work, and there are significant limitations. Exploring new ways to improve our Safety metric and LaMDA's groundedness, aligned with our AI Principles, will continue to be our main areas of focus going forward.

Acknowledgements
We'd to like to thank everyone for contributing to the project and paper, including: Blaise Aguera-Arcas, Javier Alberca, Thushan Amarasiriwardena, Lora Aroyo, Martin Baeuml, Leslie Baker, Rachel Bernstein, Taylor Bos, Maarten Bosma, Jonas Bragagnolo, Alena Butryna, Bill Byrne, Chung-Ching Chang, Zhifeng Chen, Dehao Chen, Heng-Tze Cheng, Ed Chi, Aaron Cohen, Eli Collins, Marian Croak, Claire Cui, Andrew Dai, Dipanjan Das, Daniel De Freitas, Jeff Dean, Rajat Dewan, Mark Diaz, Tulsee Doshi, Yu Du, Toju Duke, Doug Eck, Joe Fenton, Noah Fiedel, Christian Frueh, Harish Ganapathy, Saravanan Ganesh, Amin Ghafouri, Zoubin Ghahramani, Kourosh Gharachorloo, Jamie Hall, Erin Hoffman-John, Sissie Hsiao, Yanping Huang, Ben Hutchinson, Daphne Ippolito, Alicia Jin, Thomas Jurdi, Ashwin Kakarla, Nand Kishore, Maxim Krikun, Karthik Krishnamoorthi, Igor Krivokon, Apoorv Kulshreshtha, Ray Kurzweil, Viktoriya Kuzmina, Vivek Kwatra, Matthew Lamm, Quoc Le, Max Lee, Katherine Lee, Hongrae Lee, Josh Lee, Dmitry Lepikhin, YaGuang Li, Yifeng Lu, David Luan, Daphne Luong, Laichee Man, Jianchang (JC) Mao, Yossi Matias, Kathleen Meier-Hellstern, Marcelo Menegali, Muqthar Mohammad,, Muqthar Mohammad, Alejandra Molina, Erica Moreira, Meredith Ringel Morris, Maysam Moussalem, Jiaqi Mu, Tyler Mullen, Tyler Mullen, Eric Ni, Kristen Olson, Alexander Passos, Fernando Pereira, Slav Petrov, Marc Pickett, Roberto Pieraccini, Christian Plagemann, Sahitya Potluri, Vinodkumar Prabhakaran, Andy Pratt, James Qin, Ravi Rajakumar, Adam Roberts, Will Rusch, Renelito Delos Santos, Noam Shazeer, RJ Skerry-Ryan, Grigori Somin, Johnny Soraker, Pranesh Srinivasan, Amarnag Subramanya, Mustafa Suleyman, Romal Thoppilan, Song Wang, Sheng Wang, Chris Wassman, Yuanzhong Xu, Yuanzhong Xu, Ni Yan, Ben Zevenbergen, Vincent Zhao, Huaixiu Steven Zheng, Denny Zhou, Hao Zhou, Yanqi Zhou, and more.

Source: Google AI Blog


AG Paxton’s false claims still don’t add up

Today we’re filing a motion asking the court to dismiss Texas Attorney General Ken Paxton’s antitrust lawsuit over our advertising technology (“ad tech”) business.

This lawsuit has now been rewritten three times. With each version, AG Paxton follows the same pattern: make inaccurate and inflammatory allegations, publicize them widely, and repeat. This playbook may generate attention, but it doesn’t make for a credible antitrust lawsuit.

AG Paxton's allegations are more heat than light, and we don't believe they meet the legal standard to send this case to trial. The complaint misrepresents our business, products and motives, and we are moving to dismiss it based on its failure to offer plausible antitrust claims.

Why this lawsuit misses the law and the facts

At its heart, AG Paxton is asking the court to force us to share user data and design our products in a way that helps our competitors rather than our customers or consumers. But American antitrust law doesn’t require companies to give information to, or design products specifically for, rivals. This lawsuit fails to acknowledge that ad tech is a highly dynamic industry with countless competitors. It’s been recognized that competition in ad tech has led to reduced fees, encouraged new entry, led to increased investment and expanded options for advertisers and publishers alike.

Correcting AG Paxton’s false and misleading allegations

AG Paxton overlooks, or misstates, a litany of clear facts. We want to publicly and unequivocally refute the more egregious allegations:

  • We don’t force “tying”: A central allegation in AG Paxton’s lawsuit is that publishers are forced to use our ad server in order to access our ad exchange. This allegation is simply wrong, and AG Paxton offers no evidence to prove otherwise. If a publisher wants to use our ad exchange with a different ad server, they are free to do so.
  • Header bidding is thriving: A core claim is that we prevented rivals from using a technology, header bidding, through our Open Bidding program. But again, the facts don’t support that. Since we launched Open Bidding, header bidding’s popularity has continued to grow. Recent surveys show a vast majority of publishers currently use header bidding. We simply haven’t held header bidding back.
  • Our auctions are fair: The complaint uses deliberately inflammatory rhetoric to accuse us of a litany of wrongdoings: “misleading” publishers, “rigging” auctions through special data access, running “third price auctions,” “pocketing the difference.” But quite simply, we have – provably – done none of these things. AG Paxton is distorting various optimizations that we have created to improve publisher yields and returns for advertisers. To be clear, contrary to his claims, these optimizations did not and do not result in Google “pocketing” additional revenue share and do not make auctions unfair. And our auction was always a second price auction (until 2019 when it became a first price auction).
  • Out-of-date claims: And more broadly, much of AG Paxton's lawsuit is based on out-dated information that bears no correlation to our current products or business in this dynamic industry (and in any event never amounted to a violation of antitrust laws).

Facebook Audience Network’s participation in Open Bidding

The allegation that has generated the most attention is that we somehow “colluded” with Facebook Audience Network (FAN) through our Open Bidding agreement. That’s simply not true.

To set the record straight, we are today including the full text of our agreement with FAN in our motion to the court. Here are some facts that contradict AG Paxton's claims:

  • This is far from a secret deal: We announced FAN’s participation as one of over 25 partners in our Open Bidding program, all of whom have signed their own agreements to participate.
  • This is a procompetitive agreement: FAN’s participation benefits advertisers because it gives them additional ways to reach their desired audiences. And it benefits publishers because it introduces additional bidders to compete for their ad space, earning them higher returns. In fact, if FAN weren’t a part of Open Bidding, AG Paxton may have claimed we were preventing a rival from accessing our products and depriving publishers of additional revenue.
  • FAN’s involvement is not exclusive: The agreement doesn’t prevent FAN from participating in header bidding or other competing auctions. In fact, FAN participates in several similar auctions on rival platforms. The agreement also doesn’t prevent FAN from building a competing product. Our agreement explicitly states that FAN’s participation is not exclusive (and nowhere in our agreement is header bidding even mentioned). And the entire Open Bidding program (of which Facebook is one of 25 participants) accounts for a small fraction of the display ads we place.
  • We do not manipulate the auction: Finally, this agreement does not provide FAN with an advantage in the Open Bidding auction. FAN competes in the auction just like other bidders: FAN must make the highest bid to win a given impression, period. If another eligible network or exchange bids higher, they win the auction. We don’t allocate ad space to FAN, they don’t receive speed advantages, and we don’t guarantee that they win any auctions.

Our advertising technology helps fund digital content that benefits everyone, and it supports thousands of businesses, from small advertisers to major publishers. Our work in this space is designed to balance and support the needs of publishers, advertisers and consumers.

We’re confident that this case is wrong on the facts and the law, and should be dismissed. However, if it does move forward, we’ll continue to vigorously defend ourselves.

Introducing Ephemeral Containers

Ephemeral containers in Kubernetes started with a simple question: is it feasible to run a service on Kubernetes without bundling a Linux distribution userland with every binary?

It was early 2016. Kubernetes had just released version 1.2, and my SRE team was evaluating using Google Kubernetes Engine for internal workloads. Docker and Kubernetes examples always seemed to build images on top of Linux distributions like Debian or CentOS, but our build system produced a binary with its minimum set of library dependencies, so that's what I wanted to deploy as a container image.

This minimal container image worked fine, but only if I never made a mistake. Since the container image had no shell to use with kubectl exec, I had to log into the node with administrator privileges to interactively troubleshoot any problems. This produced an unfortunate debugging experience and was unacceptable from a security perspective.

What's more, kubectl exec had changed little from docker exec even though Kubernetes introduced new abstractions such as a Pod, where multiple containers share resources. How should Kubernetes native troubleshooting work?

Debugging on Borg

Providing userspace utilities for cluster applications wasn't a new problem for Google. Google's existing cluster orchestration system, Borg, provides a common userland for processes. Rather than including system and debugging utilities with the application binary, Borg provides a basic set of userland utilities that applications can expect in their runtime environment. Another team maintains and updates these utilities independent from application binaries.

There are downsides to this approach: Updates to the common utilities can take weeks or months to roll out, application owners can't specify which utilities they need, and the utilities needed at run time may be completely different from the ones needed at debug time. We could do better for Kubernetes.

Extensibility for Kubernetes

I wanted a solution that felt native for Kubernetes and gave users the freedom to customize to their use case, but I was still new to Kubernetes. I reached out to SIG Node and discovered a welcoming, helpful open source community.

Together we considered different ways of deploying tools to a Pod at debugging time. Implementing the feature entirely on the client side would be easiest, but solutions such as copying binaries into the running container image didn't make debugging feel like a feature of the platform. Kubernetes deploys binaries using containers, so it's natural to use containers for troubleshooting as well.

Existing container types were tied to the Pod lifecycle, though. Containers and Init Containers run when a Pod starts, and neither may be added after a Pod is created. For administrative actions we needed a lifecycle more like kubectl exec. We needed a new type of container: the Ephemeral Container.

What are Ephemeral Containers?

Ephemeral containers are a new type of container that are part of the Kubernetes core API. An Ephemeral Container may be added to an existing Pod for administrative actions like debugging, it runs until it exits, and it won't be restarted. An ephemeral container runs within the Pod's existing resource allocation and shares common container namespaces.

How are Ephemeral Containers used?

Here are some debugging scenarios that are made easier using ephemeral containers.

Troubleshooting Clusters

I run a service named "apples" that consists of a Go binary running in a distroless container image. One of its pods is suddenly having trouble connecting to a backend service, but since it's a distroless image I can't use kubectl exec to troubleshoot:

% kubectl exec -it apples-57bcf49487-ddmpn -- sh OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "sh": executable file not found in $PATH: unknown

We can use kubectl debug to add an ephemeral container and test the backend service:

% kubectl debug -it --image=busybox apples-57bcf49487-ddmpn -- sh Defaulting debug container name to debugger-5wvgc. / # ps ax PID USER TIME COMMAND 1 65535 0:00 /pause 7 root 0:00 /app 19 root 0:00 sh 26 root 0:00 ps ax / # wget -S -O - http://bananas:8080Connecting to bananas:8080 (10.0.0.237:8080) HTTP/1.1 500 Internal Server Error wget: server returned error: HTTP/1.1 500 Internal Server Error

Technical Support

To make this easier to detect next time, I'll add this check to my operation team's autodiagnose script. The ops team doesn't have access to attach to production pods, but they have access to run the autodiagnose image which attaches its logs to a bug report:

% kubectl debug --image=gcr.io/apples/autodiagnose apples-57bcf49487-ddmpn --

--bug=1234

What's Next for Ephemeral Containers?

Ephemeral Containers are available as a beta feature in Kubernetes 1.23, but we still have lots of work to polish the rough edges and improve kubectl debug to support more debugging journeys, such as configuring the container security context to allow attaching a debugger.

Try out Ephemeral Containers and let us know in the Ephemeral Containers and kubectl debug enhancements how they work for you.

Contributor Experience

Working with the Kubernetes community has been incredibly rewarding. When I started I didn't know nearly enough to contribute something like this, but I discovered a community that works hard to welcome contributions at all levels.

I want to thank the community, and especially Dawn Chen, Yu-Ju Hong, Jordan Liggitt‎, Clayton Coleman, Maciej Szulik, Tim Hockin‎, for providing the support and guidance that made this feature possible.

Kubernetes will welcome your contribution as well! See kubernetes.dev for how to get started.


By Lee Verberne, Site Reliability Engineer – Google Cloud Platform

Speaking our language: The importance of a Mandarin Pixel 6 ad with Simu Liu