How we’re helping developers with differential privacy

Posted by Miguel Guevara, Product Manager, Privacy and Data Protection Office

At Google, we believe that innovation and privacy must go hand in hand. Earlier this month, we shared our work to keep people safe online, including our investments in leading privacy technologies such as differential privacy. Today, on Data Privacy Day, we want to share some updates on new ways we’re applying differential privacy technologies in our own products and making it more accessible to developers and businesses globally—providing them with greater access to data and insights while keeping people’s personal information private and secure.

Strengthening our core products with differential privacy

We first deployed our world-class differential privacy anonymization technology in Chrome nearly seven years ago and are continually expanding its use across our products including Google Maps and the Assistant. And as the world combats COVID-19, last year we published our COVID-19 Community Mobility Reports, which uses differential privacy to help public health officials, economists and policymakers globally as they make critical decisions for their communities while ensuring no personally identifiable information is made available at any point.

This year in the Google Play console, we’ll provide new app metrics and benchmarks to developers in a differentially private manner. When launched, developers will be able to easily access metrics related to how successfully their apps are engaging their users, such as Daily Active Users and Revenue per Active user, in a manner that helps ensure individual users cannot be identified or re-identified. By adding differential privacy to these new app metrics, we’ll provide meaningful insights to help developers improve their apps without compromising people’s privacy, or developer confidentiality. Moving forward, we plan to expand the number of metrics we provide to developers using differential privacy.

As we have in the last year, we’ll continue to make our existing differential privacy library even easier for developers to use. For example, this month we’re open sourcing a new differentially private SQL database query language extension that is used in thousands of queries done every day at Google. These queries help our analysts obtain business insights, and observe product trends. This is a step forward in democratizing privacy safe data analysis, empowering data scientists around the world to uncover powerful insights while protecting and respecting the privacy of individuals.

Partnering with OpenMined to make differential privacy more widely accessible

As we continue to make advancements with privacy-preserving technologies in our own products, it’s also important to us that developers have access to this technology. That’s why in 2019, we open-sourced our differential privacy library and made it freely accessible, easy to deploy and useful to developers globally. Since then, hundreds of developers, researchers and institutions have incorporated Google’s differential privacy algorithms into their work, enabling them to tackle new problems while using data in a responsible and privacy protective way. One of these companies is French healthcare startup Arkhn. For Arkhn, differential privacy is making it possible to pursue its mission to revolutionize the healthcare industry with artificial intelligence, enabling them to gather, query and analyze cross-department hospital data in a secure, and safe way.

To help bring our world class differential privacy library to more developer teams, like the one at Arkhn, today we’re excited to announce a new partnership with OpenMined, a group of open-source developers that is focused on taking privacy preserving technologies and expanding their usage around the world. Together with OpenMined, we will develop a version of our differential privacy library specifically for python developers. By replicating Google’s differentially private infrastructure, Python developers will have access to a new and unique way to treat their data with world-class privacy.

A collaborative approach to improving the state of privacy in Machine Learning

Two years ago, we introduced TensorFlow Privacy (GitHub), an open source library that makes it easier not only for developers to train machine-learning models with privacy, but also for researchers to advance the state of the art in machine learning with strong privacy guarantees. In the past year, we've expanded the library to include support for TensorFlow 2, as well as both the Keras Model interface and TensorFlow's premade estimators. Thanks to a collaboration with researchers from University of Waterloo, we’ve improved performance, with our new release making it four times faster or more to train on common workloads.

We also recognize that training with privacy might be expensive, or not feasible. So we set out to understand how private machine learning models are. Last year we open-sourced our attack library to help address this and help anyone using the library get a broader privacy picture of their machine models. Since then, we partnered with researchers at Princeton University, and the National University of Singapore who have added new features that expand the library’s scope to test generative models and non-neural network models. Recently, researchers at Stanford Medical School tried it on some of their models, to test for memorization. This testing helped them understand the privacy behavior of their models, something that wasn’t possible beforehand.

We’ve also published new research studying the trade-offs between differential privacy and robustness, another property at the core of AI ethics, privacy and safety.

Our work continues as we invest in world-class-privacy that provides algorithmic protections to the people who use our products while nurturing and expanding a healthy open-source ecosystem. We strongly believe that everyone globally deserves world-class privacy, and we’ll continue partnering with organizations to fulfill that mission.