How anonymized data helps fight against disease

Data has always been a vital tool in understanding and fighting disease — from Florence Nightingale’s 1800s hand drawn illustrations that showed how poor sanitation contributed to preventable diseases to the first open source repository of data developed in response to the 2014 Ebola crisis in West Africa. When the first cases of COVID-19 were reported in Wuhan, data again became one of the most critical tools to combat the pandemic. 

A group of researchers, who documented the initial outbreak, quickly joined forces and started collecting data that could help epidemiologists around the world model the trajectory of the novel coronavirus outbreak. The researchers came from University of Oxford, Tsinghua University, Northeastern University and Boston Children’s Hospital, among others. 

However, their initial workflow was not designed for the exponential rise in cases. The researchers turned to for help. As part of Google’s $100 million contribution to COVID relief, granted $1.25 million in funding and provided a team of 10 fulltime Fellows and 7 part-time Google volunteers to assist with the project.  

Google volunteers worked with the researchers to create, a scalable and open-access platform that pulls together millions of anonymized COVID-19 cases from over 100 countries. This platform helps epidemiologists around the world model the trajectory of COVID-19, and track its variants and future infectious diseases. 

The need for trusted and anonymized case data

When an outbreak occurs, timely access to organized, trustworthy and anonymized data is critical for public health leaders to inform early policy decisions, medical interventions, and allocations of resources — all of which can slow disease spread and save lives. The insights derived from “line-list” data (e.g. anonymized case level information), as opposed to aggregated data such as case counts, are essential for epidemiologists to perform more detailed statistical analyses and model the effectiveness of interventions. 

Volunteers at the University of Oxford started manually curating this data, but it was spread over hundreds of websites, in dozens of formats, in multiple languages. The HealthMap team at Boston Children’s Hospital also identified early reports of COVID-19 through automated indexing of news sites and official sources. These two teams joined forces, shared the data, and published peer-reviewed findings to create a trusted resource for the global community.

Enter the Fellowship

To help the global community of researchers in this meaningful endeavour, decided to offer the support of 10 Fellows who spent 6 months working full-time on, in addition to $1.25M in grant funding. Working hand in hand with the University of Oxford and Boston Children’s Hospital, the team spoke to researchers and public health officials working on the frontline to understand real-life challenges they faced when finding and using high-quality trusted data — a tedious and manual process that often takes hours. 

Upholding data privacy is key to the platform’s design. The anonymized data used at comes from open-access authoritative public health sources, and a panel of data experts rigorously checks it to make sure it meets strict anonymity requirements. The Fellows assisted the team to design the data ingestion flow to implement best practices for data verification and quality checks to make sure that no personal data made its way into the platform. (All line-list data added to the platform is stored and hosted in Boston Children’s Hospital’s secure data infrastructure, not Google’s.)

Looking to the future

With the support of and The Rockefeller Foundation, has grown into an international consortium of researchers at leading universities curating the most comprehensive line-list COVID-19 database in the world.  It includes millions of anonymized records from trusted sources spanning over 100 countries, including India.

Today, helps researchers across the globe access data in a matter of minutes and a series of clicks. The flexibility of the platform means that it can be adapted to any infectious disease data and local context as new outbreaks occur. lays a foundation for researchers and public health officials to access this data no matter their location, be it New York, São Paulo, Munich, Kyoto or Nairobi.

Posted by Stephen Ratcliffe, Fellow and the team