Tag Archives: Health

How I’m giving thanks (and staying safe) this Thanksgiving

Read this post in Spanish.

I love Thanksgiving. It’s a time to be with those you love, eating fabulous food and sharing memories. In my family, my mother always made the holiday a time when we welcomed people into our home who had nowhere else to go that day. And then we’d take long afternoon walks after our big meal.  

With COVID-19 infections rising to record levels across the U.S, families are changing how they celebrate Thanksgiving this year. Like much else in 2020, we’ll need to develop new and creative traditions to replace the ones that put those we love at risk for COVID. 

This year, please follow the tips from the Centers for Disease Control and Prevention, and avoid large family gatherings.

This disease is highly contagious and getting together physically with extended family is a real risk. Every event that brings people together creates yet another chance for transmission. I’m often asked, “Can’t I just get a COVID test and then see my family?” Unfortunately, the answer I give my friends and family is an unequivocal “No.” Tests are often negative early in the course of disease, which means you can test negative today but be highly infectious tomorrow. So even if you have a negative test, still practice these measures. The best way to show your love is to not have a big family gathering.

There are many ways to celebrate from a distance. You can video call friends and family from the Thanksgiving table. You could spend in-person time outdoors at a distance, wearing masks and avoiding shared dishes. I have even heard of some families even getting creative offering “curbside pickup” of their signature pumpkin pie, green bean casserole or oyster dressing for loved ones to pick up and enjoy in the safety of their own homes.


Tips for celebrating Thanksgiving safely

The Centers for Disease Control and Prevention have shared some tips on how you could celebrate Thanksgiving this year and limit the spread of COVID-19:

  • Wear a mask

  • Rethink traveling

  • Keep gatherings small

  • Celebrate virtually if you can


This year, my immediate family is planning a small meal with just our household followed by a brief, outdoor visit with our grandmother. We will also have a virtual Friendsgiving with friends across the country, which is actually allowing us to share memories with more people than we usually do. I will miss the meals, hugs and in-person laughter, but am willing to sacrifice that for this one year so we can have many more memories together in years to come.   

Though this has been a difficult year for so many around the world, I find I have much to be grateful for this holiday. I am thankful for my medical colleagues—the doctors, nurses, respiratory techs and other responders who are going to work on Thanksgiving to care for COVID-19 patients. I am thankful for my public health colleagues who have worked tirelessly for nearly a year to keep us safe, as they do even when pandemics aren’t raging.

I am thankful for the many unsung first responders working to see that we have safe water to drink, food to eat and electricity to light and heat our homes. I am particularly thankful for the committed scientists who have advanced sound research so we have efficacious and safe treatments, and yes, COVID vaccines in sight. They are giving us so much optimism about the potential for robust countermeasures to bring this pandemic to an end.

And I am thankful for everyone who is putting the public’s health as a priority, and doing all they can to not be a link in the chain of COVID transmission. I know everyone is weary and wants to go back to normal, or at least a new normal. But I encourage everyone to be patient and dig deep inside for the stamina to carry us through these next few months. Now is not the time to let up—it is a time to double down. If scientific progress continues, then by this time next year we might be able to have family gatherings with those we love.

This Thanksgiving, I see staying home as the ultimate form of giving thanks and showing love to your family.
This Thanksgiving, I see staying home as the ultimate form of giving thanks and showing love to your family. So I hope you will join me in following the tips from the Centers for Disease Control and Prevention. They are what I am recommending to family and friends, what I would recommend to my patients and what I am asking of our community. This year, let’s give thanks. Not COVID.

A Q&A on coronavirus vaccines

Since the outbreak of the coronavirus pandemic, Dr. Karen DeSalvo, Google Health’s chief health officer, has been a trusted source for learning about its impact and implications. She's advised Google teams on everything from how to respond to the pandemic in our own workplaces, to how we can build products and features that help everyone navigate COVID-19, such as the COVID-19 layer in Maps. Recently, we shared an update on how we’re doing just that, as well as helping businesses around the world get back up and running.

With lots of discussion worldwide about COVID-19 vaccines, today we published for our employees an interview with Dr. Karen about this topic. We’re sharing a version of that interview more broadly in case it’s helpful or informative for others to read.

As the former director of the United States’ national vaccine program, Dr. Karen is intimately familiar with the subject of vaccines. In this interview, she tells us more about what happens in a vaccine trial, when we can expect to have access to one and what it takes to vaccinate the entire world’s population in record time—a feat the human race has never before undertaken.

Let’s start with the basics. How are vaccines created? 

It’s a rigorous scientific process. It typically involves starting with a concept in animal models to understand if we can identify proteins on an infectious agent, and then simulating a body’s immune system to create a response. Vaccines move through a series of defined phases to test their safety and efficacy in humans. These trials are very large and involve thousands of subjects, and the results lead to a regulatory process that will differ country to country. Then comes the approval process, and then they’re manufactured and deployed.

It sounds like it could take years for all of that to happen. 

For COVID, some of these steps are happening in parallel rather than serially. We’re already manufacturing vaccines that have not yet finished their clinical trials. If they don’t meet the bar for safety and efficacy, they will be disposed of. Deployment of the first generation of approved vaccines will have some challenges. They will require special cold storage at all times, including in transit and warehousing, at -73 degrees Fahrenheit. This may mean they will only be available at specialized centers that have that kind of freezer system. But over time, it’s expected that they will become easier to deploy and administer.

For those of us who haven’t been following every detail in the news, when can we expect to have a COVID-19 vaccine available?

Based upon the pace of science, we’re anticipating that in the U.S. there will be an approved vaccine this winter, and very near that for other parts of the world. More than 200 vaccines are in development, and more than 40 are in human trials. There are two leading candidates in the U.S.—one of them made by Pfizer, and one by Moderna. Pfizer just released some preliminary data this week; they will still need to go through the formal scientific and regulatory review with final results. Other vaccines people should be paying attention to are the AstraZeneca/Oxford vaccine, based in the U.K., and one made by Johnson & Johnson. But there is a lot of exciting science in this area, and the New York Times keeps a great tracker.

You’ve said before that once a vaccine is available, though, it will not be like flipping a switch. 

It will take years to get the world vaccinated. This has never been done before at the pace we are attempting. There will be different “generations” and types of COVID vaccines as the science evolves. They all come with their own special characteristics and may target special populations. Those which come out early will likely require two doses, and it will take six weeks until you build sufficient immunity. Another important point: The conventional wisdom is that more advanced vaccines are expected to reduce symptoms and spread, but not fully prevent or eliminate disease. The vaccines in the current pipeline are designed to prevent disease rather than prevent infection; it’s more like the influenza vaccine—you might still get it, but it will be a less serious case. This means that in reality, we will all have to integrate vaccines as another layer into our public health hygiene, like masking and social distancing.

Who’s participating in vaccine trials right now? 

People around the world have been enthusiastically signing up to participate. My husband is one of them! When he got a call from our local health care system, he marched himself over there and enrolled in the trial. He’s an ER doctor, and he’ll want to get vaccinated because of his ongoing exposure to COVID patients. The trials are randomized, controlled and double-blind: when he got his shot, the nurse turned her head so she couldn’t see what it looked like, and he couldn’t look, either. You sign up through a website, and if you’re eligible, you get a call. Generally, people have to be 18 or older to participate as a volunteer, but the studies are looking for volunteers of all backgrounds and identities.

It wasn’t too long ago that we learned that a late-stage clinical trial for a vaccine was paused due to an “unexplained illness” in a volunteer. Can you tell us what that means? 

When there is any kind of abnormal event, the trial Data Safety and Monitoring Board gets a chance to pause and make sure it isn’t a consequence of the drug. So, a pause like that one means good news; it shows the scientific process is working. There have now been two phase-three clinical trials that have been paused due to a potential event. Both have resumed. Don’t be surprised if it happens again. But there’s nothing so far that indicates there’s a problem with these vaccines. People enrolled in trials will still have the normal life course of health events. I know firsthand that the scientists who work on this are extraordinarily ethical, highly capable and really hard workers.

Lastly, the question that’s on everyone’s mind: Does any of this give us a clearer sense of when we might be able to get these vaccines ourselves? 

If everything continues to roll out the way we think it will, the general population would begin having access to a COVID vaccine by late spring or early summer 2021. That’s pending the manufacturing, that we have enough supplies like medical glass and dry ice, and that we’ve figured out how to manage the cold chain expectations. We should all be encouraged by the degree of global cooperation, including the focus on ensuring low- and middle-income countries and communities have access.

Releasing the Healthcare Text Annotation Guidelines

The Healthcare Text Annotation Guidelines are blueprints for capturing a structured representation of the medical knowledge stored in digital text. In order to automatically map the textual insights to structured knowledge, the annotations generated using these guidelines are fed into a machine learning algorithm that learns to systematically extract the medical knowledge in the text. We’re pleased to release to the public the Healthcare Text Annotation Guidelines as a standard.

Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.

The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:
  • The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
  • Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
  • Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).
Google consulted with industry experts and academic institutions in the process of assembling the Healthcare Text Annotation Guidelines. We took inspiration from other open source and research projects like i2b2 and added context to the guidelines to support information extraction needs for industry-applications like Healthcare Effectiveness Data and Information Set (HEDIS) quality reporting. The data types contained in the Healthcare Text Annotation Guidelines are a common denominator across information extraction applications. Each industry application can have additional information extraction needs that are not captured in the current version of the guidelines. We chose to open source this asset so the community can tailor this project to their needs.

We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.

By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI

Releasing the Healthcare Text Annotation Guidelines

The Healthcare Text Annotation Guidelines are blueprints for capturing a structured representation of the medical knowledge stored in digital text. In order to automatically map the textual insights to structured knowledge, the annotations generated using these guidelines are fed into a machine learning algorithm that learns to systematically extract the medical knowledge in the text. We’re pleased to release to the public the Healthcare Text Annotation Guidelines as a standard.

Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.

The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:
  • The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
  • Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
  • Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).
Google consulted with industry experts and academic institutions in the process of assembling the Healthcare Text Annotation Guidelines. We took inspiration from other open source and research projects like i2b2 and added context to the guidelines to support information extraction needs for industry-applications like Healthcare Effectiveness Data and Information Set (HEDIS) quality reporting. The data types contained in the Healthcare Text Annotation Guidelines are a common denominator across information extraction applications. Each industry application can have additional information extraction needs that are not captured in the current version of the guidelines. We chose to open source this asset so the community can tailor this project to their needs.

We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.

By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI

Exploring AI for radiotherapy planning with Mayo Clinic

More than 18 million new cancer cases are diagnosed globally each year, and radiotherapy is one of the most common cancer treatments—used to treat over halfof cancers in the United States. But planning for a course of radiotherapy treatment is often a time-consuming and manual process for clinicians. The most labor-intensive step in planning is a technique called “contouring” which involves segmenting both the areas of cancer and nearby healthy tissues that are susceptible to radiation damage during treatment. Clinicians have to painstakingly draw lines around sensitive organs on scans—a time-intensive process that can take up to seven hours for a single patient.

Technology has the potential to augment the work of doctors and other care providers, like the specialists who plan radiotherapy treatment. We’re collaborating with Mayo Clinic on research to develop an AI system that can support physicians, help reduce treatment planning time and improve the efficiency of radiotherapy. In this research partnership, Mayo Clinic and Google Health will work to develop an algorithm to assist clinicians in contouring healthy tissue and organs from tumors, and conduct research to better understand how this technology could be deployed effectively in clinical practice. 

Mayo Clinic is an international center of excellence for cancer treatment with world-renowned radiation oncologists. Google researchers have studied how AI can potentially be used to augment other areas of healthcare—from mammographies to the early deployment of an AI system that detects diabetic retinopathy using eye scans. 

In a previous collaboration with University College London Hospitals, Google researchers demonstrated how an AI system could analyze and segment medical scans of patients with head and neck cancer— similar to how expert clinicians would. Our research with Mayo Clinic will also focus on head and neck cancers, which are particularly challenging areas to contour, given the many delicate structures that sit close together. 

In this first phase of research with Mayo Clinic, we hope to develop and validate a model as well as study how an AI system could be deployed in practice. The technology will not be used in a clinical setting and algorithms will be developed using only de-identified data. 

While cancer rates continue to rise, the shortage of radiotherapy experts continues to grow as well. Waiting for a radiotherapy treatment plan can be an agonizing experience for cancer patients, and we hope this research will eventually support a faster planning process and potentially help patients to access treatment sooner.

This researcher is tracking COVID with help from Google

A research team at Carnegie Mellon University (CMU) has been working to make epidemiological forecasting as universal as weather forecasting. When COVID hit, they launched COVIDcast to develop data monitoring and forecasting resources that can help public health officials, researchers, and the public make informed decisions. 

Last month, CMU received $1 million from Google.org and a team of thirteen Google.org Fellows to work pro bono for six months to help continue building out COVIDcast. This was part of Google.org’s $100 million commitment to COVID relief

We caught up with Ryan Tibshirani, a research lead at CMU, to learn more about the project and what the Google.org fellows will work on. 

Tell us a little bit about yourself.  

I'm a faculty member at CMU, jointly appointed in Statistics and Machine Learning, and I’m very interested in epidemiological forecasting and tracking. In 2012, I cofounded Delphi centered on this topic with Roni Rosenfeld, Professor and Head of Machine Learning at CMU.  

What do you focus on most these days?

Since the pandemic began I’ve  spent all of my time on COVID-19 research. Delphi has quadrupled the number of researchers in just eight months and we’re laser-focused on COVID. Leading Delphi's pandemic response effort has been both a challenge—I've never done anything like this before—and a joy—the group is full of amazing people. 

How did you come up with the idea for COVIDcast? 

To back up just a bit: Roni and I formed Delphi in 2012 with the goal to develop the theory and practice of epidemiological forecasting, primarily for seasonal influenza in the U.S. We want this technology to become as universally accepted and useful as today’s weather forecasting. 

Our forecasting system has been a top performer at the Centers for Disease Control's (CDC) annual forecasting challenges, and last year Delphi Group was named one of the two Centers of Excellence for Influenza Forecasting. I like to think of COVIDcast as a replica of what we’ve done for the flu but better and faster.

Break it down for us, what is COVIDcast?

The COVIDcast project is about building and providing an ecosystem for COVID-19 tracking and forecasting. Our aim is to support informed decision-making at federal, state, and local levels of government, in the healthcare sector, and beyond. 

The project has many parts: 

  • Unique relationships with tech and healthcare partners that give us access to data with different views of pandemic activity in the U.S;

  • Code and infrastructure to build new, geographically-detailed, continuously-updated COVID-19 indicators;

  • A historical database of all indicators, including revision tracking;

  • A public API that serves new indicators daily, along with interactive maps and graphics to display them;

  • And lastly, modeling work that builds on the indicators to improve nowcasting and forecasting the spread of COVID-19.

A key element of COVIDcast is that we make all of our work as open and accessible as possible to other researchers and the public to help amplify its impact. We share both our data and a range of software tools—from data processing and visualization to sophisticated statistical tools. 

How will the Google.org funding and fellowship help?

This support will help Delphi expand our efforts to provide a geographically-detailed view of various aspects of the pandemic and to develop an early warning system for health officials, for example, when the number of cases in a locale are expected to rise. There will be more pandemics and epidemics after COVID-19. We want to be prepared, and we believe Delphi's work can help us do that. 

The Google.org Fellowship just kicked off. What are you most excited about?  

Everything! We're excited to embed all the Google.org Fellows—engineers, user experience designers and researchers, program and product managers—into our workstreams. We hope they can help accelerate our progress and introduce us to leading industry product and software development techniques. Each and every one of the fellows has special skills that will be put to good use. We can't wait to see what we can achieve, together. 

More broadly, what role does the tech sector play in COVID-19 response efforts? 

An enormous role. The tech sector is uniquely positioned to provide data and platforms that even governments can't provide. It also has the skills and experience to quickly assemble large-scale systems, in real time. Google has been extraordinarily helpful to us on all of these fronts.

Improving the Accuracy of Genomic Analysis with DeepVariant 1.0

Sequencing genomes involves sampling short pieces of the DNA from the ~6 billion pairs of nucleobases — i.e., adenine (A), thymine (T), guanine (G), and cytosine (C) — we inherit from our parents. Genome sequencing is enabled by two key technologies: DNA sequencers (hardware) that "read" relatively small fragments of DNA, and variant callers (software) that combine the reads to identify where and how an individual's genome differs from a reference genome, like the one assembled in the Human Genome Project. Such variants may be indicators of genetic disorders, such as an elevated risk for breast cancer, pulmonary arterial hypertension, or neurodevelopmental disorders.

In 2017, we released DeepVariant, an open-source tool which identifies genome variants in sequencing data using a convolutional neural network (CNN). The sequencing process begins with a physical sample being sequenced by any of a handful of instruments, depending on the end goal of the sequencing. The raw data, which consists of numerous reads of overlapping fragments of the genome, are then mapped to a reference genome. DeepVariant analyzes these mappings to identify variant locations and distinguish them from sequencing errors.

Soon after it was first published in 2018, DeepVariant underwent a number of updates and improvements, including significant changes to improve accuracy for whole exome sequencing and polymerase chain reaction (PCR) sequencing.

We are now releasing DeepVariant v1.0, which incorporates a large number of improvements for all sequencing types. DeepVariant v1.0 is an improved version of our submission to the PrecisionFDA v2 Truth Challenge, which achieved Best Overall accuracy for 3 of 4 instrument categories. Compared to previous state-of-the-art models, DeepVariant v1.0 significantly reduces the errors for widely-used sequencing data types, including Illumina and Pacific Biosciences. In addition, through a collaboration with the UCSC Genomics Institute, we have also released a model that combines DeepVariant with the UCSC’s PEPPER method, called PEPPER-DeepVariant, which extends coverage to Oxford Nanopore data for the first time.

Sequencing Technologies and DeepVariant
For the last decade, the majority of sequence data were generated using Illumina instruments, which produce short (75-250 bases) and accurate sequences. In recent years, new technologies have become available that can sequence much longer pieces, including Pacific Biosciences, which can produce long and accurate sequences up to ~15,000 bases in length, and Oxford Nanopore, which can produce reads up to 1 million bases long, but with higher error rates. The particular type of sequencing data a researcher might use depends on the ultimate use-case.

Because DeepVariant is a deep learning method, we can quickly re-train it for these new instrument types, ensuring highly accurate sequence identification. Accuracy is important because a missed variant call could mean missing the causal variant for a disorder, while a false positive variant call could lead to identifying an incorrect one. Earlier state-of-the-art methods could reach ~99.1% accuracy (~73,000 errors) on a 35-fold coverage Illumina whole genome, whereas an early version of DeepVariant (v0.10) had ~99.4% accuracy (46,000 errors), corresponding to a 38% error reduction. DeepVariant v1.0 reduces Illumina errors by another ~22% and PacBio errors by another ~52% relative to the last DeepVariant release (v0.10).

DeepVariant Overview
DeepVariant is a convolutional neural network (CNN) that treats the task of identifying genetic variants as an image classification problem. DeepVariant constructs tensors, essentially multi-channel images, where each channel represents an aspect of the sequence, such as the bases in the sequence (called read base), the quality of alignment between different reads (mapping quality), whether a given read supports an alternate allele (read supports variant), etc. It then analyzes these data and outputs three genotype likelihoods, corresponding to how many copies (0, 1, or 2) of a given alternate allele are present.

Example of DeepVariant data. Each row of pixels in each panel corresponds to a single read, i.e., a short genetic sequence. The top, middle, and bottom rows of panels present examples with a different number of variant alleles. Only two of the six data channels are shown: Read base — the pixel value is mapped to each of the four bases, A, C, G, or T; Read supports variant — white means that the read is consistent with a given allele and grey means it is not. Top: Classified by DeepVariant as a "2", which means that both chromosomes match the variant allele. Middle: Classified as a “1”, meaning that one chromosome matches the variant allele. Bottom: Classified as a “0”, implying that the variant allele is missing from both chromosomes.

Technical Improvements in DeepVariant v1.0
Because DeepVariant uses the same codebase for each data type, improvements apply to each of Illumina, PacBio, and Oxford Nanopore. Below, we show the numbers for Illumina and PacBio for two types of small variants: SNPs (single nucleotide polymorphisms, which change a single base without changing sequence length) and INDELs (insertions and deletions).

  • Training on an extended truth set

    The Genome in a Bottle consortium from the National Institute of Standards and Technology (NIST) creates gold-standard samples with known variants covering the regions of the genome. These are used as labels to train DeepVariant. Using long-read technologies the Genome in a Bottle expanded the set of confident variants, increasing the regions described by the standard set from 85% of the genome to 92% of it. These more difficult regions were already used in training the PacBio models, and including them in the Illumina models reduced errors by 11%. By relaxing the filter for reads of lower mapping quality, we further reduced errors by 4% for Illumina and 13% for PacBio.

  • Haplotype sorting of long reads

    We inherit one copy of DNA from our mother and another from our father. PacBio and Oxford Nanopore sequences are long enough to separate sequences by parental origin, which is called a haplotype. By providing this information to the neural network, DeepVariant improves its identification of random sequence errors and can better determine whether a variant has a copy from one or both parents.

  • Re-aligning reads to the alternate (ALT) allele

    DeepVariant uses input sequence fragments that have been aligned to a reference genome. The optimal alignment for variants that include insertions or deletions could be different if the aligner knew they were present. To capture this information, we implemented an additional alignment step relative to the candidate variant. The figure below shows an additional second row where the reads are aligned to the candidate variant, which is a large insertion. You can see sequences that abruptly stop in the first row can now be fully aligned, providing additional information.

    Example of DeepVariant data with realignment to ALT allele. DeepVariant is presented the information in both rows of data for the same example. Only two of the six data channels are shown: Read base (channel #1) and Read supports variant (channel #5). Top: Shows the reads aligned to the reference (in DeepVariant v0.10 and earlier this is all DeepVariant sees). Bottom: Shows the reads aligned to the candidate variant, in this case a long insertion of sequence). The red arrow indicates where the inserted sequence begins.
  • Use a small network to post-process outputs

    Variants can have multiple alleles, with a different base inherited from each parent. DeepVariant’s classifier only generates a probability for one potential variant at a time. In previous versions, simple hand-written rules converted the probabilities into a composite call, but these rules failed in some edge cases. In addition, it also separated the way a final call was made from the backpropagation to train the network. By adding a small, fully-connected neural network to the post-processing step, we are able to better handle these tricky multi-allelic cases.

  • Adding data to train the release model

    The timeframe for the competition was compressed, so we trained only with data similar to the challenge data (PCR-Free NovaSeq) to speed model training. In our production releases, we seek high accuracy for multiple instruments as well as PCR+ preparations. Training with data from these diverse classes helps the model generalize, so our DeepVariant v1.0 release model outperforms the one submitted.

The charts below show the error reduction achieved by each improvement.

Training a Hybrid model
DeepVariant v1.0 also includes a hybrid model for PacBio and Illumina reads. In this case, the model leverages the strengths of both input types, without needing new logic.

Example of DeepVariant merging data from both PacBio and Illumina. Only two of the six data channels are shown: Read base (channel #1) and Read supports variant (channel #5). The longer PacBio reads (at the upper part of the image) span the region being called entirely, while the shorter Illumin reads span only a portion of the region.

We observed no change in SNP errors, suggesting that PacBio reads are strictly superior for SNP calling. We observed a further 49% reduction in Indel errors relative to the PacBio model, suggesting that the Indel error modes of Illumina and PacBio HiFi can be used in a complementary manner.

PEPPER-Deepvariant: A Pipeline for Oxford Nanopore Data Using DeepVariant
Until the PrecisionFDA competition, a DeepVariant model was not available for Oxford Nanopore data, because the higher base error rate created too many candidates for DeepVariant to classify. We partnered with the UC Santa Cruz Genomics Institute, which has extensive expertise with Nanopore data. They had previously trained a deep learning method called PEPPER, which could narrow down the candidates to a more tractable number. The larger neural network of DeepVariant can then accurately characterize the remaining candidates with a reasonable runtime.

The combined PEPPER-DeepVariant pipeline with the Oxford Nanopore model is open-source and available on GitHub. This pipeline was able to achieve a superior SNP calling accuracy to DeepVariant Illumina on the PrecisionFDA challenge, which is the first time anyone has shown Nanopore outperforming Illumina in this way.

Conclusion
DeepVariant v1.0 isn’t the end of development. We look forward to working with the genomics community to further maximize the value of genomic data to patients and researchers.

Source: Google AI Blog


Making data useful for public health

Researchers around the world have used modelling techniques to find patterns in data and map the spread of COVID-19, in order to combat the disease. Modelling a complex global event is challenging, particularly when there are many variables—human behavior, evolving science and policy, and socio-economic issues—as well as unknowns about the virus itself. Teams across Google are contributing tools and resources to the broader scientific community of epidemiologists, analysts and researchers who are working with policymakers and public health officials to address the public health and economic crisis.

Organizing the world’s data for epidemiological researchers

Lack of access to useful high-quality data has posed a significant challenge, and much of the publicly available data is scattered, incomplete, or compiled in many different formats. To help researchers spend more of their time understanding the disease instead of wrangling data, we've developed a set of tools and processes to make it simpler for researchers to discover and work with normalized high-quality public datasets. 


With the help of Google Cloud, we developed a COVID-19 Open Data repository—a comprehensive, open-source resource of COVID-19 epidemiological data and related variables like economic indicators or population statistics from over 50 countries. Each data source contains information on its origin, and how it’s processed so that researchers can confirm its validity and reliability. It can also be used with Data Commons, BigQuery datasets, as well as other initiatives which aggregate regional datasets. 


This repository also includes two Google datasets developed to help researchers study the impact of the disease in a privacy-preserving manner. In April, we began publishing the COVID-19 Community Mobility Reports, which provide anonymized insights into movement trends to understand the impact of policies like shelter in place. These reports have been downloaded over 16 million times and are now updated three times a week in 64 languages, with localized insights covering 12,000 regions, cities and counties for 135 countries. Groups including the OECD, World Bank and Bruegel have used these reports in their research, and the insights inform strategies like how public health could safely unwind social distancing policies.


The latest addition to the repository is the Search Trends symptoms dataset, which aggregates anonymized search trends for over 400 symptoms. This will help researchers better understand the spread of COVID-19 and its potential secondary health impacts.

Tools for managing complex prediction modeling

The data that models rely upon may be imperfect due a range of factors, including a lack of widespread testing or inconsistent reporting. That’s why COVID-19 models need to account for uncertainty in order for their predictions to be reliable and useful. To help address this challenge, we’re providing researchers examples of how to implement bespoke epidemiological models using TensorFlow Probability (TFP), a library for building probabilistic models that can measure confidence in their own predictions. With TFP, researchers can use a range of data sources with different granularities, properties, or confidence levels, and factor that uncertainty into the overall prediction models. This could be particularly useful in fine-tuning the increasingly complex models that epidemiologists are using to understand the spread of COVID-19, particularly in gaining city or county-level insights when only state or national-level datasets exist.  


While models can help predict what happens next, researchers and policymakers are also turning to simulations to better understand the potential impact of their interventions. Simulating these "what if" scenarios involve calculating highly variable social interactions at a massive scale. Simulators can help trial different social distancing techniques and gauge how changes to the movement of people may impact the spread of disease.


Google researchers have developed an open-source agent-based simulator that utilizes real-world data to simulate populations to help public health organizations fine tune their exposure notification parameters. For example, the simulator can consider different disease and transmission characteristics, the number of places people visit, as well as the time spent in those locations. We also contributed to Oxford’s agent-based simulator by factoring in real world mobility and representative models of interactions within different workplace sectors to understand the effect of an exposure notification app on the COVID-19 pandemic.


The scientific and developer community are working on important work to understand and manage the pandemic. Whether it’s by contributing to open source initiatives or funding data science projects and providing Google.org Fellows, we’re committed to collaborating with researchers on efforts to build a more equitable and resilient future.

Making data useful for public health

Researchers around the world have used modelling techniques to find patterns in data and map the spread of COVID-19, in order to combat the disease. Modelling a complex global event is challenging, particularly when there are many variables—human behavior, evolving science and policy, and socio-economic issues—as well as unknowns about the virus itself. Teams across Google are contributing tools and resources to the broader scientific community of epidemiologists, analysts and researchers who are working with policymakers and public health officials to address the public health and economic crisis.

Organizing the world’s data for epidemiological researchers

Lack of access to useful high-quality data has posed a significant challenge, and much of the publicly available data is scattered, incomplete, or compiled in many different formats. To help researchers spend more of their time understanding the disease instead of wrangling data, we've developed a set of tools and processes to make it simpler for researchers to discover and work with normalized high-quality public datasets. 


With the help of Google Cloud, we developed a COVID-19 Open Data repository—a comprehensive, open-source resource of COVID-19 epidemiological data and related variables like economic indicators or population statistics from over 50 countries. Each data source contains information on its origin, and how it’s processed so that researchers can confirm its validity and reliability. It can also be used with Data Commons, BigQuery datasets, as well as other initiatives which aggregate regional datasets. 


This repository also includes two Google datasets developed to help researchers study the impact of the disease in a privacy-preserving manner. In April, we began publishing the COVID-19 Community Mobility Reports, which provide anonymized insights into movement trends to understand the impact of policies like shelter in place. These reports have been downloaded over 16 million times and are now updated three times a week in 64 languages, with localized insights covering 12,000 regions, cities and counties for 135 countries. Groups including the OECD, World Bank and Bruegel have used these reports in their research, and the insights inform strategies like how public health could safely unwind social distancing policies.


The latest addition to the repository is the Search Trends symptoms dataset, which aggregates anonymized search trends for over 400 symptoms. This will help researchers better understand the spread of COVID-19 and its potential secondary health impacts.

Tools for managing complex prediction modeling

The data that models rely upon may be imperfect due a range of factors, including a lack of widespread testing or inconsistent reporting. That’s why COVID-19 models need to account for uncertainty in order for their predictions to be reliable and useful. To help address this challenge, we’re providing researchers examples of how to implement bespoke epidemiological models using TensorFlow Probability (TFP), a library for building probabilistic models that can measure confidence in their own predictions. With TFP, researchers can use a range of data sources with different granularities, properties, or confidence levels, and factor that uncertainty into the overall prediction models. This could be particularly useful in fine-tuning the increasingly complex models that epidemiologists are using to understand the spread of COVID-19, particularly in gaining city or county-level insights when only state or national-level datasets exist.  


While models can help predict what happens next, researchers and policymakers are also turning to simulations to better understand the potential impact of their interventions. Simulating these "what if" scenarios involve calculating highly variable social interactions at a massive scale. Simulators can help trial different social distancing techniques and gauge how changes to the movement of people may impact the spread of disease.


Google researchers have developed an open-source agent-based simulator that utilizes real-world data to simulate populations to help public health organizations fine tune their exposure notification parameters. For example, the simulator can consider different disease and transmission characteristics, the number of places people visit, as well as the time spent in those locations. We also contributed to Oxford’s agent-based simulator by factoring in real world mobility and representative models of interactions within different workplace sectors to understand the effect of an exposure notification app on the COVID-19 pandemic.


The scientific and developer community are working on important work to understand and manage the pandemic. Whether it’s by contributing to open source initiatives or funding data science projects and providing Google.org Fellows, we’re committed to collaborating with researchers on efforts to build a more equitable and resilient future.

How sobriety has helped me cope through a pandemic

I never considered myself an addict until the day I found myself huddled under my covers at four in the afternoon, hungover and wishing my surroundings would disappear. This wasn’t the first time that had happened—in fact, it had become a weekly occurrence—but as I curled up into a ball, feeling pathetic and utterly alone, I realized I had no other options. I grabbed my phone from my nightstand and searched “rehab centers near me.”

I’d been dealing with major depression for years, and up until that moment I thought I had tried everything to find a cure. Special diets, an alphabet soup of antidepressant regimens, group therapy, solo therapy, transcranial magnetic stimulation, ketamine infusions. The only thing I hadn’t tried was sobriety. Drugs and alcohol were my only escape. I couldn’t fathom giving up the one thing that freed myself from the darkest grips of my own mind.

My Google search surfaced a number of local treatment centers, and after making some calls, I found one with a program that could help me. That was more than two years ago. Since then, thanks to hard work that continues today, I’ve remained sober and depression-free. 

Most people in recovery would agree: you can’t do it alone. It’s a reciprocal relationship—my recovery community helps to keep me sober, and my sobriety allows me to play an active role in that community. Twelve-step programs, new habits and the support of others with similar experiences provide a foundation, and then I can build a life I never thought was possible to live when depression controlled my every moment.

That foundation has carried me through COVID-19. Staying sober during a global pandemic is a bit of a paradox. During a time when people are more isolated than ever before, turning to substances to self-soothe seems like a natural response. And the data backs that up: Google searches for “how to get clean” reached an all-time high in June, and “how to get sober” surged in June and then again in August. In the past 30 days, searches for “rehab near me” hit their second-highest peak in recorded history.

And yet sobriety—in an era where it’s harder than ever to stay sober—is precisely what’s gotten me through this time. Staying sober has let me be present with my emotions, to face my anxieties and difficulties head-on. While I can’t numb my feelings, I can protect my mental health. My recovery practice has allowed me to do just that: Daily gratitude lists remind me how fortunate I still am, my sponsor regularly offers wisdom and advice, my peers hold space for my challenges and I do the same for them.

In the throes of my own crisis, the first place I turned to for help was Google. I ended up at a rehab center that profoundly transformed the way I move through the world. Last September, as part of National Recovery Month, Google made these resources even easier to find with its Recover Together site. This year, Google is adding even more features, including a mapping tool that allows you to search for local support groups by simply typing in your zip code. Of course, the search results also include virtual meetings, now that many programs have moved online. 

Map of addiction support groups in Boston area

Our new Recover Together map shows nearby (and virtual) support groups.

I’m proud to work for a company that prioritizes an issue that affects an estimated one in eight American adults and their loved ones. I’m proud to work for a company where I can take time from my day to attend 12-step meetings, no questions asked, and where I can bring my whole self to work and speak freely about my struggles. And I’m proud to work for a company that celebrates my experience as one of triumph rather than shame. That’s committed to reducing the stigma around addiction by providing resources for people like me. 

Recovery doesn’t happen in a vacuum. I can’t do it all by myself, which is why I’m sharing my story today. I hope that even one person who has fought similar battles will read what I have to say and realize that they, too, aren’t in this alone.