Author Archives: Andrew Carroll

A new genome sequencing tool powered with our technology

Genome sequencing provides a more complete description of cells and organisms, allowing scientists to uncover serious genetic conditions such as the elevated risk for breast cancer or pulmonary arterial hypertension. While researching genomics has the potential to save lives and preserve people’s quality of life, it's incredibly challenging work.

Back in January, we announced a partnership with PacBio, a developer of genome sequencing instruments, to further advance genomic technologies. Today, PacBio is introducing the Revio sequencing system, an instrument that runs on our deep learning technology, DeepConsensus. With DeepConsensus built right into Revio, researchers can quickly and accurately identify genetic variants that cause diseases.

How Google Health’s technology works

Genome sequencing requires observing individual DNA molecules amidst a complex and noisy background. To address the problem, Google Health worked to adapt Transformer, one of our most influential existing deep learning methods that was developed in 2017 primarily to understand languages. We then applied it to PacBio’s data, which uses fluorescence light to encode DNA sequences. The result was DeepConsensus.

Last year, we demonstrated that DeepConsensus was capable of reducing sequencing errors by 42%, resulting in better genome assemblies and more accurate identification of genetic variants. Although this was a promising research demonstration, we knew that DeepConsensus could have the greatest impact if it was running directly on PacBio’s sequencing instrument. Over the last year, we’ve worked closely with PacBio to speed up DeepConsensus by over 500x from its initial release. We’ve also further improved its accuracy to reduce errors by 59%.

Combining our AI methods and genomics work with PacBio’s instruments and expertise, we were able to build better methods for the research community. Our partnership with PacBio doesn’t stop with Revio. There’s limitless potential to make an impact on the research community and improve healthcare and access for people around the world.

Advancing genomics to better understand and treat disease

Genome sequencing can help us better understand, diagnose and treat disease. For example, healthcare providers are increasingly using genome sequencing to diagnose rare genetic diseases, such as elevated risk for breast cancer or pulmonary arterial hypertension, which are estimated to affect roughly 8% of the population.

At Google Health, we’re applying our technology and expertise to the field of genomics. Here are recent research and industry developments we’ve made to help quickly identify genetic disease and foster the equity of genomic tests across ancestries. This includes an exciting new partnership with Pacific Biosciences to further advance genomic technologies in research and the clinic.

Helping identify life-threatening disease when minutes matter

Genetic diseases can cause critical illness, and in many cases, a timely identification of the underlying issue can allow for life-saving intervention. This is especially true in the case of newborns. Genetic or congenital conditions affect nearly 6% of births, but clinical sequencing tests to identify these conditions typically take days or weeks to complete.

We recently worked with the University of California Santa Cruz Genomics Institute to build a method – called PEPPER-Margin-DeepVariant – that can analyze data for Oxford Nanopore sequencers, one of the fastest commercial sequencing technologies used today. This week, the New England Journal of Medicine published a study led by the Stanford University School of Medicine detailing the use of this method to identify suspected disease-causing variants in five critical newborn intensive care unit (NICU) cases.

In the fastest cases, a likely disease-causing variant was identified less than 8 hours after sequencing began, compared to the prior fastest time of 13.5 hours. In five cases, the method influenced patient care. For example, the team quickly turned around a diagnosis of Poirier–Bienvenu neurodevelopmental disorder for one infant, allowing for timely, disease-specific treatment.

Time required to sequence and analyze individuals in the pilot study. Disease-causing variants were identified in patient IDs 1, 2, 8, 9, and 11.

Applying machine learning to maximize the potential in sequencing data

Looking forward, new sequencing instruments can lead to dramatic breakthroughs in the field. We believe machine learning (ML) can further unlock the potential of these instruments. Our new research partnership with Pacific Biosciences (PacBio), a developer of genomic sequence platforms, is a great example of how Google’s machine learning and algorithm development tools can help researchers unlock more information from sequencing data.

PacBio’s long-read HiFi sequencing provides the most comprehensive view of genomes, transcriptomes and epigenomes. Using PacBio’s technology in combination with DeepVariant, our award-winning variant detection method, researchers have been able to accurately identify diseases that are otherwise difficult to diagnose with alternative methods.

Additionally, we developed a new open source method called DeepConsensus that, in combination with PacBio’s sequencing platforms, creates more accurate reads of sequencing data. This boost in accuracy will help researchers apply PacBio’s technology to more challenges, such as the final completion of the Human Genome and assembling the genomes of all vertebrate species.

Supporting more equitable genomics resources and methods

Like other areas of health and medicine, the genomics field grapples with health equity issues that, if not addressed, could exclude certain populations. For example, the overwhelming majority of participants in genomic studies have historically been of European ancestry. As a result, the genomics resources that scientists and clinicians use to identify and filter genetic variants and to interpret the significance of these variants are not equally powerful across individuals of all ancestries.

In the past year, we’ve supported two initiatives aimed at improving methods and genomics resources for under-represented populations. We collaborated with 23andMe to develop an improved resource for individuals of African ancestry, and we worked with the UCSC Genomics Institute to develop pangenome methods with this work recently published in Science.

In addition, we recently published two open-source methods that improve genetic discovery by more accurately identifying disease labels and improving the use of health measurements in genetic association studies.

We hope that our work developing and sharing these methods with those in the field of genomics will improve overall health and the understanding of biology for everyone. Working together with our collaborators, we can apply this work to real-world applications.