Tag Archives: Health

Connecting people with COVID-19 information and resources

Since the beginning of the year, search interest in COVID-19 has continued to climb around the world. Right now the disease is the largest topic people are looking for globally, surpassing even some of the most common and consistent queries we see in Search.

COVID trends

As this public health crisis has evolved into a pandemic, information needs are continuing to change, differing from region to region. When COVID-19 was declared a public health emergency by the World Health Organization (WHO) in late January, we launched an SOS Alert with resources and safety information from the WHO, along with the latest news. The alert has launched in 25 languages across dozens of countries, and people in more than 50 countries can access localized public health guidance from health authorities. 

Expanding our COVID-19 Search experience
Now, as we continue to see people’s information needs expanding, we’re introducing a more comprehensive experience for COVID-19 in Search, providing easy access to authoritative information from health authorities alongside new data and visualizations. This new format organizes the search results page to help people easily navigate information and resources, and it will also make it possible to add more information over time as it becomes available.


In addition to links to helpful resources from national and local health authorities, people will also find a carousel of Twitter accounts from local civic organizations and health authorities to help connect them with the latest local guidance as it’s shared. We’ve also introduced a feature to surface some of the most common questions about the pandemic, with relevant snippets sourced from the WHO and the Centers for Disease Control and Prevention (CDC). 

To help people track the latest information about the spread of the disease, we’re adding modules with statistics and a map showing COVID-19 prevalence in countries around the world. This new COVID-19 experience on Search will roll out in the coming days in English in the U.S., and we plan to add more information and expand to other languages and countries soon.

A website dedicated to help and resources
In addition to launching new features on Google Search that provide easy access to more authoritative information, we’ve worked with relevant agencies and authorities to roll out a website—available at google.com/covid19—focused on education, prevention and local resources. People can find state-based information, safety and prevention tips, search trends related to COVID-19, and further resources for individuals, educators and businesses. Launching today in the U.S., the site will be available in more languages and countries in the coming days and we’ll update the website as more resources become available. Along with our other products and initiatives, we hope these resources will help people find answers to the questions they’re asking and get the help they need.

Guidance around local health services
We’re also looking for more ways we can help people follow authoritative public health guidance and locate appropriate health services through our products. Right now in the U.S., people seeking out urgent care, hospitals and other medical services in Search or Maps will see an alert reminding them of the CDC’s recommendation that symptomatic individuals call ahead in order to avoid overwhelming health systems and increasing the risk of exposure.

Urgent Care COVID

As coronavirus becomes a challenge in more communities and as authorities around the world develop new guidance and tools to address the pandemic, we’ll continue to find more opportunities to connect people with key information to keep themselves, their families, and their communities safe.

Get helpful health info from the NHS, right in Search

People come to Search for all types of information to navigate their lives and look after themselves and their families. When it comes to important topics like health, high-quality information is critical, and we aim to connect people with the most reliable sources on the web as quickly as possible.

Now, we’re making it even easier for people in the U.K. to find trusted information from the National Health Service (NHS). Beginning this week, when you search for health conditions like  chickenpox, back pain, or the common cold, you can find Knowledge Panels with information from the NHS website that help you understand more about common causes, treatments and more. 

Knowledge panel in Search

These Knowledge Panels aim to give people authoritative, locally trusted health information, based on open source content. The NHS has formatted their content so that it’s easy to find on the web and available publicly to anyone via the NHS website—Google is one of more than 2,000 organizations using NHS website content to provide trusted information to people looking for it. 

To start, these Knowledge Panels will be available for more than 250 health conditions. Of course, they’re not intended to provide medical advice, and we encourage anyone searching for health information to seek guidance from a doctor if they have a medical concern or, in an emergency, call local emergency services immediately. But we hope this feature will help people find reliable information and have more informed conversations with medical professionals to improve their care.

Source: Search

Generating Diverse Synthetic Medical Image Data for Training Machine Learning Models

The progress in machine learning (ML) for medical imaging that helps doctors provide better diagnoses has partially been driven by the use of large, meticulously labeled datasets. However, dataset size can be limited in real life due to privacy concerns, low patient volume at partner institutions, or by virtue of studying rare diseases. Moreover, to ensure that ML models generalize well, they need training data that span a range of subgroups, such as skin type, demographics, and imaging devices. Requiring that the size of each combinatorial subgroup (e.g., skin type A with skin condition B, taken by camera C) is also sufficiently large can quickly become impractical.

Today we are happy to share two projects aimed at both improving the diversity of ML training data, and increasing the effective amount of available training data for medical applications. The first project is a configurable method for generation of synthetic skin lesion images in order to improve coverage of rarer skin types and conditions. The second project uses synthetic images as training data to develop an ML model, that can better interpret different biological tissue types across a range of imaging devices.

Generating Diverse Images of Skin Conditions
In “DermGAN: Synthetic Generation of Clinical Skin Images with Pathology”, published in the Machine Learning for Health (ML4H) workshop at NeurIPS 2019, we address problems associated with data diversity in de-identified dermatology images taken by consumer grade cameras. This work addresses (1) the scarcity of imaging data representative of rare skin conditions, and (2) the lower frequency of data covering certain Fitzpatrick skin types. Fitzpatrick skin types range from Type I (“pale white, always burns, never tans”) to Type VI (“darkest brown, never burns”), with datasets generally containing relative few cases at the “boundaries”. In both cases, data scarcity problems are exacerbated by the low signal-to-noise ratio common in the target images, due to the lack of standardized lighting, contrast and field-of-view; variability of the background, such as furniture and clothing; and the fine details of the skin, like hair and wrinkles.

To improve diversity in the skin images, we developed a model, called DermGAN, which generates skin images that exhibit the characteristics of a given pre-specified skin condition, location, and underlying skin color. DermGAN uses an image-to-image translation approach, based on the pix2pix generative adversarial network (GAN) architecture, to learn the underlying mapping from one type of image to another.

DermGAN takes as input a real image and its corresponding, pre-generated semantic map representing the underlying characteristics of the real image (e.g., the skin condition, location of the lesion, and skin type), from which it will generate a new synthetic example with the requested characteristics. The generator is based on the U-Net architecture, but in order to mitigate checkerboard artifacts, the deconvolution layers are replaced with a resizing layer, followed by a convolution. A few customized losses are introduced to improve the quality of the synthetic images, especially within the pathological region. The discriminator component of DermGAN is solely used for training, whereas the generator is evaluated both visually and for use in augmenting the training dataset for a skin condition classifier.
Overview of the generator component of DermGAN. The model takes an RGB semantic map (red box) annotated with the skin condition's size and location (smaller orange rectangle), and outputs a realistic skin image. Colored boxes represent various neural network layers, such as convolutions and ReLU; the skip connections resemble the U-Net and enable information to be propagated at the appropriate scales.
The top row shows generated synthetic examples and the bottom row illustrates real images of basal cell carcinoma (left) and melanocytic nevus (right). More examples can be found in the paper.
In addition to generating visually realistic images, our method enables generation of images of skin conditions or skin types that are more rare and that suffer from a paucity of dermatologic images.
DermGAN can be used to generate skin images (all with melanocytic nevus in this case) with different background skin types (top, by changing the input skin color) and different-sized lesions (bottom, by changing the input lesion size). As the input skin color changes, the lesion changes appearance to match what the lesion would look like on different skin types.
Early results indicated that using the generated images as additional data to train a skin condition classifier may improve performance at detecting rare malignant conditions, such as melanoma. However, more work is needed to explore how best to utilize such generated images to improve accuracy more generally across rarer skin types and conditions.

Generating Pathology Images with Different Labels Across Diverse Scanners
The focus quality of medical images is important for accurate diagnoses. Poor focus quality can trigger both false positives and false negatives, even in otherwise accurate ML-based metastatic breast cancer detection algorithms. Determining whether or not pathology images are in-focus is difficult due to factors such as the complexity of the image acquisition process. Digitized whole-slide images could have poor focus across the entire image, but since they are essentially stitched together from thousands of smaller fields of view, they could also have subregions with different focus properties than the rest of the image. This makes manual screening for focus quality impractical and motivates the desire for an automated approach to detect poorly-focused slides and locate out-of-focus regions. Identifying regions with poor focus might enable re-scanning, or yield opportunities to improve the focusing algorithms used during the scanning process.

In our second project, presented in “Whole-slide image focus quality: Automatic assessment and impact on AI cancer detection”, published in the Journal of Pathology Informatics, we develop a method of evaluating de-identified, large gigapixel pathology images for focus quality issues. This involved training a convolutional neural network on semi-synthetic training data that represent different tissue types and slide scanner optical properties. However, a key barrier towards developing such an ML-based system was the lack of labeled data — focus quality is difficult to grade reliably and labeled datasets were not available. To exacerbate the problem, because focus quality affects minute details of the image, any data collected for a specific scanner may not be representative of other scanners, which may have differences in the physical optical systems, the stitching procedure used to recreate a large pathology image from captured image tiles, white-balance and post-processing algorithms, and more. This led us to develop a novel multi-step system for generating synthetic images that exhibit realistic out-of-focus characteristics.

We deconstructed the process of collecting training data into multiple steps. The first step was to collect images from various scanners and to label in-focus regions. This task is substantially easier than trying to determine the degree to which an image is out of focus, and can be completed by non-experts. Next, we generated synthetic out-of-focus images, inspired by the sequence of events that happen prior to a real out-of-focus image is captured: the optical blurring effect happens first, followed by those photons being collected by a sensor (a process that adds sensor noise), and finally software compression adds noise.

A sequence of images showing step-wise out-of-focus image generation. Images are shown in grayscale to accentuate the difference between steps. First, an in-focus image is collected (a) and a bokeh effect is added to produce a blurry image (b). Next, sensor noise is added to simulate a real image sensor (c), and finally JPEG compression is added to simulate the sharp edges introduced by post-acquisition software processing (d). A real out-of-focus image is shown for comparison (e).
Our study shows that modeling each step is essential for optimal results across multiple scanner types, and remarkably, enabled the detection of spectacular out-of-focus patterns in real data:
An example of a particularly interesting out-of-focus pattern across a biological tissue slice. Areas in blue were recognized by the model to be in-focus, whereas areas highlighted in yellow, orange, or red were more out of focus. The gradation in focus here (represented by concentric circles: a red/orange out-of-focus center surrounded by green/cyan mildly out-of-focus, and then a blue in-focus ring) was caused by a hard “stone” in the center that lifted the surrounding biological tissue.
Implications and Future Outlook
Though the volume of data used to develop ML systems is seen as a fundamental bottleneck, we have presented techniques for generating synthetic data that can be used to improve the diversity of training data for ML models and thereby improve the ability of ML to work well on more diverse datasets. We should caution though that these methods are not appropriate for validation data, so as to avoid bias such as an ML model performing well only on synthetic data. To ensure unbiased, statistically-rigorous evaluation, real data of sufficient volume and diversity will still be needed, though techniques such as inverse probability weighting (for example, as leveraged in our work on ML for chest X-rays) may be useful there. We continue to explore other approaches to more efficiently leverage de-identified data to improve data diversity and reduce the need for large datasets in the development of ML models for healthcare.

These projects involved the efforts of multidisciplinary teams of software engineers, researchers, clinicians and cross functional contributors. Key contributors to these projects include Timo Kohlberger, Yun Liu, Melissa Moran, Po-Hsuan Cameron Chen, Trissia Brown, Jason Hipp, Craig Mermel, Martin Stumpe, Amirata Ghorbani, Vivek Natarajan, David Coz, and Yuan Liu. The authors would also like to acknowledge Daniel Fenner, Samuel Yang, Susan Huang, Kimberly Kanada, Greg Corrado and Erica Brand for their advice, members of the Google Health dermatology and pathology teams for their support, and Ashwin Kakarla and Shivamohan Reddy Garlapati for their team for image labeling.

Source: Google AI Blog

Detecting hidden signs of anemia from the eye

Beyond helping us navigate the world, the human eye can reveal signs of underlying disease, which care providers can now uncover during a simple, non-invasive screening (a photograph taken of the back of the eye). We’ve previously shown that deep learning applied to these photos can help identify diabetic eye disease as well as cardiovascular risk factors. Today, we’re sharing how we’re continuing to use deep learning to detect anemia.

Anemia is a major public health problem that affects 1.6 billion people globally, and can cause tiredness, weakness, dizziness and drowsiness. The diagnosis of anemia typically involves a blood test to measure the amount of hemoglobin (a critical protein in your red blood cells that carries oxygen). If your hemoglobin is lower than normal, that indicates anemia. Women during pregnancy are at particularly high risk of anemia with more than 2 in 5 affected, and anemia can also be an early sign of colon cancer in otherwise healthy individuals. 

Our findings

In our latest work, "Detection of anemia from retinal fundus images via deep learning" published in “Nature Biomedical Engineering” we find that a deep learning model can quantify hemoglobin using de-identified photographs of the back of the eye and common metadata (e.g. age, self-reported sex) from the UK Biobank, a population-based study. Compared to just using metadata, deep learning improved the detection of anemia (as measured using the AUC), from 74 percent to 88 percent.

To ensure these promising findings were not the result of chance or false correlations, other scientists helped to validate the model—which was initially developed on a dataset of primarily Caucasian ancestry—on a separate dataset from Asia. The performance of the model was similar on both datasets, suggesting the model could be useful in a variety of settings.

Optic disc

Multiple “explanation” techniques suggest that the optic disc is important for detecting anemia from images of the back of the eye.

Because this research uncovered new findings about the effects of anemia on the eye, we wanted to identify which parts of the eye contained signs of anemia. Our analysis revealed that much of the information comes from the optic disc and surrounding blood vessels. The optic disc is where nerves and blood vessels enter and exit the eye, and normally appears much brighter than the surrounding areas on a photograph of the back of the eye.

Key takeaways

This method to non-invasively screen for anemia could add value to existing diabetic eye disease screening programs, or support an anemia screening that would be quicker and easier than a blood test. Additionally, this work is another example of using deep learning with explainable insights to discover new biomedical knowledge, extending our previous work oncardiovascular risk factors, refractive error, and progression of macular degeneration. We hope this will inspire additional research to reveal new scientific insights from existing medical tests, and to help improve early interventions and health outcomes.

To read more about our latest research for improving the diagnosis of eye diseases, visit Nature Communications and Ophthalmology. You can find more research from Google Health team here.

Using AI to improve breast cancer screening

Breast cancer is a condition that affects far too many women across the globe. More than 55,000 people in the U.K. are diagnosed with breast cancer each year, and about 1 in 8 women in the U.S. will develop the disease in their lifetime. 

Digital mammography, or X-ray imaging of the breast, is the most common method to screen for breast cancer, with over 42 million exams performed each year in the U.S. and U.K. combined. But despite the wide usage of digital mammography, spotting and diagnosing breast cancer early remains a challenge. 

Reading these X-ray images is a difficult task, even for experts, and can often result in both false positives and false negatives. In turn, these inaccuracies can lead to delays in detection and treatment, unnecessary stress for patients and a higher workload for radiologists who are already in short supply.

Over the last two years, we’ve been working with leading clinical research partners in the U.K. and U.S. to see if artificial intelligence could improve the detection of breast cancer. Today, we’re sharing our initial findings, which have been published in Nature. These findings show that our AI model spotted breast cancer in de-identified screening mammograms (where identifiable information has been removed) with greater accuracy, fewer false positives, and fewer false negatives than experts. This sets the stage for future applications where the model could potentially support radiologists performing breast cancer screenings.

Our research

In collaboration with colleagues at DeepMind, Cancer Research UK Imperial Centre, Northwestern University and Royal Surrey County Hospital, we set out to see if artificial intelligence could support radiologists to spot the signs of breast cancer more accurately. 

The model was trained and tuned on a representative data set comprised of de-identified mammograms from more than 76,000 women in the U.K. and more than 15,000 women in the U.S., to see if it could learn to spot signs of breast cancer in the scans. The model was then evaluated on a separate de-identified data set of more than 25,000 women in the U.K. and over 3,000 women in the U.S. In this evaluation, our system produced a 5.7 percent reduction of false positives in the U.S, and a 1.2 percent reduction in the U.K. It produced a 9.4 percent reduction in false negatives in the U.S., and a 2.7 percent reduction in the U.K.

We also wanted to see if the model could generalize to other healthcare systems. To do this, we trained the model only on the data from the women in the U.K. and then evaluated it on the data set from women in the U.S. In this separate experiment, there was a 3.5 percent reduction in false positives and an 8.1 percent reduction in false negatives, showing the model’s potential to generalize to new clinical settings while still performing at a higher level than experts. 

Animation showing tumour growth and metastatic spread in breast cancer_resize.gif

This is a visualization of tumor growth and metastatic spread in breast cancer. Screening aims to detect breast cancer early, before symptoms develop.

Notably, when making its decisions, the model received less information than human experts did. The human experts (in line with routine practice) had access to patient histories and prior mammograms, while the model only processed the most recent anonymized mammogram with no extra information. Despite working from these X-ray images alone, the model surpassed individual experts in accurately identifying breast cancer.

Next steps

Looking forward to future applications, there are some promising signs that the model could potentially increase the accuracy and efficiency of screening programs, as well as reduce wait times and stress for patients. Google’s Chief Financial Officer Ruth Porat shared her optimism around potential technological breakthroughs in this area in a post in October reflecting on her personal experience with breast cancer.

But getting there will require continued research, prospective clinical studies and regulatory approval to understand and prove how software systems inspired by this research could improve patient care.

This work is the latest strand of our research looking into detection and diagnosis of breast cancer, not just within the scope of radiology, but also pathology. In 2017, we published early findings showing how our models can accurately detect metastatic breast cancer from lymph node specimens. Last year, we also developed a deep learning algorithm that could help doctors spot breast cancer more quickly and accurately in pathology slides.

We’re looking forward to working with our partners in the coming years to translate our machine learning research into tools that benefit clinicians and patients.

Lessons Learned from Developing ML for Healthcare

Machine learning (ML) methods are not new in medicine -- traditional techniques, such as decision trees and logistic regression, were commonly used to derive established clinical decision rules (for example, the TIMI Risk Score for estimating patient risk after a coronary event). In recent years, however, there has been a tremendous surge in leveraging ML for a variety of medical applications, such as predicting adverse events from complex medical records, and improving the accuracy of genomic sequencing. In addition to detecting known diseases, ML models can tease out previously unknown signals, such as cardiovascular risk factors and refractive error from retinal fundus photographs.

Beyond developing these models, it’s important to understand how they can be incorporated into medical workflows. Previous research indicates that doctors assisted by ML models can be more accurate than either doctors or models alone in grading diabetic eye disease and diagnosing metastatic breast cancer. Similarly, doctors are able to leverage ML-based tools in an interactive fashion to search for similar medical images, providing further evidence that doctors can work effectively with ML-based assistive tools.

In an effort to improve guidance for research at the intersection of ML and healthcare, we have written a pair of articles, published in Nature Materials and the Journal of the American Medical Association (JAMA). The first is for ML practitioners to better understand how to develop ML solutions for healthcare, and the other is for doctors who desire a better understanding of whether ML could help improve their clinical work.

How to Develop Machine Learning Models for Healthcare
In “How to develop machine learning models for healthcare” (pdf), published in Nature Materials, we discuss the importance of ensuring that the needs specific to the healthcare environment inform the development of ML models for that setting. This should be done throughout the process of developing technologies for healthcare applications, from problem selection, data collection and ML model development to validation and assessment, deployment and monitoring.

The first consideration is how to identify a healthcare problem for which there is both an urgent clinical need and for which predictions based on ML models will provide actionable insight. For example, ML for detecting diabetic eye disease can help alleviate the screening workload in parts of the world where diabetes is prevalent and the number of medical specialists is insufficient. Once the problem has been identified, one must be careful with data curation to ensure that the ground truth labels, or “reference standard”, applied to the data are reliable and accurate. This can be accomplished by validating labels via comparison to expert interpretation of the same data, such as retinal fundus photographs, or through an orthogonal procedure, such as a biopsy to confirm radiologic findings. This is particularly important since a high-quality reference standard is essential both for training useful models and for accurately measuring model performance. Therefore, it is critical that ML practitioners work closely with clinical experts to ensure the rigor of the reference standard used for training and evaluation.

Validation of model performance is also substantially different in healthcare, because the problem of distributional shift can be pronounced. In contrast to typical ML studies where a single random test split is common, the medical field values validation using multiple independent evaluation datasets, each with different patient populations that may exhibit differences in demographics or disease subtypes. Because the specifics depend on the problem, ML practitioners should work closely with clinical experts to design the study, with particular care in ensuring that the model validation and performance metrics are appropriate for the clinical setting.

Integration of the resulting assistive tools also requires thoughtful design to ensure seamless workflow integration, with consideration for measurement of the impact of these tools on diagnostic accuracy and workflow efficiency. Importantly, there is substantial value in prospective study of these tools in real patient care to better understand their real-world impact.

Finally, even after validation and workflow integration, the journey towards deployment is just beginning: regulatory approval and continued monitoring for unexpected error modes or adverse events in real use remains ahead.
Two examples of the translational process of developing, validating, and implementing ML models for healthcare based on our work in detecting diabetic eye disease and metastatic breast cancer.
Empowering Doctors to Better Understand Machine Learning for Healthcare
In “Users’ Guide to the Medical Literature: How to Read Articles that use Machine Learning,” published in JAMA, we summarize key ML concepts to help doctors evaluate ML studies for suitability of inclusion in their workflow. The goal of this article is to demystify ML, to assist doctors who need to use ML systems to understand their basic functionality, when to trust them, and their potential limitations.

The central questions doctors ask when evaluating any study, whether ML or not, remain: Was the reference standard reliable? Was the evaluation unbiased, such as assessing for both false positives and false negatives, and performing a fair comparison with clinicians? Does the evaluation apply to the patient population that I see? How does the ML model help me in taking care of my patients?

In addition to these questions, ML models should also be scrutinized to determine whether the hyperparameters used in their development were tuned on a dataset independent of that used for final model evaluation. This is particularly important, since inappropriate tuning can lead to substantial overestimation of performance, e.g., a sufficiently sophisticated model can be trained to completely memorize the training dataset and generalize poorly to new data. Ensuring that tuning was done appropriately requires being mindful of ambiguities in dataset naming, and in particular, using the terminology with which the audience is most familiar:
The intersection of two fields: ML and healthcare creates ambiguity in the term “validation dataset”. An ML validation set is typically used to refer to the dataset used for hyperparameter tuning, whereas a “clinical” validation set is typically used for final evaluation. To reduce confusion, we have opted to refer to the (ML) validation set as the “tuning” set.
Future outlook
It is an exciting time to work on AI for healthcare. The “bench-to-bedside” path is a long one that requires researchers and experts from multiple disciplines to work together in this translational process. We hope that these two articles will promote mutual understanding of what is important for ML practitioners developing models for healthcare and what is emphasized by doctors evaluating these models, thus driving further collaborations between the fields and towards eventual positive impact on patient care.

Key contributors to these projects include Yun Liu, Po-Hsuan Cameron Chen, Jonathan Krause, and Lily Peng. The authors would like to acknowledge Greg Corrado and Avinash Varadarajan for their advice, and the Google Health team for their support.

Source: Google AI Blog

Developing Deep Learning Models for Chest X-rays with Adjudicated Image Labels

With millions of diagnostic examinations performed annually, chest X-rays are an important and accessible clinical imaging tool for the detection of many diseases. However, their usefulness can be limited by challenges in interpretation, which requires rapid and thorough evaluation of a two-dimensional image depicting complex, three-dimensional organs and disease processes. Indeed, early-stage lung cancers or pneumothoraces (collapsed lungs) can be missed on chest X-rays, leading to serious adverse outcomes for patients.

Advances in machine learning (ML) present an exciting opportunity to create new tools to help experts interpret medical images. Recent efforts have shown promise in improving lung cancer detection in radiology, prostate cancer grading in pathology, and differential diagnoses in dermatology. For chest X-ray images in particular, large, de-identified public image sets are available to researchers across disciplines, and have facilitated several valuable efforts to develop deep learning models for X-ray interpretation. However, obtaining accurate clinical labels for the very large image sets needed for deep learning can be difficult. Most efforts have either applied rule-based natural language processing (NLP) to radiology reports or relied on image review by individual readers, both of which may introduce inconsistencies or errors that can be especially problematic during model evaluation. Another challenge involves assembling datasets that represent an adequately diverse spectrum of cases (i.e., ensuring inclusion of both “hard” cases and “easy” cases that represent the full spectrum of disease presentation). Finally, some chest X-ray findings are non-specific and depend on clinical information about the patient to fully understand their significance. As such, establishing labels that are clinically meaningful and have consistent definitions can be a challenging component of developing machine learning models that use only the image as input. Without standardized and clinically meaningful datasets as well as rigorous reference standard methods, successful application of ML to interpretation of chest X-rays will be hindered.

To help address these issues, we recently published “Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation” in the journal Radiology. In this study we developed deep learning models to classify four clinically important findings on chest X-rays — pneumothorax, nodules and masses, fractures, and airspace opacities. These target findings were selected in consultation with radiologists and clinical colleagues, so as to focus on conditions that are both critical for patient care and for which chest X-ray images alone are an important and accessible first-line imaging study. Selection of these findings also allowed model evaluation using only de-identified images without additional clinical data.

Models were evaluated using thousands of held-out images from each dataset for which we collected high-quality labels using a panel-based adjudication process among board-certified radiologists. Four separate radiologists also independently reviewed the held-out images in order to compare radiologist accuracy to that of the deep learning models (using the panel-based image labels as the reference standard). For all four findings and across both datasets, the deep learning models demonstrated radiologist-level performance. We are sharing the adjudicated labels for the publicly available data here to facilitate additional research.

Data Overview
This work leveraged over 600,000 images sourced from two de-identified datasets. The first dataset was developed in collaboration with co-authors at the Apollo Hospitals, and consists of a diverse set of chest X-rays obtained over several years from multiple locations across the Apollo Hospitals network. The second dataset is the publicly available ChestX-ray14 image set released by the National Institutes of Health (NIH). This second dataset has served as an important resource for many machine learning efforts, yet has limitations stemming from issues with the accuracy and clinical interpretation of the currently available labels.
Chest X-ray depicting an upper left lobe pneumothorax identified by the model and the adjudication panel, but missed by the individual radiologist readers. Left: The original image. Right: The same image with the most important regions for the model prediction highlighted in orange.
Training Set Labels Using Deep Learning and Visual Image Review
For very large datasets consisting of hundreds of thousands of images, such as those needed to train highly accurate deep learning models, it is impractical to manually assign image labels. As such, we developed a separate, text-based deep learning model to extract image labels using the de-identified radiology reports associated with each X-ray. This NLP model was then applied to provide labels for over 560,000 images from the Apollo Hospitals dataset used for training the computer vision models.

To reduce noise from any errors introduced by the text-based label extraction and also to provide the relevant labels for a substantial number of the ChestX-ray14 images, approximately 37,000 images across the two datasets were visually reviewed by radiologists. These were separate from the NLP-based labels and helped to ensure high quality labels across such a large, diverse set of training images.

Creating and Sharing Improved Reference Standard Labels
To generate high-quality reference standard labels for model evaluation, we utilized a panel-based adjudication process, whereby three radiologists reviewed all final tune and test set images and resolved disagreements through discussion. This often allowed difficult findings that were initially only detected by a single radiologist to be identified and documented appropriately. To reduce the risk of bias based on any individual radiologist’s personality or seniority, the discussions took place anonymously via an online discussion and adjudication system.

Because the lack of available adjudicated labels was a significant initial barrier to our work, we are sharing with the research community all of the adjudicated labels for the publicly available ChestX-ray14 dataset, including 2,412 training/validation set images and 1,962 test set images (4,374 images in total). We hope that these labels will facilitate future machine learning efforts and enable better apples-to-apples comparisons between machine learning models for chest X-ray interpretation.

Future Outlook
This work presents several contributions: (1) releasing adjudicated labels for images from a publicly available dataset; (2) a method to scale accurate labeling of training data using a text-based deep learning model; (3) evaluation using a diverse set of images with expert-adjudicated reference standard labels; and ultimately (4) radiologist-level performance of deep learning models for clinically important findings on chest X-rays.

However, in regards to model performance, achieving expert-level accuracy on average is just a part of the story. Even though overall accuracy for the deep learning models was consistently similar to that of radiologists for any given finding, performance for both varied across datasets. For example, the sensitivity for detecting pneumothorax among radiologists was approximately 79% for the ChestX-ray14 images, but was only 52% for the same radiologists on the other dataset, suggesting a more difficult collection cases in the latter. This highlights the importance of validating deep learning tools on multiple, diverse datasets and eventually across the patient populations and clinical settings in which any model is intended to be used.

The performance differences between datasets also emphasize the need for standardized evaluation image sets with accurate reference standards in order to allow comparison across studies. For example, if two different models for the same finding were evaluated using different datasets, comparing performance would be of minimal value without knowing additional details such as the case mix, model error modes, or radiologist performance on the same cases.

Finally, the model often identified findings that were consistently missed by radiologists, and vice versa. As such, strategies that combine the unique “skills” of both the deep learning systems and human experts are likely to hold the most promise for realizing the potential of AI applications in medical image interpretation.

Key contributors to this project at Google include Sid Mittal, Gavin Duggan, Anna Majkowska, Scott McKinney, Andrew Sellergren, David Steiner, Krish Eswaran, Po-Hsuan Cameron Chen, Yun Liu, Shravya Shetty, and Daniel Tse. Significant contributions and input were also made by radiologist collaborators Joshua Reicher, Alexander Ding, and Sreenivasa Raju Kalidindi. The authors would also like to acknowledge many members of the Google Health radiology team including Jonny Wong, Diego Ardila, Zvika Ben-Haim, Rory Sayres, Shahar Jamshy, Shabir Adeel, Mikhail Fomitchev, Akinori Mitani, Quang Duong, William Chen and Sahar Kazemzadeh. Sincere appreciation also goes to the many radiologists who enabled this work through their expert image interpretation efforts throughout the project.

Source: Google AI Blog

Tools to help healthcare providers deliver better care

There has been a lot of interest around our collaboration with Ascension. As a physician, I understand. Health is incredibly personal, and your health information should be private to you and the people providing your care. 

That’s why I want to clarify what our teams are doing, why we’re doing it, and how it will help your healthcare providers—and you. 

Doctors and nurses love caring for patients, but aren’t always equipped with the tools they need to thrive in their mission. We have all seen headlines like "Why doctors hate their computers," with complaints about having to use "a disconnected patchwork" that makes finding critical health information like finding a needle in the haystack. The average U.S. health system has 18 electronic medical record systems, and our doctors and nurses feel like they are "data clerks" rather than healers. 

Google has spent two decades on similar problems for consumers, building products such as Search, Translate and Gmail, and we believe we can adapt our technology to help. That’s why we’re building an intelligent suite of tools to help doctors, nurses, and other providers take better care of patients, leveraging our expertise in organizing information. 

One of those tools aims to make health records more useful, more accessible and more searchable by pulling them into a single, easy-to-use interface for doctors. I mentioned this during my presentation last month at theHLTH Conference. Ascension is the first partner where we are working with the frontline staff to pilot this tool.

Google Health - Tools to help healthcare providers deliver better care

Google Health: Tools to help healthcare providers deliver better care

This effort is challenging. Health information is incredibly complex—there are misspellings, different ways of saying the same thing, handwritten scribbles, and faxes. Healthcare IT systems also don’t talk well to each other and this keeps doctors and nurses from taking the best possible care of you. 

Policymakers and regulators across the world (e.g., CMS, HHS, the NHS, and EC)have called this out as an important issue. We’ve committed to help, and it’s why we built this system on interoperable standards

To deliver such a tool to providers, the system must operate on patients' records. This is what people have been asking about in the context of our Ascension partnership, and why we want to clarify how we handle that data.

As we noted in an earlier post, our work adheres to strict regulations on handling patient data, and our Business Associate Agreement with Ascension ensures their patient data cannot be used for any other purpose than for providing our services—this means it’s never used for advertising. We’ve also published a white paper around how customer data is encrypted and isolated in the cloud. 

To ensure that our tools are safe for Ascension doctors and nurses treating real patients, members of our team might come into contact with identifiable patient data. Because of this, we have strict controls for the limited Google employees who handle such data:

  • We develop and test our system on synthetic (fake) data and openly available datasets.

  • To configure, test, tune and maintain the service in a clinical setting, a limited number of screened and qualified Google staff may be exposed to real data. These staff undergo HIPAA and medical ethics training, and are individually and explicitly approved by Ascension for a limited time.

  • We have technical controls to further enhance data privacy. Data is accessible in a strictly controlled environment with audit trails—these controls are designed to prevent the data from leaving this environment and access to patient data is monitored and auditable.

  • We will further prioritize the development of technology that reduces the number of engineers that need access to patient data (similar to our external redactiontechnology).

  • We also participate in external certifications, like ISO 27001, where independent third-party auditors come and check our processes, including information security controls for these tools.

I graduated from medical school in 1989. I've seen tremendous progress in healthcare over the ensuing decades, but this progress has also brought with it challenges of information overload that have taken doctors’ and nurses’ attentions away from the patients they are called to serve. I believe technology has a major role to play in reversing this trend, while also improving how care is delivered in ways that can save lives. 

New Insights into Human Mobility with Privacy Preserving Aggregation

Understanding human mobility is crucial for predicting epidemics, urban and transit infrastructure planning, understanding people’s responses to conflict and natural disasters and other important domains. Formerly, the state-of-the-art in mobility data was based on cell carrier logs or location "check-ins", and was therefore available only in limited areas — where the telecom provider is operating. As a result, cross-border movement and long-distance travel were typically not captured, because users tend not to use their SIM card outside the country covered by their subscription plan and datasets are often bound to specific regions. Additionally, such measures involved considerable time lags and were available only within limited time ranges and geographical areas.

In contrast, de-identified aggregate flows of populations around the world can now be computed from phones' location sensors at a uniform spatial resolution. This metric has the potential to be extremely useful for urban planning since it can be measured in a direct and timely way. The use of de-identified and aggregated population flow data collected at a global level via smartphones could shed additional light on city organization, for example, while requiring significantly fewer resources than existing methods.

In “Hierarchical Organization of Urban Mobility and Its Connection with City Livability”, we show that these mobility patterns — statistics on how populations move about in aggregate — indicate a higher use of public transportation, improved walkability, lower pollutant emissions per capita, and better health indicators, including easier accessibility to hospitals. This work, which appears in Nature Communications, contributes to a better characterization of city organization and supports a stronger quantitative perspective in the efforts to improve urban livability and sustainability.
Visualization of privacy-first computation of the mobility map. Individual data points are automatically aggregated together with differential privacy noise added. Then, flows of these aggregate and obfuscated populations are studied.
Computing a Global Mobility Map While Preserving User Privacy
In line with our AI principles, we have designed a method for analyzing population mobility with privacy-preserving techniques at its core. To ensure that no individual user’s journey can be identified, we create representative models of aggregate data by employing a technique called differential privacy, together with k-anonymity, to aggregate population flows over time. Initially implemented in 2014, this approach to differential privacy intentionally adds random “noise” to the data in a way that maintains both users' privacy and the data's accuracy at an aggregate level. We use this method to aggregate data collected from smartphones of users who have deliberately chosen to opt-in to Location History, in order to better understand global patterns of population movements.

The model only considers de-identified location readings aggregated to geographical areas of predetermined sizes (e.g., S2 cells). It "snaps" each reading into a spacetime bucket by discretizing time into longer intervals (e.g., weeks) and latitude/longitude into a unique identifier of the geographical area. Aggregating into these large spacetime buckets goes beyond protecting individual privacy — it can even protect the privacy of communities.

Finally, for each pair of geographical areas, the system computes the relative flow between the areas over a given time interval, applies differential privacy filters, and outputs the global, anonymized, and aggregated mobility map. The dataset is generated only once and only mobility flows involving a sufficiently large number of accounts are processed by the model. This design is limited to heavily aggregated flows of populations, such as that already used as a vital source of information for estimates of live traffic and parking availability, which protects individual data from being manually identified. The resulting map is indexed for efficient lookup and used to fuel the modeling described below.

Mobility Map Applications
Aggregate mobility of people in cities around the globe defines the city and, in turn, its impact on the people who live there. We define a metric, the flow hierarchy (Φ), derived entirely from the mobility map, that quantifies the hierarchical organization of cities. While hierarchies across cities have been extensively studied since Christaller’s work in the 1930s, for individual cities, the focus has been primarily on the differences between core and peripheral structures, as well as whether cities are mono- or poly-centric. Our results instead show that the reality is much more rich than previously thought. The mobility map enables a quantitative demonstration that cities lie across a spectrum of hierarchical organization that strongly correlates with a series of important quality of life indicators, including health and transportation.

Below we see an example of two cities — Paris and Los Angeles. Though they have almost the same population size, those two populations move in very different ways. Paris is mono-centric, with an "onion" structure that has a distinct high-mobility city center (red), which progressively decreases as we move away from the center (in order: orange, yellow, green, blue). On the other hand, Los Angeles is truly poly-centric, with a large number of high-mobility areas scattered throughout the region.
Mobility maps of Paris (left) and Los Angeles (right). Both cities have similar population sizes, but very different mobility patterns. Paris has an "onion" structure exhibiting a distinct center with a high degree of mobility (red) that progressively decreases as we move away from the center (in order: orange, yellow, green, blue). In contrast, Los Angeles has a large number of high-mobility areas scattered throughout the region.
More hierarchical cities — in terms of flows being primarily between hotspots of similar activity levels — have values of flow hierarchy Φ closer to the upper limit of 1 and tend to have greater levels of uniformity in their spatial distribution of movements, wider use of public transportation, higher levels of walkability, lower pollution emissions, and better indicators of various measures of health. Returning to our example, the flow hierarchy of Paris is Φ=0.93 (in the top quartile across all 174 cities sampled), while that of Los Angeles is 0.86 (bottom quartile).

We find that existing measures of urban structure, such as population density and sprawl composite indices, correlate with flow hierarchy, but in addition the flow hierarchy conveys comparatively more information that includes behavioral and socioeconomic factors.
Connecting flow hierarchy Φ with urban indicators in a sample of US cities. Proportion of trips as a function of Φ, broken down by model share: private car, public transportation, and walking. Sample city names that appear in the plot: ATL (Atlanta), CHA (Charlotte), CHI (Chicago), HOU (Houston), LA (Los Angeles), MIN (Minneapolis), NY (New York City), and SF (San Francisco). We see that cities with higher flow hierarchy exhibit significantly higher rates of public transportation use, less car use, and more walkability.
Measures of urban sprawl require composite indices built up from much more detailed information on land use, population, density of jobs, and street geography among others (sometimes up to 20 different variables). In addition to the extensive data requirements, such metrics are also costly to obtain. For example, censuses and surveys require a massive deployment of resources in terms of interviews, and are only standardized at a country level, hindering the correct quantification of sprawl indices at a global scale. On the other hand, the flow hierarchy, being constructed from mobility information alone, is significantly less expensive to compile (involving only computer processing cycles), and is available in real-time.

Given the ongoing debate on the optimal structure of cities, the flow hierarchy, introduces a different conceptual perspective compared to existing measures, and can shed new light on the organization of cities. From a public-policy point of view, we see that cities with greater degree of mobility hierarchy tend to have more desirable urban indicators. Given that this hierarchy is a measure of proximity and direct connectivity between socioeconomic hubs, a possible direction could be to shape opportunity and demand in a way that facilitates a greater degree of hub-to-hub movement than a hub-to-spoke architecture. The proximity of hubs can be generated through appropriate land use, that can be shaped by data-driven zoning laws in terms of business, residence or service areas. The presence of efficient public transportation and lower use of cars is another important factor. Perhaps a combination of policies, such as congestion-pricing, used to disincentivize private transportation to socioeconomic hubs, along with building public transportation in a targeted fashion to directly connect the hubs, may well prove useful.

Next Steps
This work is part of our larger AI for Social Good efforts, a program that focuses Google's expertise on addressing humanitarian and environmental challenges.These mobility maps are only the first step toward making an impact in epidemiology, infrastructure planning, and disaster response, while ensuring high privacy standards.

The work discussed here goes to great lengths to ensure privacy is maintained. We are also working on newer techniques, such as on-device federated learning, to go a step further and enable computing aggregate flows without personal data leaving the device at all. By using distributed secure aggregation protocols or randomized responses, global flows can be computed without even the aggregator having knowledge of individual data points being aggregated. This technique has also been applied to help secure Chrome from malicious attacks.

This work resulted from a collaboration of Aleix Bassolas and José J. Ramasco from the Institute for Cross-Disciplinary Physics and Complex Systems (IFISC, CSIC-UIB), Brian Dickinson, Hugo Barbosa-Filho, Gourab Ghoshal, Surendra A. Hazarie, and Henry Kautz from the Computer Science Department and Ghoshal Lab at the University of Rochester, Riccardo Gallotti from the Bruno Kessler Foundation, and Xerxes Dotiwalla, Paul Eastham, Bryant Gipson, Onur Kucuktunc, Allison Lieber, Adam Sadilek at Google.

The differential privacy library used in this work is open source and available on our GitHub repo.

Source: Google AI Blog

Breast cancer and tech…a reason for optimism

I was diagnosed with breast cancer twice, in 2001 and again in 2004. Thanks to early detection and access to extraordinary care—including multiple rounds of chemo, radiation and more surgery than any one person should ever have in a lifetime—I’m still here and able to write this piece. In fact, I’ve probably never been healthier. 

I remember receiving the news. I was initially terrified. Our three kids were only five, seven, and nine at the time of my first diagnosis, and all I wanted was to live to see them grow up. I’m grateful I had options and access to treatments, but no aspect of it was pleasant. Last year, I had the joy of seeing our youngest son graduate from college. In the years since I first learned of my cancer, there’s been remarkable progress in global health care, augmented with pioneering work from medical researchers and technology companies. I know how incredibly fortunate I am, but I also know that for far too many, a diagnosis comes too late and the best care is beyond reach. 

And that’s where Google has focused its work: to bring healthcare innovations to everyone.Working at Google, I have had a front-row seat to these technological breakthroughs. 

During the past few years, teams at Google have applied artificial intelligence (AI) to problems in healthcare—from predicting patient outcomes in medical records to helping detect diseases like lung cancer. We’re still early on in developing these technologies, but the results are promising. 

When it comes to breast cancer, Google is looking at how AI can help specialists improve detection and diagnosis. Breast cancer is one of the most common cancers among women worldwide, taking the lives of more than 600,000 people each year. Thankfully, that number is on the decline because of huge advances in care. However, that number could be even lower if we continue to accelerate progress and make sure that progress reaches as many people as possible. Google hopes AI research will further fuel progress on both detection and diagnosis. 

Early detection depends on patients and technologies, such as mammography. Currently, we rely on mammograms to screen for cancer in otherwise healthy women, but thousands of cases go undiagnosed each year and thousands more result in  confusing or worrying findings that are not cancer or are low risk. Today we can’t easily distinguish the cancers we need to find from those that are unlikely to cause further harm. We believe that technology can help with detection, and thus improve the experience for both patients and doctors.  

Just as important as detecting cancer is determining how advanced and aggressive the cancer is. A process called staging helps determine how far the cancer has spread, which impacts the course of treatment. Staging largely depends on clinicians and radiologists looking at patient histories, physical examinations and images. In addition, pathologists examine tissue samples obtained from a biopsy to assess the microscopic appearance and biological properties of each individual patient’s cancer and judge aggressiveness. However, pathologic assessment is a laborious and costly process that--incredibly--continues to rely on an individual evaluating microscopic features in biological tissue with the human eye and microscope!

Last year, Google created a deep learning algorithm that could help pathologists assess tissue and detect the spread and extent of disease better in virtually every case. By pinpointing the location of the cancer more accurately, quickly and at a lower cost, care providers might be able to deliver better treatment for more patients. But doing this will require that these insights be paired with human intelligence and placed in the hands of skilled researchers, surgeons, oncologists, radiologists and others. Google’s research showed that the best results come when medical professionals and technology work together, rather than either working alone. 

During my treatment, I was taken care of by extraordinary teams at Memorial Sloan Kettering in New York where they had access to the latest developments in breast cancer care. My oncologist (and now good friend), Dr. Clifford Hudis, is now CEO of the American Society of Clinical Oncology (ASCO), which has developed a nonprofit big data initiative, CancerLinQ, to give oncologists and researchers access to health information to inform better care for everyone. He told me: “CancerLinQ seeks to identify hidden signals in the routine record of care from millions of de-identified patients so that doctors have deeper and faster insights into their own practices and opportunities for improvement.” He and his colleagues don't think they’ll be able to deliver optimally without robust AI. 

What medical professionals, like Dr. Hudis and his colleagues across ASCO and CancerLinQ, and engineers at companies like Google have accomplished since the time I joined the Club in 2001 is remarkable. 

I will always remember words passed on to me by another cancer survivor, which helped me throughout my treatment. He said when you’re having a good day and you’ve temporarily pushed the disease out of your mind, a little bird might land on your shoulder to remind you that you have cancer. Eventually, that bird comes around less and less. It took many years but I am relieved to say that I haven’t seen that bird in a long time, and I am incredibly grateful for that. I am optimistic that the combination of great doctors and technology could allow us to get rid of those birds for so many more people.