Tag Archives: Health

An ML-Based Framework for COVID-19 Epidemiology

Over the past 20 months, the COVID-19 pandemic has had a profound impact on daily life, presented logistical challenges for businesses planning for supply and demand, and created difficulties for governments and organizations working to support communities with timely public health responses. While there have been well-studied epidemiology models that can help predict COVID-19 cases and deaths to help with these challenges, this pandemic has generated an unprecedented amount of real-time publicly-available data, which makes it possible to use more advanced machine learning techniques in order to improve results.

In "A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan", accepted to npj Digital Medicine, we continued our previous work [1, 2, 3, 4] and proposed a framework designed to simulate the effect of certain policy changes on COVID-19 deaths and cases, such as school closings or a state-of-emergency at a US-state, US-county, and Japan-prefecture level, using only publicly-available data. We conducted a 2-month prospective assessment of our public forecasts, during which our US model tied or outperformed all other 33 models on COVID19 Forecast Hub. We also released a fairness analysis of the performance on protected sub-groups in the US and Japan. Like other Google initiatives to help with COVID-19 [1, 2, 3], we are releasing daily forecasts based on this work to the public for free, on the web [us, ja] and through BigQuery.

Prospective forecasts for the USA and Japan models. Ground truth cumulative deaths counts (green lines) are shown alongside the forecasts for each day. Each daily forecast contains a predicted increase in deaths for each day during the prediction window of 4 weeks (shown as colored dots, where shading shifting to yellow indicates days further from the date of prediction in the forecasting horizon, up to 4 weeks). Predictions of deaths are shown for the USA (above) and Japan (below).

The Model
Models for infectious diseases have been studied by epidemiologists for decades. Compartmental models are the most common, as they are simple, interpretable, and can fit different disease phases effectively. In compartmental models, individuals are separated into mutually exclusive groups, or compartments, based on their disease status (such as susceptible, exposed, or recovered), and the rates of change between these compartments are modeled to fit the past data. A population is assigned to compartments representing disease states, with people flowing between states as their disease status changes.

In this work, we propose a few extensions to the Susceptible-Exposed-Infectious-Removed (SEIR) type compartmental model. For example, susceptible people becoming exposed causes the susceptible compartment to decrease and the exposed compartment to increase, with a rate that depends on disease spreading characteristics. Observed data for COVID-19 associated outcomes, such as confirmed cases, hospitalizations and deaths, are used for training of compartmental models.

Visual explanation of "compartmental” models in epidemiology. People "flow" between compartments. Real-world events, like policy changes and more ICU beds, change the rate of flow between compartments.

Our framework proposes a number of novel technical innovations:

  1. Learned transition rates: Instead of using static rates for transitions between compartments across all locations and times, we use machine-learned rates to map them. This allows us to take advantage of the vast amount of available data with informative signals, such as Google's COVID-19 Community Mobility Reports, healthcare supply, demographics, and econometrics features.
  2. Explainability: Our framework provides explainability for decision makers, offering insights on disease propagation trends via its compartmental structure, and suggesting which factors may be most important for driving compartmental transitions.
  3. Expanded compartments: We add hospitalization, ICU, ventilator, and vaccine compartments and demonstrate efficient training despite data sparsity.
  4. Information sharing across locations: As opposed to fitting to an individual location, we have a single model for all locations in a country (e.g., >3000 US counties) with distinct dynamics and characteristics, and we show the benefit of transferring information across locations.
  5. Seq2seq modeling: We use a sequence-to-sequence model with a novel partial teacher forcing approach that minimizes amplified growth of errors into the future.

Forecast Accuracy
Each day, we train models to predict COVID-19 associated outcomes (primarily deaths and cases) 28 days into the future. We report the mean absolute percentage error (MAPE) for both a country-wide score and a location-level score, with both cumulative values and weekly incremental values for COVID-19 associated outcomes.

We compare our framework with alternatives for the US from the COVID19 Forecast Hub. In MAPE, our models outperform all other 33 models except one — the ensemble forecast that also includes our model’s predictions, where the difference is not statistically significant.

We also used prediction uncertainty to estimate whether a forecast is likely to be accurate. If we reject forecasts that the model considers uncertain, we can improve the accuracy of the forecasts that we do release. This is possible because our model has well-calibrated uncertainty.

Mean average percentage error (MAPE, the lower the better) decreases as we remove uncertain forecasts, increasing accuracy.

What-If Tool to Simulate Pandemic Management Policies and Strategies
In addition to understanding the most probable scenario given past data, decision makers are interested in how different decisions could affect future outcomes, for example, understanding the impact of school closures, mobility restrictions and different vaccination strategies. Our framework allows counterfactual analysis by replacing the forecasted values for selected variables with their counterfactual counterparts. The results of our simulations reinforce the risk of prematurely relaxing non-pharmaceutical interventions (NPIs) until the rapid disease spreading is reduced. Similarly, the Japan simulations show that maintaining the State of Emergency while having a high vaccination rate greatly reduces infection rates.

What-if simulations on the percent change of predicted exposed individuals assuming different non-pharmaceutical interventions (NPIs) for the prediction date of March 1, 2021 in Texas, Washington and South Carolina. Increased NPI restrictions are associated with a larger % reduction in the number of exposed people.
What-if simulations on the percent change of predicted exposed individuals assuming different vaccination rates for the prediction date of March 1, 2021 in Texas, Washington and South Carolina. Increased vaccination rate also plays a key role to reduce exposed count in these cases.

Fairness Analysis
To ensure that our models do not create or reinforce unfairly biased decision making, in alignment with our AI Principles, we performed a fairness analysis separately for forecasts in the US and Japan by quantifying whether the model's accuracy was worse on protected sub-groups. These categories include age, gender, income, and ethnicity in the US, and age, gender, income, and country of origin in Japan. In all cases, we demonstrated no consistent pattern of errors among these groups once we controlled for the number of COVID-19 deaths and cases that occur in each subgroup.

Normalized errors by median income. The comparison between the two shows that patterns of errors don't persist once errors are normalized by cases. Left: Normalized errors by median income for the US. Right: Normalized errors by median income for Japan.

Real-World Use Cases
In addition to quantitative analyses to measure the performance of our models, we conducted a structured survey in the US and Japan to understand how organisations were using our model forecasts. In total, seven organisations responded with the following results on the applicability of the model.

  • Organization type: Academia (3), Government (2), Private industry (2)
  • Main user job role: Analyst/Scientist (3), Healthcare professional (1), Statistician (2), Managerial (1)
  • Location: USA (4), Japan (3)
  • Predictions used: Confirmed cases (7), Death (4), Hospitalizations (4), ICU (3), Ventilator (2), Infected (2)
  • Model use case: Resource allocation (2), Business planning (2), scenario planning (1), General understanding of COVID spread (1), Confirm existing forecasts (1)
  • Frequency of use: Daily (1), Weekly (1), Monthly (1)
  • Was the model helpful?: Yes (7)

To share a few examples, in the US, the Harvard Global Health Institute and Brown School of Public Health used the forecasts to help create COVID-19 testing targets that were used by the media to help inform the public. The US Department of Defense used the forecasts to help determine where to allocate resources, and to help take specific events into account. In Japan, the model was used to make business decisions. One large, multi-prefecture company with stores in more than 20 prefectures used the forecasts to better plan their sales forecasting, and to adjust store hours.

Limitations and next steps
Our approach has a few limitations. First, it is limited by available data, and we are only able to release daily forecasts as long as there is reliable, high-quality public data. For instance, public transportation usage could be very useful but that information is not publicly available. Second, there are limitations due to the model capacity of compartmental models as they cannot model very complex dynamics of Covid-19 disease propagation. Third, the distribution of case counts and deaths are very different between the US and Japan. For example, most of Japan's COVID-19 cases and deaths have been concentrated in a few of its 47 prefectures, with the others experiencing low values. This means that our per-prefecture models, which are trained to perform well across all Japanese prefectures, often have to strike a delicate balance between avoiding overfitting to noise while getting supervision from these relatively COVID-19-free prefectures.

We have updated our models to take into account large changes in disease dynamics, such as the increasing number of vaccinations. We are also expanding to new engagements with city governments, hospitals, and private organizations. We hope that our public releases continue to help public and policy-makers address the challenges of the ongoing pandemic, and we hope that our method will be useful to epidemiologists and public health officials in this and future health crises.

Acknowledgements
This paper was the result of hard work from a variety of teams within Google and collaborators around the globe. We'd especially like to thank our paper co-authors from the School of Medicine at Keio University, Graduate School of Public Health at St Luke’s International University, and Graduate School of Medicine at The University of Tokyo.

Source: Google AI Blog


Self-Supervised Learning Advances Medical Image Classification

In recent years, there has been increasing interest in applying deep learning to medical imaging tasks, with exciting progress in various applications like radiology, pathology and dermatology. Despite the interest, it remains challenging to develop medical imaging models, because high-quality labeled data is often scarce due to the time-consuming effort needed to annotate medical images. Given this, transfer learning is a popular paradigm for building medical imaging models. With this approach, a model is first pre-trained using supervised learning on a large labeled dataset (like ImageNet) and then the learned generic representation is fine-tuned on in-domain medical data.

Other more recent approaches that have proven successful in natural image recognition tasks, especially when labeled examples are scarce, use self-supervised contrastive pre-training, followed by supervised fine-tuning (e.g., SimCLR and MoCo). In pre-training with contrastive learning, generic representations are learned by simultaneously maximizing agreement between differently transformed views of the same image and minimizing agreement between transformed views of different images. Despite their successes, these contrastive learning methods have received limited attention in medical image analysis and their efficacy is yet to be explored.

In “Big Self-Supervised Models Advance Medical Image Classification”, to appear at the International Conference on Computer Vision (ICCV 2021), we study the effectiveness of self-supervised contrastive learning as a pre-training strategy within the domain of medical image classification. We also propose Multi-Instance Contrastive Learning (MICLe), a novel approach that generalizes contrastive learning to leverage special characteristics of medical image datasets. We conduct experiments on two distinct medical image classification tasks: dermatology condition classification from digital camera images (27 categories) and multilabel chest X-ray classification (5 categories). We observe that self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images, significantly improves the accuracy of medical image classifiers. Specifically, we demonstrate that self-supervised pre-training outperforms supervised pre-training, even when the full ImageNet dataset (14M images and 21.8K classes) is used for supervised pre-training.

SimCLR and Multi Instance Contrastive Learning (MICLe)
Our approach consists of three steps: (1) self-supervised pre-training on unlabeled natural images (using SimCLR); (2) further self-supervised pre-training using unlabeled medical data (using either SimCLR or MICLe); followed by (3) task-specific supervised fine-tuning using labeled medical data.

Our approach comprises three steps: (1) Self-supervised pre-training on unlabeled ImageNet using SimCLR (2) Additional self-supervised pre-training using unlabeled medical images. If multiple images of each medical condition are available, a novel Multi-Instance Contrastive Learning (MICLe) strategy is used to construct more informative positive pairs based on different images. (3) Supervised fine-tuning on labeled medical images. Note that unlike step (1), steps (2) and (3) are task and dataset specific.

After the initial pre-training with SimCLR on unlabeled natural images is complete, we train the model to capture the special characteristics of medical image datasets. This, too, can be done with SimCLR, but this method constructs positive pairs only through augmentation and does not readily leverage patients' meta data for positive pair construction. Alternatively, we use MICLe, which uses multiple images of the underlying pathology for each patient case, when available, to construct more informative positive pairs for self-supervised learning. Such multi-instance data is often available in medical imaging datasets — e.g., frontal and lateral views of mammograms, retinal fundus images from each eye, etc.

Given multiple images of a given patient case, MICLe constructs a positive pair for self-supervised contrastive learning by drawing two crops from two distinct images from the same patient case. Such images may be taken from different viewing angles and show different body parts with the same underlying pathology. This presents a great opportunity for self-supervised learning algorithms to learn representations that are robust to changes of viewpoint, imaging conditions, and other confounding factors in a direct way. MICLe does not require class label information and only relies on different images of an underlying pathology, the type of which may be unknown.

MICLe generalizes contrastive learning to leverage special characteristics of medical image datasets (patient metadata) to create realistic augmentations, yielding further performance boost of image classifiers.

Combining these self-supervised learning strategies, we show that even in a highly competitive production setting we can achieve a sizable gain of 6.7% in top-1 accuracy on dermatology skin condition classification and an improvement of 1.1% in mean AUC on chest X-ray classification, outperforming strong supervised baselines pre-trained on ImageNet (the prevailing protocol for training medical image analysis models). In addition, we show that self-supervised models are robust to distribution shift and can learn efficiently with only a small number of labeled medical images.

Comparison of Supervised and Self-Supervised Pre-training
Despite its simplicity, we observe that pre-training with MICLe consistently improves the performance of dermatology classification over the original method of pre-training with SimCLR under different pre-training dataset and base network architecture choices. Using MICLe for pre-training, translates to (1.18 ± 0.09)% increase in top-1 accuracy for dermatology classification over using SimCLR. The results demonstrate the benefit accrued from utilizing additional metadata or domain knowledge to construct more semantically meaningful augmentations for contrastive pre-training. In addition, our results suggest that wider and deeper models yield greater performance gains, with ResNet-152 (2x width) models often outperforming ResNet-50 (1x width) models or smaller counterparts.

Comparison of supervised and self-supervised pre-training, followed by supervised fine-tuning using two architectures on dermatology and chest X-ray classification. Self-supervised learning utilizes unlabeled domain-specific medical images and significantly outperforms supervised ImageNet pre-training.

Improved Generalization with Self-Supervised Models
For each task we perform pretraining and fine-tuning using the in-domain unlabeled and labeled data respectively. We also use another dataset obtained in a different clinical setting as a shifted dataset to further evaluate the robustness of our method to out-of-domain data. For the chest X-ray task, we note that self-supervised pre-training with either ImageNet or CheXpert data improves generalization, but stacking them both yields further gains. As expected, we also note that when only using ImageNet for self-supervised pre-training, the model performs worse compared to using only in-domain data for pre-training.

To test the performance under distribution shift, for each task, we held out additional labeled datasets for testing that were collected under different clinical settings. We find that the performance improvement in the distribution-shifted dataset (ChestX-ray14) by using self-supervised pre-training (both using ImageNet and CheXpert data) is more pronounced than the original improvement on the CheXpert dataset. This is a valuable finding, as generalization under distribution shift is of paramount importance to clinical applications. On the dermatology task, we observe similar trends for a separate shifted dataset that was collected in skin cancer clinics and had a higher prevalence of malignant conditions. This demonstrates that the robustness of the self-supervised representations to distribution shifts is consistent across tasks.

Evaluation of models on distribution-shifted datasets for the chest-xray interpretation task. We use the model trained on in-domain data to make predictions on an additional shifted dataset without any further fine-tuning (zero-shot transfer learning). We observe that self-supervised pre-training leads to better representations that are more robust to distribution shifts.
Evaluation of models on distribution-shifted datasets for the dermatology task. Our results generally suggest that self-supervised pre-trained models can generalize better to distribution shifts with MICLe pre-training leading to the most gains.

Improved Label Efficiency
We further investigate the label-efficiency of the self-supervised models for medical image classification by fine-tuning the models on different fractions of labeled training data. We use label fractions ranging from 10% to 90% for both Derm and CheXpert training datasets and examine how the performance varies using the different available label fractions for the dermatology task. First, we observe that pre-training using self-supervised models can compensate for low label efficiency for medical image classification, and across the sampled label fractions, self-supervised models consistently outperform the supervised baseline. These results also suggest that MICLe yields proportionally higher gains when fine-tuning with fewer labeled examples. In fact, MICLe is able to match baselines using only 20% of the training data for ResNet-50 (4x) and 30% of the training data for ResNet152 (2x).

Top-1 accuracy for dermatology condition classification for MICLe, SimCLR, and supervised models under different unlabeled pre-training datasets and varied sizes of label fractions. MICLe is able to match baselines using only 20% of the training data for ResNet-50 (4x).

Conclusion
Supervised pre-training on natural image datasets is commonly used to improve medical image classification. We investigate an alternative strategy based on self-supervised pre-training on unlabeled natural and medical images and find that it can significantly improve upon supervised pre-training, the standard paradigm for training medical image analysis models. This approach can lead to models that are more accurate and label efficient and are robust to distribution shifts. In addition, our proposed Multi-Instance Contrastive Learning method (MICLe) enables the use of additional metadata to create realistic augmentations, yielding further performance boost of image classifiers.

Self-supervised pre-training is much more scalable than supervised pre-training because class label annotation is not required. We hope this paper will help popularize the use of self-supervised approaches in medical image analysis yielding label efficient and robust models suited for clinical deployment at scale in the real world.

Acknowledgements
This work involved collaborative efforts from a multidisciplinary team of researchers, software engineers, clinicians, and cross-functional contributors across Google Health and Google Brain. We thank our co-authors: Basil Mustafa, Fiona Ryan, Zach Beaver, Jan Freyberg, Jon Deaton, Aaron Loh, Alan Karthikesalingam, Simon Kornblith, Ting Chen, Vivek Natarajan, and Mohammad Norouzi. We also thank Yuan Liu from Google Health for valuable feedback and our partners for access to the datasets used in the research.

Source: Google AI Blog


These researchers are driving health equity with Fitbit

Under-resourced communities across the country have long faced disparities in health due to structural and long-standing inequities. Unfortunately, the pandemic has further widened many of these gaps.

Still, health equity research in digital health remains limited. To help address these issues, we announced the Fitbit Health Equity Research Initiative earlier this year to help support underrepresented researchers who are early in their careers and working to address health disparities in communities.

Over the past decade, researchers have used Fitbit devicesin over 900 health studies, in areas like diabetes, heart disease, oncology, mental health, infectious disease and more. Today, we’re awarding six researchers more than a total of $300,000 in Fitbit devices and services to support their research projects. Additionally, Fitbit’s long-time partner, Fitabase, will provide all projects with access to their data management platform to help researchers maximize study participation and analysis.

Learn more about the awardees and their research:

A photo of Sherilyn Francis of Georgia Tech

Improving postpartum care for rural black women

Black women in the U.S. are two to three times more likely to die from pregnancy or childbirth when compared to their white counterparts. And in Georgia, the disparities are more pronounced among rural populations. “As Black women who reside in Georgia, we’re more likely to die simply by becoming pregnant,” shares Sherilyn Francis, a PhD student in Georgia Tech’s Human-Centered Computing program. Her research aims to improve postpartum care for rural Black mothers through a culturally informed mobile health intervention. As part of the study, participants will receive a Fitbit Sense smartwatch and Fitbit Aria Air scale. By combining insight into physical activity, heart rate, sleep, weight and nutritional data with health outcomes, Sherilyn and her colleagues hope to shed light on ways to reduce the risk of severe maternal morbidity for Black mothers.

A photo of Jessee Dietch of Oregon State University

A look at sleep health in transgender youth

Transgender youth (ages 14-19) are at elevated risk for poor sleep health and associated physical and mental health outcomes. However, there’s no research to date that examines how medical transition and the use of gender-affirming hormone therapy impact sleep health. Jessee Dietch, PhD, who is an assistant professor of psychology at Oregon State University, will analyze participants’ sleep using a Fitbit Charge 5. The hope is that the findings will highlight potential points for sleep health intervention that could lead to improved wellbeing for a community that is already at an elevated risk for poor health outcomes.

A photo of  Rony F. Santiago of Sansum Diabetes Research Institute

Preventing the progression of type 2 diabetes in Latino adults

The causes and complications of type 2 diabetes (T2D) disproportionately impact Latinos. Motivated by personal experiences, Rony F. Santiago, MA, is an early-career researcher at Sansum Diabetes Research Institute and manages T2D programs that support the Santa Barbara community. Rony and his team, in collaboration with researchers at Texas A&M University, aim to recruit healthy Latino participants and those with pre-diabetes or T2D who will each receive a continuous glucose monitor and a Fitbit Sense smartwatch. They hope to analyze physical activity, nutrition tracking and sleep patterns to better understand the impact these behaviors can have on blood sugar and the potential to improve health outcomes, including the progression from pre-diabetes to T2D.

A photo of Toluwalase Ajayi of Scripps Research

Investigating how systemic racism impacts maternal and fetal health

Black and Hispanic pregnant people experience higher rates of pregnancy-related mortality in comparison to their non-Hispanic white counterparts. And Black infants are twice as likely to die within their first year of life in comparison to white infants. Toluwalase Ajayi, MD, pediatrician, palliative care physician and clinical researcher at Scripps Research is the principal researcher for this study, PowerMom FIRST, which is part of her larger research study PowerMom. PowerMom FIRST aims to answer questions about how systemic racism and discrimination may have a negative impact on maternal and fetal health in these vulnerable populations. In this study, 500 Black and Hispanic mothers will receive a Fitbit Luxe tracker and Aria Air scale. Researchers will assess participant survey data for health inequities, disproportionate health outcomes, disparities in quality of care, and other factors that may influence maternal health alongside biometric data from Fitbit devices. Data, like sleep and heart rate, will help researchers better understand the impact that systemic racism experienced by Black and Hispanic pregnant people may have on their health.

A photo of Susan Ramsundarsingh of SKY Schools

Building healthy habits in adolescents facing health disparities

Experiences of trauma, such as the COVID-19 pandemic and social inequity, are linked to poor health habits among marginalized student populations. Although there is a known relationship between unhealthy habits such as physical inactivity, poor nutrition, and socioeconomic status, there is little clarity on effective interventions. Susan Ramsundarsingh, PhD is the National Director of Research at SKY Schools, which develops evidence-based programs aimed at increasing the wellbeing and academic performance of under-resourced students. In this study, researchers will pair Fitbit Inspire 2 devices with the SKY School program, which teaches children social-emotional skills and resilience to improve health and wellbeing through tools like breathing techniques. Six hundred adolescent students will be assigned to three groups to measure the impact of the interventions on heart rate, sleep and physical activity during the 2021-22 school year.

A photo of Victoria Bandera of UCHealth

Reducing cardiovascular disease risk factors in Hispanic families in Colorado

Hispanics have a disproportionately higher prevalence of cardiovascular disease risk factors relative to non-Hispanic whites, as well as higher rates of modifiable risk factors such as diabetes and hypertension. Victoria Bandera, M.S. is an exercise physiologist and early career researcher at UCHealth Healthy Hearts in Loveland, Colo., whose research aims to combat health inequities that impact the Hispanic community. Participants enrolled in the Healthy Hearts Family Program will receive a Fitbit Charge 5 and take part in a 6-month program that includes an educational series on cardiovascular disease risks, healthy behaviors and health screenings. Researchers will encourage participants, ages 13 and older, to use their new Fitbit device to monitor and modify their health behaviors, such as eating habits and physical activity. They will then analyze changes in physical activity levels, body composition and biometric variables to assess the impact of the Healthy Hearts Family Program.

For the past 14 years at Fitbit, our mission has been to help everyone around the world live active, healthier lives, and along with Google, we’re committed to using tech to improve health equity. We hope the Fitbit Health Equity Research Initiative will continue to encourage wearable research and generate new evidence and methods for addressing health disparities.

The promise of using AI to help prostate cancer care

In 2021, nearly 250,000 Americans will be diagnosed with prostate cancer, which remains the second most common cancer among men in the U.S. Even as we make advancements in cancer research and treatment, diagnosing and treating prostate cancer remains difficult. This National Prostate Cancer Awareness Month, we’re sharing how Google researchers are looking at ways artificial intelligence (AI) can improve prostate cancer care and the lessons learned along the way.  

Our AI research to date 

Currently, pathologists rely on a process called the ‘Gleason grading system’ to grade prostate cancer and inform the selection of an effective treatment option. This process involves examining tumor samples under a microscope for tissue growth patterns that indicate the aggressiveness of the cancer. Over the past few years, research teams at Google have developed AI systems that can help pathologists grade prostate cancer with more objectivity and ease. 

These AI systems can help identify the aggressiveness of prostate cancer for tumors at different steps of the clinical timeline — from smaller biopsy samples during initial diagnosis to larger samples from prostate removal surgery. In prior studies published in JAMA Oncology and Nature Partner Journal Digital Medicine, we found our AI system for Gleason grading prostate cancer samples performed at a higher rate of agreement with subspecialists (pathologists who have specialized training in prostate cancer) as compared to general pathologists. These results suggest that AI systems have the potential to support high-quality prostate cancer diagnosis for more patients. 

To understand this system's potential impact within a clinical workflow, we also studied how general pathologists could use our AI system during their assessments. In arandomized study involving 20 pathologists reviewing 240 retrospective prostate biopsies, we found that the use of an AI system as an assistive tool was associated with an increase in grading agreement between general pathologists and subspecialists. This indicated that AI tools may help general pathologists grade prostate biopsies with greater accuracy. The AI system also improved both pathologists’ efficiency and their self-reported diagnostic confidence. 

In our latest study in Nature Communications Medicine, we directly examined whether the AI’s grading was able to identify high-risk patients by comparing the system’s grading against mortality outcomes. This is important because mortality outcomes are one of the most clinically relevant results for evaluating the value of Gleason grading, ensuring greater confidence in the AI’s grading. We found that the AI’s grades were more strongly associated with patient outcomes than the grades from general pathologists, suggesting that the AI could potentially help inform decision-making on treatment plans. 


Contributing to reducing variability in AI research 

We first began training our AI system using Gleason grades from both general pathologists and subspecialists. As we continued to develop AI systems for assisting prostate cancer grading, we learned that both training the AI and evaluating the model’s performance can be challenging because often the “ground truth” or reference standard is based on expert opinion. Because of this subjectivity, for some cases, two pathologists examining the same sample may arrive at a different Gleason grade.

To improve the quality of the “ground truth”, we developed a set of best practices that we have shared this week in Lancet Digital Health. These recommendations include involving experienced prostate pathology experts, making sure that multiple experts look at each sample, and designing an unbiased disagreement resolution process. By sharing these learnings, we hope to encourage and accelerate further work in this area, particularly in earlier-phase research when it’s impractical to train or validate a model using patient outcomes data.

Our research has shown that AI can be most helpful when it's built to support clinicians with the right problem, in the right way, at the right time. With that in mind, we plan to further validate the role of AI and other novel technologies in helping improve prostate cancer diagnosis, treatment planning and patient outcomes. 

Detecting Abnormal Chest X-rays using Deep Learning

The adoption of machine learning (ML) for medical imaging applications presents an exciting opportunity to improve the availability, latency, accuracy, and consistency of chest X-ray (CXR) image interpretation. Indeed, a plethora of algorithms have already been developed to detect specific conditions, such as lung cancer, tuberculosis and pneumothorax. By virtue of being trained to detect a specific disease, however, the utility of these algorithms may be limited in a general clinical setting, where a wide variety of abnormalities could surface. For example, a pneumothorax detector is not expected to highlight nodules suggestive of cancer, and a tuberculosis detector may not identify findings specific to pneumonia. Since an initial triaging step is to determine whether a CXR contains concerning abnormalities, a general-purpose algorithm that identifies X-rays containing any sort of abnormality could significantly facilitate the workflow. However, developing a classifier to do this is challenging due to the ​​wide variety of abnormal findings that present on CXRs.

In “Deep Learning for Distinguishing Normal versus Abnormal Chest Radiographs and Generalization to Two Unseen Diseases Tuberculosis and COVID-19”, published in Scientific Reports, we present a model that can distinguish between normal and abnormal CXRs across multiple de-identified datasets and settings. We find that the model performs well on general abnormalities, as well as unseen examples of tuberculosis and COVID-19. We are also releasing our set of radiologists’ labels1 for the test set used in this study for the publicly available ChestX-ray14 dataset.

A Deep Learning System for Detecting Abnormal Chest X-rays
The deep learning system we used is based on the EfficientNet-B7 architecture, pre-trained on ImageNet. We trained the model using over 200,000 de-identified CXRs from the Apollo Hospitals in India. Each CXR was assigned a label of either “normal” or “abnormal” using a regular expression–based natural language processing approach on the associated radiology reports.

To evaluate how well the system generalizes to new patient populations, we compared its performance on two datasets consisting of a wide spectrum of abnormalities: the test split from the Apollo Hospitals dataset (DS-1), and the publicly available ChestX-ray14 (CXR-14). The labels for these two test sets were annotated for the purposes of this project by a group of US board-certified radiologists. The system achieved areas under the receiver operating characteristic curve (AUROC) of 0.87 on DS-1 and 0.94 on CXR-14 (higher is better).

Though the evaluations on DS-1 and CXR-14 contained a wide range of abnormalities, a possible use-case would be to utilize such an abnormality detector in novel or unforeseen settings with diseases that it had not encountered before. To evaluate the generalizability of the system to new patient populations and in the presence of diseases not seen in the training set, we used four de-identified datasets from three countries, including two publicly available tuberculosis datasets and two COVID-19 datasets from Northwestern Medicine. The system achieved AUCs of 0.95-0.97 in detecting tuberculosis, and 0.65-0.68 in detecting COVID-19. Because CXRs that are negative for these diseases could still contain other concerning abnormalities, we further evaluated the system for its ability to detect abnormalities more broadly (instead of disease positive vs. negative), finding AUCs of 0.91-0.93 for the tuberculosis dataset, and AUCs of 0.86 for the COVID-19 dataset.

The purpose of multiple evaluations (abnormality detection and disease detection) is the distinction between the two: a given disease can present with a certain abnormality or not; and a certain abnormality can arise from multiple diseases. Our study evaluates for both.

The large drop in performance for COVID-19 is because many cases flagged by the system as “positive” for abnormalities were negative for COVID-19, but nevertheless contained abnormal CXR findings that needed attention. This further highlights the usefulness of abnormality detectors even if disease-specific models are available.

In addition, it’s important to note that there is a difference between generalization to unseen diseases (i.e., tuberculosis and COVID-19) versus generalization to unseen CXR findings (e.g., pleural effusion, consolidation/infiltrate). In this study, we demonstrated the generalizability of the system to unseen diseases but not necessarily unseen CXR findings.

Sample chest X-rays of true and false positives, and true and false negatives for (A) general abnormalities, (B) tuberculosis, and (C) COVID-19. On each CXR, we outline in red the areas on which the model focused to identify abnormalities (i.e., the class activation map), and outline the regions of interest indicated by a radiologist in yellow.

Potential Benefits in the Clinic
To understand the potential utility of the deep learning model in improving clinical workflow, we simulated its use for case prioritization, where abnormal cases are “expedited” ahead of normal cases. In these simulations, the system reduced the turnaround time for abnormal cases by up to 28%. This reprioritization setup could be used to divert complex abnormal cases to cardiothoracic specialist radiologists, enable rapid triage of cases that may need urgent decisions, and provide the opportunity to batch negative CXRs for streamlined review.

Impact of a simulated deep learning model–based prioritization in comparison with random review order for (A) general abnormalities, (B) tuberculosis, and (C) COVID-19. The red bars indicate sequences of abnormal CXRs in red and normal CXRs in pink; a greater density of red towards the left indicates abnormal CXRs are reviewed sooner than normal ones. The histograms indicate the average improvement in turnaround time.

Additionally, we found that the system can be used as a pre-trained model to improve other ML algorithms for chest X-rays, especially when data is limited. For example, we used the normal/abnormal classifier in our recent study to detect pulmonary tuberculosis from chest X-rays. Abnormality and tuberculosis detectors can play a critical role in supporting early diagnosis in regions that lack access to resources like trained radiologists or molecular testing.

Sharing Improved Reference Standard Labels
Much work remains to be done to realize the potential of ML to aid chest X-ray interpretation around the world. In particular, obtaining high-quality labels on de-identified data can be a significant barrier to developing and evaluating ML algorithms in healthcare. To accelerate these efforts, we are expanding upon our previous label release by releasing the labels used in this study for the publicly available ChestX-ray14 dataset. We look forward to future machine learning projects by the community in this space.

AcknowledgementsKey contributors to this project at Google include Zaid Nabulsi, Andrew Sellergren‎, Shahar Jamshy, Charles Lau, Eddie Santos, Atilla P. Kiraly, Wenxing Ye, Jie Yang, Rory Pilgrim, Sahar Kazemzadeh, Jin Yu, Greg S. Corrado, Lily Peng, Krish Eswaran, Daniel Tse, Neeral Beladia, Yun Liu, Po-Hsuan Cameron Chen, Shravya Shetty. Significant contributions and input were also made by radiologist collaborators Sreenivasa Raju Kalidindi, Mozziyar Etemadi, Florencia Garcia Vicente, David Melnick. For the CXR-14 dataset, we thank the NIH Clinical Center for making it publicly available. For tuberculosis data collection, thanks go to Sameer Antani, Stefan Jaeger, Sema Candemir, Zhiyun Xue, Alex Karargyris, George R. Thomas, Pu-Xuan Lu, Yi-Xiang Wang, Michael Bonifant, Ellan Kim, Sonia Qasba, and Jonathan Musco. The authors would also like to acknowledge many members of the Google Health Radiology and labeling software teams, in particular Shruthi Prabhakara, Scott McKinney, and Akib Uddin. Sincere appreciation also goes to the radiologists who enabled this work with their image interpretation and annotation efforts throughout the study; Jonny Wong for coordinating the imaging annotation work; Gavin Bee, Mikhail Fomitchev, Shabir Adeel, Jeff Bertram, and Benedict Noero for data releasing; David F. Steiner, Kunal Nagpal, and Michael D. Howell for providing feedback on the manuscript; Craig Mermel, Lauren Winer, Johnny Luu, Adrienne Welch, Annisah Um'rani, and Ashley Zlatinov for feedback on the blogpost.


1Labels include atelectasis, cardiomegaly, effusion, infiltration, mass, nodule, pneumonia, pneumothorax, consolidation, edema, emphysema, fibrosis, pleural thickening, hernia, other abnormality, and normal vs abnormal. 

Source: Google AI Blog


Recreating Natural Voices for People with Speech Impairments

On June 2nd, 2021, Major League Baseball in the United States celebrated Lou Gehrig Day, commemorating both the day in 1925 that Lou Gehrig became the Yankees’ starting first baseman, and the day in 1941 that he passed away from amyotrophic lateral sclerosis (ALS, also known as Lou Gehrig’s disease) at the age of 37. ALS is a progressive neurodegenerative disease that affects motor neurons, which connect the brain with the muscles throughout the body, and govern muscle control and voluntary movements. When voluntary muscle control is affected, people may lose their ability to speak, eat, move and breathe.

In honor of Lou Gehrig, former NFL player and ALS advocate Steve Gleason, who lost his ability to speak due to ALS, recited Gehrig’s famous “Luckiest Man” speech at the June 2nd event using a recreation of his voice generated by a machine learning (ML) model. Gleason’s voice recreation was developed in collaboration with Google’s Project Euphonia, which aims to empower people who have impaired speaking ability due to ALS to better communicate using their own voices.

Steve Gleason, who lost his voice to ALS, worked with Google’s Project Euphonia to generate a speech in his own voice in honor of Lou Gehrig. A portion of Gleason’s speech was broadcast in ballparks across the country during the 4th inning on June 2nd, 2021.

Today we describe PnG NAT, the model adopted by Project Euphonia to recreate Steve Gleason’s voice. PnG NAT is a new text-to-speech synthesis (TTS) model that merges two state-of-the-art technologies, PnG BERT and Non-Attentive Tacotron (NAT), into a single model. It demonstrates significantly better quality and fluency than previous technologies, and represents a promising approach that can be extended to a wider array of users.

Recreating a Voice
Non-Attentive Tacotron (NAT) is the successor to Tacotron 2, a sequence-to-sequence neural TTS model proposed in 2017. Tacotron 2 used an attention module to connect the input text sequence and the output speech spectrogram frame sequence, so that the model knows which part of the text to pay attention to when generating each time step of the synthesized speech spectrogram. Tacotron 2 was the first TTS model that was able to synthesize speech that sounds as natural as a person speaking. However, with extensive experimentation we discovered that there is a small probability that the model can suffer from robustness issues — such as babbling, repeating, or skipping part of the text — due to the inherent flexibility of the attention mechanism.

NAT improves upon Tacotron 2 by replacing the attention module with a duration-based upsampler, which predicts a duration for each input phoneme and upsamples the encoded phoneme representation so that the output length corresponds to the length of the predicted speech spectrogram. Such a change both resolves the robustness issue, and improves the naturalness of the synthesized speech. This approach also enables precise control of the speech duration for each phoneme of the input text while still maintaining highly natural synthesis quality. Because recordings of people with ALS often exhibit disfluent speech, this ability to exert per-phoneme control is key for achieving the fluency of the recreated voice.

Non-Attentive Tacotron (NAT) model.

While NAT addresses the robustness issue and enables precise duration control in neural TTS, we build upon it to further improve the natural language understanding of the TTS input. For this, we apply PnG BERT, which uses an approach similar to BERT, but is specifically designed for TTS. It is pre-trained with self-supervision on both the phoneme representation and the grapheme representation of the same content from a large text corpus, and then is used as the encoder of the TTS model. This results in a significant improvement of the prosody and pronunciation of the synthesized speech, especially in difficult cases.

Take, for example, the following audio, which was synthesized from a regular NAT model that takes only phonemes as input:

In comparison, the audio synthesized from PnG NAT on the same input text includes an additional pause that makes the meaning more clear.

The input text to both models is, “To cancel the payment, press one; or to continue, two.” Notice the different pause lengths before the ending “two” in the two versions. The word “two” in the version output by the regular NAT model could be confused for “too”. Because “too” and “two” have identical pronunciation (and thus the same phoneme representation), the regular NAT model does not understand which of the two is appropriate, and assumes it to be the word that more frequently follows a comma, “too”. In contrast, the PnG NAT model can more easily tell the difference, because it takes graphemes in addition to phonemes as input, and thus makes more appropriate pause.

The PnG NAT model integrates the pre-trained PnG BERT model as the encoder to the NAT model. The hidden representations output from the encoder are used by NAT to predict the duration of each phoneme, and are then upsampled to match the length of the audio spectrogram, as outlined above. In the final step, a non-attentive decoder converts the upsampled hidden representations into audio speech spectrograms, which are finally converted into audio waveforms by a neural vocoder.

PnG BERT and the pre-training objectives. Yellow boxes represent phonemes, and pink boxes represent graphemes.
PnG NAT: PnG BERT replaces the original encoder in the NAT model. The random masking for the Masked Language Model (MLM) pre-training is removed.

To recreate Steve Gleason’s voice, we first trained a PnG NAT model with recordings from 31 professional speakers, and then fine-tuned it with 30 minutes of Gleason’s recordings. Because these latter recordings were made after he was diagnosed with ALS, they exhibit signs of slurring. The fine tuned model was able to synthesize speech that sounds very similar to these recordings. However, because the symptoms of ALS were already present in Gleason’s speech, they exhibited some similar disfluencies.

To mitigate this, we leveraged the phoneme duration control of NAT as well as the model trained with professional speakers. We first predicted the durations of each phoneme for both a professional speaker and for Gleason, and then used the geometric mean of the two durations for each phoneme to guide the NAT output. As a result, the model is able to speak in Gleason’s voice, but more fluently than in the original recordings.

Here is the full version of the synthesized Lou Gehrig speech in Gleason’s voice:

Besides recreating voices for people with ALS, PnG NAT is also powering voices for a variety of customers through Google Cloud Custom Voice.

Project Euphonia
Of the millions of people around the world who have neurologic conditions that may impact their speech, such as ALS, cerebral palsy or Down syndrome, many may find it difficult to be understood, which can make face-to-face communication challenging. Using voice-activated technologies can be frustrating too, as they don’t always work reliably. Project Euphonia is a Google Research initiative focused on helping people with impaired speech be better understood. The team is researching ways to improve speech recognition for individuals with speech impairments (see recent blog post and segment in TODAY show), as well as customized text-to-speech technology (see Age of AI documentary featuring former NFL player Tim Shaw).

Acknowledgements
Many people across Google Research, Google Cloud and Consumer Apps, and Google Accessibility teams contributed to this project and the event, including Michael Brenner, Bob MacDonald, Heiga Zen, Yu Zhang, Jonathan Shen, Isaac Elias‎, Yonghui Wu, Anne Keck, Danielle Notaro, Kevin Hogan, Zack Kaplan, KR Liu, Kyndra Price, Zoe Ortiz.

Source: Google AI Blog


Improved Detection of Elusive Polyps via Machine Learning

With the increasing ability to consistently and accurately process large amounts of data, particularly visual data, computer-aided diagnostic systems are more frequently being used to assist physicians in their work. This, in turn, can lead to meaningful improvements in health care. An example of where this could be especially useful is in the diagnosis and treatment of colorectal cancer (CRC), which is especially deadly and results in over 900K deaths per year, globally. CRC originates in small pre-cancerous lesions in the colon, called polyps, the identification and removal of which is very successful in preventing CRC-related deaths.

The standard procedure used by gastroenterologists (GIs) to detect and remove polyps is the colonoscopy, and about 19 million such procedures are performed annually in the US alone. During a colonoscopy, the gastroenterologist uses a camera-containing probe to check the intestine for pre-cancerous polyps and early signs of cancer, and removes tissue that looks worrisome. However, complicating factors, such as incomplete detection (in which the polyp appears within the field of view, but is missed by the GI, perhaps due to its size or shape) and incomplete exploration (in which the polyp does not appear in the camera’s field of view), can lead to a high fraction of missed polyps. In fact, studies suggest that 22%–28% of polyps are missed during colonoscopies, of which 20%–24% have the potential to become cancerous (adenomas).

Today, we are sharing progress made in using machine learning (ML) to help GIs fight colorectal cancer by making colonoscopies more effective. In “Detection of Elusive Polyps via a Large Scale AI System”, we present an ML model designed to combat the problem of incomplete detection by helping the GI detect polyps that are within the field of view. This work adds to our previously published work that maximizes the coverage of the colon during the colonoscopy by flagging for GI follow-up areas that may have been missed. Using clinical studies, we show that these systems significantly improve polyp detection rates.

Incomplete Exploration
To help the GI detect polyps that are outside the field of view, we previously developed an ML system that reduces the rate of incomplete exploration by estimating the fractions of covered and non-covered regions of a colon during a colonoscopy. This earlier work uses computer vision and geometry in a technique we call colonoscopy coverage deficiency via depth, to compute segment-by-segment coverage for the colon. It does so in two phases: first computing depth maps for each frame of the colonoscopy video, and then using these depth maps to compute the coverage in real time.

The ML system computes a depth image (middle) from a single RGB image (left). Then, based on the computation of depth images for a video sequence, it calculates local coverage (right), and detects where the coverage has been deficient and a second look is required (blue color indicates observed segments where red indicates uncovered ones). You can learn more about this work in our previous blog post.

This segment-by-segment work yields the ability to estimate what fraction of the current segment has been covered. The helpfulness of such functionality is clear: during the procedure itself, a physician may be alerted to segments with deficient coverage, and can immediately return to review these areas, potentially reducing the rates of missed polyps due to incomplete exploration.

Incomplete Detection
In our most recent paper, we look into the problem of incomplete detection. We describe an ML model that aids a GI in detecting polyps that are within the field of view, so as to reduce the rate of incomplete detection. We developed a system that is based on convolutional neural networks (CNN) with an architecture that combines temporal logic with a single frame detector, resulting in more accurate detection.

This new system has two principal advantages. The first is that the system improves detection performance by reducing the number of false negatives detections of elusive polyps, those polyps that are particularly difficult for GIs to detect. The second advantage is the very low false positive rate of the system. This low false positive rate makes these systems more likely to be adopted in the clinic.

Examples of the variety of polyps detected by the ML system.

We trained the system on 3600 procedures (86M video frames) and tested it on 1400 procedures (33M frames). All the videos and metadata were de-identified. The system detected 97% of the polyps (i.e., it yielded 97% sensitivity) at 4.6 false alarms per procedure, which is a substantial improvement over previously published results. Of the false alarms, follow-up review showed that some were, in fact, valid polyp detections, indicating that the system was able to detect polyps that were missed by the performing endoscopist and by those who annotated the data. The performance of the system on these elusive polyps suggests its generalizability in that the system has learned to detect examples that were initially missed by all who viewed the procedure.

We evaluated the system performance on polyps that are in the field of view for less than five seconds, which makes them more difficult for the GI to detect, and for which models typically have much lower sensitivity. In this case the system attained a sensitivity that is about three times that of the sensitivity that the original procedure achieved. When the polyps were present in the field of view for less than 2 seconds, the difference was even more stark — the system exhibited a 4x improvement in sensitivity.

It is also interesting to note that the system is fairly insensitive to the choice of neural network architecture. We used two architectures: RetinaNet and  LSTM-SSD. RetinaNet is a leading technique for object detection on static images (used for video by applying it to frames in a consecutive fashion). It is one of the top performers on a variety of benchmarks, given a fixed computational budget, and is known for balancing speed of computation with accuracy. LSTM-SSD is a true video object detection architecture, which can explicitly account for the temporal character of the video (e.g., temporal consistency of detections, ability to deal with blur and fast motion, etc.). It is known for being robust and very computationally lightweight and can therefore run on less expensive processors. Comparable results were also obtained on the much heavier Faster R-CNN architecture. The fact that results are similar across different architectures implies that one can choose the network meeting the available hardware specifications.

Prospective Clinical Research Study
As part of the research reported in our detection paper we ran a clinical validation on 100 procedures in collaboration with Shaare Zedek Medical Center in Jerusalem, where our system was used in real time to help GIs. The system helped detect an average of one polyp per procedure that would have otherwise been missed by the GI performing the procedure, while not missing any of the polyps detected by the GIs, and with 3.8 false alarms per procedure. The feedback from the GIs was consistently positive.

We are encouraged by the potential helpfulness of this system for improving polyp detection, and we look forward to working together with the doctors in the procedure room to further validate this research.

Acknowledgements
The research was conducted by teams from Google Health and Google Research, Israel with support from Verily Life Sciences, and in collaboration with Shaare Zedek Medical Center. Verily is advancing this research via a newly established center in Israel, led by Ehud Rivlin. This research was conducted by Danny Veikherman, Tomer Golany, Dan M. Livovsky, Amit Aides, Valentin Dashinsky, Nadav Rabani, David Ben Shimol, Yochai Blau, Liran Katzir, Ilan Shimshoni, Yun Liu, Ori Segol, Eran Goldin, Greg Corrado, Jesse Lachter, Yossi Matias, Ehud Rivlin, and Daniel Freedman. Our appreciation also goes to several institutions and GIs who provided advice along the way and tested our system prototype. We would like to thank all of our team members and collaborators who worked on this project with us, including: Chen Barshai, Nia Stoykova, and many others.

Source: Google AI Blog


Improved Detection of Elusive Polyps via Machine Learning

With the increasing ability to consistently and accurately process large amounts of data, particularly visual data, computer-aided diagnostic systems are more frequently being used to assist physicians in their work. This, in turn, can lead to meaningful improvements in health care. An example of where this could be especially useful is in the diagnosis and treatment of colorectal cancer (CRC), which is especially deadly and results in over 900K deaths per year, globally. CRC originates in small pre-cancerous lesions in the colon, called polyps, the identification and removal of which is very successful in preventing CRC-related deaths.

The standard procedure used by gastroenterologists (GIs) to detect and remove polyps is the colonoscopy, and about 19 million such procedures are performed annually in the US alone. During a colonoscopy, the gastroenterologist uses a camera-containing probe to check the intestine for pre-cancerous polyps and early signs of cancer, and removes tissue that looks worrisome. However, complicating factors, such as incomplete detection (in which the polyp appears within the field of view, but is missed by the GI, perhaps due to its size or shape) and incomplete exploration (in which the polyp does not appear in the camera’s field of view), can lead to a high fraction of missed polyps. In fact, studies suggest that 22%–28% of polyps are missed during colonoscopies, of which 20%–24% have the potential to become cancerous (adenomas).

Today, we are sharing progress made in using machine learning (ML) to help GIs fight colorectal cancer by making colonoscopies more effective. In “Detection of Elusive Polyps via a Large Scale AI System”, we present an ML model designed to combat the problem of incomplete detection by helping the GI detect polyps that are within the field of view. This work adds to our previously published work that maximizes the coverage of the colon during the colonoscopy by flagging for GI follow-up areas that may have been missed. Using clinical studies, we show that these systems significantly improve polyp detection rates.

Incomplete Exploration
To help the GI detect polyps that are outside the field of view, we previously developed an ML system that reduces the rate of incomplete exploration by estimating the fractions of covered and non-covered regions of a colon during a colonoscopy. This earlier work uses computer vision and geometry in a technique we call colonoscopy coverage deficiency via depth, to compute segment-by-segment coverage for the colon. It does so in two phases: first computing depth maps for each frame of the colonoscopy video, and then using these depth maps to compute the coverage in real time.

The ML system computes a depth image (middle) from a single RGB image (left). Then, based on the computation of depth images for a video sequence, it calculates local coverage (right), and detects where the coverage has been deficient and a second look is required (blue color indicates observed segments where red indicates uncovered ones). You can learn more about this work in our previous blog post.

This segment-by-segment work yields the ability to estimate what fraction of the current segment has been covered. The helpfulness of such functionality is clear: during the procedure itself, a physician may be alerted to segments with deficient coverage, and can immediately return to review these areas, potentially reducing the rates of missed polyps due to incomplete exploration.

Incomplete Detection
In our most recent paper, we look into the problem of incomplete detection. We describe an ML model that aids a GI in detecting polyps that are within the field of view, so as to reduce the rate of incomplete detection. We developed a system that is based on convolutional neural networks (CNN) with an architecture that combines temporal logic with a single frame detector, resulting in more accurate detection.

This new system has two principal advantages. The first is that the system improves detection performance by reducing the number of false negatives detections of elusive polyps, those polyps that are particularly difficult for GIs to detect. The second advantage is the very low false positive rate of the system. This low false positive rate makes these systems more likely to be adopted in the clinic.

Examples of the variety of polyps detected by the ML system.

We trained the system on 3600 procedures (86M video frames) and tested it on 1400 procedures (33M frames). All the videos and metadata were de-identified. The system detected 97% of the polyps (i.e., it yielded 97% sensitivity) at 4.6 false alarms per procedure, which is a substantial improvement over previously published results. Of the false alarms, follow-up review showed that some were, in fact, valid polyp detections, indicating that the system was able to detect polyps that were missed by the performing endoscopist and by those who annotated the data. The performance of the system on these elusive polyps suggests its generalizability in that the system has learned to detect examples that were initially missed by all who viewed the procedure.

We evaluated the system performance on polyps that are in the field of view for less than five seconds, which makes them more difficult for the GI to detect, and for which models typically have much lower sensitivity. In this case the system attained a sensitivity that is about three times that of the sensitivity that the original procedure achieved. When the polyps were present in the field of view for less than 2 seconds, the difference was even more stark — the system exhibited a 4x improvement in sensitivity.

It is also interesting to note that the system is fairly insensitive to the choice of neural network architecture. We used two architectures: RetinaNet and  LSTM-SSD. RetinaNet is a leading technique for object detection on static images (used for video by applying it to frames in a consecutive fashion). It is one of the top performers on a variety of benchmarks, given a fixed computational budget, and is known for balancing speed of computation with accuracy. LSTM-SSD is a true video object detection architecture, which can explicitly account for the temporal character of the video (e.g., temporal consistency of detections, ability to deal with blur and fast motion, etc.). It is known for being robust and very computationally lightweight and can therefore run on less expensive processors. Comparable results were also obtained on the much heavier Faster R-CNN architecture. The fact that results are similar across different architectures implies that one can choose the network meeting the available hardware specifications.

Prospective Clinical Research Study
As part of the research reported in our detection paper we ran a clinical validation on 100 procedures in collaboration with Shaare Zedek Medical Center in Jerusalem, where our system was used in real time to help GIs. The system helped detect an average of one polyp per procedure that would have otherwise been missed by the GI performing the procedure, while not missing any of the polyps detected by the GIs, and with 3.8 false alarms per procedure. The feedback from the GIs was consistently positive.

We are encouraged by the potential helpfulness of this system for improving polyp detection, and we look forward to working together with the doctors in the procedure room to further validate this research.

Acknowledgements
The research was conducted by teams from Google Health and Google Research, Israel with support from Verily Life Sciences, and in collaboration with Shaare Zedek Medical Center. Verily is advancing this research via a newly established center in Israel, led by Ehud Rivlin. This research was conducted by Danny Veikherman, Tomer Golany, Dan M. Livovsky, Amit Aides, Valentin Dashinsky, Nadav Rabani, David Ben Shimol, Yochai Blau, Liran Katzir, Ilan Shimshoni, Yun Liu, Ori Segol, Eran Goldin, Greg Corrado, Jesse Lachter, Yossi Matias, Ehud Rivlin, and Daniel Freedman. Our appreciation also goes to several institutions and GIs who provided advice along the way and tested our system prototype. We would like to thank all of our team members and collaborators who worked on this project with us, including: Chen Barshai, Nia Stoykova, and many others.

Source: Google AI Blog


Mental health trends & how they affect communities of color

Editor’s note: July is Bebe Moore Campbell National Minority Mental Health Awareness Month. To bring awareness to mental health, Asad Abdullah II — a Google engineer, trauma-informed meditation instructor and mental health advocate — chatted with licensed psychologist Dr. Ghynecee Temple about mental health trends, how they affect communities of color and ways to cope.  

Search interest for anxiety reached a record high across the U.S. this year. As we begin to reintegrate into life after an extended period of social distancing and self-isolation, people across the country are looking for ways to cope. 


Marginalized communities in particular have been disproportionately affected and continue to face challenges and stigmas when it comes to accessing resources and talking openly about mental wellbeing. According to Mental Health First Aid, 48% of White Americans with mental illness received mental health services, compared to 31% of Black Americans and Latino/Hispanic Americans and 22% of AAPI populations.


Curious to talk more about the mental health trends we’re seeing for marginalized groups, I sat down with Dr. Ghynecee Temple of the Ladipo Group, a Black-owned company dedicated to the emotional wellness of Black and African-American people and communities. Dr. Temple sifted through these trends, discussed lingering mental health stigmas and shared ways we can take care of our wellbeing and support others. 

Search interest for “why do I feel anxious for no reason” spiked 400% in 2021 U.S. compared to 2020. How is this affecting communities of color specifically? 


There's always a reason you feel anxious, you just may not have uncovered it yet. For communities of color, both before and during the pandemic, there are unique experiences that affect their mental wellbeing. You may deal with navigating daily discrimination, feel a lack of autonomy being in a system that suppresses or grapple with intersecting identities.


Fast forward to COVID-19, and you have a massive loss of control. You can’t see, smell or touch it, but it’s ever-looming and ever-present. So of course you’re going to feel anxious.


Still in some communities, getting mental health help is stigmatized. What I tell people is: Your brain is your control center for your entire body. If your thoughts are off,  it's going to impact every facet of your functioning. And if something is off and not feeling right, why wouldn't we get help? 


As people prepare to return to work and school, what would you say to those who are experiencing uneasiness or anxiety? 


Your feelings about the transition are valid. Some people are excited to socialize again,  others are relieved as home may not always be the safest place for them, and still others are nervous about interacting with people outside of their bubble. Don’t judge your feelings, and accept that you’re going to experience different moods each day.


What are some practical steps we can take to manage those feelings?


There’s still a lot of uncertainty, but part of what we can do to weather that storm is to be present. Instead of thinking about what’s happening in two months or 12 months, ask yourself how you are feeling right now and what you need at this moment. Set boundaries and goals for yourself. For example, if you feel safer wearing a mask, continue to do so even if it’s not required. If you’re struggling with social anxiety, set a goal to socialize for 15 minutes at lunch before allowing yourself to go back to your desk to decompress. Exposure is one of the most helpful things to improve social anxiety. Start small and challenge yourself to build upon it every day.


A lot of people are turning to therapy, and search interest for “black therapists” spiked last summer. How can people within the BIPOC community go about finding a therapist?


A quick Google search will show you resources near you — and even a self-assessment to help you learn more about anxiety. When finding a therapist, many therapists will have an online bio where they can talk about their own identities that feel salient or what communities they’ve worked with before — start there. Then ask for a consultation and evaluate them for yourself. I love when new clients ask me questions! You don’t have to pick the first therapist you find. Remember that you’re shopping and want to feel comfortable and safe.


I’m a Blue Dot Listener at Google. Our aim is to de-stigmatize mental health conversations in the workplace through allyship, peer support and education. I’d love to know from you, how we can be better mental health allies at work?


As allies, we need to check our own beliefs and biases, and embrace a continuous posture of learning and unlearning. I’d also encourage people to know their limits. There are often instances where we try to support people, but it’s out of our scope. Know when to connect people to the right resources.


You’ve been in the mental health space for almost a decade, what makes you hopeful for mental wellbeing for historically underrepresented groups?


The fact that people are even searching for mental health topics is encouraging. It makes me hopeful that people are willing to learn and unlearn things. 


Mental health trends & how they affect communities of color

Editor’s note: July is Bebe Moore Campbell National Minority Mental Health Awareness Month. To bring awareness to mental health, Asad Abdullah II — a Google engineer, trauma-informed meditation instructor and mental health advocate — chatted with licensed psychologist Dr. Ghynecee Temple about mental health trends, how they affect communities of color and ways to cope.  

Search interest for anxiety reached a record high across the U.S. this year. As we begin to reintegrate into life after an extended period of social distancing and self-isolation, people across the country are looking for ways to cope. 


Marginalized communities in particular have been disproportionately affected and continue to face challenges and stigmas when it comes to accessing resources and talking openly about mental wellbeing. According to Mental Health First Aid, 48% of White Americans with mental illness received mental health services, compared to 31% of Black Americans and Latino/Hispanic Americans and 22% of AAPI populations.


Curious to talk more about the mental health trends we’re seeing for marginalized groups, I sat down with Dr. Ghynecee Temple of the Ladipo Group, a Black-owned company dedicated to the emotional wellness of Black and African-American people and communities. Dr. Temple sifted through these trends, discussed lingering mental health stigmas and shared ways we can take care of our wellbeing and support others. 

Search interest for “why do I feel anxious for no reason” spiked 400% in 2021 U.S. compared to 2020. How is this affecting communities of color specifically? 


There's always a reason you feel anxious, you just may not have uncovered it yet. For communities of color, both before and during the pandemic, there are unique experiences that affect their mental wellbeing. You may deal with navigating daily discrimination, feel a lack of autonomy being in a system that suppresses or grapple with intersecting identities.


Fast forward to COVID-19, and you have a massive loss of control. You can’t see, smell or touch it, but it’s ever-looming and ever-present. So of course you’re going to feel anxious.


Still in some communities, getting mental health help is stigmatized. What I tell people is: Your brain is your control center for your entire body. If your thoughts are off,  it's going to impact every facet of your functioning. And if something is off and not feeling right, why wouldn't we get help? 


As people prepare to return to work and school, what would you say to those who are experiencing uneasiness or anxiety? 


Your feelings about the transition are valid. Some people are excited to socialize again,  others are relieved as home may not always be the safest place for them, and still others are nervous about interacting with people outside of their bubble. Don’t judge your feelings, and accept that you’re going to experience different moods each day.


What are some practical steps we can take to manage those feelings?


There’s still a lot of uncertainty, but part of what we can do to weather that storm is to be present. Instead of thinking about what’s happening in two months or 12 months, ask yourself how you are feeling right now and what you need at this moment. Set boundaries and goals for yourself. For example, if you feel safer wearing a mask, continue to do so even if it’s not required. If you’re struggling with social anxiety, set a goal to socialize for 15 minutes at lunch before allowing yourself to go back to your desk to decompress. Exposure is one of the most helpful things to improve social anxiety. Start small and challenge yourself to build upon it every day.


A lot of people are turning to therapy, and search interest for “black therapists” spiked last summer. How can people within the BIPOC community go about finding a therapist?


A quick Google search will show you resources near you — and even a self-assessment to help you learn more about anxiety. When finding a therapist, many therapists will have an online bio where they can talk about their own identities that feel salient or what communities they’ve worked with before — start there. Then ask for a consultation and evaluate them for yourself. I love when new clients ask me questions! You don’t have to pick the first therapist you find. Remember that you’re shopping and want to feel comfortable and safe.


I’m a Blue Dot Listener at Google. Our aim is to de-stigmatize mental health conversations in the workplace through allyship, peer support and education. I’d love to know from you, how we can be better mental health allies at work?


As allies, we need to check our own beliefs and biases, and embrace a continuous posture of learning and unlearning. I’d also encourage people to know their limits. There are often instances where we try to support people, but it’s out of our scope. Know when to connect people to the right resources.


You’ve been in the mental health space for almost a decade, what makes you hopeful for mental wellbeing for historically underrepresented groups?


The fact that people are even searching for mental health topics is encouraging. It makes me hopeful that people are willing to learn and unlearn things.