Tag Archives: Health

Using AI to improve breast cancer screening

Breast cancer is a condition that affects far too many women across the globe. More than 55,000 people in the U.K. are diagnosed with breast cancer each year, and about 1 in 8 women in the U.S. will develop the disease in their lifetime. 

Digital mammography, or X-ray imaging of the breast, is the most common method to screen for breast cancer, with over 42 million exams performed each year in the U.S. and U.K. combined. But despite the wide usage of digital mammography, spotting and diagnosing breast cancer early remains a challenge. 

Reading these X-ray images is a difficult task, even for experts, and can often result in both false positives and false negatives. In turn, these inaccuracies can lead to delays in detection and treatment, unnecessary stress for patients and a higher workload for radiologists who are already in short supply.

Over the last two years, we’ve been working with leading clinical research partners in the U.K. and U.S. to see if artificial intelligence could improve the detection of breast cancer. Today, we’re sharing our initial findings, which have been published in Nature. These findings show that our AI model spotted breast cancer in de-identified screening mammograms (where identifiable information has been removed) with greater accuracy, fewer false positives, and fewer false negatives than experts. This sets the stage for future applications where the model could potentially support radiologists performing breast cancer screenings.

Our research

In collaboration with colleagues at DeepMind, Cancer Research UK Imperial Centre, Northwestern University and Royal Surrey County Hospital, we set out to see if artificial intelligence could support radiologists to spot the signs of breast cancer more accurately. 

The model was trained and tuned on a representative data set comprised of de-identified mammograms from more than 76,000 women in the U.K. and more than 15,000 women in the U.S., to see if it could learn to spot signs of breast cancer in the scans. The model was then evaluated on a separate de-identified data set of more than 25,000 women in the U.K. and over 3,000 women in the U.S. In this evaluation, our system produced a 5.7 percent reduction of false positives in the U.S, and a 1.2 percent reduction in the U.K. It produced a 9.4 percent reduction in false negatives in the U.S., and a 2.7 percent reduction in the U.K.

We also wanted to see if the model could generalize to other healthcare systems. To do this, we trained the model only on the data from the women in the U.K. and then evaluated it on the data set from women in the U.S. In this separate experiment, there was a 3.5 percent reduction in false positives and an 8.1 percent reduction in false negatives, showing the model’s potential to generalize to new clinical settings while still performing at a higher level than experts. 

Animation showing tumour growth and metastatic spread in breast cancer_resize.gif

This is a visualization of tumor growth and metastatic spread in breast cancer. Screening aims to detect breast cancer early, before symptoms develop.

Notably, when making its decisions, the model received less information than human experts did. The human experts (in line with routine practice) had access to patient histories and prior mammograms, while the model only processed the most recent anonymized mammogram with no extra information. Despite working from these X-ray images alone, the model surpassed individual experts in accurately identifying breast cancer.

Next steps

Looking forward to future applications, there are some promising signs that the model could potentially increase the accuracy and efficiency of screening programs, as well as reduce wait times and stress for patients. Google’s Chief Financial Officer Ruth Porat shared her optimism around potential technological breakthroughs in this area in a post in October reflecting on her personal experience with breast cancer.

But getting there will require continued research, prospective clinical studies and regulatory approval to understand and prove how software systems inspired by this research could improve patient care.

This work is the latest strand of our research looking into detection and diagnosis of breast cancer, not just within the scope of radiology, but also pathology. In 2017, we published early findings showing how our models can accurately detect metastatic breast cancer from lymph node specimens. Last year, we also developed a deep learning algorithm that could help doctors spot breast cancer more quickly and accurately in pathology slides.

We’re looking forward to working with our partners in the coming years to translate our machine learning research into tools that benefit clinicians and patients.

Lessons Learned from Developing ML for Healthcare



Machine learning (ML) methods are not new in medicine -- traditional techniques, such as decision trees and logistic regression, were commonly used to derive established clinical decision rules (for example, the TIMI Risk Score for estimating patient risk after a coronary event). In recent years, however, there has been a tremendous surge in leveraging ML for a variety of medical applications, such as predicting adverse events from complex medical records, and improving the accuracy of genomic sequencing. In addition to detecting known diseases, ML models can tease out previously unknown signals, such as cardiovascular risk factors and refractive error from retinal fundus photographs.

Beyond developing these models, it’s important to understand how they can be incorporated into medical workflows. Previous research indicates that doctors assisted by ML models can be more accurate than either doctors or models alone in grading diabetic eye disease and diagnosing metastatic breast cancer. Similarly, doctors are able to leverage ML-based tools in an interactive fashion to search for similar medical images, providing further evidence that doctors can work effectively with ML-based assistive tools.

In an effort to improve guidance for research at the intersection of ML and healthcare, we have written a pair of articles, published in Nature Materials and the Journal of the American Medical Association (JAMA). The first is for ML practitioners to better understand how to develop ML solutions for healthcare, and the other is for doctors who desire a better understanding of whether ML could help improve their clinical work.

How to Develop Machine Learning Models for Healthcare
In “How to develop machine learning models for healthcare” (pdf), published in Nature Materials, we discuss the importance of ensuring that the needs specific to the healthcare environment inform the development of ML models for that setting. This should be done throughout the process of developing technologies for healthcare applications, from problem selection, data collection and ML model development to validation and assessment, deployment and monitoring.

The first consideration is how to identify a healthcare problem for which there is both an urgent clinical need and for which predictions based on ML models will provide actionable insight. For example, ML for detecting diabetic eye disease can help alleviate the screening workload in parts of the world where diabetes is prevalent and the number of medical specialists is insufficient. Once the problem has been identified, one must be careful with data curation to ensure that the ground truth labels, or “reference standard”, applied to the data are reliable and accurate. This can be accomplished by validating labels via comparison to expert interpretation of the same data, such as retinal fundus photographs, or through an orthogonal procedure, such as a biopsy to confirm radiologic findings. This is particularly important since a high-quality reference standard is essential both for training useful models and for accurately measuring model performance. Therefore, it is critical that ML practitioners work closely with clinical experts to ensure the rigor of the reference standard used for training and evaluation.

Validation of model performance is also substantially different in healthcare, because the problem of distributional shift can be pronounced. In contrast to typical ML studies where a single random test split is common, the medical field values validation using multiple independent evaluation datasets, each with different patient populations that may exhibit differences in demographics or disease subtypes. Because the specifics depend on the problem, ML practitioners should work closely with clinical experts to design the study, with particular care in ensuring that the model validation and performance metrics are appropriate for the clinical setting.

Integration of the resulting assistive tools also requires thoughtful design to ensure seamless workflow integration, with consideration for measurement of the impact of these tools on diagnostic accuracy and workflow efficiency. Importantly, there is substantial value in prospective study of these tools in real patient care to better understand their real-world impact.

Finally, even after validation and workflow integration, the journey towards deployment is just beginning: regulatory approval and continued monitoring for unexpected error modes or adverse events in real use remains ahead.
Two examples of the translational process of developing, validating, and implementing ML models for healthcare based on our work in detecting diabetic eye disease and metastatic breast cancer.
Empowering Doctors to Better Understand Machine Learning for Healthcare
In “Users’ Guide to the Medical Literature: How to Read Articles that use Machine Learning,” published in JAMA, we summarize key ML concepts to help doctors evaluate ML studies for suitability of inclusion in their workflow. The goal of this article is to demystify ML, to assist doctors who need to use ML systems to understand their basic functionality, when to trust them, and their potential limitations.

The central questions doctors ask when evaluating any study, whether ML or not, remain: Was the reference standard reliable? Was the evaluation unbiased, such as assessing for both false positives and false negatives, and performing a fair comparison with clinicians? Does the evaluation apply to the patient population that I see? How does the ML model help me in taking care of my patients?

In addition to these questions, ML models should also be scrutinized to determine whether the hyperparameters used in their development were tuned on a dataset independent of that used for final model evaluation. This is particularly important, since inappropriate tuning can lead to substantial overestimation of performance, e.g., a sufficiently sophisticated model can be trained to completely memorize the training dataset and generalize poorly to new data. Ensuring that tuning was done appropriately requires being mindful of ambiguities in dataset naming, and in particular, using the terminology with which the audience is most familiar:
The intersection of two fields: ML and healthcare creates ambiguity in the term “validation dataset”. An ML validation set is typically used to refer to the dataset used for hyperparameter tuning, whereas a “clinical” validation set is typically used for final evaluation. To reduce confusion, we have opted to refer to the (ML) validation set as the “tuning” set.
Future outlook
It is an exciting time to work on AI for healthcare. The “bench-to-bedside” path is a long one that requires researchers and experts from multiple disciplines to work together in this translational process. We hope that these two articles will promote mutual understanding of what is important for ML practitioners developing models for healthcare and what is emphasized by doctors evaluating these models, thus driving further collaborations between the fields and towards eventual positive impact on patient care.

Acknowledgements
Key contributors to these projects include Yun Liu, Po-Hsuan Cameron Chen, Jonathan Krause, and Lily Peng. The authors would like to acknowledge Greg Corrado and Avinash Varadarajan for their advice, and the Google Health team for their support.

Source: Google AI Blog


Developing Deep Learning Models for Chest X-rays with Adjudicated Image Labels



With millions of diagnostic examinations performed annually, chest X-rays are an important and accessible clinical imaging tool for the detection of many diseases. However, their usefulness can be limited by challenges in interpretation, which requires rapid and thorough evaluation of a two-dimensional image depicting complex, three-dimensional organs and disease processes. Indeed, early-stage lung cancers or pneumothoraces (collapsed lungs) can be missed on chest X-rays, leading to serious adverse outcomes for patients.

Advances in machine learning (ML) present an exciting opportunity to create new tools to help experts interpret medical images. Recent efforts have shown promise in improving lung cancer detection in radiology, prostate cancer grading in pathology, and differential diagnoses in dermatology. For chest X-ray images in particular, large, de-identified public image sets are available to researchers across disciplines, and have facilitated several valuable efforts to develop deep learning models for X-ray interpretation. However, obtaining accurate clinical labels for the very large image sets needed for deep learning can be difficult. Most efforts have either applied rule-based natural language processing (NLP) to radiology reports or relied on image review by individual readers, both of which may introduce inconsistencies or errors that can be especially problematic during model evaluation. Another challenge involves assembling datasets that represent an adequately diverse spectrum of cases (i.e., ensuring inclusion of both “hard” cases and “easy” cases that represent the full spectrum of disease presentation). Finally, some chest X-ray findings are non-specific and depend on clinical information about the patient to fully understand their significance. As such, establishing labels that are clinically meaningful and have consistent definitions can be a challenging component of developing machine learning models that use only the image as input. Without standardized and clinically meaningful datasets as well as rigorous reference standard methods, successful application of ML to interpretation of chest X-rays will be hindered.

To help address these issues, we recently published “Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation” in the journal Radiology. In this study we developed deep learning models to classify four clinically important findings on chest X-rays — pneumothorax, nodules and masses, fractures, and airspace opacities. These target findings were selected in consultation with radiologists and clinical colleagues, so as to focus on conditions that are both critical for patient care and for which chest X-ray images alone are an important and accessible first-line imaging study. Selection of these findings also allowed model evaluation using only de-identified images without additional clinical data.

Models were evaluated using thousands of held-out images from each dataset for which we collected high-quality labels using a panel-based adjudication process among board-certified radiologists. Four separate radiologists also independently reviewed the held-out images in order to compare radiologist accuracy to that of the deep learning models (using the panel-based image labels as the reference standard). For all four findings and across both datasets, the deep learning models demonstrated radiologist-level performance. We are sharing the adjudicated labels for the publicly available data here to facilitate additional research.

Data Overview
This work leveraged over 600,000 images sourced from two de-identified datasets. The first dataset was developed in collaboration with co-authors at the Apollo Hospitals, and consists of a diverse set of chest X-rays obtained over several years from multiple locations across the Apollo Hospitals network. The second dataset is the publicly available ChestX-ray14 image set released by the National Institutes of Health (NIH). This second dataset has served as an important resource for many machine learning efforts, yet has limitations stemming from issues with the accuracy and clinical interpretation of the currently available labels.
Chest X-ray depicting an upper left lobe pneumothorax identified by the model and the adjudication panel, but missed by the individual radiologist readers. Left: The original image. Right: The same image with the most important regions for the model prediction highlighted in orange.
Training Set Labels Using Deep Learning and Visual Image Review
For very large datasets consisting of hundreds of thousands of images, such as those needed to train highly accurate deep learning models, it is impractical to manually assign image labels. As such, we developed a separate, text-based deep learning model to extract image labels using the de-identified radiology reports associated with each X-ray. This NLP model was then applied to provide labels for over 560,000 images from the Apollo Hospitals dataset used for training the computer vision models.

To reduce noise from any errors introduced by the text-based label extraction and also to provide the relevant labels for a substantial number of the ChestX-ray14 images, approximately 37,000 images across the two datasets were visually reviewed by radiologists. These were separate from the NLP-based labels and helped to ensure high quality labels across such a large, diverse set of training images.

Creating and Sharing Improved Reference Standard Labels
To generate high-quality reference standard labels for model evaluation, we utilized a panel-based adjudication process, whereby three radiologists reviewed all final tune and test set images and resolved disagreements through discussion. This often allowed difficult findings that were initially only detected by a single radiologist to be identified and documented appropriately. To reduce the risk of bias based on any individual radiologist’s personality or seniority, the discussions took place anonymously via an online discussion and adjudication system.

Because the lack of available adjudicated labels was a significant initial barrier to our work, we are sharing with the research community all of the adjudicated labels for the publicly available ChestX-ray14 dataset, including 2,412 training/validation set images and 1,962 test set images (4,374 images in total). We hope that these labels will facilitate future machine learning efforts and enable better apples-to-apples comparisons between machine learning models for chest X-ray interpretation.

Future Outlook
This work presents several contributions: (1) releasing adjudicated labels for images from a publicly available dataset; (2) a method to scale accurate labeling of training data using a text-based deep learning model; (3) evaluation using a diverse set of images with expert-adjudicated reference standard labels; and ultimately (4) radiologist-level performance of deep learning models for clinically important findings on chest X-rays.

However, in regards to model performance, achieving expert-level accuracy on average is just a part of the story. Even though overall accuracy for the deep learning models was consistently similar to that of radiologists for any given finding, performance for both varied across datasets. For example, the sensitivity for detecting pneumothorax among radiologists was approximately 79% for the ChestX-ray14 images, but was only 52% for the same radiologists on the other dataset, suggesting a more difficult collection cases in the latter. This highlights the importance of validating deep learning tools on multiple, diverse datasets and eventually across the patient populations and clinical settings in which any model is intended to be used.

The performance differences between datasets also emphasize the need for standardized evaluation image sets with accurate reference standards in order to allow comparison across studies. For example, if two different models for the same finding were evaluated using different datasets, comparing performance would be of minimal value without knowing additional details such as the case mix, model error modes, or radiologist performance on the same cases.

Finally, the model often identified findings that were consistently missed by radiologists, and vice versa. As such, strategies that combine the unique “skills” of both the deep learning systems and human experts are likely to hold the most promise for realizing the potential of AI applications in medical image interpretation.

Acknowledgements
Key contributors to this project at Google include Sid Mittal, Gavin Duggan, Anna Majkowska, Scott McKinney, Andrew Sellergren, David Steiner, Krish Eswaran, Po-Hsuan Cameron Chen, Yun Liu, Shravya Shetty, and Daniel Tse. Significant contributions and input were also made by radiologist collaborators Joshua Reicher, Alexander Ding, and Sreenivasa Raju Kalidindi. The authors would also like to acknowledge many members of the Google Health radiology team including Jonny Wong, Diego Ardila, Zvika Ben-Haim, Rory Sayres, Shahar Jamshy, Shabir Adeel, Mikhail Fomitchev, Akinori Mitani, Quang Duong, William Chen and Sahar Kazemzadeh. Sincere appreciation also goes to the many radiologists who enabled this work through their expert image interpretation efforts throughout the project.

Source: Google AI Blog


Tools to help healthcare providers deliver better care

There has been a lot of interest around our collaboration with Ascension. As a physician, I understand. Health is incredibly personal, and your health information should be private to you and the people providing your care. 

That’s why I want to clarify what our teams are doing, why we’re doing it, and how it will help your healthcare providers—and you. 

Doctors and nurses love caring for patients, but aren’t always equipped with the tools they need to thrive in their mission. We have all seen headlines like "Why doctors hate their computers," with complaints about having to use "a disconnected patchwork" that makes finding critical health information like finding a needle in the haystack. The average U.S. health system has 18 electronic medical record systems, and our doctors and nurses feel like they are "data clerks" rather than healers. 


Google has spent two decades on similar problems for consumers, building products such as Search, Translate and Gmail, and we believe we can adapt our technology to help. That’s why we’re building an intelligent suite of tools to help doctors, nurses, and other providers take better care of patients, leveraging our expertise in organizing information. 


One of those tools aims to make health records more useful, more accessible and more searchable by pulling them into a single, easy-to-use interface for doctors. I mentioned this during my presentation last month at theHLTH Conference. Ascension is the first partner where we are working with the frontline staff to pilot this tool.

Google Health - Tools to help healthcare providers deliver better care

Google Health: Tools to help healthcare providers deliver better care

This effort is challenging. Health information is incredibly complex—there are misspellings, different ways of saying the same thing, handwritten scribbles, and faxes. Healthcare IT systems also don’t talk well to each other and this keeps doctors and nurses from taking the best possible care of you. 

Policymakers and regulators across the world (e.g., CMS, HHS, the NHS, and EC)have called this out as an important issue. We’ve committed to help, and it’s why we built this system on interoperable standards

To deliver such a tool to providers, the system must operate on patients' records. This is what people have been asking about in the context of our Ascension partnership, and why we want to clarify how we handle that data.

As we noted in an earlier post, our work adheres to strict regulations on handling patient data, and our Business Associate Agreement with Ascension ensures their patient data cannot be used for any other purpose than for providing our services—this means it’s never used for advertising. We’ve also published a white paper around how customer data is encrypted and isolated in the cloud. 

To ensure that our tools are safe for Ascension doctors and nurses treating real patients, members of our team might come into contact with identifiable patient data. Because of this, we have strict controls for the limited Google employees who handle such data:

  • We develop and test our system on synthetic (fake) data and openly available datasets.

  • To configure, test, tune and maintain the service in a clinical setting, a limited number of screened and qualified Google staff may be exposed to real data. These staff undergo HIPAA and medical ethics training, and are individually and explicitly approved by Ascension for a limited time.

  • We have technical controls to further enhance data privacy. Data is accessible in a strictly controlled environment with audit trails—these controls are designed to prevent the data from leaving this environment and access to patient data is monitored and auditable.

  • We will further prioritize the development of technology that reduces the number of engineers that need access to patient data (similar to our external redactiontechnology).

  • We also participate in external certifications, like ISO 27001, where independent third-party auditors come and check our processes, including information security controls for these tools.

I graduated from medical school in 1989. I've seen tremendous progress in healthcare over the ensuing decades, but this progress has also brought with it challenges of information overload that have taken doctors’ and nurses’ attentions away from the patients they are called to serve. I believe technology has a major role to play in reversing this trend, while also improving how care is delivered in ways that can save lives. 

New Insights into Human Mobility with Privacy Preserving Aggregation



Understanding human mobility is crucial for predicting epidemics, urban and transit infrastructure planning, understanding people’s responses to conflict and natural disasters and other important domains. Formerly, the state-of-the-art in mobility data was based on cell carrier logs or location "check-ins", and was therefore available only in limited areas — where the telecom provider is operating. As a result, cross-border movement and long-distance travel were typically not captured, because users tend not to use their SIM card outside the country covered by their subscription plan and datasets are often bound to specific regions. Additionally, such measures involved considerable time lags and were available only within limited time ranges and geographical areas.

In contrast, de-identified aggregate flows of populations around the world can now be computed from phones' location sensors at a uniform spatial resolution. This metric has the potential to be extremely useful for urban planning since it can be measured in a direct and timely way. The use of de-identified and aggregated population flow data collected at a global level via smartphones could shed additional light on city organization, for example, while requiring significantly fewer resources than existing methods.

In “Hierarchical Organization of Urban Mobility and Its Connection with City Livability”, we show that these mobility patterns — statistics on how populations move about in aggregate — indicate a higher use of public transportation, improved walkability, lower pollutant emissions per capita, and better health indicators, including easier accessibility to hospitals. This work, which appears in Nature Communications, contributes to a better characterization of city organization and supports a stronger quantitative perspective in the efforts to improve urban livability and sustainability.
Visualization of privacy-first computation of the mobility map. Individual data points are automatically aggregated together with differential privacy noise added. Then, flows of these aggregate and obfuscated populations are studied.
Computing a Global Mobility Map While Preserving User Privacy
In line with our AI principles, we have designed a method for analyzing population mobility with privacy-preserving techniques at its core. To ensure that no individual user’s journey can be identified, we create representative models of aggregate data by employing a technique called differential privacy, together with k-anonymity, to aggregate population flows over time. Initially implemented in 2014, this approach to differential privacy intentionally adds random “noise” to the data in a way that maintains both users' privacy and the data's accuracy at an aggregate level. We use this method to aggregate data collected from smartphones of users who have deliberately chosen to opt-in to Location History, in order to better understand global patterns of population movements.

The model only considers de-identified location readings aggregated to geographical areas of predetermined sizes (e.g., S2 cells). It "snaps" each reading into a spacetime bucket by discretizing time into longer intervals (e.g., weeks) and latitude/longitude into a unique identifier of the geographical area. Aggregating into these large spacetime buckets goes beyond protecting individual privacy — it can even protect the privacy of communities.

Finally, for each pair of geographical areas, the system computes the relative flow between the areas over a given time interval, applies differential privacy filters, and outputs the global, anonymized, and aggregated mobility map. The dataset is generated only once and only mobility flows involving a sufficiently large number of accounts are processed by the model. This design is limited to heavily aggregated flows of populations, such as that already used as a vital source of information for estimates of live traffic and parking availability, which protects individual data from being manually identified. The resulting map is indexed for efficient lookup and used to fuel the modeling described below.

Mobility Map Applications
Aggregate mobility of people in cities around the globe defines the city and, in turn, its impact on the people who live there. We define a metric, the flow hierarchy (Φ), derived entirely from the mobility map, that quantifies the hierarchical organization of cities. While hierarchies across cities have been extensively studied since Christaller’s work in the 1930s, for individual cities, the focus has been primarily on the differences between core and peripheral structures, as well as whether cities are mono- or poly-centric. Our results instead show that the reality is much more rich than previously thought. The mobility map enables a quantitative demonstration that cities lie across a spectrum of hierarchical organization that strongly correlates with a series of important quality of life indicators, including health and transportation.

Below we see an example of two cities — Paris and Los Angeles. Though they have almost the same population size, those two populations move in very different ways. Paris is mono-centric, with an "onion" structure that has a distinct high-mobility city center (red), which progressively decreases as we move away from the center (in order: orange, yellow, green, blue). On the other hand, Los Angeles is truly poly-centric, with a large number of high-mobility areas scattered throughout the region.
Mobility maps of Paris (left) and Los Angeles (right). Both cities have similar population sizes, but very different mobility patterns. Paris has an "onion" structure exhibiting a distinct center with a high degree of mobility (red) that progressively decreases as we move away from the center (in order: orange, yellow, green, blue). In contrast, Los Angeles has a large number of high-mobility areas scattered throughout the region.
More hierarchical cities — in terms of flows being primarily between hotspots of similar activity levels — have values of flow hierarchy Φ closer to the upper limit of 1 and tend to have greater levels of uniformity in their spatial distribution of movements, wider use of public transportation, higher levels of walkability, lower pollution emissions, and better indicators of various measures of health. Returning to our example, the flow hierarchy of Paris is Φ=0.93 (in the top quartile across all 174 cities sampled), while that of Los Angeles is 0.86 (bottom quartile).

We find that existing measures of urban structure, such as population density and sprawl composite indices, correlate with flow hierarchy, but in addition the flow hierarchy conveys comparatively more information that includes behavioral and socioeconomic factors.
Connecting flow hierarchy Φ with urban indicators in a sample of US cities. Proportion of trips as a function of Φ, broken down by model share: private car, public transportation, and walking. Sample city names that appear in the plot: ATL (Atlanta), CHA (Charlotte), CHI (Chicago), HOU (Houston), LA (Los Angeles), MIN (Minneapolis), NY (New York City), and SF (San Francisco). We see that cities with higher flow hierarchy exhibit significantly higher rates of public transportation use, less car use, and more walkability.
Measures of urban sprawl require composite indices built up from much more detailed information on land use, population, density of jobs, and street geography among others (sometimes up to 20 different variables). In addition to the extensive data requirements, such metrics are also costly to obtain. For example, censuses and surveys require a massive deployment of resources in terms of interviews, and are only standardized at a country level, hindering the correct quantification of sprawl indices at a global scale. On the other hand, the flow hierarchy, being constructed from mobility information alone, is significantly less expensive to compile (involving only computer processing cycles), and is available in real-time.

Given the ongoing debate on the optimal structure of cities, the flow hierarchy, introduces a different conceptual perspective compared to existing measures, and can shed new light on the organization of cities. From a public-policy point of view, we see that cities with greater degree of mobility hierarchy tend to have more desirable urban indicators. Given that this hierarchy is a measure of proximity and direct connectivity between socioeconomic hubs, a possible direction could be to shape opportunity and demand in a way that facilitates a greater degree of hub-to-hub movement than a hub-to-spoke architecture. The proximity of hubs can be generated through appropriate land use, that can be shaped by data-driven zoning laws in terms of business, residence or service areas. The presence of efficient public transportation and lower use of cars is another important factor. Perhaps a combination of policies, such as congestion-pricing, used to disincentivize private transportation to socioeconomic hubs, along with building public transportation in a targeted fashion to directly connect the hubs, may well prove useful.

Next Steps
This work is part of our larger AI for Social Good efforts, a program that focuses Google's expertise on addressing humanitarian and environmental challenges.These mobility maps are only the first step toward making an impact in epidemiology, infrastructure planning, and disaster response, while ensuring high privacy standards.

The work discussed here goes to great lengths to ensure privacy is maintained. We are also working on newer techniques, such as on-device federated learning, to go a step further and enable computing aggregate flows without personal data leaving the device at all. By using distributed secure aggregation protocols or randomized responses, global flows can be computed without even the aggregator having knowledge of individual data points being aggregated. This technique has also been applied to help secure Chrome from malicious attacks.

Acknowledgements
This work resulted from a collaboration of Aleix Bassolas and José J. Ramasco from the Institute for Cross-Disciplinary Physics and Complex Systems (IFISC, CSIC-UIB), Brian Dickinson, Hugo Barbosa-Filho, Gourab Ghoshal, Surendra A. Hazarie, and Henry Kautz from the Computer Science Department and Ghoshal Lab at the University of Rochester, Riccardo Gallotti from the Bruno Kessler Foundation, and Xerxes Dotiwalla, Paul Eastham, Bryant Gipson, Onur Kucuktunc, Allison Lieber, Adam Sadilek at Google.

The differential privacy library used in this work is open source and available on our GitHub repo.

Source: Google AI Blog


Breast cancer and tech…a reason for optimism

I was diagnosed with breast cancer twice, in 2001 and again in 2004. Thanks to early detection and access to extraordinary care—including multiple rounds of chemo, radiation and more surgery than any one person should ever have in a lifetime—I’m still here and able to write this piece. In fact, I’ve probably never been healthier. 

I remember receiving the news. I was initially terrified. Our three kids were only five, seven, and nine at the time of my first diagnosis, and all I wanted was to live to see them grow up. I’m grateful I had options and access to treatments, but no aspect of it was pleasant. Last year, I had the joy of seeing our youngest son graduate from college. In the years since I first learned of my cancer, there’s been remarkable progress in global health care, augmented with pioneering work from medical researchers and technology companies. I know how incredibly fortunate I am, but I also know that for far too many, a diagnosis comes too late and the best care is beyond reach. 

And that’s where Google has focused its work: to bring healthcare innovations to everyone.Working at Google, I have had a front-row seat to these technological breakthroughs. 

During the past few years, teams at Google have applied artificial intelligence (AI) to problems in healthcare—from predicting patient outcomes in medical records to helping detect diseases like lung cancer. We’re still early on in developing these technologies, but the results are promising. 

When it comes to breast cancer, Google is looking at how AI can help specialists improve detection and diagnosis. Breast cancer is one of the most common cancers among women worldwide, taking the lives of more than 600,000 people each year. Thankfully, that number is on the decline because of huge advances in care. However, that number could be even lower if we continue to accelerate progress and make sure that progress reaches as many people as possible. Google hopes AI research will further fuel progress on both detection and diagnosis. 

Early detection depends on patients and technologies, such as mammography. Currently, we rely on mammograms to screen for cancer in otherwise healthy women, but thousands of cases go undiagnosed each year and thousands more result in  confusing or worrying findings that are not cancer or are low risk. Today we can’t easily distinguish the cancers we need to find from those that are unlikely to cause further harm. We believe that technology can help with detection, and thus improve the experience for both patients and doctors.  

Just as important as detecting cancer is determining how advanced and aggressive the cancer is. A process called staging helps determine how far the cancer has spread, which impacts the course of treatment. Staging largely depends on clinicians and radiologists looking at patient histories, physical examinations and images. In addition, pathologists examine tissue samples obtained from a biopsy to assess the microscopic appearance and biological properties of each individual patient’s cancer and judge aggressiveness. However, pathologic assessment is a laborious and costly process that--incredibly--continues to rely on an individual evaluating microscopic features in biological tissue with the human eye and microscope!

Last year, Google created a deep learning algorithm that could help pathologists assess tissue and detect the spread and extent of disease better in virtually every case. By pinpointing the location of the cancer more accurately, quickly and at a lower cost, care providers might be able to deliver better treatment for more patients. But doing this will require that these insights be paired with human intelligence and placed in the hands of skilled researchers, surgeons, oncologists, radiologists and others. Google’s research showed that the best results come when medical professionals and technology work together, rather than either working alone. 

During my treatment, I was taken care of by extraordinary teams at Memorial Sloan Kettering in New York where they had access to the latest developments in breast cancer care. My oncologist (and now good friend), Dr. Clifford Hudis, is now CEO of the American Society of Clinical Oncology (ASCO), which has developed a nonprofit big data initiative, CancerLinQ, to give oncologists and researchers access to health information to inform better care for everyone. He told me: “CancerLinQ seeks to identify hidden signals in the routine record of care from millions of de-identified patients so that doctors have deeper and faster insights into their own practices and opportunities for improvement.” He and his colleagues don't think they’ll be able to deliver optimally without robust AI. 

What medical professionals, like Dr. Hudis and his colleagues across ASCO and CancerLinQ, and engineers at companies like Google have accomplished since the time I joined the Club in 2001 is remarkable. 

I will always remember words passed on to me by another cancer survivor, which helped me throughout my treatment. He said when you’re having a good day and you’ve temporarily pushed the disease out of your mind, a little bird might land on your shoulder to remind you that you have cancer. Eventually, that bird comes around less and less. It took many years but I am relieved to say that I haven’t seen that bird in a long time, and I am incredibly grateful for that. I am optimistic that the combination of great doctors and technology could allow us to get rid of those birds for so many more people. 

Breast cancer and tech…a reason for optimism

I was diagnosed with breast cancer twice, in 2001 and again in 2004. Thanks to early detection and access to extraordinary care—including multiple rounds of chemo, radiation and more surgery than any one person should ever have in a lifetime—I’m still here and able to write this piece. In fact, I’ve probably never been healthier. 

I remember receiving the news. I was initially terrified. Our three kids were only five, seven, and nine at the time of my first diagnosis, and all I wanted was to live to see them grow up. I’m grateful I had options and access to treatments, but no aspect of it was pleasant. Last year, I had the joy of seeing our youngest son graduate from college. In the years since I first learned of my cancer, there’s been remarkable progress in global health care, augmented with pioneering work from medical researchers and technology companies. I know how incredibly fortunate I am, but I also know that for far too many, a diagnosis comes too late and the best care is beyond reach. 

And that’s where Google has focused its work: to bring healthcare innovations to everyone.Working at Google, I have had a front-row seat to these technological breakthroughs. 

During the past few years, teams at Google have applied artificial intelligence (AI) to problems in healthcare—from predicting patient outcomes in medical records to helping detect diseases like lung cancer. We’re still early on in developing these technologies, but the results are promising. 

When it comes to breast cancer, Google is looking at how AI can help specialists improve detection and diagnosis. Breast cancer is one of the most common cancers among women worldwide, taking the lives of more than 600,000 people each year. Thankfully, that number is on the decline because of huge advances in care. However, that number could be even lower if we continue to accelerate progress and make sure that progress reaches as many people as possible. Google hopes AI research will further fuel progress on both detection and diagnosis. 

Early detection depends on patients and technologies, such as mammography. Currently, we rely on mammograms to screen for cancer in otherwise healthy women, but thousands of cases go undiagnosed each year and thousands more result in  confusing or worrying findings that are not cancer or are low risk. Today we can’t easily distinguish the cancers we need to find from those that are unlikely to cause further harm. We believe that technology can help with detection, and thus improve the experience for both patients and doctors.  

Just as important as detecting cancer is determining how advanced and aggressive the cancer is. A process called staging helps determine how far the cancer has spread, which impacts the course of treatment. Staging largely depends on clinicians and radiologists looking at patient histories, physical examinations and images. In addition, pathologists examine tissue samples obtained from a biopsy to assess the microscopic appearance and biological properties of each individual patient’s cancer and judge aggressiveness. However, pathologic assessment is a laborious and costly process that--incredibly--continues to rely on an individual evaluating microscopic features in biological tissue with the human eye and microscope!

Last year, Google created a deep learning algorithm that could help pathologists assess tissue and detect the spread and extent of disease better in virtually every case. By pinpointing the location of the cancer more accurately, quickly and at a lower cost, care providers might be able to deliver better treatment for more patients. But doing this will require that these insights be paired with human intelligence and placed in the hands of skilled researchers, surgeons, oncologists, radiologists and others. Google’s research showed that the best results come when medical professionals and technology work together, rather than either working alone. 

During my treatment, I was taken care of by extraordinary teams at Memorial Sloan Kettering in New York where they had access to the latest developments in breast cancer care. My oncologist (and now good friend), Dr. Clifford Hudis, is now CEO of the American Society of Clinical Oncology (ASCO), which has developed a nonprofit big data initiative, CancerLinQ, to give oncologists and researchers access to health information to inform better care for everyone. He told me: “CancerLinQ seeks to identify hidden signals in the routine record of care from millions of de-identified patients so that doctors have deeper and faster insights into their own practices and opportunities for improvement.” He and his colleagues don't think they’ll be able to deliver optimally without robust AI. 

What medical professionals, like Dr. Hudis and his colleagues across ASCO and CancerLinQ, and engineers at companies like Google have accomplished since the time I joined the Club in 2001 is remarkable. 

I will always remember words passed on to me by another cancer survivor, which helped me throughout my treatment. He said when you’re having a good day and you’ve temporarily pushed the disease out of your mind, a little bird might land on your shoulder to remind you that you have cancer. Eventually, that bird comes around less and less. It took many years but I am relieved to say that I haven’t seen that bird in a long time, and I am incredibly grateful for that. I am optimistic that the combination of great doctors and technology could allow us to get rid of those birds for so many more people. 

Putting your heart first on World Heart Day

World Heart Day is this Sunday, and it raises awareness around the cause and prevention of cardiovascular diseases around the world. As part of these efforts, the World Heart Federation recognizes “people from all walks of life who have shown commitment, courage, empathy and care in relation to heart health” as heart heroes. It’s an honor to have been included this year for my focus on using technology to promote lifestyle interventions such as increasing physical activity to help people lead healthier lives.

Heart disease continues to be the number one cause of death in the U.S., so it’s more important than ever to identify and share simple ways to keep your heart healthy. I have two kids under the age of five and life can get really busy. When juggling between patients, children, work and errands, it’s easy to feel active when in reality, I’ve lost track of healthy habits.

With Google Fit’s smart activity goals and Heart Point tracking, I realized I wasn’t reaching American Heart Association and World Health Organization’s recommended amount of weekly physical activity and I needed to make changes to earn more Heart Points throughout the week.

Meeting weekly Heart Point goals improve overall wellness and health

Meeting weekly Heart Point goals improve overall wellness and health

On busy days, I’ve started to use a 7-minute workout app every evening that provides video overviews and audio descriptions of each exercise. It’s quick, easy and fun. And to top it off, my kids will often join in on a wall sit or climb on me for some extra weight during a plank. I’ve found these exercises to be a quick and efficient way to earn 14 Heart Points, which quickly adds up to help me reach my weekly goal.

7 minute workout with kids

Using a workout app may not be for everyone—there are many ways to incorporate incremental changes throughout your week that will help you be more active. Here are a few other things to try out: 

  • Get your body moving and rake the leaves outside or mow the lawn.
  • Pick up the pace when you’re on a walk, with yourself, your friends or your dog.
  • Wear sneakers and make it a walking meeting—this way you and your co-workers get health benefits. 
  • Sign up for a workout class! A 45-minute indoor cycling class earns you 90 Heart Points.
  • Before you shower, take a few minutes to do simple exercises like jumping jacks, squats, wall sits, push ups or planks.

The beauty of it all is that you don’t have to go to a gym or buy special equipment. Just getting moving can have health benefits that add up. For World Heart Day, I challenge you to find opportunities that work with your schedule to earn more Heart Points.

DeepMind’s health team joins Google Health

Over the last three years, DeepMind has built a team to tackle some of healthcare’s most complex problems—developing AI research and mobile tools that are already having a positive impact on patients and care teams. Today, with our healthcare partners, the team is excited to officially join the Google Health family. Under the leadership of Dr. David Feinberg, and alongside other teams at Google, we’ll now be able to tap into global expertise in areas like app development, data security, cloud storage and user-centered design to build products that support care teams and improve patient outcomes. 

During my time working in the UK National Health Service (NHS) as a surgeon and researcher, I saw first-hand how technology could help, or hinder, the important work of nurses and doctors. It’s remarkable that many frontline clinicians, even in the world’s most advanced hospitals, are still reliant on clunky desktop systems and pagers that make delivering fast and safe patient care challenging. Thousands of people die in hospitals every year from avoidable conditions like sepsis and acute kidney injury and we believe that better tools could save lives. That’s why I joined DeepMind, and why I will continue this work with Google Health. 

We’ve already seen how our mobile medical assistant for clinicians is helping patients and the clinicians looking after them, and we are looking forward to continuing our partnerships with The Royal Free London NHS Foundation Trust, Imperial College Healthcare NHS Trust and Taunton and Somerset NHS Foundation Trust.

On the research side, we’ve seen major advances with Moorfields Eye Hospital NHS Foundation Trust in detecting eye disease from scansas accurately as experts; with University College London Hospitals NHS Foundation Trust on planning cancer radiotherapy treatment; and with the US Department of Veterans Affairs to predict patient deterioration up to 48 hours earlier than currently possible. We see enormous potential in continuing, and scaling, our work with all three partners in the coming years as part of Google Health. 

It’s clear that a transition like this takes time. Health data is sensitive, and we gave proper time and care to make sure that we had the full consent and cooperation of our partners. This included giving them the time to ask questions and fully understand our plans and to choose whether to continue our partnerships. As has always been the case, our partners are in full control of all patient data and we will only use patient data to help improve care, under their oversight and instructions.

I know DeepMind is proud of our healthcare work to date. With the expertise and reach of Google behind us, we’ll now be able to develop tools and technology capable of helping millions of patients around the world. 

Using Deep Learning to Inform Differential Diagnoses of Skin Diseases



An estimated 1.9 billion people worldwide suffer from a skin condition at any given time, and due to a shortage of dermatologists, many cases are seen by general practitioners instead. In the United States alone, up to 37% of patients seen in the clinic have at least one skin complaint and more than half of those patients are seen by non-dermatologists. However, studies demonstrate a significant gap in the accuracy of skin condition diagnoses between general practitioners and dermatologists, with the accuracy of general practitioners between 24% and 70%, compared to 77-96% for dermatologists. This can lead to suboptimal referrals, delays in care, and errors in diagnosis and treatment.

Existing strategies for non-dermatologists to improve diagnostic accuracy include the use of reference textbooks, online resources, and consultation with a colleague. Machine learning tools have also been developed with the aim of helping to improve diagnostic accuracy. Previous research has largely focused on early screening of skin cancer, in particular, whether a lesion is malignant or benign, or whether a lesion is melanoma. However, upwards of 90% of skin problems are not malignant, and addressing these more common conditions is also important to reduce the global burden of skin disease.

In “A Deep Learning System for Differential Diagnosis of Skin Diseases,” we developed a deep learning system (DLS) to address the most common skin conditions seen in primary care. Our results showed that a DLS can achieve an accuracy across 26 skin conditions that is on par with U.S. board-certified dermatologists, when presented with identical information about a patient case (images and metadata). This study highlights the potential of the DLS to augment the ability of general practitioners who did not have additional specialty training to accurately diagnose skin conditions.

DLS Design
Clinicians often face ambiguous cases for which there is no clear cut answer. For example, is this patient’s rash stasis dermatitis or cellulitis, or perhaps both superimposed? Rather than giving just one diagnosis, clinicians generate a differential diagnosis, which is a ranked list of possible diagnoses. A differential diagnosis frames the problem so that additional workup (laboratory tests, imaging, procedures, consultations) and treatments can be systematically applied until a diagnosis is confirmed. As such, a deep learning system (DLS) that produces a ranked list of possible skin conditions for a skin complaint closely mimics how clinicians think and is key to prompt triage, diagnosis and treatment for patients.

To render this prediction, the DLS processes inputs, including one or more clinical images of the skin abnormality and up to 45 types of metadata (self-reported components of the medical history such as age, sex, symptoms, etc.). For each case, multiple images were processed using the Inception-v4 neural network architecture and combined with feature-transformed metadata, for use in the classification layer. In our study, we developed and evaluated the DLS with 17,777 de-identified cases that were primarily referred from primary care clinics to a teledermatology service. Data from 2010-2017 were used for training and data from 2017-2018 for evaluation. During model training, the DLS leveraged over 50,000 differential diagnoses provided by over 40 dermatologists.

To evaluate the DLS’s accuracy, we compared it to a rigorous reference standard based on the diagnoses from three U.S. board-certified dermatologists. In total, dermatologists provided differential diagnoses for 3,756 cases (“Validation set A”), and these diagnoses were aggregated via a voting process to derive the ground truth labels. The DLS’s ranked list of skin conditions was compared with this dermatologist-derived differential diagnosis, achieving 71% and 93% top-1 and top-3 accuracies, respectively.
Schematic of the DLS and how the reference standard (ground truth) was derived via the voting of three board-certified dermatologists for each case in the validation set.
Comparison to Professional Evaluations
In this study, we also compared the accuracy of the DLS to that of three categories of clinicians on a subset of the validation A dataset (“Validation set B”): dermatologists, primary care physicians (PCPs), and nurse practitioners (NPs) — all chosen randomly and representing a range of experience, training, and diagnostic accuracy. Because typical differential diagnoses provided by clinicians only contain up to three diagnoses, we compared only the top three predictions by the DLS with the clinicians. The DLS achieved a top-3 diagnostic accuracy of 90% on the validation B dataset, which was comparable to dermatologists and substantially higher than primary care physicians (PCPs) and nurse practitioners (NPs)—75%, 60%, and 55%, respectively, for the 6 clinicians in each group. This high top-3 accuracy suggests that the DLS may help prompt clinicians (including dermatologists) to consider possibilities that were not originally in their differential diagnoses, thus improving diagnostic accuracy and condition management.
The DLS’s leading (top-1) differential diagnosis is substantially higher than PCPs and NPs, and on par with dermatologists. This accuracy increases substantially when we look at the DLS’s top-3 accuracy, suggesting that in the majority of cases the DLS’s ranked list of diagnoses contains the correct ground truth answer for the case.
Assessing Demographic Performance
Skin type, in particular, is highly relevant to dermatology, where visual assessment of the skin itself is crucial to diagnosis. To evaluate potential bias towards skin type, we examined DLS performance based on the Fitzpatrick skin type, which is a scale that ranges from Type I (“pale white, always burns, never tans”) to Type VI (“darkest brown, never burns”). To ensure sufficient numbers of cases on which to draw convincing conclusions, we focused on skin types that represented at least 5% of the data — Fitzpatrick skin types II through IV. On these categories, the DLS’s accuracy was similar, with a top-1 accuracy ranging from 69-72%, and the top-3 accuracy from 91-94%. Encouragingly, the DLS also remained accurate in patient subgroups for which significant numbers (at least 5%) were present in the dataset based on other self-reported demographic information: age, sex, and race/ethnicities. As further qualitative analysis, we assessed via saliency (explanation) techniques that the DLS was reassuringly “focusing” on the abnormalities instead of on skin tone.
Left: An example of a case with hair loss that was challenging for non-specialists to arrive at the specific diagnosis, which is necessary for determining appropriate treatment. Right: An image with regions highlighted in green showing the areas that the DLS identified as important and used to make its prediction. Center: The combined image, which indicates that the DLS mostly focused on the area with hair loss to make this prediction, instead of on forehead skin color, for example, which may indicate potential bias.
Incorporating Multiple Data Types
We also studied the effect of different types of input data on the DLS performance. Much like how having images from several angles can help a teledermatologist more accurately diagnose a skin condition, the accuracy of the DLS improves with increasing number of images. If metadata (e.g., the medical history) is missing, the model does not perform as well. This accuracy gap, which may occur in scenarios where no medical history is available, can be partially mitigated by training the DLS with only images. Nevertheless, this data suggests that providing the answers to a few questions about the skin condition can substantially improve the DLS accuracy.
The DLS performance improves when more images (blue line) or metadata (blue compared with red line) are present. In the absence of metadata as input, training a separate DLS using images alone leads to a marginal improvement compared to the current DLS (green line).
Future Work and Applications
Though these results are very promising, much work remains ahead. First, as reflective of real-world practice, the relative rarity of skin cancer such as melanoma in our dataset hindered our ability to train an accurate system to detect cancer. Related to this, the skin cancer labels in our dataset were not biopsy-proven, limiting the quality of the ground truth in this regard. Second, while our dataset did contain a variety of Fitzpatrick skin types, some skin types were too rare in this dataset to allow meaningful training or analysis. Finally, the validation dataset was from one teledermatology service. Though 17 primary care locations across two states were included, additional validation on cases from a wider geographical region will be critical. We believe these limitations can be addressed by including more cases of biopsy-proven skin cancers in the training and validation sets, and including cases representative of additional Fitzpatrick skin types and from other clinical centers.

The success of deep learning to inform the differential diagnosis of skin disease is highly encouraging of such a tool’s potential to assist clinicians. For example, such a DLS could help triage cases to guide prioritization for clinical care or could help non-dermatologists initiate dermatologic care more accurately and potentially improve access. Though significant work remains, we are excited for future efforts in examining the usefulness of such a system for clinicians. For research collaboration inquiries, please contact [email protected]

Acknowledgements
This work involved the efforts of a multidisciplinary team of software engineers, researchers, clinicians and cross functional contributors. Key contributors to this project include Yuan Liu, Ayush Jain, Clara Eng, David H. Way, Kang Lee, Peggy Bui, Kimberly Kanada, Guilherme de Oliveira Marinho, Jessica Gallegos, Sara Gabriele, Vishakha Gupta, Nalini Singh, Vivek Natarajan, Rainer Hofmann-Wellenhof, Greg S. Corrado, Lily H. Peng, Dale R. Webster, Dennis Ai, Susan Huang, Yun Liu, R. Carter Dunn and David Coz. The authors would like to acknowledge William Chen, Jessica Yoshimi, Xiang Ji and Quang Duong for software infrastructure support for data collection. Thanks also go to Genevieve Foti, Ken Su, T Saensuksopa, Devon Wang, Yi Gao and Linh Tran. Last but not least, this work would not have been possible without the participation of the dermatologists, primary care physicians, nurse practitioners who reviewed cases for this study, Sabina Bis who helped to establish the skin condition mapping and Amy Paller who provided feedback on the manuscript.

Source: Google AI Blog