Tag Archives: Health

Working with the WHO to power digital health apps

Nearly 4 billion people around the world don’t have access to the essential healthcare services they need, like immunizations or pediatric care. Complicating matters, the World Health Organization (WHO) estimates a global shortage of 18 million healthcare workers by 2030 — primarily in low-and-middle income countries (LMICs).

In many countries, healthcare workers use smartphone applications to manage data specific to certain diseases like malaria and tuberculosis. However, the data is often stored across multiple applications using different data formats, making it difficult for healthcare workers to have all the information they need. Additionally, it’s difficult for healthcare providers and organizations to exchange data, so they often don’t have a holistic view of individual or community health data to inform health decisions.

To give healthcare workers access to advanced mobile digital health solutions, we’re collaborating with the WHO on building an open source software developer kit (SDK). This SDK will help Android developers around the world, including in LMICs, build secure mobile solutions using the Fast Healthcare Interoperability Resources (FHIR), a global standard framework for healthcare data that is being widely adopted to address fragmentation and foster more patient-centered care. With Android OS powering 3 billion active devices worldwide, this collaboration provides an opportunity to support more healthcare workers on the frontlines.

Supporting developers and frontline health workers

Frontline health workers often work in areas where connectivity is unreliable. The SDK allows Android applications to run offline by storing and processing data locally, so health workers can deliver care without worrying about connectivity. When there is connectivity, the SDK will send the server the latest data collected on the device, and receive new updates to patient records.

The SDK is being designed to provide healthcare workers with access to decision support tools. For example, the WHO is using the SDK to develop EmCare, an app for healthcare workers in emergency settings. This application provides clinical decision support, based on the WHO SMART Guidelines content, which ensures compliance with evidence-based recommendations at the point of care.

By providing a common set of application components - like on-device storage, data-access and search APIs - the SDK reduces the time and effort it takes to build FHIR-based, interoperable digital health applications on Android, maximizing the efforts of local developers and unlocking their potential to meet their community’s needs.

The FHIR SDK facilitates interoperability and high-quality data exchange and is designed with a high level of security. Interoperability not only opens up the ability for healthcare workers to more easily gather community health data, but also makes it possible to use high-quality data to understand health trends, better prioritize high-risk patients and deliver more patient-centered care to everyone. All data stored by apps built on the SDK is strongly encrypted, and the SDK does not send or share any data with Google.

Extending interoperability globally

The global digital health community is rallying around FHIR to help improve health data interoperability, and we are committed to helping developers everywhere safely use our SDK to build secure and interoperable digital health solutions for their communities.

We are collaborating with WHO and a group of developers to make sure the SDK meets the needs of the community. We plan to release it more widely in the coming months and look forward to supporting developers as they build digital health tools for healthcare workers everywhere.

This year, we searched for ways to stay healthy

Every day, millions of people come to Google Search to ask important questions about their wellbeing. The COVID-19 pandemic drove even more concern for our health and the health of our loved ones – and this year, searches for ways to heal reached record highs. We saw questions about vaccinations, therapists, body positivity and mental wellbeing, to name a few. Today, we launched our annual Year in Search, which takes a look back at the top-trending searches of the year. Here’s a glimpse into some of the trending searches of 2021, a year we looked for ways to feel better and heal together.

Finding resources near me

Across the world, people searched for information on COVID-19 vaccinations and testing. The top trending "near me" queries in 2021 were "covid vaccine near me" and "covid testing near me.” To help people find credible, timely testing and vaccine information, we updated Google Search information panels, and worked with national and international partners to help people get vaccinated and tested.

Learning how to help

Helping ourselves and our communities was a priority for many of us. We asked questions about how to help others with anxiety and depression, and we also looked for help with our own mental wellbeing. Search interest for “therapists near me” hit record highs in 2021, and the phrase "why do I feel anxious for no reason" also hit an all-time high this year, spiking more than 400%. In addition to providing mental health resources and helplines, a quick Google Search also surfaces self-assessments to help you learn more about mental health topics like depression, anxiety, PTSD and postpartum depression.

Evaluating information effectively

Is it allergies or COVID? A sinus infection or COVID? Pfizer or Moderna? As many of us searched for health related information online, we wanted to know what we found was trustworthy. Connecting people with critical, timely and authoritative health information has been a crucial part of our role over the last year, and our team is constantly working to find ways to help people everywhere find credible and actionable information to help manage their health. To help people evaluate information online, we launched a new tool called About This Result, so you can learn more about the pages you see across a range of topics. About This Result helps people evaluate the credibility of sources, and decide which results are useful for them.

Search continues to be one of the first stops people make when making decisions, big and small, about their health — and so much more. To dive deeper into some of the other trending topics that defined 2021, visit yearinsearch.google/trends.

Making healthcare options more accessible on Search

Navigating the U.S. healthcare system can be quite challenging, so it’s no wonder three in four people turn to the internet first in their search for health information. By providing timely and authoritative health information, plus relevant resources and tools on Google Search, we’re always exploring ways to help people make more informed choices about their health. Here are a few new ways we’re helping.

New ways to find insurance information on Google

In the U.S., finding a doctor who accepts your health insurance is often a top priority. When searching for a specific provider, people can check which insurance networks that they might accept. And if they’re searching for a new provider overall, on mobile, they’re now able to filter providers nearby who accept Medicare — a health plan predominantly for people over the age of 65.

Mobile image showing Accepts Medicare filter on Healthcare Business Profiles.

How providers can keep patients up to date

To help people get connected to the care they need, we’re conducting checks to ensure details of local doctors are up to date, and giving all healthcare providers the ability to update their information by claiming and updating their Google Business Profile.

We continue to expand the features and tools that doctors can use to communicate about the services they offer. After claiming their profile, health professionals can edit and update information about their hours, services, and more.

Whether helping people find information to self-assess their symptoms for mental health conditions like depression or getting real time information of COVID-19 vaccine availability nearby, we continue to explore ways to connect people around the world to relevant and actionable information to better manage their health.

Enhanced Sleep Sensing in Nest Hub

Earlier this year, we launched Contactless Sleep Sensing in Nest Hub, an opt-in feature that can help users better understand their sleep patterns and nighttime wellness. While some of the most critical sleep insights can be derived from a person’s overall schedule and duration of sleep, that alone does not tell the complete story. The human brain has special neurocircuitry to coordinate sleep cycles — transitions between deep, light, and rapid eye movement (REM) stages of sleep — vital not only for physical and emotional wellbeing, but also for optimal physical and cognitive performance. Combining such sleep staging information with disturbance events can help you better understand what’s happening while you’re sleeping.

Today we announced enhancements to Sleep Sensing that provide deeper sleep insights. While not intended for medical purposes1, these enhancements allow better understanding of sleep through sleep stages and the separation of the user’s coughs and snores from other sounds in the room. Here we describe how we developed these novel technologies, through transfer learning techniques to estimate sleep stages and sensor fusion of radar and microphone signals to disambiguate the source of sleep disturbances.

To help people understand their sleep patterns, Nest Hub displays a hypnogram, plotting the user’s sleep stages over the course of a sleep session. Potential sound disturbances during sleep will now include “Other sounds” in the timeline to separate the user’s coughs and snores from other sound disturbances detected from sources in the room outside of the calibrated sleeping area.

Training and Evaluating the Sleep Staging Classification Model
Most people cycle through sleep stages 4-6 times a night, about every 80-120 minutes, sometimes with a brief awakening between cycles. Recognizing the value for users to understand their sleep stages, we have extended Nest Hub’s sleep-wake algorithms using Soli to distinguish between light, deep, and REM sleep. We employed a design that is generally similar to Nest Hub’s original sleep detection algorithm: sliding windows of raw radar samples are processed to produce spectrogram features, and these are continuously fed into a Tensorflow Lite model. The key difference is that this new model was trained to predict sleep stages rather than simple sleep-wake status, and thus required new data and a more sophisticated training process.

In order to assemble a rich and diverse dataset suitable for training high-performing ML models, we leveraged existing non-radar datasets and applied transfer learning techniques to train the model. The gold standard for identifying sleep stages is polysomnography (PSG), which employs an array of wearable sensors to monitor a number of body functions during sleep, such as brain activity, heartbeat, respiration, eye movement, and motion. These signals can then be interpreted by trained sleep technologists to determine sleep stages.

To develop our model, we used publicly available data from the Sleep Heart Health Study (SHHS) and Multi-ethnic Study of Atherosclerosis (MESA) studies with over 10,000 sessions of raw PSG sensor data with corresponding sleep staging ground-truth labels, from the National Sleep Research Resource. The thoracic respiratory inductance plethysmography (RIP) sensor data within these PSG datasets is collected through a strap worn around the patient’s chest to measure motion due to breathing. While this is a very different sensing modality from radar, both RIP and radar provide signals that can be used to characterize a participant’s breathing and movement. This similarity between the two domains makes it possible to leverage a plethysmography-based model and adapt it to work with radar.

To do so, we first computed spectrograms from the RIP time series signals and used these as features to train a convolutional neural network (CNN) to predict the groundtruth sleep stages. This model successfully learned to identify breathing and motion patterns in the RIP signal that could be used to distinguish between different sleep stages. This indicated to us that the same should also be possible when using radar-based signals.

To test the generality of this model, we substituted similar spectrogram features computed from Nest Hub’s Soli sensor and evaluated how well the model was able to generalize to a different sensing modality. As expected, the model trained to predict sleep stages from a plethysmograph sensor was much less accurate when given radar sensor data instead. However, the model still performed much better than chance, which demonstrated that it had learned features that were relevant across both domains.

To improve on this, we collected a smaller secondary dataset of radar sensor data with corresponding PSG-based groundtruth labels, and then used a portion of this dataset to fine-tune the weights of the initial model. This smaller amount of additional training data allowed the model to adapt the original features it had learned from plethysmography-based sleep staging and successfully generalize them to our domain. When evaluated on an unseen test set of new radar data, we found the fine-tuned model produced sleep staging results comparable to that of other consumer sleep trackers.

The custom ML model efficiently processes a continuous stream of 3D radar tensors (as shown in the spectrogram at the top of the figure) to automatically compute probabilities of each sleep stage — REM, light, and deep — or detect if the user is awake or restless.

More Intelligent Audio Sensing Through Audio Source Separation
Soli-based sleep tracking gives users a convenient and reliable way to see how much sleep they are getting and when sleep disruptions occur. However, to understand and improve their sleep, users also need to understand why their sleep may be disrupted. We’ve previously discussed how Nest Hub can help monitor coughing and snoring, frequent sources of sleep disturbances of which people are often unaware. To provide deeper insight into these disturbances, it is important to understand if the snores and coughs detected are your own.

The original algorithms on Nest Hub used an on-device, CNN-based detector to process Nest Hub’s microphone signal and detect coughing or snoring events, but this audio-only approach did not attempt to distinguish from where a sound originated. Combining audio sensing with Soli-based motion and breathing cues, we updated our algorithms to separate sleep disturbances from the user-specified sleeping area versus other sources in the room. For example, when the primary user is snoring, the snoring in the audio signal will correspond closely with the inhalations and exhalations detected by Nest Hub’s radar sensor. Conversely, when snoring is detected outside the calibrated sleeping area, the two signals will vary independently. When Nest Hub detects coughing or snoring but determines that there is insufficient correlation between the audio and motion features, it will exclude these events from the user’s coughing or snoring timeline and instead note them as “Other sounds” on Nest Hub’s display. The updated model continues to use entirely on-device audio processing with privacy-preserving analysis, with no raw audio data sent to Google’s servers. A user can then opt to save the outputs of the processing (sound occurrences, such as the number of coughs and snore minutes) in Google Fit, in order to view their night time wellness over time.

Snoring sounds that are synchronized with the user’s breathing pattern (left) will be displayed in the user’s Nest Hub’s Snoring timeline. Snoring sounds that do not align with the user’s breathing pattern (right) will be displayed in Nest Hub’s “Other sounds” timeline.

Since Nest Hub with Sleep Sensing launched, researchers have expressed interest in investigational studies using Nest Hub’s digital quantification of nighttime cough. For example, a small feasibility study supported by the Cystic Fibrosis Foundation2 is currently underway to evaluate the feasibility of measuring night time cough using Nest Hub in families of children with cystic fibrosis (CF), a rare inherited disease, which can result in a chronic cough due to mucus in the lungs. Researchers are exploring if quantifying cough at night could be a proxy for monitoring response to treatment.

Conclusion
Based on privacy-preserving radar and audio signals, these improved sleep staging and audio sensing features on Nest Hub provide deeper insights that we hope will help users translate their night time wellness into actionable improvements for their overall wellbeing.

Acknowledgements
This work involved collaborative efforts from a multidisciplinary team of software engineers, researchers, clinicians, and cross-functional contributors. Special thanks to Dr. Logan Schneider, a sleep neurologist whose clinical expertise and contributions were invaluable to continuously guide this research. In addition to the authors, key contributors to this research include Anupam Pathak, Jeffrey Yu, Arno Charton, Jian Cui, Sinan Hersek, Jonathan Hsu, Andi Janti, Linda Lei, Shao-Po Ma, ‎Jo Schaeffer, Neil Smith, Siddhant Swaroop, Bhavana Koka, Dr. Jim Taylor, and the extended team. Thanks to Mark Malhotra and Shwetak Patel for their ongoing leadership, as well as the Nest, Fit, and Assistant teams we collaborated with to build and validate these enhancements to Sleep Sensing on Nest Hub.


1Not intended to diagnose, cure, mitigate, prevent or treat any disease or condition. 
2Google did not have any role in study design, execution, or funding. 

Source: Google AI Blog


Daylight Saving Time tips from Google’s sleep scientist

As the days get shorter and colder, it’s getting much harder for us to step out from under our bed covers and into the dark morning. When Daylight Saving Time ends this weekend in Europe and the weekend after in North America, we’ll need to adjust ourselves even more. So, what’s the best way to deal with the new sleeping schedule?

The Nest team spoke to Dr. Logan Schneider who gave us five tips to get your winter sleep schedule ready. Originally a sleep scientist at Stanford Medicine, Logan is now the sleep expert at Google Health. He’s also the brain behind Sleep Sensing on the new Google Nest Hub, the smart screen that helps you get a better night's sleep.

Start adjusting on time… or don’t adjust at all

That extra hour of sleep this weekend can feel like jet lag for some. Soon, your sleep rhythm might make you want to go to bed earlier than usual. Logan's advice is to start preparing a few days in advance to make the transition easier for your body. Dr. Logan says: “Rather than shifting your bedtime and wake time by an hour at once, you could try shifting them over four days, so that’s by 15 minutes a day. Start two days before the clocks change, and wrap up two days after.”

The time change can be even more dreadful for kids and their parents. Dr. Logan applies the same principles to kids as above, but makes the night of the time change extra fun: “I allow my kids to wake up 15 minutes later on the Friday before the time change, and again on Saturday morning. On Saturday night, the kids get to stay up an hour later than usual. I make sure we're watching a movie in a bright light environment, because that helps push the clock a bit later. They wake up at the usual time on Sunday.”

For adults, there might be an even better way: why adjust to the new schedule at all? “You could simply take advantage of being an early bird and just stay on the earlier schedule”, Logan says. Nest Hub with Sleep Sensing can help you monitor your sleep schedule and suggest a new bedtime and wake time recommendation after the transition.

Find your perfect room temperature

People often think that a cool room (16-19˚C or 61-66˚F) is better for sleeping, but according to Dr. Logan, there is no one-size-fits-all temperature in the bedroom. He recommends finding a temperature that is comfortable for you throughout the night. An uncomfortably cold or warm bedroom can affect the quality of your REM sleep, which is an important phase of your night's rest.

Nest Hub keeps track of the average temperature at night. Did you sleep well? Great! Take note of the temperature that Nest Hub measured for you on the Sleep Quality page and make sure that your bedroom is set to that temperature from now on.

Embrace the winter cold once you wake up

We’ve all been there: the alarm goes off, your eyes won't open and the thought of walking in the cold to the bathroom makes you want to stay in bed even more. However, embracing a cold winter’s day is actually a good idea.

Dr. Logan says: “The cold can serve as a cue to your body that it’s time to wake up. So, while you may not want to leave your cozy bed, walking around on a cool floor or washing your face with cold water can be just the invigorating experience your body needs to get going in the morning.”

Never snooze again

As the saying goes: You snooze, you lose. Dr. Logan says: “When using the snooze function, not only are you delaying the inevitable, you’re also not using the extra time well. Falling back to sleep after an alarm takes time. Between each ring of the alarm you’re not getting as much sleep as you think. Your brain can spend up to half of the time falling back to sleep!”

In short, your snoozy nap isn’t really that helpful. It’s better to get up immediately when your alarm goes off.

Imitate a sunrise

Humans are naturally accustomed to waking up to sunlight. Yet, in the winter months, waking up during a dark morning might feel like waking up in the middle of the night. Light plays a key role in your sleep rhythm, says Dr. Logan: “It’s important to use light to help wake up, because your body relies on exposure to light when you’re waking up to set its internal clock for the next sleep period.”

A picture of the Nest Hub with an orange morning glow, sitting on a night stand next to a bed.

Fortunately, Nest Hub’s Sunrise Alarm can help you wake up from your deepest sleep peacefully. It gradually brightens up the screen, just like a sunrise, and then slowly increases the alarm volume. Good morning sunshine!

How Underspecification Presents Challenges for Machine Learning

Machine learning (ML) models are being used more widely today than ever before and are becoming increasingly impactful. However, they often exhibit unexpected behavior when they are used in real-world domains. For example, computer vision models can exhibit surprising sensitivity to irrelevant features, while natural language processing models can depend unpredictably on demographic correlations not directly indicated by the text. Some reasons for these failures are well-known: for example, training ML models on poorly curated data, or training models to solve prediction problems that are structurally mismatched with the application domain. Yet, even when these known problems are handled, model behavior can still be inconsistent in deployment, varying even between training runs.

In “Underspecification Presents Challenges for Credibility in Modern Machine Learning”, to be published in the Journal of Machine Learning Research, we show that a key failure mode especially prevalent in modern ML systems is underspecification. The idea behind underspecification is that while ML models are validated on held-out data, this validation is often insufficient to guarantee that the models will have well-defined behavior when they are used in a new setting. We show that underspecification appears in a wide variety of practical ML systems and suggest some strategies for mitigation.

Underspecification
ML systems have been successful largely because they incorporate validation of the model on held-out data to ensure high performance. However, for a fixed dataset and model architecture, there are often many distinct ways that a trained model can achieve high validation performance. But under standard practice, models that encode distinct solutions are often treated as equivalent because their held-out predictive performance is approximately equivalent.

Importantly, the distinctions between these models do become clear when they are measured on criteria beyond standard predictive performance, such as fairness or robustness to irrelevant input perturbations. For example, among models that perform equally well on standard validations, some may exhibit greater performance disparities between social groups than others, or rely more heavily on irrelevant information. These differences, in turn, can translate to real differences in behavior when the model is used in real-world scenarios.

Underspecification refers to this gap between the requirements that practitioners often have in mind when they build an ML model, and the requirements that are actually enforced by the ML pipeline (i.e., the design and implementation of a model). An important consequence of underspecification is that even if the pipeline could in principle return a model that meets all of these requirements, there is no guarantee that in practice the model will satisfy any requirement beyond accurate prediction on held-out data. In fact, the model that is returned may have properties that instead depend on arbitrary or opaque choices made in the implementation of the ML pipeline, such as those arising from random initialization seeds, data ordering, hardware, etc. Thus, ML pipelines that do not include explicit defects may still return models that behave unexpectedly in real-world settings.

Identifying Underspecification in Real Applications
In this work, we investigated concrete implications of underspecification in the kinds of ML models that are used in real-world applications. Our empirical strategy was to construct sets of models using nearly identical ML pipelines, to which we only applied small changes that had no practical effect on standard validation performance. Here, we focused on the random seed used to initialize training and determine data ordering. If important properties of the model can be substantially influenced by these changes, it indicates that the pipeline does not fully specify this real-world behavior. In every domain where we conducted this experiment, we found that these small changes induced substantial variation on axes that matter in real-world use.

Underspecification in Computer Vision
As an example, consider underspecification and its relationship to robustness in computer vision. A central challenge in computer vision is that deep models often suffer from brittleness under distribution shifts that humans do not find challenging. For instance, image classification models that perform well on the ImageNet benchmark are known to perform poorly on benchmarks like ImageNet-C, which apply common image corruptions, such as pixelization or motion blur, to the standard ImageNet test set.

In our experiment, we showed that model sensitivity to these corruptions is underspecified by standard pipelines. Following the strategy discussed above, we generated fifty ResNet-50 image classification models using the same pipeline and the same data. The only difference between these models was the random seed used in training. When evaluated on the standard ImageNet validation set, these models achieved practically equivalent performance. However, when the models were evaluated on different test sets in the ImageNet-C benchmark (i.e., on corrupted data), performance on some tests varied by orders of magnitude more than on standard validations. This pattern persisted for larger-scale models that were pre-trained on much larger datasets (e.g., a BiT-L model pre-trained on the 300 million image JFT-300M dataset). For these models, varying the random seed at the fine-tuning stage of training produced a similar pattern of variations.

Left: Parallel axis plots showing the variation in accuracy between identical, randomly initialized ResNet-50 models on strongly corrupted ImageNet-C data. Lines represent the performance of each model in the ensemble on classification tasks using uncorrupted test data, as well as corrupted data (pixelation, contrast, motion blur, and brightness). Given values are the deviation in accuracy from the ensemble mean, scaled by the standard deviation of accuracies on the “clean” ImageNet test set. The solid black line highlights the performance of an arbitrarily selected model to show how performance on one test may not be a good indication of performance on others. Right: Example images from the standard ImageNet test set, with corrupted versions from the ImageNet-C benchmark.

We also showed that underspecification can have practical implications in special-purpose computer vision models built for medical imaging, where deep learning models have shown great promise. We considered two research pipelines intended as precursors for medical applications: one ophthalmology pipeline for building models that detect diabetic retinopathy and referable diabetic macular edema from retinal fundus images, and one dermatology pipeline for building models to recognize common dermatological conditions from photographs of skin. In our experiments, we considered pipelines that were validated only on randomly held-out data.

We then stress-tested models produced by these pipelines on practically important dimensions. For the ophthalmology pipeline, we tested how models trained with different random seeds performed when applied to images taken from a new camera type not encountered during training. For the dermatology pipeline, the stress test was similar, but for patients with different estimated skin types (i.e., non-dermatologist evaluation of tone and response to sunlight). In both cases, we found that standard validations were not enough to fully specify the trained model’s performance on these axes. In the ophthalmology application, the random seed used in training induced wider variability in performance on a new camera type than would have been expected from standard validations, and in the dermatology application, the random seed induced similar variation in performance in skin-type subgroups, even though the overall performance of the models was stable across seeds.

These results reiterate that standard hold-out testing alone is not sufficient to ensure acceptable model behavior in medical applications, underscoring the need for expanded testing protocols for ML systems intended for application in the medical domain. In the medical literature, such validations are termed "external validation" and have historically been part of reporting guidelines such as STARD and TRIPOD. These are being emphasized in updates such as STARD-AI and TRIPOD-AI. Finally, as part of regulated medical device development processes (see, e.g., US and EU regulations), there are other forms of safety and performance related considerations, such as mandatory compliance to standards for risk management, human factors engineering, clinical validations and accredited body reviews, that aim to ensure acceptable medical application performance.

Relative variability of medical imaging models on stress tests, using the same conventions as the figure above. Top left: Variation in AUC between diabetic retinopathy classification models trained using different random seeds when evaluated on images from different camera types. In this experiment, camera type 5 was not encountered during training. Bottom left: Variation in accuracy between skin condition classification models trained using different random seeds when evaluated on different estimated skin types (approximated by dermatologist-trained laypersons from retrospective photographs and potentially subject to labeling errors). Right: example images from the original test set (left) and the stress test set (right).

Underspecification in Other Applications

The cases discussed above are a small subset of models that we probed for underspecification. Other cases we examined include:

  • Natural Language Processing: We showed that on a variety of NLP tasks, underspecification affected how models derived from BERT-processed sentences. For example, depending on the random seed, a pipeline could produce a model that depends more or less on correlations involving gender (e.g., between gender and occupation) when making predictions.
  • Acute Kidney Injury (AKI) prediction: We showed that underspecification affects reliance on operational versus physiological signals in AKI prediction models based on electronic health records.
  • Polygenic Risk Scores (PRS): We showed that underspecification influences the ability for (PRS) models, which predict clinical outcomes based on patient genomic data, to generalize across different patient populations.

In each case, we showed that these important properties are left ill-defined by standard training pipelines, making them sensitive to seemingly innocuous choices.

Conclusion
Addressing underspecification is a challenging problem. It requires full specification and testing of requirements for a model beyond standard predictive performance. Doing this well needs full engagement with the context in which the model will be used, an understanding of how the training data were collected, and often, incorporation of domain expertise when the available data fall short. These aspects of ML system design are often underemphasized in ML research today. A key goal of this work is to show how underinvestment in this area can manifest concretely, and to encourage the development of processes for fuller specification and testing of ML pipelines.

Some important first steps in this area are to specify stress testing protocols for any applied ML pipeline that is meant to see real-world use. Once these criteria are codified in measurable metrics, a number of different algorithmic strategies may be useful for improving them, including data augmentation, pretraining, and incorporation of causal structure. It should be noted, however, that ideal stress testing and improvement processes will usually require iteration: both the requirements for ML systems, and the world in which they are used, are constantly changing.

Acknowledgements
We would like to thank all of our co-authors, Dr. Nenad Tomasev (DeepMind), Prof. Finale Doshi-Velez (Harvard SEAS), UK Biobank, and our partners, EyePACS, Aravind Eye Hospital and Sankara Nethralaya.

Source: Google AI Blog


HLTH: Building on our commitments in health

Tonight kicked off the HLTH event in Boston that brings together leaders across health to discuss healthcare's most pressing problems and how we can tackle them to improve care delivery and outcomes.

Over the past two years, the pandemic shined a light on the importance of our collective health — and the role the private sector, payers, healthcare delivery organizations, governments and public health play in keeping communities healthy. For us at Google, we saw Search, Maps and YouTube become critical ways for people to learn about COVID-19. So we partnered with public health organizations to provide information that helped people stay safe, find testing and get vaccinated. In addition, we provided healthcare organizations, researchers and non-profits with tools, data and resources to support pandemic response and research efforts.

As I mentioned on the opening night of HLTH, Google Health is our company-wide effort to help billions of people be healthier by leaning on our strengths: organizing information and developing innovative technology. Beyond the pandemic, we have an opportunity to continue helping people to address health more holistically through the Google products they use every day and equipping healthcare teams with tools and solutions that help them improve care.

Throughout the conference, leaders from Google Health will share more about the work we’re doing and the partnerships needed across the health industry to improve health outcomes.

Meeting people in their everyday moments and empowering them to be healthier

People are increasingly turning to technology to manage their daily health and wellbeing — from using wearables and apps to track fitness goals, to researching conditions and building community around those with similar health experiences. At Google, we’re working to connect people with accurate, timely and actionable information and tools that can help them manage their health and achieve their goals.

On Monday, Dr. Garth Graham, who leads healthcare and public health partnerships for YouTube, will join the panel “Impactful Health Information Sharing” to discuss video as a powerful medium to connect people with engaging and high-quality health information. YouTube has been working closely with organizations, like the American College of Physicians, the National Alliance on Mental Illness and Mass General Brigham, to increase authoritative video content.

On Tuesday, Fitbit’s Dr. John Moore will join a panel on “The Next Generation of Health Consumers” focusing on how tools and technologies can help people take charge of their health and wellness between doctors’ visits — especially for younger generations. Regardless of age, there’s a huge opportunity for products like Fitbit to deliver daily, actionable insights into issues that can have a huge impact on overall health, like fitness, stress and sleep.

Helping health systems unlock the potential of healthcare data

Across Google Health, we’re building solutions and tools to help unlock the potential of healthcare data and transform care delivery. Care Studio, for example, helps clinicians at the point of care by bringing together patient information from different EHR systems into an integrated view. We’ve been piloting this tool at select hospital sites in the U.S. and soon clinicians in the pilot will have access to the Care Studio Mobile app so they can quickly access the critical patient information they need, wherever they are — whether that’s bedside, at clinic or in a hospital corridor.

In addition to Care Studio, we’re developing solutions that will bring greater interoperability to healthcare data, helping organizations deliver better care. Hear more from Aashima Gupta, Google Cloud’s global head of healthcare solutions, at HLTH in two sessions. On Monday, October 18, Aashima will discuss how digital strategies can reboot healthcare operations, and on Tuesday, October 19 she will join the panel “Turning of the Data Tides” to discuss different approaches to data interoperability and patient access to health records.

Building for everyone

Where people live, work and learn can greatly impact their experience with health. Behind many of our products and initiatives are industry experts and leaders who are making sure we build for everyone, and create an inclusive environment for that work to take place. During the Women at HLTH Luncheon on Tuesday, Dr. Ivor Horn, our Director of Health Equity, will share her career journey rooted in advocacy, entrepreneurship and activism.

From our early days as a company, Google has sought to improve the lives of as many people as possible. Helping people live healthier lives is one of the most impactful ways we can do that. It will take more than a single feature, product or initiative to improve health outcomes for everyone. If we work together across the healthcare industry and embed health into all our work, we can make the greatest impact.

For more information about speakers at HLTH, check out the full agenda.

An ML-Based Framework for COVID-19 Epidemiology

Over the past 20 months, the COVID-19 pandemic has had a profound impact on daily life, presented logistical challenges for businesses planning for supply and demand, and created difficulties for governments and organizations working to support communities with timely public health responses. While there have been well-studied epidemiology models that can help predict COVID-19 cases and deaths to help with these challenges, this pandemic has generated an unprecedented amount of real-time publicly-available data, which makes it possible to use more advanced machine learning techniques in order to improve results.

In "A prospective evaluation of AI-augmented epidemiology to forecast COVID-19 in the USA and Japan", accepted to npj Digital Medicine, we continued our previous work [1, 2, 3, 4] and proposed a framework designed to simulate the effect of certain policy changes on COVID-19 deaths and cases, such as school closings or a state-of-emergency at a US-state, US-county, and Japan-prefecture level, using only publicly-available data. We conducted a 2-month prospective assessment of our public forecasts, during which our US model tied or outperformed all other 33 models on COVID19 Forecast Hub. We also released a fairness analysis of the performance on protected sub-groups in the US and Japan. Like other Google initiatives to help with COVID-19 [1, 2, 3], we are releasing daily forecasts based on this work to the public for free, on the web [us, ja] and through BigQuery.

Prospective forecasts for the USA and Japan models. Ground truth cumulative deaths counts (green lines) are shown alongside the forecasts for each day. Each daily forecast contains a predicted increase in deaths for each day during the prediction window of 4 weeks (shown as colored dots, where shading shifting to yellow indicates days further from the date of prediction in the forecasting horizon, up to 4 weeks). Predictions of deaths are shown for the USA (above) and Japan (below).

The Model
Models for infectious diseases have been studied by epidemiologists for decades. Compartmental models are the most common, as they are simple, interpretable, and can fit different disease phases effectively. In compartmental models, individuals are separated into mutually exclusive groups, or compartments, based on their disease status (such as susceptible, exposed, or recovered), and the rates of change between these compartments are modeled to fit the past data. A population is assigned to compartments representing disease states, with people flowing between states as their disease status changes.

In this work, we propose a few extensions to the Susceptible-Exposed-Infectious-Removed (SEIR) type compartmental model. For example, susceptible people becoming exposed causes the susceptible compartment to decrease and the exposed compartment to increase, with a rate that depends on disease spreading characteristics. Observed data for COVID-19 associated outcomes, such as confirmed cases, hospitalizations and deaths, are used for training of compartmental models.

Visual explanation of "compartmental” models in epidemiology. People "flow" between compartments. Real-world events, like policy changes and more ICU beds, change the rate of flow between compartments.

Our framework proposes a number of novel technical innovations:

  1. Learned transition rates: Instead of using static rates for transitions between compartments across all locations and times, we use machine-learned rates to map them. This allows us to take advantage of the vast amount of available data with informative signals, such as Google's COVID-19 Community Mobility Reports, healthcare supply, demographics, and econometrics features.
  2. Explainability: Our framework provides explainability for decision makers, offering insights on disease propagation trends via its compartmental structure, and suggesting which factors may be most important for driving compartmental transitions.
  3. Expanded compartments: We add hospitalization, ICU, ventilator, and vaccine compartments and demonstrate efficient training despite data sparsity.
  4. Information sharing across locations: As opposed to fitting to an individual location, we have a single model for all locations in a country (e.g., >3000 US counties) with distinct dynamics and characteristics, and we show the benefit of transferring information across locations.
  5. Seq2seq modeling: We use a sequence-to-sequence model with a novel partial teacher forcing approach that minimizes amplified growth of errors into the future.

Forecast Accuracy
Each day, we train models to predict COVID-19 associated outcomes (primarily deaths and cases) 28 days into the future. We report the mean absolute percentage error (MAPE) for both a country-wide score and a location-level score, with both cumulative values and weekly incremental values for COVID-19 associated outcomes.

We compare our framework with alternatives for the US from the COVID19 Forecast Hub. In MAPE, our models outperform all other 33 models except one — the ensemble forecast that also includes our model’s predictions, where the difference is not statistically significant.

We also used prediction uncertainty to estimate whether a forecast is likely to be accurate. If we reject forecasts that the model considers uncertain, we can improve the accuracy of the forecasts that we do release. This is possible because our model has well-calibrated uncertainty.

Mean average percentage error (MAPE, the lower the better) decreases as we remove uncertain forecasts, increasing accuracy.

What-If Tool to Simulate Pandemic Management Policies and Strategies
In addition to understanding the most probable scenario given past data, decision makers are interested in how different decisions could affect future outcomes, for example, understanding the impact of school closures, mobility restrictions and different vaccination strategies. Our framework allows counterfactual analysis by replacing the forecasted values for selected variables with their counterfactual counterparts. The results of our simulations reinforce the risk of prematurely relaxing non-pharmaceutical interventions (NPIs) until the rapid disease spreading is reduced. Similarly, the Japan simulations show that maintaining the State of Emergency while having a high vaccination rate greatly reduces infection rates.

What-if simulations on the percent change of predicted exposed individuals assuming different non-pharmaceutical interventions (NPIs) for the prediction date of March 1, 2021 in Texas, Washington and South Carolina. Increased NPI restrictions are associated with a larger % reduction in the number of exposed people.
What-if simulations on the percent change of predicted exposed individuals assuming different vaccination rates for the prediction date of March 1, 2021 in Texas, Washington and South Carolina. Increased vaccination rate also plays a key role to reduce exposed count in these cases.

Fairness Analysis
To ensure that our models do not create or reinforce unfairly biased decision making, in alignment with our AI Principles, we performed a fairness analysis separately for forecasts in the US and Japan by quantifying whether the model's accuracy was worse on protected sub-groups. These categories include age, gender, income, and ethnicity in the US, and age, gender, income, and country of origin in Japan. In all cases, we demonstrated no consistent pattern of errors among these groups once we controlled for the number of COVID-19 deaths and cases that occur in each subgroup.

Normalized errors by median income. The comparison between the two shows that patterns of errors don't persist once errors are normalized by cases. Left: Normalized errors by median income for the US. Right: Normalized errors by median income for Japan.

Real-World Use Cases
In addition to quantitative analyses to measure the performance of our models, we conducted a structured survey in the US and Japan to understand how organisations were using our model forecasts. In total, seven organisations responded with the following results on the applicability of the model.

  • Organization type: Academia (3), Government (2), Private industry (2)
  • Main user job role: Analyst/Scientist (3), Healthcare professional (1), Statistician (2), Managerial (1)
  • Location: USA (4), Japan (3)
  • Predictions used: Confirmed cases (7), Death (4), Hospitalizations (4), ICU (3), Ventilator (2), Infected (2)
  • Model use case: Resource allocation (2), Business planning (2), scenario planning (1), General understanding of COVID spread (1), Confirm existing forecasts (1)
  • Frequency of use: Daily (1), Weekly (1), Monthly (1)
  • Was the model helpful?: Yes (7)

To share a few examples, in the US, the Harvard Global Health Institute and Brown School of Public Health used the forecasts to help create COVID-19 testing targets that were used by the media to help inform the public. The US Department of Defense used the forecasts to help determine where to allocate resources, and to help take specific events into account. In Japan, the model was used to make business decisions. One large, multi-prefecture company with stores in more than 20 prefectures used the forecasts to better plan their sales forecasting, and to adjust store hours.

Limitations and next steps
Our approach has a few limitations. First, it is limited by available data, and we are only able to release daily forecasts as long as there is reliable, high-quality public data. For instance, public transportation usage could be very useful but that information is not publicly available. Second, there are limitations due to the model capacity of compartmental models as they cannot model very complex dynamics of Covid-19 disease propagation. Third, the distribution of case counts and deaths are very different between the US and Japan. For example, most of Japan's COVID-19 cases and deaths have been concentrated in a few of its 47 prefectures, with the others experiencing low values. This means that our per-prefecture models, which are trained to perform well across all Japanese prefectures, often have to strike a delicate balance between avoiding overfitting to noise while getting supervision from these relatively COVID-19-free prefectures.

We have updated our models to take into account large changes in disease dynamics, such as the increasing number of vaccinations. We are also expanding to new engagements with city governments, hospitals, and private organizations. We hope that our public releases continue to help public and policy-makers address the challenges of the ongoing pandemic, and we hope that our method will be useful to epidemiologists and public health officials in this and future health crises.

Acknowledgements
This paper was the result of hard work from a variety of teams within Google and collaborators around the globe. We'd especially like to thank our paper co-authors from the School of Medicine at Keio University, Graduate School of Public Health at St Luke’s International University, and Graduate School of Medicine at The University of Tokyo.

Source: Google AI Blog


Self-Supervised Learning Advances Medical Image Classification

In recent years, there has been increasing interest in applying deep learning to medical imaging tasks, with exciting progress in various applications like radiology, pathology and dermatology. Despite the interest, it remains challenging to develop medical imaging models, because high-quality labeled data is often scarce due to the time-consuming effort needed to annotate medical images. Given this, transfer learning is a popular paradigm for building medical imaging models. With this approach, a model is first pre-trained using supervised learning on a large labeled dataset (like ImageNet) and then the learned generic representation is fine-tuned on in-domain medical data.

Other more recent approaches that have proven successful in natural image recognition tasks, especially when labeled examples are scarce, use self-supervised contrastive pre-training, followed by supervised fine-tuning (e.g., SimCLR and MoCo). In pre-training with contrastive learning, generic representations are learned by simultaneously maximizing agreement between differently transformed views of the same image and minimizing agreement between transformed views of different images. Despite their successes, these contrastive learning methods have received limited attention in medical image analysis and their efficacy is yet to be explored.

In “Big Self-Supervised Models Advance Medical Image Classification”, to appear at the International Conference on Computer Vision (ICCV 2021), we study the effectiveness of self-supervised contrastive learning as a pre-training strategy within the domain of medical image classification. We also propose Multi-Instance Contrastive Learning (MICLe), a novel approach that generalizes contrastive learning to leverage special characteristics of medical image datasets. We conduct experiments on two distinct medical image classification tasks: dermatology condition classification from digital camera images (27 categories) and multilabel chest X-ray classification (5 categories). We observe that self-supervised learning on ImageNet, followed by additional self-supervised learning on unlabeled domain-specific medical images, significantly improves the accuracy of medical image classifiers. Specifically, we demonstrate that self-supervised pre-training outperforms supervised pre-training, even when the full ImageNet dataset (14M images and 21.8K classes) is used for supervised pre-training.

SimCLR and Multi Instance Contrastive Learning (MICLe)
Our approach consists of three steps: (1) self-supervised pre-training on unlabeled natural images (using SimCLR); (2) further self-supervised pre-training using unlabeled medical data (using either SimCLR or MICLe); followed by (3) task-specific supervised fine-tuning using labeled medical data.

Our approach comprises three steps: (1) Self-supervised pre-training on unlabeled ImageNet using SimCLR (2) Additional self-supervised pre-training using unlabeled medical images. If multiple images of each medical condition are available, a novel Multi-Instance Contrastive Learning (MICLe) strategy is used to construct more informative positive pairs based on different images. (3) Supervised fine-tuning on labeled medical images. Note that unlike step (1), steps (2) and (3) are task and dataset specific.

After the initial pre-training with SimCLR on unlabeled natural images is complete, we train the model to capture the special characteristics of medical image datasets. This, too, can be done with SimCLR, but this method constructs positive pairs only through augmentation and does not readily leverage patients' meta data for positive pair construction. Alternatively, we use MICLe, which uses multiple images of the underlying pathology for each patient case, when available, to construct more informative positive pairs for self-supervised learning. Such multi-instance data is often available in medical imaging datasets — e.g., frontal and lateral views of mammograms, retinal fundus images from each eye, etc.

Given multiple images of a given patient case, MICLe constructs a positive pair for self-supervised contrastive learning by drawing two crops from two distinct images from the same patient case. Such images may be taken from different viewing angles and show different body parts with the same underlying pathology. This presents a great opportunity for self-supervised learning algorithms to learn representations that are robust to changes of viewpoint, imaging conditions, and other confounding factors in a direct way. MICLe does not require class label information and only relies on different images of an underlying pathology, the type of which may be unknown.

MICLe generalizes contrastive learning to leverage special characteristics of medical image datasets (patient metadata) to create realistic augmentations, yielding further performance boost of image classifiers.

Combining these self-supervised learning strategies, we show that even in a highly competitive production setting we can achieve a sizable gain of 6.7% in top-1 accuracy on dermatology skin condition classification and an improvement of 1.1% in mean AUC on chest X-ray classification, outperforming strong supervised baselines pre-trained on ImageNet (the prevailing protocol for training medical image analysis models). In addition, we show that self-supervised models are robust to distribution shift and can learn efficiently with only a small number of labeled medical images.

Comparison of Supervised and Self-Supervised Pre-training
Despite its simplicity, we observe that pre-training with MICLe consistently improves the performance of dermatology classification over the original method of pre-training with SimCLR under different pre-training dataset and base network architecture choices. Using MICLe for pre-training, translates to (1.18 ± 0.09)% increase in top-1 accuracy for dermatology classification over using SimCLR. The results demonstrate the benefit accrued from utilizing additional metadata or domain knowledge to construct more semantically meaningful augmentations for contrastive pre-training. In addition, our results suggest that wider and deeper models yield greater performance gains, with ResNet-152 (2x width) models often outperforming ResNet-50 (1x width) models or smaller counterparts.

Comparison of supervised and self-supervised pre-training, followed by supervised fine-tuning using two architectures on dermatology and chest X-ray classification. Self-supervised learning utilizes unlabeled domain-specific medical images and significantly outperforms supervised ImageNet pre-training.

Improved Generalization with Self-Supervised Models
For each task we perform pretraining and fine-tuning using the in-domain unlabeled and labeled data respectively. We also use another dataset obtained in a different clinical setting as a shifted dataset to further evaluate the robustness of our method to out-of-domain data. For the chest X-ray task, we note that self-supervised pre-training with either ImageNet or CheXpert data improves generalization, but stacking them both yields further gains. As expected, we also note that when only using ImageNet for self-supervised pre-training, the model performs worse compared to using only in-domain data for pre-training.

To test the performance under distribution shift, for each task, we held out additional labeled datasets for testing that were collected under different clinical settings. We find that the performance improvement in the distribution-shifted dataset (ChestX-ray14) by using self-supervised pre-training (both using ImageNet and CheXpert data) is more pronounced than the original improvement on the CheXpert dataset. This is a valuable finding, as generalization under distribution shift is of paramount importance to clinical applications. On the dermatology task, we observe similar trends for a separate shifted dataset that was collected in skin cancer clinics and had a higher prevalence of malignant conditions. This demonstrates that the robustness of the self-supervised representations to distribution shifts is consistent across tasks.

Evaluation of models on distribution-shifted datasets for the chest-xray interpretation task. We use the model trained on in-domain data to make predictions on an additional shifted dataset without any further fine-tuning (zero-shot transfer learning). We observe that self-supervised pre-training leads to better representations that are more robust to distribution shifts.
Evaluation of models on distribution-shifted datasets for the dermatology task. Our results generally suggest that self-supervised pre-trained models can generalize better to distribution shifts with MICLe pre-training leading to the most gains.

Improved Label Efficiency
We further investigate the label-efficiency of the self-supervised models for medical image classification by fine-tuning the models on different fractions of labeled training data. We use label fractions ranging from 10% to 90% for both Derm and CheXpert training datasets and examine how the performance varies using the different available label fractions for the dermatology task. First, we observe that pre-training using self-supervised models can compensate for low label efficiency for medical image classification, and across the sampled label fractions, self-supervised models consistently outperform the supervised baseline. These results also suggest that MICLe yields proportionally higher gains when fine-tuning with fewer labeled examples. In fact, MICLe is able to match baselines using only 20% of the training data for ResNet-50 (4x) and 30% of the training data for ResNet152 (2x).

Top-1 accuracy for dermatology condition classification for MICLe, SimCLR, and supervised models under different unlabeled pre-training datasets and varied sizes of label fractions. MICLe is able to match baselines using only 20% of the training data for ResNet-50 (4x).

Conclusion
Supervised pre-training on natural image datasets is commonly used to improve medical image classification. We investigate an alternative strategy based on self-supervised pre-training on unlabeled natural and medical images and find that it can significantly improve upon supervised pre-training, the standard paradigm for training medical image analysis models. This approach can lead to models that are more accurate and label efficient and are robust to distribution shifts. In addition, our proposed Multi-Instance Contrastive Learning method (MICLe) enables the use of additional metadata to create realistic augmentations, yielding further performance boost of image classifiers.

Self-supervised pre-training is much more scalable than supervised pre-training because class label annotation is not required. We hope this paper will help popularize the use of self-supervised approaches in medical image analysis yielding label efficient and robust models suited for clinical deployment at scale in the real world.

Acknowledgements
This work involved collaborative efforts from a multidisciplinary team of researchers, software engineers, clinicians, and cross-functional contributors across Google Health and Google Brain. We thank our co-authors: Basil Mustafa, Fiona Ryan, Zach Beaver, Jan Freyberg, Jon Deaton, Aaron Loh, Alan Karthikesalingam, Simon Kornblith, Ting Chen, Vivek Natarajan, and Mohammad Norouzi. We also thank Yuan Liu from Google Health for valuable feedback and our partners for access to the datasets used in the research.

Source: Google AI Blog


These researchers are driving health equity with Fitbit

Under-resourced communities across the country have long faced disparities in health due to structural and long-standing inequities. Unfortunately, the pandemic has further widened many of these gaps.

Still, health equity research in digital health remains limited. To help address these issues, we announced the Fitbit Health Equity Research Initiative earlier this year to help support underrepresented researchers who are early in their careers and working to address health disparities in communities.

Over the past decade, researchers have used Fitbit devicesin over 900 health studies, in areas like diabetes, heart disease, oncology, mental health, infectious disease and more. Today, we’re awarding six researchers more than a total of $300,000 in Fitbit devices and services to support their research projects. Additionally, Fitbit’s long-time partner, Fitabase, will provide all projects with access to their data management platform to help researchers maximize study participation and analysis.

Learn more about the awardees and their research:

A photo of Sherilyn Francis of Georgia Tech

Improving postpartum care for rural black women

Black women in the U.S. are two to three times more likely to die from pregnancy or childbirth when compared to their white counterparts. And in Georgia, the disparities are more pronounced among rural populations. “As Black women who reside in Georgia, we’re more likely to die simply by becoming pregnant,” shares Sherilyn Francis, a PhD student in Georgia Tech’s Human-Centered Computing program. Her research aims to improve postpartum care for rural Black mothers through a culturally informed mobile health intervention. As part of the study, participants will receive a Fitbit Sense smartwatch and Fitbit Aria Air scale. By combining insight into physical activity, heart rate, sleep, weight and nutritional data with health outcomes, Sherilyn and her colleagues hope to shed light on ways to reduce the risk of severe maternal morbidity for Black mothers.

A photo of Jessee Dietch of Oregon State University

A look at sleep health in transgender youth

Transgender youth (ages 14-19) are at elevated risk for poor sleep health and associated physical and mental health outcomes. However, there’s no research to date that examines how medical transition and the use of gender-affirming hormone therapy impact sleep health. Jessee Dietch, PhD, who is an assistant professor of psychology at Oregon State University, will analyze participants’ sleep using a Fitbit Charge 5. The hope is that the findings will highlight potential points for sleep health intervention that could lead to improved wellbeing for a community that is already at an elevated risk for poor health outcomes.

A photo of  Rony F. Santiago of Sansum Diabetes Research Institute

Preventing the progression of type 2 diabetes in Latino adults

The causes and complications of type 2 diabetes (T2D) disproportionately impact Latinos. Motivated by personal experiences, Rony F. Santiago, MA, is an early-career researcher at Sansum Diabetes Research Institute and manages T2D programs that support the Santa Barbara community. Rony and his team, in collaboration with researchers at Texas A&M University, aim to recruit healthy Latino participants and those with pre-diabetes or T2D who will each receive a continuous glucose monitor and a Fitbit Sense smartwatch. They hope to analyze physical activity, nutrition tracking and sleep patterns to better understand the impact these behaviors can have on blood sugar and the potential to improve health outcomes, including the progression from pre-diabetes to T2D.

A photo of Toluwalase Ajayi of Scripps Research

Investigating how systemic racism impacts maternal and fetal health

Black and Hispanic pregnant people experience higher rates of pregnancy-related mortality in comparison to their non-Hispanic white counterparts. And Black infants are twice as likely to die within their first year of life in comparison to white infants. Toluwalase Ajayi, MD, pediatrician, palliative care physician and clinical researcher at Scripps Research is the principal researcher for this study, PowerMom FIRST, which is part of her larger research study PowerMom. PowerMom FIRST aims to answer questions about how systemic racism and discrimination may have a negative impact on maternal and fetal health in these vulnerable populations. In this study, 500 Black and Hispanic mothers will receive a Fitbit Luxe tracker and Aria Air scale. Researchers will assess participant survey data for health inequities, disproportionate health outcomes, disparities in quality of care, and other factors that may influence maternal health alongside biometric data from Fitbit devices. Data, like sleep and heart rate, will help researchers better understand the impact that systemic racism experienced by Black and Hispanic pregnant people may have on their health.

A photo of Susan Ramsundarsingh of SKY Schools

Building healthy habits in adolescents facing health disparities

Experiences of trauma, such as the COVID-19 pandemic and social inequity, are linked to poor health habits among marginalized student populations. Although there is a known relationship between unhealthy habits such as physical inactivity, poor nutrition, and socioeconomic status, there is little clarity on effective interventions. Susan Ramsundarsingh, PhD is the National Director of Research at SKY Schools, which develops evidence-based programs aimed at increasing the wellbeing and academic performance of under-resourced students. In this study, researchers will pair Fitbit Inspire 2 devices with the SKY School program, which teaches children social-emotional skills and resilience to improve health and wellbeing through tools like breathing techniques. Six hundred adolescent students will be assigned to three groups to measure the impact of the interventions on heart rate, sleep and physical activity during the 2021-22 school year.

A photo of Victoria Bandera of UCHealth

Reducing cardiovascular disease risk factors in Hispanic families in Colorado

Hispanics have a disproportionately higher prevalence of cardiovascular disease risk factors relative to non-Hispanic whites, as well as higher rates of modifiable risk factors such as diabetes and hypertension. Victoria Bandera, M.S. is an exercise physiologist and early career researcher at UCHealth Healthy Hearts in Loveland, Colo., whose research aims to combat health inequities that impact the Hispanic community. Participants enrolled in the Healthy Hearts Family Program will receive a Fitbit Charge 5 and take part in a 6-month program that includes an educational series on cardiovascular disease risks, healthy behaviors and health screenings. Researchers will encourage participants, ages 13 and older, to use their new Fitbit device to monitor and modify their health behaviors, such as eating habits and physical activity. They will then analyze changes in physical activity levels, body composition and biometric variables to assess the impact of the Healthy Hearts Family Program.

For the past 14 years at Fitbit, our mission has been to help everyone around the world live active, healthier lives, and along with Google, we’re committed to using tech to improve health equity. We hope the Fitbit Health Equity Research Initiative will continue to encourage wearable research and generate new evidence and methods for addressing health disparities.