Machine Learning-based Damage Assessment for Disaster Relief

Natural disasters, such as earthquakes, hurricanes, and floods, affect large areas and millions of people, but responding to such disasters is a massive logistical challenge. Crisis responders, including governments, NGOs, and UN organizations, need fast access to comprehensive and accurate assessments in the aftermath of disasters to plan how best to allocate limited resources.To this end, very high resolution (VHR) satellite imagery, with up to 0.3 meter resolution, is becoming an increasingly important tool for crisis response, giving responders an unprecedented breadth of visual information about how terrain, infrastructure, and populations are changed by disasters.

However, intensive manual labor is still required to extract operationally-relevant information — collapsed buildings, cracks in bridges, where people have set up temporary shelters — from the raw satellite imagery. As an example, for the 2010 Haiti earthquake, analysts manually examined over 90,000 buildings in the Port-au-Prince area alone, rating the damage each one incurred on a five point scale. Many of these manual analyses take teams of experts many weeks to complete, whereas they are most needed within 48-72 hours after the disaster, when the most urgent decisions are made.

To help mitigate the impact of such disasters, we present "Building Damage Detection in Satellite Imagery Using Convolutional Neural Networks", which details a machine learning (ML) approach to automatically process satellite data to generate building damage assessments. Developed in partnership with the United Nations World Food Program (WFP) Innovation Accelerator, we believe this work has the potential to drastically reduce the time and effort required for crisis workers to produce damage assessment reports. In turn, this would reduce the turnaround times needed to deliver timely disaster aid to the most severely affected areas, while increasing the overall coverage of such critical services.

The Approach
The automatic damage assessment process is split into two steps: building detection and damage classification. In the building detection step, our approach uses an object detection model to draw bounding boxes around each building in the image. We then extract pre-disaster and post-disaster images centered on each detected building and use a classification model to determine whether the building is damaged.

The classification model consists of a convolutional neural network to which is input two 161 pixel x 161 pixel RGB images, corresponding to a 50 m x 50 m ground footprint, centered on a given building. One image is from before the disaster event, and the other image is from after the disaster event. The model analyzes differences in the two images and outputs a score from 0.0 to 1.0, where 0.0 means the building was not damaged, and 1.0 means the building was damaged.

Because the before and after images are taken on different dates, at different times of day, and in some cases by different satellites altogether, there can be a host of different problems that arise. For example, the brightness, contrast, color saturation, and lighting conditions of the images may differ significantly, and the pixels in the image may be misaligned.

To correct for differences in color and illumination, we use histogram equalization to normalize the colors in the before and after images. We also make the model more robust to insignificant color differences by using standard data augmentation techniques, such as randomly perturbing the contrast and saturation of the images, during training.

Training Data
One of the main challenges of this work is assembling a training data set. Data availability in this application is inherently limited because there are only a handful of disasters that have high resolution satellite images and an even smaller number that have existing damage assessments. For labels, we use publicly available damage assessments manually generated by humanitarian organizations operating in this space, such as UNOSAT and REACH. We obtain the original satellite images on which the manual assessments are performed and then use Google Earth Engine to spatially join the damage assessment labels with the satellite images in order to produce the final training examples. All images used to train the model were sourced from commercially available sources.
Examples of individual image patches that capture before and after images of damaged and undamaged buildings from different disasters.
We evaluated this technology for 3 major past earthquakes: the 2010 earthquake in Haiti (magnitude 7.0), the 2017 event in Mexico City (magnitude 7.1), and the series of earthquakes occuring in Indonesia in 2018 (magnitudes 5.9 - 7.5). For each event, we trained the model on buildings in one part of the region affected by the quake and tested it on buildings in another part of the region. We used human expert damage assessments performed by UNOSAT and REACH as the ground truth for evaluation. We measure the model’s quality using both true accuracy (compared to expert assessment) and the area under the ROC curve (AUROC), which captures the trade-off between the model’s true positive and false positive rates of detection, and is a common way to measure quality when the number of positive and negative examples in the test dataset is imbalanced. An AUROC value of 0.5 means that the model’s predictions are random, while a value of 1.0 means the model is perfectly accurate. According to crisis responder feedback, 70% accuracy is the threshold needed for making high-level decisions in the first 72 hours after the disaster.
Area under the
Event Accuracy ROC curve
2010 Haiti earthquake 77% 0.83
2017 Mexico City earthquake 71% 0.79
2018 Indonesia earthquake 78% 0.86
Evaluation of model predictions against human expert assessments (higher is better).
Example model predictions from the 2010 Haiti earthquake. Prediction values closer to 1.0 means the model is more confident that the building is damaged. Values closer to 0.0 means the building is not damaged. A threshold value of 0.5 is typically used to distinguish between damaged/undamaged predictions, but this can be tuned to make the predictions more or less sensitive.
Future Work
While the current model works reasonably well when trained and tested on buildings from the same regions (e.g., same city or country), the ultimate goal is to have a model that can accurately assess building damage for disasters that happen anywhere in the world, and not just those that look similar to the ones the model has been trained on. This is challenging because the variety of the available training data for past disasters is inherently limited to a handful of events that occurred in a few geographic locations. Generalizing to future disasters that will likely occur in new locations is therefore still a challenge for our model and is the focus of ongoing work. We envision a system that can be interactively trained, validated, and deployed by expert analysts so that important aid distribution decisions are always verified by experienced crisis responders. Our hope is that this technology can help communities get the aid that they need in times of most critical need in a timely fashion.

This post reflects the work of our co-authors Wenhan Lu and Zebo Li. We would also like to thank Maolin Zuo for his contributions to the project. In tackling this problem, we have had a very productive partnership with the United Nations World Food Programme (WFP) Innovation Accelerator, an organization that identifies, funds, and supports startups and innovative projects to disrupt world hunger.

Source: Google AI Blog