How to automatically scan Cloud Storage buckets for sensitive data: Taking charge of your security



Security in the cloud is often a matter of identifying—and sticking to—some simple best practices. A few months ago, we discussed some steps you can take to harden the security of your Cloud Storage buckets. We covered how to set up proper access permissions and provided tips on using tools like the Data Loss Prevention API to monitor your buckets for sensitive data. Today, let’s talk about how to automate data classification using the DLP API and Cloud Functions, Google Cloud Platform’s event-driven serverless compute platform that makes it easy for you to integrate and extend cloud services with code.

Imagine you need to regularly share data with a partner outside of your company, and this data cannot contain any sensitive elements such as Personally Identifiable Information (PII). You could just create a bucket, upload data to it, and grant access to your partner, but what if someone uploads the wrong file or doesn’t know that they aren’t supposed to upload PII? With the DLP API and Cloud Functions, you can automatically scan this data before it’s uploaded to the shared storage bucket.

Setting this up is easy: Simply create three buckets—one in which to upload data, one to share and one for any sensitive data that gets flagged. Then:

  1. Configure access appropriately so that relevant users can put data in the “upload” bucket 
  2. Write a Cloud Function triggered by an upload that scans the data using the DLP API 
  3. Based on any DLP findings, automatically move data into the share bucket or into a restricted bucket for further review.

To get you started, here’s a tutorial with detailed instructions including the Cloud Functions script. You can get it up and running in just a few minutes from the Cloud Console or via Cloud Shell. You can then easily modify the script for your environment, and add more advanced actions such as sending notification emails, creating a redacted copy or triggering approval workflows.

We hope we’ve showed you some proactive steps you can take to prevent sensitive data from getting into the wrong hands. To learn more, check out the documentation for the DLP API and Cloud Functions.