Cloud computing has changed many traditional IT practices, and one particularly useful change has been in the area of disaster recovery (DR). Our team helps Google Cloud Platform (GCP) users build their infrastructures with cloud, and we’ve seen some great results when they use GCP as the DR target environment.
When you integrate a cloud provider like GCP into your DR plan, you no longer have to invest up front in mostly idle backup infrastructure. Testing that DR plan no longer seems so daunting, as you can bring up your DR environment automatically and close it all down again when it’s no longer needed—and it’s always ready for the next tests. In the event of an actual disaster, the DR environment can be made ready.
However, planning and testing for and recovering from a disaster involves more than just getting your application restored and available within your Recovery Time Objective (RTO). You need to ensure the security controls you implemented on-premises also apply to your recovered environment. This post provides tips to help you maintain your security controls in your cloud DR environment.
1. Grant users the same access they’re used toIf your production environment is running on GCP, it’s easy to replicate the identity and access management (IAM) policies already in place. Use infrastructure as code methods and employ tools such as Cloud Deployment Manager to recreate your IAM policies. You can then bind the policies to corresponding resources when you’re standing up the DR environment.
If your production environment is on-premises, map functional roles such as your network administrator and auditor roles to appropriate IAM policies. Use the IAM documentation, which has some example configurations such as the networking and audit logging functional roles. You’ll want to configure IAM policies to grant appropriate permissions to GCP products. For example, you might want to restrict access to specific Google Cloud Storage buckets.
Once you’ve created a test environment, ensure that the access granted to your users confers the same permissions as those they are granted on-premises.
If your production environment runs on another cloud, you will need to understand how to map the IAM controls to Google Cloud Identity and Access Management (IAM) policies. The GCP for AWS professionals management doc can help if your other cloud is AWS.
2. Ensure user understandingDo not wait for a disaster to occur before checking that your users—whether developers, operators, data scientists, security or network admins—can access the DR environment. Make sure their accounts have been granted the appropriate access rights. If you are using an alternative identity system, ensure accounts have been synced with your Cloud Identity account. Make sure users who will need access to the DR environment (which will be the production environment for awhile) are able to log in to the DR environment. Resolve any authentication issues. When you’re conducting regular DR tests, incorporate users logging into the DR environment as part of the test process.
Enable the GCP OS login feature on the projects that constitute your DR environment so you can centrally manage who has SSH access to VMs launched in the DR environment.
It’s also important to train users on the DR environment so that they understand how to undertake their usual actions in GCP. Use the test environment for this.
3. Double-check blocking and compliance requirementsIt’s essential that your network controls confer the same separation and blocking settings in the DR as the source production environment. Learn how to configure Shared VPCs and GCP firewalls and take advantage of using service accounts as part of the firewall rules. Understand how to use service accounts to implement least privileges for applications accessing GCP APIs.
In addition, make sure your DR environment meets your compliance requirements. Validate that access to your DR environment is restricted to only those who need access. Ensure personally identifiable information (PII) data is appropriately redacted and encrypted. If you carry out regular penetration tests on your production environment, you should start including your DR environment and carry out those tests by regularly standing up a DR environment.
4. Capture log dataWhen the DR environment is in service, the logs collected during that time should be backfilled into your production environment log archive. Ensure that as part of your GCP DR environment you are able to export audit logs that are collected via Stackdriver to your main log sink archive. Use the export sink facilities. For application logs, stand up a mirror of your on-premises logging and monitoring environment. For another cloud, map across to the equivalent GCP services. Have a process in place to format this log input into your production environment.
5. Use Cloud Storage for day-to-day backup routinesUse Cloud Storage to store backups that result from the DR environment. Ensure the storage buckets containing your backups have limited permissions applied to them.
The same security controls should apply to your recovered data as if it were production data. The same permissions, encryption and audit requirements apply. Know where your backups are located and where, and who restored them.
6. Consider secrets and key managementManage application-level secrets and keys using GCP to host the key or secret management service. You can use Cloud Key Management Service (KMS) or a third-party solution like HashiCorp Vault with a GCP backend such as Cloud Spanner or Cloud Storage.
7. Manage VM images and snapshotsIf you have predefined configurations for VMs, such as who can use them or make changes, reflect those controls in your GCP DR recovery site. Follow the guidance outlined in restricting access to images.
These tips can help make sure you don’t lose the security you’ve built into your production environment when you stand up a DR site. You’ll be on your way more quickly to having a cloud DR site that works for your users and the business.
Next steps:Read our guide on How to Design a DR plan.
Related content:Cloud Identity-Aware Proxy: a simple and more secure way to manage application access
Know thy enemy: how to prioritize and communicate risks - CRE life lessons
Building trust through Access Transparency