Verifying PostgreSQL backups made easier with new open-source tool

When was the last time you verified a database backup? If that question causes you to break into a cold sweat, rest assured you’re not alone.

Verifying backups should be a common practice, but it often isn’t. This can be an issue if there’s a disaster or—as is more likely at most companies—if someone makes a mistake when deploying database changes. One industry survey indicates that data loss is one of the biggest risks when making database changes.

PostgreSQL Page Verification Tool

At Google Cloud Platform (GCP), we recently wrote a tool to fight data loss and help detect data corruption early in the change process. We made it open source, because data corruption can happen to anybody, and we’re committed to making code available to ensure secure, reliable backups. If you use Google Cloud SQL for PostgreSQL, then you’re in luck—we’re already running the PostgreSQL Page Verification Tool on your behalf. It’s also available now as open source code.

This new PostgreSQL Page Verification tool is a command-line tool that you can execute against a Postgres database. Since PostgreSQL version 9.3, it’s been possible to enable checksums on data pages to avoid ignoring data corruption. However, with the release of this utility, you can now verify all data files, online or offline. The Page Verification Tool can calculate and verify checksums for each data page.

How the Page Verification tool works

To use the PostgreSQL Page Verification tool, you must enable checksums during initialization of a new PostgreSQL database cluster. You can’t go back in and do it after the fact. Once checksums are turned on, the Page Verification tool computes its own checksum and compares it to the Postgres checksum to confirm that they are identical. If the checksum does not match, the tool identifies which data page is at fault and causing the corruption.

The Page Verification Tool can be run against a database that’s online or offline. It verifies checksums on PostgreSQL data pages without having to load each page into a shared buffer cache, and supports subsequent segments for tables larger than 1GB.

The tool skips Free Space Map, Visibility Map and pg_internal.init files, since they can be regenerated. While the tool can run against a database continuously, it does have a performance overhead associated with it, so we advise incorporating the tool into your backup process and running it on a separate server.

How to start using the PostgreSQL Page Verification tool

The Page Verification tool is integrated into Google Cloud SQL, so it runs automatically. We’re using the tool at scale to validate our customers’ backups. We do the verification process on internal instances of Cloud SQL to make sure your database doesn’t take a performance hit.

The value of the PostgreSQL Page Verification Tool comes from detecting data corruption early to minimize data loss resulting from data corruption. Organizations that use the tool and achieve a successful verification have assurance of a useful backup in case disaster strikes.

At Google, when we make a database better, we make it better for everyone, so the PostgreSQL Page Verification tool is available to you via open source. We encourage Postgres users to download the tool at Google Open Source or GitHub. The best detection is early detection, not when you need to restore a backup.