Tag Archives: Databases

MySQL to Cloud Spanner via HarbourBridge



Today we’re announcing that HarbourBridge—an open source toolkit that automates much of the manual work of evaluating and assessing Cloud Spanner—supports migrations from MySQL, in addition to existing support for PostgreSQL. This provides a zero-configuration path for MySQL users to try out Cloud Spanner. HarbourBridge bootstraps early stages of migration, and helps get you to the meaty issues as quickly as possible.

Core capabilities

At its core, HarbourBridge provides an automated workflow for loading the contents of an existing MySQL or PostgreSQL database into Spanner. It requires zero configuration—no manifests or data maps to write. Instead, it imports the source database, builds a Spanner schema, creates a new Spanner database populated with data from the source database, and generates a detailed assessment report. HarbourBridge can either import dump files (from mysqldump or pg_dump) or directly connect to the source database. It is intended for loading databases up to a few tens of GB for evaluation purposes, not full-scale migrations.

Bootstrap early-stage migration

HarbourBridge bootstraps early-stage migration to Spanner by using an existing MySQL or PostgreSQL source database to quickly get you running on Spanner. It generates an assessment report with an overall migration-fitness score for Spanner, a table-by-table analysis of type mappings and a list of features used in the source database that aren't supported by Spanner.

View HarbourBridge as a way to get up and running quickly, so you can focus on critical things like tuning performance and getting the most out of Spanner. You will need to tweak and enhance what HarbourBridge produces—more on that later.

Getting started

HarbourBridge can be used with the Cloud Spanner Emulator, or directly with a Cloud Spanner instance. The Emulator is a local, in-memory emulation of Spanner that implements the same APIs as Cloud Spanner’s production service, and allows you to try out Spanner’s functionality without creating a GCP Project. The HarbourBridge README contains a step-by-step quick-start guide for using the tool with a Cloud Spanner instance.

Together, HarbourBridge and the Cloud Spanner Emulator provide a lightweight, open source toolchain to experiment with Cloud Spanner. Moreover, when you want to proceed to performance testing and tuning, switching to a production Cloud Spanner instance is a simple configuration change.

To get started on using HarbourBridge with the Emulator, follow the Emulator instructions. In particular, start the Emulator using Docker and configure the SPANNER_EMULATOR_HOST environment variable (this tells the Cloud Spanner Client libraries to use the Emulator).

Next, install Go and configure the GOPATH environment variable if they are not already part of your environment. Now you can download and install HarbourBridge using
 
GO111MODULE=on \
go get github.com/cloudspannerecosystem/harbourbridge

It should be installed as $GOPATH/bin/harbourbridge. To use HarbourBridge on a MySQL database, run mysqldump and pipe its output to HarbourBridge

mysqldump <opts> db | $GOPATH/bin/harbourbridge -driver=mysqldump

where <opts> are the standard options you pass to mysqldump or mysql to specify host, port, etc., and db is the name of the database to dump.
Similarly, to use HarbourBridge on a PostgreSQL database, run

 pg_dump <opts> db | $GOPATH/bin/harbourbridge -driver=pg_dump

See the Troubleshooting guide if you run into any issues. In addition to creating a new Spanner database with data from the source database, HarbourBridge also generates a schema file, the assessment report, and a bad data file (if any data is dropped). See Files generated by HarbourBridge.

Sample dump files

If you don’t have ready access to a MySQL or PostgreSQL database, the HarbourBridge github repository has some samples. The files cart.mysqldump and cart.pg_dump contain mysqldump and pg_dump output for a very basic shopping cart application (just two tables, one for products and one for user carts). The files singers.mysqldump and singers.pg_dump contain mysqldump and pg_dump output for a version of the Cloud Spanner singers example. To use HarbourBridge on cart.mysqldump, download the file locally and run

$GOPATH/bin/harbourbridge -driver=mysqldump < cart.mysqldump

Next steps

The schema created by HarbourBridge provides a starting point for evaluation of Spanner. While it preserves much of the core structure of your MySQL or PostgreSQL schema, data types will be mapped based on the types supported by Spanner, and unsupported features will be dropped e.g. functions, sequences, procedures, triggers and views. See the assessment report as well as HarbourBridge’s Schema conversion documentation for details.

To test Spanner’s performance, you will need to switch from the Emulator to a Cloud Spanner instance. The HarbourBridge quick-start guide provides details of how to set up a Cloud Spanner instance. To have HarbourBridge use your Cloud Spanner instance instead of the Emulator, simply unset the SPANNER_EMULATOR_HOST environment variable (see the Emulator documentation for context).

To optimize your Spanner performance, carefully review choices of primary keys and indexes—see Keys and indexes. Note that HarbourBridge preserves primary keys from the source database but drops all other indexes. This means that the out-of-the-box performance you get from the schema created by HarbourBridge can be significantly impacted. If this is the case, add appropriate Secondary indexes. In addition, consider using Interleaved tables to optimize table layout and improve the performance of joins.

Recap

HarbourBridge is an open source toolkit for evaluating and assessing Cloud Spanner using an existing MySQL or PostgreSQL database. It automates many of the manual steps so that you can quickly get to important design, evaluation and performance issues, such as. refining choice of primary keys, tuning of indexes, and other optimizations.

We encourage you to try out HarbourBridge, send feedback, file issues, fork and modify the codebase, and send PRs for fixes and new functionality. We have big plans for HarbourBridge, including the addition of user-guided schema conversion (to customize type mappings and provide a guided exploration of indexing, primary key choices, and use of interleaved tables), as well as support for more databases. HarbourBridge is part of the Cloud Spanner Ecosystem, owned and maintained by the Cloud Spanner user community. It is not officially supported by Google as part of Cloud Spanner.

By Nevin Heintze, Cloud Spanner

HarbourBridge: From PostgreSQL to Cloud Spanner

Would you like to try out Cloud Spanner with data from an existing PostgreSQL database? Maybe you’ve wanted to ‘kick the tires’ on Spanner, but have been discouraged by the effort involved?

Today, we’re announcing a tool that makes trying out Cloud Spanner using PostgreSQL data simple and easy.

HarbourBridge is a tool that loads Spanner with the contents of an existing PostgreSQL database. It requires zero configuration—no manifests or data maps to write. Instead, it ingests pg_dump output, automatically builds a Spanner schema, and creates a new Spanner database populated with data from pg_dump.

HarbourBridge is part of the Cloud Spanner Ecosystem, a collection of public, open source repositories contributed to, owned, and maintained by the Cloud Spanner user community. None of these repositories are officially supported by Google as part of Cloud Spanner.

Get up and running fast

HarbourBridge is designed to simplify Spanner evaluation, and in particular to bootstrap the process by getting moderate-size PostgreSQL datasets into Spanner (up to a few GB). Many features of PostgreSQL, especially those that don't map directly to Spanner features, are ignored, e.g. (non-primary) indexes, functions and sequences.

View HarbourBridge as a way to get up and running fast, so you can focus on critical things like tuning performance and getting the most out of Spanner. Expect that you'll need to tweak and enhance what HarbourBridge produces—More on this later.

Quick-start guide

The HarbourBridge README contains a step-by-step quick-start guide. We’ll quickly review the main steps. Before you begin, you'll need a Cloud Spanner instance, Cloud Spanner API enabled for your Google Cloud project, authentication credentials configured to use the Cloud API, and Go installed on your development machine.

To download HarbourBridge and install it, run
go get -u github.com/cloudspannerecosystem/harbourbridge
The tool should now be installed as $GOPATH/bin/harbourbridge. To use HarbourBridge on a PostgreSQL database called mydb, run
pg_dump mydb | $GOPATH/bin/harbourbridge
The tool will use the cloud project specified by the GCLOUD_PROJECT environment variable, automatically determine the Cloud Spanner instance associated with this project, convert the PostgreSQL schema for mydb to a Spanner schema, create a new Cloud Spanner database with this schema, and finally, populate this new database with the data from mydb. HarbourBridge also generates several files when it runs: a schema file, a report file (with details of the conversion), and a bad data file (if any data is dropped). See Files Generated by HarbourBridge.

Take care with ACLs

Note that PostgreSQL table-level and row-level ACLs are dropped during conversion since they are not supported by Spanner (Spanner manages access control at the database level). All data written to Spanner will be visible to anyone who can access the database created by HarbourBridge (which inherits default permissions from your Cloud Spanner instance).

Next steps

The tables created by HarbourBridge provide a starting point for evaluation of Spanner. While they preserve much of the core structure of your PostgreSQL schema and data, many important PostgreSQL features have been dropped.

In particular, HarbourBridge preserves primary keys but drops all other indexes. This means that the out-of-the-box performance you get from the tables created by HarbourBridge can be significantly slower than PostgreSQL performance. If HarbourBridge has dropped indexes that are important to the performance of your SQL queries, consider adding Secondary Indexes to the tables created by HarbourBridge. Use the existing PostgreSQL indexes as a guide. In addition, Spanner's Interleaved Tables can provide a significant performance boost.

Other dropped features include functions, sequences, procedures, triggers, and views. In addition, types have been mapped based on the types supported by Spanner. Types such as integers, floats, char/text, bools, timestamps and (some) array types map fairly directly to Spanner, but many other types do not and instead are mapped to Spanner's STRING(MAX). See Schema Conversion for details of the type conversions and their tradeoffs.

Recap

HarbourBridge automates much of the manual work of trying out Cloud Spanner using PostgreSQL data. The goal is to bootstrap your evaluation and help get you to the meaty issues as quickly as possible. The tables generated by HarbourBridge provide a starting point, but they will likely need to be tweaked and enhanced to support a full evaluation.

We encourage you to try out the tool, send feedback, file issues, fork and modify the codebase, and send PRs for fixes and new functionality. Our plans and aspirations for developing HarbourBridge further are outlined in the HarbourBridge Whitepaper. HarbourBridge is part of the Cloud Spanner Ecosystem, owned and maintained by the Cloud Spanner user community. It is not officially supported by Google as part of Cloud Spanner.

By Nevin Heintze, Cloud Spanner

NoSQL for the serverless age: Announcing Cloud Firestore general availability and updates

Posted by Amit Ganesh, VP Engineering & Dan McGrath, Product Manager

As modern application development moves away from managing infrastructure and toward a serverless future, we're pleased to announce the general availability of Cloud Firestore, our serverless, NoSQL document database. We're also making it available in 10 new locations to complement the existing three, announcing a significant price reduction for regional instances, and enabling integration with Stackdriver for monitoring.

Cloud Firestore is a fully managed, cloud-native database that makes it simple to store, sync, and query data for web, mobile, and IoT applications. It focuses on providing a great developer experience and simplifying app development with live synchronization, offline support, and ACID transactions across hundreds of documents and collections. Cloud Firestore is integrated with both Google Cloud Platform (GCP) and Firebase, Google's mobile development platform. You can learn more about how Cloud Firestore works with Firebase here. With Cloud Firestore, you can build applications that move swiftly into production, thanks to flexible database security rules, real-time capabilities, and a completely hands-off auto-scaling infrastructure.

Cloud Firestore does more than just core database tasks. It's designed to be a complete data backend that handles security and authorization, infrastructure, edge data storage, and synchronization. Identity and Access Management (IAM) and Firebase Auth are built in to help make sure your application and its data remain secure. Tight integration with Cloud Functions, Cloud Storage, and Firebase's SDK accelerates and simplifies building end-to-end serverless applications. You can also easily export data into BigQuery for powerful analysis, post-processing of data, and machine learning.

Building with Cloud Firestore means your app can seamlessly transition from online to offline and back at the edge of connectivity. This helps lead to simpler code and fewer errors. You can serve rich user experiences and push data updates to more than a million concurrent clients, all without having to set up and maintain infrastructure. Cloud Firestore's strong consistency guarantee helps to minimize application code complexity and reduces bugs. A client-side application can even talk directly to the database, because enterprise-grade security is built right in. Unlike most other NoSQL databases, Cloud Firestore supports modifying up to 500 collections and documents in a single transaction while still automatically scaling to exactly match your workload.

What's new with Cloud Firestore

  • New regional instance pricing. This new pricing takes effect on March 3, 2019 for most regional instances, and is as low as 50% of multi-region instance prices.
    • Data in regional instances is replicated across multiple zones within a region. This is optimized for lower cost and lower write latency. We recommend multi-region instances when you want to maximize the availability and durability of your database.
  • SLA now available. You can now take advantage of Cloud Firestore's SLA: 99.999% availability for multi-region instances and 99.99% availability for regional instances.
  • New locations available. There are 10 new locations for Cloud Firestore:
    • Multi-region
      • Europe (eur3)
    • North America (Regional)
      • Los Angeles (us-west2)
      • Montréal (northamerica-northeast1)
      • Northern Virginia (us-east4)
    • South America (Regional)
      • São Paulo (southamerica-east1)
    • Europe (Regional)
      • London (europe-west2)
    • Asia (Regional)
      • Mumbai (asia-south1)
      • Hong Kong (asia-east2)
      • Tokyo (asia-northeast1)
    • Australia (Regional)
      • Sydney (australia-southeast1)

Cloud Firestore is now available in 13 regions.

  • Stackdriver integration (in beta). You can now monitor Cloud Firestore read, write and delete operations in near-real time with Stackdriver.
  • More features coming soon. We're working on adding some of the most requested features to Cloud Firestore from our developer community, such as querying for documents across collections and incrementing database values without needing a transaction.

As the next generation of Cloud Datastore, Cloud Firestore is compatible with all Cloud Datastore APIs and client libraries. Existing Cloud Datastore users will be live-upgraded to Cloud Firestore automatically later in 2019. You can learn more about this upgrade here.

Adding flexibility and scalability across industries

Cloud Firestore is already changing the way companies build apps in media, IoT, mobility, digital agencies, real estate, and many others. The unifying themes among these workloads include: the need for mobility even when connectivity lapses, scalability for many users, and the ability to move quickly from prototype to production. Here are a few of the stories we've heard from Cloud Firestore users.

When opportunity strikes...

In the highly competitive world of shared, on-demand personal mobility via cars, bikes, and scooters, the ability to deliver a differentiated user experience, iterate rapidly, and scale are critical. The prize is huge. Skip provides a scooter-sharing system where shipping fast can have a big impact. Mike Wadhera, CTO and Co-founder, says, "Cloud Firestore has enabled our engineering and product teams to ship at the clock-speed of a startup while leveraging Google-scale infrastructure. We're delighted to see continued investment in Firebase and the broader GCP platform."

Another Cloud Firestore user, digital consultancy The Nerdery, has to deliver high-quality results in a short period of time, often needing to integrate with existing third-party data sources. They can't build up and tear down complicated, expensive infrastructure for every client app they create. "Cloud Firestore was a great fit for the web and mobile applications we built because it required a solution to keep 40,000-plus users apprised of real-time data updates," says Jansen Price, Principal Software Architect. "The reliability and speed of Cloud Firestore coupled with its real-time capabilities allowed us to deliver a great product for the Google Cloud Next conferences."

Reliable information delivery

Incident response company Now IMS uses real-time data to keep citizens safe in crowded places, where cell service can get spotty when demand is high. "As an incident management company, real-time and offline capabilities are paramount to our customers," says John Rodkey, Co-founder. "Cloud Firestore, along with the Firebase Javascript SDK, provides us with these capabilities out of the box. This new 100% serverless architecture on Google Cloud enables us to focus on rapid application development to meet our customers' needs instead of worrying about infrastructure or server management like with our previous cloud."

Regardless of the app, users want the latest information right away, without having to click refresh. The QuintoAndar mobile application connects tenants and landlords in Brazil for easier apartment rentals. "Being able to deliver constantly changing information to our customers allows us to provide a truly engaging experience. Cloud Firestore enables us to do this without additional infrastructure and allows us to focus on the core challenges of our business," says Guilherme Salerno, Engineering Manager at QuintoAndar.

Real-time, responsive apps, happy users

Famed broadsheet and media company The Telegraph uses Cloud Firestore so registered users can easily discover and engage with relevant content. The Telegraph wanted to make the user experience better without having to become infrastructure experts in serving and managing data to millions of concurrent connections. "Cloud Firestore allowed us to build a real-time personalized news feed, keeping readers informed with synchronized content state across all of their devices," says Alex Mansfield-Scaddan, Solution Architect. "It allowed The Telegraph engineering teams to focus on improving engagement with our customers, rather than becoming real-time database and infrastructure experts."

On the other side of the Atlantic, The New York Times used Cloud Firestore to build a feature in The Times' mobile app to send push notifications updated in real time for the 2018 Winter Olympics. In previous approaches to this feature, scaling had been a challenge. The team needed to track each reader's history of interactions in order to provide tailored content for particular events or sports. Cloud Firestore allowed them to query data dynamically, then send the real-time updates to readers. The team was able to send more targeted content faster.

Delivering powerful edge storage for IoT devices

Athlete testing technology company Hawkin Dynamics was an early, pre-beta adopter of Cloud Firestore. Their pressure pads are used by many professional sports teams to measure and track athlete performance. In the fast-paced, high-stakes world of professional sports, athletes can't wait around for devices to connect or results to calculate. They demand instant answers even if the WiFi is temporarily down. Hawkin Dynamics uses Cloud Firestore to bring real-time data to athletes through their app dashboard, shown below.

"Our core mission at Hawkin Dynamics is to help coaches make informed decisions regarding their athletes through the use of actionable data. With real-time updates, our users can get the data they need to adjust an athlete's training on a moment-by-moment basis," says Chris Wales, CTO. "By utilizing the powerful querying ability of Cloud Firestore, we can provide them the insights they need to evaluate the overall efficacy of their programs. The close integrations with Cloud Functions and the other Firebase products have allowed us to constantly improve on our product and stay responsive to our customers' needs. In an industry that is rapidly changing, the flexibility afforded to us by Cloud Firestore in extending our applications has allowed us to stay ahead of the game."

Getting started with Cloud Firestore

We've heard from many of you that Cloud Firestore is helping solve some of your most timely development challenges by simplifying real-time data and data synchronization, eliminating server-side code, and providing flexible yet secure database authentication rules. This reflects the state of the cloud app market, where developers are exploring lots of options to help them build better and faster while also providing modern user experiences. This glance at Stack Overflow questions gives a good picture of some of these trends, where Cloud Firestore is a hot topic among cloud databases.

Source: StackExchange

We've seen close to a million Cloud Firestore databases created since its beta launch. The platform is designed to serve databases ranging in size from kilobytes to multiple petabytes of data. Even a single application running on Cloud Firestore is delivering more than 1 million real-time updates per second to users. These apps are just the beginning. To learn more about serverless application development, take a look through the archive of the recent application development digital conference.

We'd love to hear from you, and we can't wait to see what you build next. Try Cloud Firestore today for your apps.