Tag Archives: Storage & Databases

Why you should pick strong consistency, whenever possible

Do you like complex application logic? We don’t either. One of the things we’ve learned here at Google is that application code is simpler and development schedules are shorter when developers can rely on underlying data stores to handle complex transaction processing and keeping data ordered. To quote the original Spanner paper, “we believe it is better to have application programmers deal with performance problems due to overuse of transactions as bottlenecks arise, rather than always coding around the lack of transactions.”1

Put another way, data stores that provide transactions and consistency across the entire dataset by default lead to fewer bugs, fewer headaches and easier-to-maintain application code.

Defining database consistency

But to have an interesting discussion about consistency, it’s important to first define our terms. A quick look at different databases on the market shows that not all consistency models are created equal, and that some of the related terms can intimidate even the bravest database developer. Below is a short primer on consistency:

What Cloud Spanner Supports
Consistency in database systems refers to the requirement that any given database transaction must change affected data only in allowed ways. Any data written to the database must be valid according to all defined rules.2
Cloud Spanner provides external consistency, which is strong consistency + additional properties (including serializability and linearizability). All transactions across a Cloud Spanner database satisfy this consistency property, not just those within a replica or region.
Serializability is an isolation property of transactions, where every transaction may read and write multiple objects. It guarantees that transactions behave the same as if they had executed in some serial order. It's okay for that serial order to be different from the order in which transactions were actually run.3
Cloud Spanner provides external consistency, which is a stronger property than serializability, which means that all transactions appear as if they executed in a serial order, even if some of the reads, writes and other operations of distinct transactions actually occurred in parallel.
Linearizability is a recency guarantee on reads and writes of a register (an individual object). It doesn’t group operations together into transactions, so it does not prevent problems such as write skew, unless you take additional measures such as materializing conflicts.4
Cloud Spanner provides external consistency, which is a stronger property than linearizability, because linearizability does not say anything about the behavior of transactions.
Strong Consistency
All accesses are seen by all parallel processes (or nodes, processors, etc.) in the same order (sequentially)5

In some definitions, a replication protocol exhibits "strong consistency" if the replicated objects are linearizable.
The default mode for reads in Cloud Spanner is "strong," which guarantees that they observe the effects of all transactions that committed before the start of the operation, independent of which replica receives the read.
Eventual Consistency
Eventual consistency means that if you stop writing to the database and wait for some unspecified length of time, then
eventually all read requests will return the same value.6
Cloud Spanner supports bounded stale reads, which offer similar performance benefits as eventual consistency but with much stronger consistency guarantees.

Cloud Spanner, in particular, provides external consistency, which provides all the benefits of strong consistency plus serializability. All transactions (across rows, regions and continents) in a Cloud Spanner database satisfy the external consistency property, not just those within a replica. External consistency states that Cloud Spanner executes transactions in a manner that's indistinguishable from a system in which the transactions are executed serially, and furthermore, that the serial order is consistent with the order in which transactions can be observed to commit. External consistency is a stronger property than both linearizability and serializability.

Consistency in the wild

There are lots of use cases that call for external consistency. For example, a financial application might need to show users' account balances. When users make a deposit, they want to see the result of this deposit reflected immediately when they view their balance (otherwise they may fear their money has been lost!). There should never appear to be more or less money in aggregate in the bank than there really is. Another example might be a mail or messaging app: You click "send" on your message, then immediately view "sent messages" because you want to double check what you wrote. Without external consistency, the app’s request to retrieve your sent messages may go to a different replica that's behind on getting all state changes, and have no record of your message, resulting in a confusing and reduced user experience.

But what does it really mean from a technical standpoint to have external consistency? When performing read operations, external consistency means that you're reading the latest copy of your data in global order. It provides the ability to read the latest change to your data across rows, regions and continents. From a developer’s perspective, it means you can read a consistent view of the state of the entire database (not just a row or object) at any point in time. Anything less introduces tradeoffs and complexity in the application design. That in turn can lead to brittle, hard-to-maintain software and can cause innumerable maintenance headaches for developers and operators. Multi-master architectures and multiple levels of consistency are workarounds for not being able to provide the external consistency that Cloud Spanner does.

What’s the problem with using something less than external consistency? When you choose a relaxed/eventual consistency mode, you have to understand which consistency mode you need to use for each use case and have to hard code rigid transactional logic into your apps to guarantee the correctness and ordering of operations. To take advantage of "transactions" in database systems that have limited or no strong consistency across documents/objects/rows, you have to design your application schema such that you never need to make a change that involves multiple "things" at the same time. That’s a huge restriction and workarounds at the application layer are painful, complex, and often buggy.

Further, these workarounds have to be carried everywhere in the system. For example, take the case of adding a button to set your color scheme in an admin preferences panel. Even a simple feature like this is expected to be carried over immediately across the app and other devices and sessions. It needs a synchronous, strongly consistent update—or a makeshift way to obtain the same result. Using a workaround to achieve strong consistency at the application level adds a velocity-tax to every subsequent new feature—no matter how small. It also makes it really hard to scale the application dev team, because everyone needs to be an expert in these edge cases. With this example, a unit test that passes on a developer workstation does not imply it will work in production at scale, especially in high concurrency applications. Adding workarounds to an eventually consistent data store often introduces bugs that go unnoticed until they bite a real customer and corrupt data. In fact, you may not even recognize the workaround is needed in the first place.

Lots of application developers are under the impression that the performance hit of external or strong consistency is too high. And in some systems, that might be true. Additionally, we're firm believers that having choice is a good thing—as long as the database does not introduce unnecessary complexity or introduce potential bugs in the application. Inside Google, we aim to give application developers the performance they need while avoiding unnecessary complexity in their application code. To that end, we’ve been researching advanced distributed database systems for many years and have built a wide variety of data stores to get strong consistency just right. Some examples are Cloud Bigtable, which is strongly consistent within a row; Cloud Datastore, which is strongly consistent within a document or object; and Cloud Spanner, which offers strong consistency across rows, regions and continents with serializability. [Note: In fact, Cloud Spanner offers a stronger guarantee of external consistency (strong consistency + serializability), but we tend to talk about Cloud Spanner having strong consistency because it's a more broadly accepted term.]

Strongly consistent reads and Cloud Spanner

Cloud Spanner was designed from the ground up to serve strong reads (i.e., strongly consistent reads) by default with low latency and high throughput. Thanks to the unique power of TrueTime, Spanner provides strong reads for arbitrary queries without complex multi-phase consensus protocols and without locks of any kind. Cloud Spanner’s use of TrueTime also provides the added benefit of being able do global bounded-staleness reads.

Better yet, Cloud Spanner offers strong consistency for multi-region and regional configurations. Other globally distributed databases present a dilemma to developers: If they want to read the data from geographically distributed regions, they forfeit the ability to do strongly consistent reads. In these other systems, if a customer opts to have strongly consistent reads, then they forfeit the ability to do reads from remote regions.

To take maximum advantage of the external consistency guarantees that Cloud Spanner provides and to maximize your application's performance, we offer the following two recommendations:
  1. Always use strong reads, whenever possible. Strong reads, which provide strong consistency, ensure that you are reading the latest copy of your data. Strong consistency makes application code simpler and applications more trustworthy.
  2. If latency makes strong reads infeasible in some situations, then use reads with bounded-staleness to improve performance, in places where strong reads with the latest data are not necessary. Bounded-staleness semantics ensures you read a guaranteed prefix of the data (for example, within a specified period of time) that is consistent, as opposed to eventual consistency where you have no guarantees and your app can read almost anything forwards or back in time from when you queried it.
Foregoing strong consistency has some real risks. Strong reads across a database ensure that you're reading the latest copy of your data and that it maintains the referential integrity of the entire dataset, making it easier to reason about concurrent requests. Using weaker consistency models introduces the risk of software bugs and can be a waste of developer hours—and potentially—customer trust.

What about writes?

Strong consistency is even more important for write operations—especially read-modify-write transactions. Systems that don't provide strong consistency in such situations create a burden for application developers, as there's always a risk of putting your data into an inconsistent state.

Perhaps the most insidious type of problem is write skew. In write skew, two transactions read a set of objects and make changes to some of those objects. However, the modifications that each transaction makes affect what the other transaction should have read. For example, consider a database for an airline based in San Francisco. It’s the airline’s policy to always have a free plane in San Francisco, in the event that this spare plane is needed to replace another plane with maintenance problems or for some other need. Imagine two transactions that are both reserving planes for upcoming flights out of San Francisco:

Begin Transaction
SELECT * FROM Airplanes WHERE location = "San Francisco" AND Availability = "Free";
If number of airplanes is > 1:  # to enforce "one free plane" rule
Pick 1 airplane
Set its Availability to "InUse"
Else: Rollback

Without strong consistency (and, in particular, serializable isolation for these transactions), both transactions could successfully commit, thus potentially breaking our one free plane rule. There are many more situations where write skew can cause problems7.

Because Cloud Spanner was built from the ground up to be a relational database with strong, transactional consistency—even for complex multi-row and multi-table transactions—it can be used in many situations where a NoSQL database would cause headaches for application developers. Cloud Spanner protects applications from problems like write skew, which makes it appropriate for mission-critical applications in many domains including finance, logistics, gaming and merchandising.

How does Cloud Spanner differ from multi-master replication?

One topic that's often combined with scalability and consistency discussions is multi-master replication. At its core, multi-master replication is a strategy used to reduce mean time to recovery for vertically scalable database systems. In other words, it’s a disaster recovery solution, and not a solution for global, strong consistency. With a multi-master system, each machine contains the entire dataset, and changes are replicated to other machines for read-scaling and disaster recovery.

In contrast, Cloud Spanner is a truly distributed system, where data is distributed across multiple machines within a replica, and also replicated across multiple machines and multiple data centers. The primary distinction between Cloud Spanner and multi-master replication is that Cloud Spanner uses paxos to synchronously replicate writes out of region, while still making progress in the face of single server/cluster/region failures. Synchronous out-of-region replication means that consistency can be maintained, and strongly consistent data can be served without downtime, even when a region is unavailable—no acknowledged writes are delayed/lost due to the unavailable region. Cloud Spanner’s paxos implementation elects a leader so that it's not necessary to do time-intensive quorum reads to obtain strong consistency. Additionally, Cloud Spanner shards data horizontally across servers, so individual machine failures impact less data. While a node is recovering, replicated nodes on other clusters that contain that dataset can assume mastership easily, and serve strong reads without any visible downtime to the user.

A strongly consistent solution for your mission-critical data

For storing critical, transactional data in the cloud, Cloud Spanner offers a unique combination of external, strong consistency, relational semantics, high availability and horizontal scale. Stringent consistency guarantees are critical to delivering trustworthy services. Cloud Spanner was built from the ground up to provide those guarantees in a high-performance, intuitive way. We invite you to try it out and learn more.

See more on Cloud Spanner and external consistency.

1 https://static.googleusercontent.com/media/research.google.com/en//archive/spanner-osdi2012.pdf
2 https://en.wikipedia.org/wiki/Consistency_(database_systems)
3 Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017, p. 329.
4 Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017, p. 329.
5 https://en.wikipedia.org/wiki/Strong_consistency
6 Kleppmann, Martin. Designing Data-Intensive Applications. O’Reilly, 2017, p. 322.
7 Kleppmann, Martin. Designing Data-Intensive Applications. O'Reilly, 2017, p. 246.

With Multi-Region support in Cloud Spanner, have your cake and eat it too

Today, we’re thrilled to announce the general availability of Cloud Spanner Multi-Region configurations. With this release, we’ve extended Cloud Spanner’s transactions and synchronous replication across regions and continents. That means no matter where your users may be, apps backed by Cloud Spanner can read and write up-to-date (strongly consistent) data globally and do so with minimal latency for end users. In other words, your app now has an accurate, consistent view of the data it needs to support users whether they’re around the corner or around the globe. Additionally, when running a Multi-Region instance, your database is able to survive a regional failure.

This release also delivers an industry-leading 99.999% availability SLA with no planned downtime. That’s 10x less downtime (< 5min / year) than database services with four nines of availability.

Cloud Spanner is the first and only enterprise-grade, globally distributed and strongly consistent database service built specifically for the cloud that combines the benefits and familiarity of relational database semantics with non-relational scale and performance. It now supports a wider range of application workloads, from a single node in a single region to massive instances that span regions and continents. At any scale, Cloud Spanner behaves the same, delivering a single database experience.

Since we announced the general availability of Cloud Spanner in May, customers, from startups to enterprises, have rethought what a database can do, and have been migrating their mission critical production workloads to it. For example, Mixpanel, a business analytics service, moved their sharded MySQL database to Cloud Spanner to handle user-id lookups when processing events from their customers' end-users web browser and mobile devices.

No more trade-offs

For years, developers and IT organizations were forced to make painful compromises between the horizontal scalability of non-relational databases and the transactions, structured schema and complex SQL queries offered by traditional relational databases. With the increase in volume, variety and velocity of data, companies had to layer additional technologies and scale-related workarounds to keep up. These compromises introduced immense complexity and only addressed the symptoms of the problem, not the actual problem.

This summer, we announced an alliance with marketing automation provider Marketo, Inc., which is migrating to GCP and Cloud Spanner. Companies around the world rely on Marketo to orchestrate, automate, and adapt their marketing campaigns via the Marketo Engagement Platform. To meet the demands of its customers today, and tomorrow, Marketo needed to be able to process trillions of activities annually, creating an extreme-scale big data challenge. When it came time to scale its platform, Marketo did what many companies do  it migrated to a non-relational database stack. But if your data is inherently transactional, going to a system without transactions and keeping data ordered and readers consistent is very hard.

"It was essential for us to have order sequence in our app logic, and with Cloud Spanner, it’s built in. When we started looking at GCP, we quickly identified Cloud Spanner as the solution, as it provided relational semantics and incredible scalability within a managed service. We hadn’t found a Cloud Spanner-like product in other clouds. We ran a successful POC and plan to move several massive services to Cloud Spanner. We look forward to Multi-Region configurations, as they give us the ability to expand globally and reduce latencies for customers on the other side of the world" 
— Manoj Goyal, Marketo Chief Product Officer

Mission-critical high availability

For global businesses, reliability is expected but maintaining that reliability while also rapidly scaling can be a challenge. Evernote, a cross-platform app for individuals and teams to create, assemble, nurture and share ideas in any form, migrated to GCP last year. In the coming months, it will mark the next phase of its move to the cloud by migrating to a single Cloud Spanner instance to manage over 8 billion plus pieces of its customers’ notes, replacing over 750 MySQL instances in the process. Cloud Spanner Multi-Region support gives Evernote the confidence it needs to make this bold move.
"At our size, problems such as scalability and reliability don't have a simple answer, Cloud Spanner is a transformational technology choice for us. It will give us a regionally distributed database storage layer for our customers’ data that can scale as we continue to grow. Our whole technology team is excited to bring this into production in the coming months."
Ben McCormack, Evernote Vice President of Operations

Strong consistency with scalability and high performance

Cloud Spanner delivers scalability and global strong consistency so apps can rely on an accurate and ordered view of their data around the world with low latency. Redknee, for example, provides enterprise software to mobile operators to help them charge their subscribers for their data, voice and texts. Its customers' network traffic currently runs through traditional database systems that are expensive to operate and come with processing capacity limitations.
“We want to move from our current on-prem per-customer deployment model to the cloud to improve performance and reliability, which is extremely important to us and our customers. With Cloud Spanner, we can process ten times more transactions per second (using a current benchmark of 55k transactions per second), allowing us to better serve customers, with a dramatically reduced total cost of ownership." 
— Danielle Royston, CEO, Redknee

Revolutionize the database admin and management experience

Standing up a globally consistent, scalable relational database instance is usually prohibitively complex. With Cloud Spanner, you can create an instance in just a few clicks and then scale it simply using the Google Console or programmatically. This simplicity revolutionizes database administration, freeing up time for activities that drive the business forward, and enabling new and unique end-user experiences.

A different way of thinking about databases

We believe Cloud Spanner is unique among databases and cloud database services, offering a global relational database, not just a feature to eventually copy or replicate data around the world. At Google, Spanner powers apps that process billions of transactions per day across many Google services. In fact, it has become the default database internally for apps of all sizes. We’re excited to see what your company can do with Cloud Spanner as your database foundation.

Want to learn more? Check out the many whitepapers discussing the technology behind Cloud Spanner. Then, when you’re ready to get started, follow our Quickstart guide to Cloud Spanner, or Kelsey Hightower’s post How to get started with Cloud Spanner in 5 minutes.

Commvault and Google Cloud partner on cloud-based data protection and simpler “lift and shift” to the cloud

Today at Commvault Go 2017, we announced a new strategic alliance with Commvault to enable you to benefit from advanced data protection in the cloud as well as on-premises, and to make it easier to “lift-and-shift” workloads to Google Cloud Platform (GCP).

At Google Cloud, we strive to provide you with the best offerings not just to store but also to use your data. For example, if you’re looking for data protection, you can benefit from our unique Coldline class as part of Google Cloud Storage, which provides immediate access to your data at archival storage prices. You can test this for free. Try serving an image or video directly from the Coldline storage tier and it will return within milliseconds. Then there’s our partner Forsythe, whose data analytics-as-a-service offering allows you to bring your backup data from Commvault to Google Cloud Storage and then analyze it using GCP machine learning and data loss prevention services.

We work hard with our technology partners to deliver solutions that are easy to use and cost-effective. We're working with Commvault on a number of initiatives, specifically:
  • Backup to Google Cloud Storage Coldline: If you use Commvault, you can now use Coldline in addition to Regional and Nearline classes as your storage target. Check out this video to see how easy it is to set up Cloud Storage with Commvault.
  • Protect workloads in the cloud: As enterprises move their applications to Google Compute Engine, you can use the same data protection policies that you use on-premises with Commvault’s data protection software. Commvault supports a wide range of common enterprise applications from SAP, Exchange, SQL, DB2, and PostgreSQL, to big data applications such as GPFS, MongoDB, Hadoop and many more.
  • G Suite backup with Commvault: You can now use the Commvault platform to backup and recover data from G Suite applications such as Gmail and Drive.
We're excited to work with Commvault to bring more capabilities to our joint customers in the future, such as enhanced data visibility via analytics and the ability to migrate and/or recover VMs in Compute Engine for on-premises workloads.

If you’re planning to attend Commvault Go this week, visit our booth to learn more about our partnership with Commvault and how to use GCP for backup and disaster recovery with Commvault!

Cloud SQL for PostgreSQL adds high availability and replication

Cloud SQL for PostgreSQL users, we heard you loud and clear, and added support for high availability (HA) and read replicas, helping you ensure your database workloads are fault tolerant.

The beta release of high availability provides isolation from failures, and read replicas provide additional read performance requirements for demanding workloads.
"As a global retail company, who uses digital innovation and data collaboration to enrich consumers' experiences with retailers and venues, we have very high availability requirements and we trust Cloud SQL for PostgreSQL with our data." 
— Peter McInerney, Senior Director of Technical Operations at Westfield Retail Solutions
"We love Postgres and rely upon it for many production workloads. While our Compute Engine VMs running Postgres have never gone down, the added peace of mind provided by HA and read replicas combined with reduction in operations makes the decision to move to the new Cloud SQL for PostgreSQL a simple one." 
— Jason Vertrees, CTO at RealMassive
Additional enhancements include database instance cloning and higher performance instances with up to 64 vCPU cores and 416GB of RAM. Cloud SQL for PostgreSQL is also now part of the Google Cloud Business Associates Agreement (BAA) for HIPAA covered customers. And in case you missed it, we added support for 19 extensions this summer.

Understanding the high availability configuration

The high availability configuration on Cloud SQL for PostgreSQL is backed by Google's new Regional Disks, which synchronously replicate data at the block-level between two zones in a region. Cloud SQL continuously health-checks HA instances and automatically fails over if an instance is not healthy. The combination of synchronous disk replication and automatic failover provides isolation from many types of infrastructure, hardware and software failures.

What triggers a failover?

Both primary and standby instances of Cloud SQL for PostgreSQL send a heartbeat signal, which is evaluated to define an availability state for the master instance. Cloud SQL monitors the heartbeats, and if it doesn’t detect multiple heartbeats from the primary instance (and the standby instance is healthy), starts a failover operation.

What happens during and after a failover?

During failover, Cloud SQL transfers the IP address and name of the primary instance to the standby instance and initializes the database. After failover, an application resumes connection to the new master instance without needing to change its connection string because the IP address moved automatically. Regional disks, meanwhile, ensure that all previously committed database transactions, right up to the time of the failure, were persisted and available after failover.

How to create a high availability instance

Creating a new HA instance is easy, as is upgrading existing Cloud SQL for PostgreSQL single instances. When creating or editing an instance, expand the "Configuration options" and select "High availability (regional)" in the Availability section:

Quickly scale out with read replicas

Read replicas are useful for scaling out read load and for ad-hoc reporting. To create a read replica, select your primary Cloud SQL for PostgreSQL instance and click "Replicas" on the Instance details page.

By combining high availability instances and read replicas, you can build a fault-tolerant, high performance PostgreSQL database cluster:

Get started

Sign up for a $300 credit to try Cloud SQL and the rest of GCP. Start with inexpensive micro instances for testing and development, and then, when you’re ready, you can easily scale them up to serve performance-intensive applications. As a bonus, everyone gets the 100% sustained use discount during the beta period, regardless of usage.

With Cloud SQL, we still feel like we’re just getting started. We hope you’ll come along for the ride and let us know what you need to be successful at Issue Tracker and by joining the Cloud SQL discussion group. Please keep the feedback coming!

Cloud SQL for PostgreSQL updated with new extensions

Among relational databases, PostgreSQL is the open-source solution of choice for a wide range of workloads. Back in March, we added support for PostgreSQL in Cloud SQL, our managed database service, with a limited set of features and extensions. Since then, we’ve been amazed by your interest, with many of you taking the time to suggest desired PostgreSQL extensions on the Issue Tracker and the Cloud SQL discussion group. This feedback has resulted in us adding the following 19 extensions, across four categories:
  • PostGIS: better support for geographic applications
  • Data type: a variety of new data types
  • Language: enhanced functionality with new processing languages
  • Miscellaneous: text search, cryptographic capabilities and integer aggregators, to name but a few
An extension is a piece of software that adds functionality, often data types and procedural languages, to PostgreSQL itself. If you already have a Cloud SQL for PostgreSQL database instance running, you can enable one or more of these extensions.

We're continuing our journey with PostgreSQL on Cloud SQL. As we prepare for general availability, we’re working on automatic failover for high availability, read replicas, additional extensions and precise restores with point-in-time recovery. Stay tuned!

Thanks for your feedback and please keep it coming on the Issue Tracker and in the Cloud SQL discussion group! Your input helps shape the future of Cloud SQL and all Google Cloud products.

Guest post: How Seenit uses Google Cloud Platform and Couchbase to power our video collaboration platform

Editor’s Note: In this guest post, Seenit CTO Dave Starling walks us through how they use Google Cloud Platform (GCP) and Couchbase to build their innovative crowdsourced video platform.

Since we started Seenit in 2014, our goal has been to give businesses the tools to tell interesting stories through crowdsourced video. But getting there wasn’t simple. What we envisioned for Seenit didn’t exist at the time we started, challenging us to define our product architecture from ground zero. We learned a lot, which is why today I thought I’d share a little on how we’re using Couchbase and GCP to bring Seenit to life.

When we first began looking at what we wanted to build as a platform, we came up with a list of requirements for our database and cloud provider. We chose to run Couchbase on GCP because it offered us distributed architecture that’s highly scalable and available globally. Our clients are typically large enterprises, sometimes in dozens of countries all over the world. We wanted to make sure that everyone, no matter where they are, could get a consistently good user experience.

By applying Couchbase’s N1QL and Full Text Search (FTS) with Google Cloud Machine Learning APIs, our customers can easily filter submissions by objects, words or phrases. And because everything is on GCP, we can duplicate our entire platform within minutes on 12 VMs.

Here’s how it works:

  1. We use Google Compute Engine to autoscale between two and 20 servers.
  2. Google Cloud Storage allows for unified object storage and retrieval. Near-infinite scalability means the service is capable of handling everything from small applications to builds of exabyte-scale systems.
  3. Couchbase’s Full Text Search (FTS) enables us to examine all the words in every document and match them with designated criteria.
  4. Cloud Machine Learning APIs sort clips by objects, gender of speakers and sentiment. The APIs all speak the same language so communication is seamless.

Last year, when we began looking for a machine learning platform, we wanted something that would talk JSON, store JSON and search JSON. We knew a machine learning platform that did all of that would integrate nicely into our Couchbase system. TensorFlow fit our criteria. We love that it isn’t restricted. We can build our own domain-specific models and use Google tools to train them.

Although TensorFlow is an open source machine learning platform, we use it through Cloud Machine Learning Engine. It’s a fully managed service, which is great for us because that way we don’t need to build and manage our own hardware. This allows us to do a lot of manipulation and extract a lot of really interesting data. It’s fully integrated in Couchbase, especially in full text search but also into N1QL, so we can search and extract intelligence and provide value to our customers. It’s a serverless architecture with the advantage of the custom hardware that Google started doing.

It’s also been great that we feel engaged with the community and product and engineering teams. As a startup, it’s important to feel like you can stand on the shoulders of giants, so to speak. The support we get from organizations like Google and Couchbase allow us to do lots of things that we otherwise wouldn’t be able to do with the resources we had.

There’s plenty more to share, but I’ll stop here. If you want to learn more, you might want to check out the joint talk GCP Product Manager Anil Dhawan and I recently gave at Couchbase Connect.

I also recommend checking out Couchbase and other tools on Cloud Launcher. You can use free trial credits to play around and even deploy something of your own. Good luck!

How to get started with Cloud Spanner in 5 minutes

The general availability of Cloud Spanner is really good news for developers. For the first time, you have direct access to horizontally-scaling, cloud-native infrastructure that provides global transactions (think apps that involve payments, inventory, ticketing, or financial trading) and that defaults to the “gold standard” of strong/external consistency without compromising latency. Try that with either a traditional RDBMS or non-relational database.

Thanks to the GCP Free Trial that offers $300 in credits for one year, you can get your feet wet with a single-node Cloud Spanner cluster over the course of a couple weeks. Here’s how to do that, using the Spanner API via gcloud. (Click here for the console-based approach.)

  1. In Cloud Console, go to the Projects page and either create a new project or open an existing project by clicking on the project name.
  2. Open a terminal window and set your project as the default for gcloud. Do this by substituting your project ID (not project name) with the command:

gcloud config set project [MY_PROJECT_ID]

  1. Enable billing for your project.
  2. Enable the Cloud Spanner API for your project.
  3. Set up authentication and authorization (Cloud Spanner uses OAuth 2.0 out of the box) with the following command:

    gcloud auth application-default login

    API client libraries now automatically pick up the created credentials. You need to run the command only once per local user environment. (Note: This approach is suitable for local development; for production use, you’ll want to use a different method for auth.)
  4. Next, create a single-node instance:

    gcloud spanner instances create test-instance
    --config=regional-us-central1 \
    --description="Test Instance" --nodes=1

  5. Finally, create a database. To create a database called test-db:

    gcloud spanner databases create test-db --instance=test-instance

Alternatively, you can download sample data and interact with it using the language of your choice.

That’s it — you now have your very own Cloud Spanner database. Again, your GCP credit should allow you to run it cost-free for a couple weeks. From there, you can download sample data and interact with it using the language of your choice.

Introducing Transfer Appliance: Sneakernet for the cloud era

Back in the eighties, when network constraints limited data transfers, people took to the streets and walked their floppy disks where they needed to go. And Sneakernet was born.

In the world of cloud and exponential data growth, the size of the disk and the speed of your sneakers may have changed, but the solution is the same: Sometimes the best way to move data is to ship it on physical media.

Today, we’re excited to introduce Transfer Appliance, to help you ingest large amounts of data to Google Cloud Platform (GCP).
Transfer Appliance offers up to 480TB in 4U or 100TB in 2U of raw data capacity in a single rackmount device
Transfer Appliance is a rackable high-capacity storage server that you set up in your data center. Fill it up with data and then ship it to us, and we upload your data to Google Cloud Storage. With capacity of up to one-petabyte compressed, Transfer Appliance helps you migrate your data orders-of-magnitude faster than over a typical network. The appliance encrypts your data at capture, and you decrypt it when it reaches its final cloud destination, helping to get it to the cloud safely.

Like many organizations we talk to, you probably have large amounts of data that you want to use to train machine learning models. You have huge archives and backup libraries taking up expensive space in your data center. Or IoT devices flooding your storage arrays. There’s all this data waiting to get to the cloud, but it’s impeded by expensive, limited bandwidth. With Transfer Appliance, you can finally take advantage of all that GCP has to offer  machine learning, advanced analytics, content serving, archive and disaster recovery  without upgrading your network infrastructure or acquiring third-party data migration tools.

Working with customers, we’ve found that the typical enterprise has many petabytes of data, and available network bandwidth between 100 Mbps and 1 Gbps. Depending on the available bandwidth, transferring 10 PB of that data would take between three and 34 years  much too long.

Estimated transfer times for given capacity and bandwidth
That’s where Transfer Appliance comes in. In a matter of weeks, you can have a petabyte of your data accessible in Google Cloud Storage, without consuming a single bit of precious outbound network bandwidth. Simply put, Transfer Appliance is the fastest way to move large amounts of data into GCP.

Compare the transfer times for 1 petabyte of data.
Customers tell us that space inside the data center is at a premium, and what space there is comes in the form of server racks. In developing Transfer Appliance, we built a device designed for the data center, that slides into a standard 19” rack. Transfer Appliance will only live in your data center for a few days, but we want it to be a good houseguest while it’s there.

Customers have been testing Transfer Appliance for several months, and love what they see:
"Google Transfer Appliance moves petabytes of environmental and geographic data for Makani so we can find out where the wind is the most windy." Ruth Marsh, Technical Program Manager at Makani

"Using a service like Google Transfer Appliance meant I could transfer hundreds of terabytes of data in days not weeks. Now we can leverage all that Google Cloud Platform has to offer as we bring narratives to life for our clients."  Tom Taylor, Head of Engineering at The Mill
Transfer Appliance joins the growing family of Google Cloud Data Transfer services. Initially available in the US, the service comes in two configurations: 100TB or 480TB of raw storage capacity, or up to 200TB or 1PB compressed. The 100TB model is priced at $300, plus shipping via Fedex (approximately $500); the 480TB model is priced at $1800, plus shipping (approximately $900). To learn more visit the documentation.

We think you’re going to love getting to cloud in a matter of weeks rather than years. Sign up to reserve a Transfer Appliance today. You can also sign up here for a GCP free trial.

From NoSQL to new SQL: How Spanner became a global, mission-critical database

Now that Cloud Spanner is generally available for mission-critical production workloads, it’s time to tell how Spanner evolved into a global, strongly consistent relational database service.
Recently the Spanner team presented a new paper at SIGMOD ‘17 that offers some fascinating insights into this aspect of Spanner’s “database DNA” and how it developed over time.

Spanner was originally designed to meet Google’s internal requirements for a global, fault-tolerant service to power massive business-critical applications. Today Spanner also embraces the SQL functionality, strong consistency and ACID transactions of a relational database. For critical use cases like financial transactions, inventory management, account authorization and ticketing/reservations, customers will accept no substitute for that functionality.

For example, there's no “spectrum” of less-than-strong consistency levels that will satisfy the mission-critical requirement for a single transaction state that's maintained worldwide; only strong consistency will do. Hence, few if any customers would choose to use an eventually-consistent database for critical OLTP. For Cloud Spanner customers like JDA, Snap and Quizlet, this unique feature set is already resonating.

Here are a few highlights from the paper:

  • Although Spanner was initially designed as a NoSQL key-value store, new requirements led to an embrace of the relational model, as well. Spanner’s architects had a relatively specific goal: to provide a service that could support fault-tolerant, multi-row transactions and strong consistency across data centers (with significant influence  and code  from Bigtable). At the same time, internal customers building OLTP applications also needed a database schema, cross-row transactions and an expressive query language. Thus early in Spanner’s lifecycle, the team drew on Google’s experience building the F1 distributed relational database to bring robust relational semantics and SQL functionality into the Spanner architecture. “These changes have allowed us to preserve the massive scalability of Spanner, while offering customers a powerful platform for database applications,” the authors wrote, adding that, “From the perspective of many engineers working on the Google infrastructure, the SQL vs. NoSQL dichotomy may no longer be relevant.”
  • The Spanner SQL query processor, while recognizable as a standard implementation, has unique capabilities that contribute to low-latency queries. Features such as query range extraction (for runtime analysis of complex expressions that are not easily re-written) and query restarts (compensating for failures, resharding, and other anomalies without significant latency impact) mitigate the complexities of highly distributed queries that would otherwise contribute to latency. Furthermore, the query processor serves both transactional and analytical workloads for low-latency or long-running queries.
  • Long-term investments in SQL tooling have produced a familiar RDBMS-like user experience. As part of a companywide effort to standardize on common SQL functionality for all its relational services (Spanner, Dremel/BigQuery, F1, and so on), Spanner’s user experience emphasizes ANSI SQL constructs and support for nested data as a first-class citizen. “SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data, ” the authors wrote.
  • Spanner will soon rely on a new columnar format called Ressi designed for database-like access patterns (for hybrid OLAP/OLTP workloads). Ressi is optimized for time-versioned (rapidly changing) data, allowing queries to more efficiently find the most recent values. Later in 2017, Ressi will replace the SSTables format inherited from Bigtable, which although highly robust, are not explicitly designed for performance.

All in all, “Our path to making Spanner a SQL system led us through the milestones of addressing scalability, manageability, ACID transactions, relational model, schema DDL with indexing of nested data, to SQL,” the authors wrote.

For more details, read the full paper here.

How to do serverless pixel tracking with GCP

Whether they’re opening a newsletter or visiting a shopping cart page, how users interact with web content is very interesting to publishers. One way to understand user behavior is by using pixels, small 1x1 transparent images embedded into the web property. When loaded, the pixel calls a web server that records the request parameters passed in the URL that can be processed later.

Adding a pixel is easy, but hosting it and processing the request can be challenging for various reasons:
  • You need to set up, manage and monitor your ad servers
  • Users are usually global, which means that you need ad servers around the world
  • User visits are spiky, so pixel servers must scale up to sustain the load and scale down to limit the spend.
Google Cloud Platform (GCP) services such as Container Engine and managed autoscaled instance groups can help with those challenges. But at Google Cloud, we think companies should avoid managing infrastructure whenever possible.

For example, we recently worked with GCP partner and professional services firm DoiT International to build a pixel tracking platform that relieves the administrator from setting up or managing any servers. Instead, this serverless pixel tracking solution leverages managed GCP services, including:
  • Google Cloud Storage: A global or regional object store that offers different options such as Standard, Nearline, Cold with various prices and SLAs depending on your needs. In our case, we used Standard, which offers low millisecond latency
  • Google HTTP(s) Load Balancer: A global anycast IP load balancer service that can scale to millions of QPS with integrated logging. It also can be leveraged by Cloud CDN to prevent useless access to Google Cloud Storage by caching pixels closer to the user in Google edges
  • BigQuery: Google's fully managed, petabyte-scale, low-cost enterprise data warehouse for analytics
  • Stackdriver Logging: A logging system that allows you to store, search, analyze, monitor and alert on log data and events from GCP and Amazon Web Services (AWS). It supports Google load balancers and can export data to Cloud Storage, BigQuery or Pub/Sub
Tracking pixels with these services works as follows:
  1. A client calls a pixel URL that's served directly by Cloud Storage.
  2. A Google Cloud Load Balancer in front of Cloud Storage records the request to Stackdriver Logging, whether there was a cache hit or not.
  3. Stackdriver Logging exports every request to BigQuery as they come in, which acts as a storage and querying engine for ad-hoc analytics that can help business analysts better understand their users.

All those services are fully managed and do not require you to set up any instances or VMs. You can learn more about this solution by:
Going forward, we look forward to building more serverless solutions on top of GCP managed offerings. Let us know in the comments if there’s a solution that you’d like us to build!