Now that Cloud Spanner is generally available for mission-critical production workloads, it’s time to tell how Spanner evolved into a global, strongly consistent relational database service.
Recently the Spanner team presented a new paper at SIGMOD ‘17 that offers some fascinating insights into this aspect of Spanner’s “database DNA” and how it developed over time.
Spanner was originally designed to meet Google’s internal requirements for a global, fault-tolerant service to power massive business-critical applications. Today Spanner also embraces the SQL functionality, strong consistency and ACID transactions of a relational database. For critical use cases like financial transactions, inventory management, account authorization and ticketing/reservations, customers will accept no substitute for that functionality.
For example, there's no “spectrum” of less-than-strong consistency levels that will satisfy the mission-critical requirement for a single transaction state that's maintained worldwide; only strong consistency will do. Hence, few if any customers would choose to use an eventually-consistent database for critical OLTP. For Cloud Spanner customers like JDA, Snap and Quizlet, this unique feature set is already resonating.
Here are a few highlights from the paper:
- Although Spanner was initially designed as a NoSQL key-value store, new requirements led to an embrace of the relational model, as well. Spanner’s architects had a relatively specific goal: to provide a service that could support fault-tolerant, multi-row transactions and strong consistency across data centers (with significant influence — and code — from Bigtable). At the same time, internal customers building OLTP applications also needed a database schema, cross-row transactions and an expressive query language. Thus early in Spanner’s lifecycle, the team drew on Google’s experience building the F1 distributed relational database to bring robust relational semantics and SQL functionality into the Spanner architecture. “These changes have allowed us to preserve the massive scalability of Spanner, while offering customers a powerful platform for database applications,” the authors wrote, adding that, “From the perspective of many engineers working on the Google infrastructure, the SQL vs. NoSQL dichotomy may no longer be relevant.”
- The Spanner SQL query processor, while recognizable as a standard implementation, has unique capabilities that contribute to low-latency queries. Features such as query range extraction (for runtime analysis of complex expressions that are not easily re-written) and query restarts (compensating for failures, resharding, and other anomalies without significant latency impact) mitigate the complexities of highly distributed queries that would otherwise contribute to latency. Furthermore, the query processor serves both transactional and analytical workloads for low-latency or long-running queries.
- Long-term investments in SQL tooling have produced a familiar RDBMS-like user experience. As part of a companywide effort to standardize on common SQL functionality for all its relational services (Spanner, Dremel/BigQuery, F1, and so on), Spanner’s user experience emphasizes ANSI SQL constructs and support for nested data as a first-class citizen. “SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data, ” the authors wrote.
- Spanner will soon rely on a new columnar format called Ressi designed for database-like access patterns (for hybrid OLAP/OLTP workloads). Ressi is optimized for time-versioned (rapidly changing) data, allowing queries to more efficiently find the most recent values. Later in 2017, Ressi will replace the SSTables format inherited from Bigtable, which although highly robust, are not explicitly designed for performance.
All in all, “Our path to making Spanner a SQL system led us through the milestones of addressing scalability, manageability, ACID transactions, relational model, schema DDL with indexing of nested data, to SQL,” the authors wrote.
For more details, read the full paper here.