Tag Archives: Storage & Databases

Cloud SQL for PostgreSQL updated with new extensions

Among relational databases, PostgreSQL is the open-source solution of choice for a wide range of workloads. Back in March, we added support for PostgreSQL in Cloud SQL, our managed database service, with a limited set of features and extensions. Since then, we’ve been amazed by your interest, with many of you taking the time to suggest desired PostgreSQL extensions on the Issue Tracker and the Cloud SQL discussion group. This feedback has resulted in us adding the following 19 extensions, across four categories:
  • PostGIS: better support for geographic applications
  • Data type: a variety of new data types
  • Language: enhanced functionality with new processing languages
  • Miscellaneous: text search, cryptographic capabilities and integer aggregators, to name but a few
An extension is a piece of software that adds functionality, often data types and procedural languages, to PostgreSQL itself. If you already have a Cloud SQL for PostgreSQL database instance running, you can enable one or more of these extensions.

We're continuing our journey with PostgreSQL on Cloud SQL. As we prepare for general availability, we’re working on automatic failover for high availability, read replicas, additional extensions and precise restores with point-in-time recovery. Stay tuned!

Thanks for your feedback and please keep it coming on the Issue Tracker and in the Cloud SQL discussion group! Your input helps shape the future of Cloud SQL and all Google Cloud products.

Guest post: How Seenit uses Google Cloud Platform and Couchbase to power our video collaboration platform

Editor’s Note: In this guest post, Seenit CTO Dave Starling walks us through how they use Google Cloud Platform (GCP) and Couchbase to build their innovative crowdsourced video platform.

Since we started Seenit in 2014, our goal has been to give businesses the tools to tell interesting stories through crowdsourced video. But getting there wasn’t simple. What we envisioned for Seenit didn’t exist at the time we started, challenging us to define our product architecture from ground zero. We learned a lot, which is why today I thought I’d share a little on how we’re using Couchbase and GCP to bring Seenit to life.

When we first began looking at what we wanted to build as a platform, we came up with a list of requirements for our database and cloud provider. We chose to run Couchbase on GCP because it offered us distributed architecture that’s highly scalable and available globally. Our clients are typically large enterprises, sometimes in dozens of countries all over the world. We wanted to make sure that everyone, no matter where they are, could get a consistently good user experience.

By applying Couchbase’s N1QL and Full Text Search (FTS) with Google Cloud Machine Learning APIs, our customers can easily filter submissions by objects, words or phrases. And because everything is on GCP, we can duplicate our entire platform within minutes on 12 VMs.

Here’s how it works:

  1. We use Google Compute Engine to autoscale between two and 20 servers.
  2. Google Cloud Storage allows for unified object storage and retrieval. Near-infinite scalability means the service is capable of handling everything from small applications to builds of exabyte-scale systems.
  3. Couchbase’s Full Text Search (FTS) enables us to examine all the words in every document and match them with designated criteria.
  4. Cloud Machine Learning APIs sort clips by objects, gender of speakers and sentiment. The APIs all speak the same language so communication is seamless.

Last year, when we began looking for a machine learning platform, we wanted something that would talk JSON, store JSON and search JSON. We knew a machine learning platform that did all of that would integrate nicely into our Couchbase system. TensorFlow fit our criteria. We love that it isn’t restricted. We can build our own domain-specific models and use Google tools to train them.

Although TensorFlow is an open source machine learning platform, we use it through Cloud Machine Learning Engine. It’s a fully managed service, which is great for us because that way we don’t need to build and manage our own hardware. This allows us to do a lot of manipulation and extract a lot of really interesting data. It’s fully integrated in Couchbase, especially in full text search but also into N1QL, so we can search and extract intelligence and provide value to our customers. It’s a serverless architecture with the advantage of the custom hardware that Google started doing.

It’s also been great that we feel engaged with the community and product and engineering teams. As a startup, it’s important to feel like you can stand on the shoulders of giants, so to speak. The support we get from organizations like Google and Couchbase allow us to do lots of things that we otherwise wouldn’t be able to do with the resources we had.

There’s plenty more to share, but I’ll stop here. If you want to learn more, you might want to check out the joint talk GCP Product Manager Anil Dhawan and I recently gave at Couchbase Connect.

I also recommend checking out Couchbase and other tools on Cloud Launcher. You can use free trial credits to play around and even deploy something of your own. Good luck!

How to get started with Cloud Spanner in 5 minutes

The general availability of Cloud Spanner is really good news for developers. For the first time, you have direct access to horizontally-scaling, cloud-native infrastructure that provides global transactions (think apps that involve payments, inventory, ticketing, or financial trading) and that defaults to the “gold standard” of strong/external consistency without compromising latency. Try that with either a traditional RDBMS or non-relational database.

Thanks to the GCP Free Trial that offers $300 in credits for one year, you can get your feet wet with a single-node Cloud Spanner cluster over the course of a couple weeks. Here’s how to do that, using the Spanner API via gcloud. (Click here for the console-based approach.)

  1. In Cloud Console, go to the Projects page and either create a new project or open an existing project by clicking on the project name.
  2. Open a terminal window and set your project as the default for gcloud. Do this by substituting your project ID (not project name) with the command:

gcloud config set project [MY_PROJECT_ID]

  1. Enable billing for your project.
  2. Enable the Cloud Spanner API for your project.
  3. Set up authentication and authorization (Cloud Spanner uses OAuth 2.0 out of the box) with the following command:

    gcloud auth application-default login

    API client libraries now automatically pick up the created credentials. You need to run the command only once per local user environment. (Note: This approach is suitable for local development; for production use, you’ll want to use a different method for auth.)
  4. Next, create a single-node instance:

    gcloud spanner instances create test-instance
    --config=regional-us-central1 \
    --description="Test Instance" --nodes=1

  5. Finally, create a database. To create a database called test-db:

    gcloud spanner databases create test-db --instance=test-instance

Alternatively, you can download sample data and interact with it using the language of your choice.

That’s it — you now have your very own Cloud Spanner database. Again, your GCP credit should allow you to run it cost-free for a couple weeks. From there, you can download sample data and interact with it using the language of your choice.

Introducing Transfer Appliance: Sneakernet for the cloud era

Back in the eighties, when network constraints limited data transfers, people took to the streets and walked their floppy disks where they needed to go. And Sneakernet was born.

In the world of cloud and exponential data growth, the size of the disk and the speed of your sneakers may have changed, but the solution is the same: Sometimes the best way to move data is to ship it on physical media.

Today, we’re excited to introduce Transfer Appliance, to help you ingest large amounts of data to Google Cloud Platform (GCP).
Transfer Appliance offers up to 480TB in 4U or 100TB in 2U of raw data capacity in a single rackmount device
Transfer Appliance is a rackable high-capacity storage server that you set up in your data center. Fill it up with data and then ship it to us, and we upload your data to Google Cloud Storage. With capacity of up to one-petabyte compressed, Transfer Appliance helps you migrate your data orders-of-magnitude faster than over a typical network. The appliance encrypts your data at capture, and you decrypt it when it reaches its final cloud destination, helping to get it to the cloud safely.

Like many organizations we talk to, you probably have large amounts of data that you want to use to train machine learning models. You have huge archives and backup libraries taking up expensive space in your data center. Or IoT devices flooding your storage arrays. There’s all this data waiting to get to the cloud, but it’s impeded by expensive, limited bandwidth. With Transfer Appliance, you can finally take advantage of all that GCP has to offer  machine learning, advanced analytics, content serving, archive and disaster recovery  without upgrading your network infrastructure or acquiring third-party data migration tools.

Working with customers, we’ve found that the typical enterprise has many petabytes of data, and available network bandwidth between 100 Mbps and 1 Gbps. Depending on the available bandwidth, transferring 10 PB of that data would take between three and 34 years  much too long.

Estimated transfer times for given capacity and bandwidth
That’s where Transfer Appliance comes in. In a matter of weeks, you can have a petabyte of your data accessible in Google Cloud Storage, without consuming a single bit of precious outbound network bandwidth. Simply put, Transfer Appliance is the fastest way to move large amounts of data into GCP.

Compare the transfer times for 1 petabyte of data.
Customers tell us that space inside the data center is at a premium, and what space there is comes in the form of server racks. In developing Transfer Appliance, we built a device designed for the data center, that slides into a standard 19” rack. Transfer Appliance will only live in your data center for a few days, but we want it to be a good houseguest while it’s there.

Customers have been testing Transfer Appliance for several months, and love what they see:
"Google Transfer Appliance moves petabytes of environmental and geographic data for Makani so we can find out where the wind is the most windy." Ruth Marsh, Technical Program Manager at Makani

"Using a service like Google Transfer Appliance meant I could transfer hundreds of terabytes of data in days not weeks. Now we can leverage all that Google Cloud Platform has to offer as we bring narratives to life for our clients."  Tom Taylor, Head of Engineering at The Mill
Transfer Appliance joins the growing family of Google Cloud Data Transfer services. Initially available in the US, the service comes in two configurations: 100TB or 480TB of raw storage capacity, or up to 200TB or 1PB compressed. The 100TB model is priced at $300, plus shipping via Fedex (approximately $500); the 480TB model is priced at $1800, plus shipping (approximately $900). To learn more visit the documentation.

We think you’re going to love getting to cloud in a matter of weeks rather than years. Sign up to reserve a Transfer Appliance today. You can also sign up here for a GCP free trial.

From NoSQL to new SQL: How Spanner became a global, mission-critical database

Now that Cloud Spanner is generally available for mission-critical production workloads, it’s time to tell how Spanner evolved into a global, strongly consistent relational database service.
Recently the Spanner team presented a new paper at SIGMOD ‘17 that offers some fascinating insights into this aspect of Spanner’s “database DNA” and how it developed over time.

Spanner was originally designed to meet Google’s internal requirements for a global, fault-tolerant service to power massive business-critical applications. Today Spanner also embraces the SQL functionality, strong consistency and ACID transactions of a relational database. For critical use cases like financial transactions, inventory management, account authorization and ticketing/reservations, customers will accept no substitute for that functionality.

For example, there's no “spectrum” of less-than-strong consistency levels that will satisfy the mission-critical requirement for a single transaction state that's maintained worldwide; only strong consistency will do. Hence, few if any customers would choose to use an eventually-consistent database for critical OLTP. For Cloud Spanner customers like JDA, Snap and Quizlet, this unique feature set is already resonating.

Here are a few highlights from the paper:

  • Although Spanner was initially designed as a NoSQL key-value store, new requirements led to an embrace of the relational model, as well. Spanner’s architects had a relatively specific goal: to provide a service that could support fault-tolerant, multi-row transactions and strong consistency across data centers (with significant influence  and code  from Bigtable). At the same time, internal customers building OLTP applications also needed a database schema, cross-row transactions and an expressive query language. Thus early in Spanner’s lifecycle, the team drew on Google’s experience building the F1 distributed relational database to bring robust relational semantics and SQL functionality into the Spanner architecture. “These changes have allowed us to preserve the massive scalability of Spanner, while offering customers a powerful platform for database applications,” the authors wrote, adding that, “From the perspective of many engineers working on the Google infrastructure, the SQL vs. NoSQL dichotomy may no longer be relevant.”
  • The Spanner SQL query processor, while recognizable as a standard implementation, has unique capabilities that contribute to low-latency queries. Features such as query range extraction (for runtime analysis of complex expressions that are not easily re-written) and query restarts (compensating for failures, resharding, and other anomalies without significant latency impact) mitigate the complexities of highly distributed queries that would otherwise contribute to latency. Furthermore, the query processor serves both transactional and analytical workloads for low-latency or long-running queries.
  • Long-term investments in SQL tooling have produced a familiar RDBMS-like user experience. As part of a companywide effort to standardize on common SQL functionality for all its relational services (Spanner, Dremel/BigQuery, F1, and so on), Spanner’s user experience emphasizes ANSI SQL constructs and support for nested data as a first-class citizen. “SQL has provided significant additional value in expressing more complex data access patterns and pushing computation to the data, ” the authors wrote.
  • Spanner will soon rely on a new columnar format called Ressi designed for database-like access patterns (for hybrid OLAP/OLTP workloads). Ressi is optimized for time-versioned (rapidly changing) data, allowing queries to more efficiently find the most recent values. Later in 2017, Ressi will replace the SSTables format inherited from Bigtable, which although highly robust, are not explicitly designed for performance.

All in all, “Our path to making Spanner a SQL system led us through the milestones of addressing scalability, manageability, ACID transactions, relational model, schema DDL with indexing of nested data, to SQL,” the authors wrote.

For more details, read the full paper here.

How to do serverless pixel tracking with GCP

Whether they’re opening a newsletter or visiting a shopping cart page, how users interact with web content is very interesting to publishers. One way to understand user behavior is by using pixels, small 1x1 transparent images embedded into the web property. When loaded, the pixel calls a web server that records the request parameters passed in the URL that can be processed later.

Adding a pixel is easy, but hosting it and processing the request can be challenging for various reasons:
  • You need to set up, manage and monitor your ad servers
  • Users are usually global, which means that you need ad servers around the world
  • User visits are spiky, so pixel servers must scale up to sustain the load and scale down to limit the spend.
Google Cloud Platform (GCP) services such as Container Engine and managed autoscaled instance groups can help with those challenges. But at Google Cloud, we think companies should avoid managing infrastructure whenever possible.

For example, we recently worked with GCP partner and professional services firm DoiT International to build a pixel tracking platform that relieves the administrator from setting up or managing any servers. Instead, this serverless pixel tracking solution leverages managed GCP services, including:
  • Google Cloud Storage: A global or regional object store that offers different options such as Standard, Nearline, Cold with various prices and SLAs depending on your needs. In our case, we used Standard, which offers low millisecond latency
  • Google HTTP(s) Load Balancer: A global anycast IP load balancer service that can scale to millions of QPS with integrated logging. It also can be leveraged by Cloud CDN to prevent useless access to Google Cloud Storage by caching pixels closer to the user in Google edges
  • BigQuery: Google's fully managed, petabyte-scale, low-cost enterprise data warehouse for analytics
  • Stackdriver Logging: A logging system that allows you to store, search, analyze, monitor and alert on log data and events from GCP and Amazon Web Services (AWS). It supports Google load balancers and can export data to Cloud Storage, BigQuery or Pub/Sub
Tracking pixels with these services works as follows:
  1. A client calls a pixel URL that's served directly by Cloud Storage.
  2. A Google Cloud Load Balancer in front of Cloud Storage records the request to Stackdriver Logging, whether there was a cache hit or not.
  3. Stackdriver Logging exports every request to BigQuery as they come in, which acts as a storage and querying engine for ad-hoc analytics that can help business analysts better understand their users.

All those services are fully managed and do not require you to set up any instances or VMs. You can learn more about this solution by:
Going forward, we look forward to building more serverless solutions on top of GCP managed offerings. Let us know in the comments if there’s a solution that you’d like us to build!

Cloud Spanner is now production-ready; let the migrations begin!

Cloud Spanner, the world’s first horizontally-scalable and strongly-consistent relational database service, is now generally available for your mission-critical OLTP applications.

We’ve carefully designed Cloud Spanner to meet customer requirements for enterprise databases — including ANSI 2011 SQL support, ACID transactions, 99.999% availability and strong consistency — without compromising latency. As a combined software/hardware solution that includes atomic clocks and GPS receivers across Google’s global network, Cloud Spanner also offers additional accuracy, reliability and performance in the form of a fully-managed cloud database service. Thanks to this unique combination of qualities, Cloud Spanner is already delivering long-term value for our customers with mission-critical applications in the cloud, including customer authentication systems, business-transaction and inventory-management systems, and high-volume media systems that require low latency and high throughput. For example, Snap uses Cloud Spanner to power part of its search infrastructure.

Looking toward migration

In preparation for general availability, we’ve been working closely with our partners to make adoption as smooth and easy as possible. Thus today, we're also announcing our initial data integration partners: Alooma, Informatica and Xplenty.

Now that these partners are in the early stages of Cloud Spanner “lift-and-shift” migration projects for customers, we asked a couple of them to pass along some of their insights about the customer value of Cloud Spanner, as well as any advice about planning for a successful migration:

From Alooma:

Cloud Spanner is a game-changer because it offers horizontally scalable, strongly consistent, highly available OLTP infrastructure in the cloud for the first time. To accelerate migrations, we recommend that customers replicate their data continuously between the source OLTP database and Cloud Spanner, thereby maintaining both infrastructures in the same state — this allows them to migrate their workloads gradually in a predictable manner.

From Informatica:
“Informatica customers are stretching the limits of latency and data volumes, and need innovative enterprise-scale capabilities to help them outperform their competition. We are excited about Cloud Spanner because it provides a completely new way for our mutual customers to disrupt their markets. For integration, migration and other use cases, we are partnering with Google to help them ingest data into Cloud Spanner and integrate a variety of heterogeneous batch, real-time, and streaming data in a highly scalable, performant and secure way.”

From Xplenty:
"Cloud Spanner is one of those cloud-based technologies for which businesses have been waiting: With its horizontal scalability and ACID compliance, it’s ideal for those who seek the lower TCO of a fully managed cloud-based service without sacrificing the features of a legacy, on-premises database. In our experience with customers migrating to Cloud Spanner, important considerations include accounting for data types, embedded code and schema definitions, as well as understanding Cloud Spanner’s security model to efficiently migrate your current security and access-control implementation."

Next steps

We encourage you to dive into a no-cost trial to experience first-hand the value of a relational database service that offers strong consistency, mission-critical availability and global scale (contact us about multi-regional instances) with no workarounds — and with no infrastructure for you to deploy, scale or manage. (Read more about Spanner’s evolution inside Google in this new paper presented at the SIGMOD ‘17 conference today.) If you like what you see, a growing partner ecosystem is standing by for migration help, and to add further value to Cloud Spanner use cases via data analytics and visualization tooling.

Compute Engine machine types with up to 64 vCPUs now ready for your production workloads

Today, we're happy to announce general availability for our largest virtual machine shapes, including both predefined and custom machine types, with up to 64 virtual CPUs and 416 GB of memory.

64 vCPU machine types are available on our Haswell, Broadwell and Skylake (currently in Alpha) generation Intel processor host machines.

Tim Kelton, co-founder and Cloud Architect of Descartes Labs, an early adopter of our 64 vCPU machine types, had this to say:
"Recently we used the 64 vCPU instances during the building of both our global composite imagery layers and GeoVisual Search. In both cases, our parallel processing jobs needed tens of thousands of CPU hours to complete the task. The new 64 vCPU instances allow us to work across more satellite imagery scenes simultaneously on a single instance, dramatically speeding up our total processing times."
The new 64 core machines are available for use today. If you're new to GCP and want to give these larger virtual machines a try, it’s easy to get started with our $300 credit for 12 months.

Google Cloud Natural Language API launches new features and Cloud Spanner graduating to GA

Today at Google Cloud Next London we're excited to announce product news that will help customers innovate and transform their businesses faster via the cloud: first, that Google Cloud Natural Language API is adding support for new languages and entity sentiment analysis, and second, that Google Cloud Spanner is graduating to general availability (GA).

Cloud Natural Language API beta

Since we launched Cloud Natural Language API, a fully managed service for extracting meaning from text via machine learning, we’ve seen customers such as Evernote and Ocado enhance their businesses in fascinating ways. For example, they use Cloud Natural Language API to analyze customer feedback and sentiment, extract key entities and metadata from unstructured text such as emails or web articles, and enable novel features (such as deriving action items from meeting notes).

These use cases, among many others, highlighted the need to expand language support and add improvements in the quality of our base NLU technology. We've incorporated this feedback into the product and are pleased to announce the following new capabilities under beta:

  • Expanded language support for entity, document sentiment and syntax analysis for the following languages: Chinese (Simplified and Traditional), French, German, Italian, Korean and Portuguese. This is in addition to existing support for English, Spanish and Japanese.
  • Understand sentiment for specific entities and not just whole document or sentence: We're introducing a new method that identifies entities in a block of text and also determines sentiment for those entities. Entity sentiment analysis is currently only available for the English language. For more information, see Analyzing Entity Sentiment.
  • Improved quality for sentiment and entity analysis: As part of the continuous effort to improve quality of our base models, we're also launching improved models for sentiment and entity analysis as part of this release.

Early access users of this new functionality such as Wootric are already using the expanded language support and new entity sentiment analysis feature to better understand customer sentiment around brands and products. For example, for customer feedback such as “the phone is expensive but has great battery life,” users can now parse that the sentiment for phone is negative while the sentiment for battery life is positive.

As the API becomes more widely adopted, we're looking forward to seeing more interesting and useful applications of it.

Cloud Spanner enters GA

Announced in March at Google Cloud Next ‘17, Cloud Spanner is the world’s first fully managed, horizontally scalable relational database service for mission-critical online transaction processing (OLTP) applications. Cloud Spanner is specifically designed to meet customer requirements in this area for strong consistency, high availability and global scale qualities that make it unique as a service.

During the beta period, we were thrilled to see customers unlock new use cases in the cloud with Cloud Spanner, including:

  • Powering mission-critical applications like customer authentication and provisioning for multi-national businesses
  • Building consistent systems for business transactions and inventory management in the financials services and retail industries
  • Supporting incredibly high-volume systems that need low-latency and high-throughput in the advertising and media industries

As with all our other services, GCP handles all the performance, scalability and availability needs automatically in a pay-as-you-go way.

On May 16, Cloud Spanner will reach a further milestone by becoming generally available for the first time. Currently we're offering regional instances, with multi-regional instances coming later this year. We've been Spanner users ourselves for more than five years to support a variety of mission-critical global apps, and we can’t wait to see what new workloads you bring to the cloud, and which new ones you build next!

Google Cloud Storage introduces Cloud Pub/Sub notifications

Google Cloud Storage has always been a high-performance and cost-effective place to store data objects. Now it’s also easy to build workflows around those objects that are triggered by creating or deleting them, or changing their metadata.

Suppose you want to take some action every time a change occurs in one of your Cloud Storage buckets. You might want to automatically update sales projections every day when sales uploads its new daily totals. You might need to remove a resource from a search index when an object is deleted. Or perhaps you want to update the thumbnail when someone makes a change to an image. The ability to respond to changes in a Cloud Storage bucket gives you increased responsiveness, control and flexibility.

Cloud Pub/Sub Support

We’re pleased to announce that Cloud Storage can now register changes by sending change notifications to a Google Cloud Pub/Sub topic. Cloud Pub/Sub is a powerful messaging platform that allows you to build fast, reliable and more secure messaging solutions. Cloud Pub/Sub support introduces many new capabilities to Cloud Storage notifications, such as pulling from subscriptions instead of requiring users to configure webhooks, multiplexing copies of each message to many subscribers and filtering messages by event type or prefix.
You can get started sending Cloud Storage notifications to Cloud Pub/Sub by reading our getting started guide. Once you’ve enabled the Cloud Pub/Sub API and downloaded the latest version of the gcloud SDK, you can set up notification triggers from your Cloud Storage bucket to your Cloud Pub/Sub topic with the following command:

$> gsutil notification create -f json -t your-topic gs://your-bucket

From that point on, any changes to the contents of your Cloud Storage bucket trigger a message to your Cloud Pub/Sub topic. You can then create Cloud Pub/Sub subscriptions on that topic and pull messages from those subscriptions in your programs, like in this example Python app.

Cloud Functions

Cloud Pub/Sub is a powerful and flexible way to respond to changes in a bucket. However, for some tasks you may prefer the simplicity of deploying a small, serverless function that just describes the action you want to take in response to a change. For that, Google Cloud Functions supports Cloud Storage triggers.

Cloud Functions is a quick way to deploy cloud-based scripts in response to a wide variety of events, for example an HTTP request to a certain URL, or a new object in a Cloud Storage bucket.

Once you get started with Google Cloud Functions, you can learn about setting up a Cloud Storage Trigger for your function. It’s as simple as adding a “--trigger-bucket” parameter to your deploy function:

$> gcloud beta functions deploy helloWorld --stage-bucket cloud-functions --trigger-bucket your-bucket

It’s fun to think about what’s possible when Cloud Storage objects aren’t just static entities, but can trigger a wide variety of tasks. We hope you’re as excited as we are!